1
0

Step 7: register-to-BPMN transcoder tool

Adds tools/register-transcoder — a Python tool that reads a published Valsts
Kase accounting-process register (.xlsx/.xlsm) and emits BPMN process
skeletons. For a given sub-process it produces one userTask per register
step, swimlanes from the RACI columns (placing each step in its Responsible
actor's lane), sequence flows reconstructed from the register's own
predecessor/successor step references, and synthesised start/end events per
entry and exit step. Output is an isExecutable=false skeleton — the
deterministic first pass of the transcription pipeline; refinement into a
Level 4 executable package is the human/AI-assisted second pass that produced
the curated FG3-1/FG3-4/FG3-5 packages. Includes a README and sample-output
skeletons emitted from the FG3 register for sub-processes 3.5.2 and 3.5.3.
This commit is contained in:
2026-05-19 21:38:45 +00:00
parent 37000f77f5
commit a608de41ad
4 changed files with 615 additions and 0 deletions

View File

@@ -0,0 +1,94 @@
# register-transcoder
Transcodes a published Valsts Kase accounting-process register
(`.xlsx` / `.xlsm`) into BPMN process skeletons — one deterministic step in
the `vk-gramatvediba` transcription pipeline.
The Valsts Kase / VPC *Grāmatvedības uzskaites procesu apraksts* is published
as a set of function-group spreadsheets (FG1–FG6). Each row of a register is a
process step with explicit predecessor and successor step references, a RACI
split across the responsible actors, the IT system used, an SLA, and the data
the step produces. That structure is already a process graph; this tool reads
it and emits the corresponding BPMN.
## What it produces
For a given sub-process the tool emits one `.bpmn` file containing a single
`bpmn:process` with `isExecutable="false"`:
- one `bpmn:userTask` per register step, named from the register and carrying
the step's description, system, SLA, RACI and cross-references in
`bpmn:documentation`;
- `bpmn:lane`s derived from the RACI columns — a step is placed in the lane of
its **Responsible** actor (Nodarbinātais / Iestāde / VPC);
- `bpmn:sequenceFlow`s reconstructed from the register's own
*No procesa darbības soļa* (predecessor) and *Uz procesa darbības soli*
(successor) columns, restricted to links whose endpoints are both inside the
emitted sub-process;
- synthesised `bpmn:startEvent` / `bpmn:endEvent` nodes — one per entry step
(no in-group predecessor) and one per exit step (no in-group successor) — so
the fragment's real boundary is visible rather than hidden.
## Register format expected
The parser locates the worksheet and header row by content, not by position,
so it tolerates the leading title rows the registers carry. It expects a
header row containing `Nr.p.k.` and the columns *No procesa darbības soļa*
(predecessor, with the FG-group and step-number in adjacent cells),
*Process, apakšprocess*, *Atbildības sadalījums (RACI)* (a three-column block
for Nodarbinātais / Iestāde / VPC), *Darbību apraksts*, *Izmantotā IS*,
*Izpildes termiņš*, *Sagatavotie dati* and *Uz procesa darbības soli*
(successor). Rows that carry a number and a name but no description and no
RACI are treated as sub-process headers; rows with a description or any RACI
entry are treated as steps. Steps are grouped under the most recent header.
## Usage
```
transcode.py list <register.xlsx>
transcode.py emit <register.xlsx> <subprocess> [-o <output.bpmn>]
```
`list` reports the sub-processes that contain steps, with step counts. `emit`
writes (or, without `-o`, prints) the BPMN skeleton for one sub-process.
```
python3 transcode.py list fg3_process.xlsm
python3 transcode.py emit fg3_process.xlsm 3.5.2 -o 3.5.2.skeleton.bpmn
```
The only dependency is `openpyxl`.
## Limitations — a skeleton, not an executable
The output is deliberately a faithful mechanical transcription, not a finished
package. It does **not**:
- detect decisions — every step becomes a `userTask`; branching points are not
promoted to `exclusiveGateway`s and no DMN is extracted;
- repair the register — where the register's predecessor and successor columns
disagree, the skeleton reproduces the result as-is (this can surface as a
reciprocal edge / short cycle, or as a step that reaches the rest of its
sub-process only through a cross-FG excursion);
- carry BPMN diagram interchange (`bpmndi`) — the output is a logical model,
laid out by an editor on import;
- emit a UAPF package — there is no `uapf.yaml`, no resources and no metadata.
The RACI-to-lane rule is a heuristic: the lane is the first actor whose RACI
cell contains `R`. The full RACI is preserved verbatim in each task's
documentation so the heuristic can be checked and corrected.
## Position in the pipeline
A skeleton is the deterministic first pass. Refining one into a Level 4
executable — introducing explicit gateways, extracting decision logic into
DMN, writing resource roles/agents/mappings and the package manifest — is the
human / AI-assisted second pass. The curated `processes/fg3-1`, `fg3-4` and
`fg3-5` packages are what that second pass yields; `docs/methodology.md`
discusses the transcoder skeleton against the curated executable for the same
sub-process.
`sample-output/` holds skeletons emitted from the FG3 register for
sub-processes 3.5.2 (*Saimnieciskie norēķini*) and 3.5.3 (*Komandējuma
norēķini*) — the two that have curated executable counterparts in this
workspace.