Two changes: 1. INDEX.md (new, generated). Workspace-level index that lists every package grouped by UAPF level (L0/L1/L2/L3/L4) for navigation. Per UAPF specification §01-concepts.md, levels are an aggregation and governance scope only and MUST NOT be used to imply modeling semantics; per §04-folder-structure.md the SHOULD-recommended on-disk layout is enterprise/ + domains/ + processes/ regardless of level. INDEX.md is therefore a level-grouped *view* over the spec-conformant on-disk layout, not an alternative layout. Generated by tools/build-index/build_index.py (new), which walks the workspace manifests and emits INDEX.md from current state. 2. docs/methodology.md updated to document the §07 auto-layout DI as a known deliberate non-conformance. §07-package-format.md requires authored DI (from a conforming OMG modeler) and explicitly forbids automatic layout generation as a substitute. The DI in the five .bpmn files in this workspace is auto-generated by tools/register-transcoder/bpmn_di.py and is therefore non-conformant under §07. It is kept for the POC so artefacts preview visually; the conformant path is to re-author each .bpmn in Camunda Modeler or bpmn-js Studio and recommit, which emits authored DI. New *Known non-conformances* section in methodology.md cites §07 verbatim and explains this rationale. The Pass-1 transcoder section and Final-validation-pass section are softened to call out that the DI is auto-generated. uapf-cli validate green on all 13 packages (L1 domains/gramatvediba, L2 fg1-fg6, L3 fg3-2/3/6, L4 fg3-1/4/5).
18 KiB
Methodology — vk-gramatvediba transcription pipeline
Context
The Valsts Kase (State Treasury) together with the Vienotais Pakalpojumu Centrs (VPC) publishes a normative description of public-sector bookkeeping in Latvia — Grāmatvedības uzskaites procesu apraksts — as a set of six function-group spreadsheets (FG1–FG6) spanning planning, expenditure, settlement, fixed assets, payroll and reporting. Each register row is a process step with an explicit responsible actor (RACI), the IT system used, an SLA, the data produced, and predecessor/successor references to other steps in the same or adjacent registers.
This workspace, vk-gramatvediba, is a UAPF 2.2.0 transcription of the FG3
register (saistību un izdevumu uzskaite — obligations and expenditure
accounting) into executable process artefacts. It is one of the projects
accepted into the Latvian AI regulatory sandbox (MIC), with PPPA as
institutional applicant and KISC as pilot partner.
The transcription is not a one-off translation. It is a methodology — a repeatable two-pass pipeline from published normative documents to signed, runnable process packages. This note describes the methodology, shows how the same pipeline applied to two sub-processes (3.5.2 and 3.5.3) produced the curated FG3-4 and FG3-5 packages, and reports the final validation pass over the workspace.
The pipeline
Transcription has two passes with deliberately different epistemic status.
Pass 1 — mechanical transcoding. A deterministic tool reads the
register and emits a BPMN skeleton: one task per register step, swimlanes
from RACI, sequence flows from the register's own predecessor/successor
columns, synthesised start/end events at the fragment's real boundary. The
skeleton is faithful to the register — including its inconsistencies — and
is isExecutable="false". This pass lives in tools/register-transcoder/.
Pass 2 — curated refinement. A human curator, optionally AI-assisted,
takes the skeleton and resolves it into a Level 4 executable: explicit
gateways, decision logic extracted into DMN, resource roles/agents/mappings,
metadata (ownership, lifecycle, policies), and a package manifest. The
result is a processes/<id>/uapf.yaml package that validates against the
UAPF 2.2.0 schemas and against uapf-cli, and is runnable on the
uapf-engine.
The separation is the methodology's load-bearing claim. Pass 1 is deterministic and traceable — re-running the transcoder on the same register produces identical output, and every node has a mechanical provenance to a register row. Pass 2 is interpretive and signable — it adds judgement the register does not contain (rule tables that the register describes only in prose, branches the register implies in footnotes, metadata the register does not carry), and the curator is identified in the package's ownership metadata. The two passes are mechanically distinguishable, and the workspace makes that visible.
Workspace structure — the L0–L4 level model
The workspace's directory layout is grounded in the UAPF SSOT
specification at UAPFormat/UAPF-specification/specification/, in
particular 01-concepts.md (Levels), 04-folder-structure.md,
05-level-composition.md and the conformance checklist in
10-conformance-checklist.md.
The spec defines five levels as aggregation and governance scope only — not as modeling semantics:
- L0 — Enterprise process collection index. Workspace-level. MUST NOT
contain executable logic. Here:
enterprise/enterprise.yaml. - L1 — Domain process collection. Composes L2/L3/L4 packages within a
domain. Here:
domains/gramatvediba/. - L2 — End-to-end business process. Composes L3/L4 packages.
Here:
processes/fg1,fg2,fg3,fg4,fg5,fg6— one per Valsts Kase function group. - L3 — Composed subprocess / variant. A composition placeholder that
references one or more L4 packages.
Here:
processes/fg3-2,fg3-3,fg3-6— the FG3 sub-processes that were in scope for the POC but not built out to atomic executables. - L4 — Atomic executable process. MUST include at least one BPMN file
and MUST include resource mappings. Cornerstones (BPMN, optional DMN,
optional CMMN, resources) live here and only here.
Here:
processes/fg3-1,fg3-4,fg3-5.
The spec enforces strict containment of executable artefacts at L4: L1–L3
packages MUST reference lower-level packages via includes and MUST NOT
duplicate BPMN/DMN/CMMN files. The validator in uapf-cli and the
conformance rules in 05-level-composition.md reject workspaces that
mix the layers.
Pass 1 in detail — the transcoder
The transcoder, tools/register-transcoder/transcode.py, is a small Python
tool with one external dependency (openpyxl) plus a co-installed layout
helper bpmn_di.py (also runnable standalone). It locates the worksheet
and header row by content rather than by position, so it tolerates the
leading title rows the registers carry and applies unchanged to any of the
FG1–FG6 registers. It expects the standard register columns: the
predecessor block (FG-group and step-number in adjacent cells), the step's
Nr.p.k., Process, apakšprocess, the RACI block split across the three
actor sub-columns (Nodarbinātais / Iestāde / VPC), Darbību apraksts,
Izmantotā IS, Izpildes termiņš, Sagatavotie dati, and the successor
block Uz procesa darbības soli.
Rows that carry a number and a name but no description and no RACI entry
are treated as sub-process headers; rows with description or any RACI entry
are treated as steps and assigned to the most recently encountered header.
For a requested sub-process the transcoder emits a single BPMN process
containing one bpmn:userTask per step, with the step's description,
system, SLA, RACI cells and any cross-sub-process or cross-FG references
preserved in bpmn:documentation; swimlanes for each actor that has steps
in the sub-process, with each step placed in the lane of its Responsible
actor; sequence flows reconstructed from the union of predecessor and
successor references whose endpoints are both inside the sub-process; and
one bpmn:startEvent per entry step (no in-group predecessor) and one
bpmn:endEvent per exit step (no in-group successor), so the fragment's
real boundary is visible rather than hidden behind synthesised gateways.
The output then has BPMN Diagram Interchange (bpmndi:BPMNDiagram with
BPMNShape and BPMNEdge elements) appended by bpmn_di.py via a
swim-lane left-to-right auto-layout, so the resulting file previews in
bpmn.io, Camunda Modeler and the ProcessGit web view. This DI is
auto-generated, not authored — §07 of the UAPF specification requires
authored DI produced by a conforming OMG modeler and explicitly forbids
automatic layout generation as a substitute. The auto-layout output is
kept as a known deliberate non-conformance pending re-authoring of each
.bpmn in a modeler. See Known non-conformances below.
The output is isExecutable="false" and deliberately unembellished: no
inferred gateways, no synthesised decision logic, no compensation for
register-side inconsistencies. Reproducing register defects is a feature —
it makes the refinement step's contribution explicit.
Pass 2 in detail — the refinement
Pass 2 takes a skeleton and authors the material the register implies but does not encode. The operations, in roughly the order they apply:
- Link reconciliation — the skeleton may show reciprocal edges, short cycles, or disjoint fragments where the register's predecessor and successor columns disagree. The curator decides the intended topology and rewrites flows accordingly.
- Gateway promotion — a task whose semantics implies a decision is split
into the evaluating step plus an explicit gateway (typically
exclusiveGateway) with named branches; multiple register steps that represent alternative outcomes collapse into branches. - DMN extraction — where the register's prose implies a rule table, the
decision is lifted into a separate
.dmnfile with FIRST or UNIQUE hit policy and named inputs/outputs. The BPMN gets abusinessRuleTaskwhose decision reference points to the DMN decision. - Resource authoring —
resources/roles.yaml,agents.yamlandmappings.yamlenumerate the responsible parties, bind roles to the systems named in Izmantotā IS (HoP, RVS Horizon, ePNS, etc.), and link BPMN tasks to roles and agents. - Metadata authoring —
metadata/ownership.yaml,lifecycle.yamlandpolicies.yamlrecord who curated the package, its UAPF lifecycle stage, and policy constraints (retention, signing, jurisdictional applicability). - Manifest assembly — the package's
uapf.yamllists kind, id, name, level (4 for atomic executables), cornerstones referencing the BPMN, DMN, resources and metadata, exposed inputs/outputs/artifacts, and any MCP exposure.
The refined package validates against the UAPF 2.2.0 schemas and against
uapf-cli validate. Once validated, the package is signable.
Concrete comparison — 3.5.2 / FG3-4
Sub-process 3.5.2 (Saimnieciskie norēķini un to kustība) has three
register steps. The transcoder's sample-output/3.5.2.skeleton.bpmn
contains three userTasks in two lanes (Nodarbinātais for 3.5.2.1 and
3.5.2.2, VPC for 3.5.2.3), four sequence flows, one start event and one
end event. It makes two register-side artefacts visible. Step 3.5.2.1
(the advance request) has only external successor references — it routes
out to FG2/2.3.2 (budget commitment) and back, and is not directly
linked to 3.5.2.2 in the register's columns, so the skeleton shows it as
a small linear fragment of its own. Steps 3.5.2.2 and 3.5.2.3 have
reciprocal predecessor/successor entries in the register — each lists the
other in both directions — so the skeleton renders a two-task cycle.
The curated processes/fg3-4 package resolves both. Its BPMN
Process_SaimnieciskaNorekina has 14 nodes and 14 flows across all three
lanes, a single clean entry and exit, and a businessRuleTask linked to a
separate DMN. The DMN Decision_AvansaNorekins is a FIRST-hit decision
with five rules; its inputs are avansaSituacija and avansaVeids, and
its output norekinResultats takes one of four values — slegts,
atmaksa, papildu-izmaksa, parnesums — the four outcomes the register
describes in prose but does not encode in a table. The gateway downstream
of the DMN routes to the finalisation task for each outcome (close,
reclaim from employee, additional disbursement, carry forward).
Three tasks in the skeleton; 14 nodes plus a 5-rule DMN in the executable. None of what the executable adds is in the register's columns. It is in the register's prose, in the SLA cells, and in the normative documents the register cites — Grāmatvedības likums, MK noteikumi Nr. 749, MK noteikumi Nr. 877. Pass 2 is where that material enters.
Concrete comparison — 3.5.3 / FG3-5
Sub-process 3.5.3 (Komandējuma (darba brauciena) dokumenti un to
kustība) has four register steps. The transcoder's
sample-output/3.5.3.skeleton.bpmn contains eight nodes and six flows.
The curated processes/fg3-5 package likewise has 14 nodes and 15 flows
across three lanes. It introduces an explicit cancellation branch — a
trip annulled before settlement vs a trip proceeded with — and the DMN
Decision_KomandejumaNorekins whose parnesums rule reconciles an
advance surplus against the next approved business trip from the same
funding line. Neither the cancellation branch nor the carry-forward rule
is in the register's predecessor/successor columns; both are in the
register's prose and in the cited Komandējuma izdevumu noteikumi.
Final validation pass
The workspace contains three Level 4 executable packages, three Level 3 composition stubs, six Level 2 function-group manifests, the Level 1 domain manifest, the Level 0 enterprise index, the transcoder tool, and this methodology note. The validation pass run after the level-marker correction (next section):
uapf-cli validate processes/fg3-1→OK: package valid.uapf-cli validate processes/fg3-4→OK: package valid.uapf-cli validate processes/fg3-5→OK: package valid.- All
.bpmnand.dmnfiles in the workspace are XML well-formed. - BPMN graph integrity: every
sequenceFlowreferences existingsourceRef/targetRefnodes; everyflowNodeRefresolves to a defined node; everyincoming/outgoingreference is consistent with the corresponding flow's source/target. - All
.bpmnfiles carry BPMN Diagram Interchange (auto-generated; see Known non-conformances below) — sufficient for preview in bpmn.io, Camunda Modeler and ProcessGit's web view. - The transcoder is byte-deterministic: re-running it on the FG3 register
for 3.5.2 and 3.5.3 reproduces the committed
sample-output/files exactly.
Conformance correction — Step-4 level-labelling
An initial pass of this workspace shipped with the FG3 sub-process stubs
(fg3-2, fg3-3, fg3-6) marked level: 4 by template inheritance from
fg3-1, with no BPMN and no resources. That fails the spec's L4
requirement — §01-concepts: "A Level-4 package MUST include BPMN and MUST
include resources and mappings, even if minimal."
The cause was a Step-4 design error: the FG3 sub-process packages were
created in a single sweep with the same level marker as the FG3-1
template, without checking whether each one would actually carry
executable artefacts. Three of them never would in this POC's scope; they
are composition placeholders, which the spec models as L3 (composed
subprocess / variant — 05-level-composition.md).
The correction is a level-marker change: fg3-2, fg3-3, fg3-6 are
now level: 3 with cornerstones.bpmn: false. Their lack of BPMN is now
spec-conformant (L1–L3 MUST NOT duplicate L4 content). The three real
executables (fg3-1, fg3-4, fg3-5) remain L4. The mermaid in
05-level-composition.md shows L2 → L3 → L4 as a typical chain, but the
spec text is explicit that the diagram is informative and that L2
packages may reference L4 directly when no intermediate composition is
needed (fg3 includes references the three L4s and the three L3 stubs
in parallel, which is conformant).
Known non-conformances
This POC ships with one documented non-conformance against the UAPF specification, retained deliberately for the POC and tracked here.
Authored Diagram Interchange (§07-package-format.md). §07 makes DI a MUST for cornerstone BPMN/DMN/CMMN files and explicitly forbids automatic layout generation as a substitute:
"An implementation MUST NOT rely on automatic layout generation as a substitute for authored DI: a generated layout is not the authored layout and is not deterministic across tools."
The DI in the five .bpmn files (fg3-1, fg3-4, fg3-5, and the two
transcoder samples under tools/register-transcoder/sample-output/) is
produced by tools/register-transcoder/bpmn_di.py using a swim-lane
left-to-right auto-layout. It renders the diagrams in bpmn.io, Camunda
Modeler and ProcessGit's web view, but it is not authored DI and
therefore non-conformant under §07. The auto-layout is kept so the POC
artefacts are reviewable visually; the conformant path is to open each
.bpmn in a conforming OMG modeler (Camunda Modeler, bpmn-js Studio) and
re-save, which emits authored DI. The DMN files are unaffected — §07
exempts DMNs whose only logic is decision tables, and all three workspace
DMNs are exactly that.
Implications for the AI regulatory sandbox
The pipeline has four properties that bear on the sandbox's evaluation. Provenance: every executable step has a mechanical trace back to a register row, and the refinement layer is recorded in ownership metadata. A reviewer can audit either pass independently. Versionability: the register is itself versioned (publication dates, changelog); the workspace is git-versioned on processgit; the packages carry lifecycle metadata. A register change triggers a re-transcode and a diffable refinement. Separability of judgement: what the register says and what the refinement adds are mechanically distinguishable — the skeleton is reproducible from the register alone, and the curator's contribution is exactly the diff. Coverage: the same tool applies unchanged to FG1, FG2, FG4, FG5 and FG6 — the parser locates headers by content. Three of FG3's nine sub-processes have curated executables in this POC (FG3-1, FG3-4, FG3-5); the remaining six FG3 sub-processes are composition stubs; the other five function groups are untouched.
The sandbox question is whether AI-assisted transcription of regulatory processes into executable, signable artefacts is feasible at production scale. This workspace is one positive existence proof — small, end-to-end, with both passes shipped and reproducible.
Limitations and next steps
The POC has known gaps. Semantic validation is structural only: the packages validate against UAPF schemas and graph-integrity rules, not yet against accounting-law semantics. There is no automated check that, for instance, the parnesums rule complies with the relevant MK noteikumi or that the deadlines align with Grāmatvedības likums 28.p(5). A second validator layer — rule coverage against the cited statutes — is the obvious next step. Engine execution is in scope for the uapf-engine in isolation but not yet integrated with a Sledger host that would run the packages against real accounting state. Coverage: the POC takes three sub-processes to L4; production-grade coverage of FG3 alone needs six more, and FG1, FG2, FG4, FG5, FG6 are still untouched. Refinement automation is currently human-driven; quantifying the AI-assisted portion of Pass 2 — applying an LLM to the skeleton and measuring the curator's remaining diff — is the natural sandbox experiment that follows.
References inside the workspace: tools/register-transcoder/README.md for
the transcoder's CLI and register-format assumptions;
processes/fg3-1, processes/fg3-4, processes/fg3-5 for the curated
executables; tools/register-transcoder/sample-output/ for the skeletons
the comparisons in this note refer to.