Three corrections grounded in the UAPF SSOT specification (UAPFormat/ UAPF-specification, specification/01-concepts.md, 04-folder-structure.md, 05-level-composition.md, 10-conformance-checklist.md), which had not been read in full before the initial workspace build. 1. Level relabel. The FG3 sub-process stubs fg3-2, fg3-3 and fg3-6 had been marked level: 4 by template inheritance from fg3-1 at Step 4 of the build, despite carrying no BPMN and no resources. Per the spec conformance checklist this fails the L4 requirement. The three are composition placeholders, which the spec models as L3 (composed subprocess / variant). Their uapf.yaml is now level: 3 with cornerstones.bpmn: false — conformant: L1-L3 packages MUST NOT duplicate L4 content. The three real executables fg3-1, fg3-4 and fg3-5 remain L4. 2. BPMN Diagram Interchange. All five .bpmn files in the workspace now carry a bpmndi:BPMNDiagram with BPMNShape and BPMNEdge elements produced by a swim-lane left-to-right auto-layout, so the diagrams preview in bpmn.io, Camunda Modeler and ProcessGit's web view. The spec doesn't require DI (its own examples have none) but practical reviewability does. 3. Transcoder. tools/register-transcoder gains bpmn_di.py — also runnable standalone for retrofitting existing BPMN files. transcode.py now imports it and emits DI by default for newly generated skeletons. sample-output/3.5.2.skeleton.bpmn and 3.5.3.skeleton.bpmn regenerated with DI; the logical-model content is byte-identical to the previous commit, only DI is added. docs/methodology.md updated: adds an explicit Workspace-structure section grounding L0-L4 in the SSOT spec, a Conformance-correction section documenting the Step-4 mislabel and its fix, and drops the now-untrue 'no DI' line from limitations. Validation after the change, full L1-L4 sweep: uapf-cli validate green on all 10 packages (domains/gramatvediba, fg1-fg6, fg3, fg3-1..fg3-6); xmllint clean on all 8 .bpmn/.dmn; every .bpmn has BPMNDiagram present.
299 lines
16 KiB
Markdown
299 lines
16 KiB
Markdown
# Methodology — `vk-gramatvediba` transcription pipeline
|
|
|
|
## Context
|
|
|
|
The Valsts Kase (State Treasury) together with the Vienotais Pakalpojumu
|
|
Centrs (VPC) publishes a normative description of public-sector bookkeeping
|
|
in Latvia — *Grāmatvedības uzskaites procesu apraksts* — as a set of six
|
|
function-group spreadsheets (FG1–FG6) spanning planning, expenditure,
|
|
settlement, fixed assets, payroll and reporting. Each register row is a
|
|
process step with an explicit responsible actor (RACI), the IT system used,
|
|
an SLA, the data produced, and predecessor/successor references to other
|
|
steps in the same or adjacent registers.
|
|
|
|
This workspace, `vk-gramatvediba`, is a UAPF 2.2.0 transcription of the FG3
|
|
register (*saistību un izdevumu uzskaite* — obligations and expenditure
|
|
accounting) into executable process artefacts. It is one of the projects
|
|
accepted into the Latvian AI regulatory sandbox (MIC), with PPPA as
|
|
institutional applicant and KISC as pilot partner.
|
|
|
|
The transcription is not a one-off translation. It is a *methodology* —
|
|
a repeatable two-pass pipeline from published normative documents to signed,
|
|
runnable process packages. This note describes the methodology, shows how
|
|
the same pipeline applied to two sub-processes (3.5.2 and 3.5.3) produced
|
|
the curated FG3-4 and FG3-5 packages, and reports the final validation pass
|
|
over the workspace.
|
|
|
|
## The pipeline
|
|
|
|
Transcription has two passes with deliberately different epistemic status.
|
|
|
|
**Pass 1 — mechanical transcoding.** A deterministic tool reads the
|
|
register and emits a BPMN skeleton: one task per register step, swimlanes
|
|
from RACI, sequence flows from the register's own predecessor/successor
|
|
columns, synthesised start/end events at the fragment's real boundary. The
|
|
skeleton is faithful to the register — including its inconsistencies — and
|
|
is `isExecutable="false"`. This pass lives in `tools/register-transcoder/`.
|
|
|
|
**Pass 2 — curated refinement.** A human curator, optionally AI-assisted,
|
|
takes the skeleton and resolves it into a Level 4 executable: explicit
|
|
gateways, decision logic extracted into DMN, resource roles/agents/mappings,
|
|
metadata (ownership, lifecycle, policies), and a package manifest. The
|
|
result is a `processes/<id>/uapf.yaml` package that validates against the
|
|
UAPF 2.2.0 schemas and against `uapf-cli`, and is runnable on the
|
|
uapf-engine.
|
|
|
|
The separation is the methodology's load-bearing claim. Pass 1 is
|
|
deterministic and traceable — re-running the transcoder on the same
|
|
register produces identical output, and every node has a mechanical
|
|
provenance to a register row. Pass 2 is interpretive and signable — it
|
|
adds judgement the register does not contain (rule tables that the register
|
|
describes only in prose, branches the register implies in footnotes,
|
|
metadata the register does not carry), and the curator is identified in the
|
|
package's ownership metadata. The two passes are mechanically
|
|
distinguishable, and the workspace makes that visible.
|
|
|
|
## Workspace structure — the L0–L4 level model
|
|
|
|
The workspace's directory layout is grounded in the UAPF SSOT
|
|
specification at `UAPFormat/UAPF-specification/specification/`, in
|
|
particular `01-concepts.md` (Levels), `04-folder-structure.md`,
|
|
`05-level-composition.md` and the conformance checklist in
|
|
`10-conformance-checklist.md`.
|
|
|
|
The spec defines five levels as aggregation and governance scope only —
|
|
not as modeling semantics:
|
|
|
|
- **L0 — Enterprise process collection index.** Workspace-level. MUST NOT
|
|
contain executable logic. Here: `enterprise/enterprise.yaml`.
|
|
- **L1 — Domain process collection.** Composes L2/L3/L4 packages within a
|
|
domain. Here: `domains/gramatvediba/`.
|
|
- **L2 — End-to-end business process.** Composes L3/L4 packages.
|
|
Here: `processes/fg1`, `fg2`, `fg3`, `fg4`, `fg5`, `fg6` — one per
|
|
Valsts Kase function group.
|
|
- **L3 — Composed subprocess / variant.** A composition placeholder that
|
|
references one or more L4 packages.
|
|
Here: `processes/fg3-2`, `fg3-3`, `fg3-6` — the FG3 sub-processes that
|
|
were in scope for the POC but not built out to atomic executables.
|
|
- **L4 — Atomic executable process.** MUST include at least one BPMN file
|
|
and MUST include resource mappings. Cornerstones (BPMN, optional DMN,
|
|
optional CMMN, resources) live here and only here.
|
|
Here: `processes/fg3-1`, `fg3-4`, `fg3-5`.
|
|
|
|
The spec enforces strict containment of executable artefacts at L4: L1–L3
|
|
packages MUST reference lower-level packages via `includes` and MUST NOT
|
|
duplicate BPMN/DMN/CMMN files. The validator in `uapf-cli` and the
|
|
conformance rules in `05-level-composition.md` reject workspaces that
|
|
mix the layers.
|
|
|
|
## Pass 1 in detail — the transcoder
|
|
|
|
The transcoder, `tools/register-transcoder/transcode.py`, is a small Python
|
|
tool with one external dependency (`openpyxl`) plus a co-installed layout
|
|
helper `bpmn_di.py` (also runnable standalone). It locates the worksheet
|
|
and header row by content rather than by position, so it tolerates the
|
|
leading title rows the registers carry and applies unchanged to any of the
|
|
FG1–FG6 registers. It expects the standard register columns: the
|
|
predecessor block (FG-group and step-number in adjacent cells), the step's
|
|
*Nr.p.k.*, *Process, apakšprocess*, the RACI block split across the three
|
|
actor sub-columns (Nodarbinātais / Iestāde / VPC), *Darbību apraksts*,
|
|
*Izmantotā IS*, *Izpildes termiņš*, *Sagatavotie dati*, and the successor
|
|
block *Uz procesa darbības soli*.
|
|
|
|
Rows that carry a number and a name but no description and no RACI entry
|
|
are treated as sub-process headers; rows with description or any RACI entry
|
|
are treated as steps and assigned to the most recently encountered header.
|
|
For a requested sub-process the transcoder emits a single BPMN process
|
|
containing one `bpmn:userTask` per step, with the step's description,
|
|
system, SLA, RACI cells and any cross-sub-process or cross-FG references
|
|
preserved in `bpmn:documentation`; swimlanes for each actor that has steps
|
|
in the sub-process, with each step placed in the lane of its Responsible
|
|
actor; sequence flows reconstructed from the union of predecessor and
|
|
successor references whose endpoints are both inside the sub-process; and
|
|
one `bpmn:startEvent` per *entry step* (no in-group predecessor) and one
|
|
`bpmn:endEvent` per *exit step* (no in-group successor), so the fragment's
|
|
real boundary is visible rather than hidden behind synthesised gateways.
|
|
The output then has BPMN Diagram Interchange (`bpmndi:BPMNDiagram` with
|
|
`BPMNShape` and `BPMNEdge` elements) appended by `bpmn_di.py` using a
|
|
swim-lane left-to-right auto-layout, so the resulting file previews in
|
|
bpmn.io, Camunda Modeler and the ProcessGit web view without manual
|
|
positioning.
|
|
|
|
The output is `isExecutable="false"` and deliberately unembellished: no
|
|
inferred gateways, no synthesised decision logic, no compensation for
|
|
register-side inconsistencies. Reproducing register defects is a feature —
|
|
it makes the refinement step's contribution explicit.
|
|
|
|
## Pass 2 in detail — the refinement
|
|
|
|
Pass 2 takes a skeleton and authors the material the register implies but
|
|
does not encode. The operations, in roughly the order they apply:
|
|
|
|
- *Link reconciliation* — the skeleton may show reciprocal edges, short
|
|
cycles, or disjoint fragments where the register's predecessor and
|
|
successor columns disagree. The curator decides the intended topology
|
|
and rewrites flows accordingly.
|
|
- *Gateway promotion* — a task whose semantics implies a decision is split
|
|
into the evaluating step plus an explicit gateway (typically
|
|
`exclusiveGateway`) with named branches; multiple register steps that
|
|
represent alternative outcomes collapse into branches.
|
|
- *DMN extraction* — where the register's prose implies a rule table, the
|
|
decision is lifted into a separate `.dmn` file with FIRST or UNIQUE hit
|
|
policy and named inputs/outputs. The BPMN gets a `businessRuleTask` whose
|
|
decision reference points to the DMN decision.
|
|
- *Resource authoring* — `resources/roles.yaml`, `agents.yaml` and
|
|
`mappings.yaml` enumerate the responsible parties, bind roles to the
|
|
systems named in *Izmantotā IS* (HoP, RVS Horizon, ePNS, etc.), and link
|
|
BPMN tasks to roles and agents.
|
|
- *Metadata authoring* — `metadata/ownership.yaml`, `lifecycle.yaml` and
|
|
`policies.yaml` record who curated the package, its UAPF lifecycle stage,
|
|
and policy constraints (retention, signing, jurisdictional applicability).
|
|
- *Manifest assembly* — the package's `uapf.yaml` lists kind, id, name,
|
|
level (4 for atomic executables), cornerstones referencing the BPMN, DMN,
|
|
resources and metadata, exposed inputs/outputs/artifacts, and any MCP
|
|
exposure.
|
|
|
|
The refined package validates against the UAPF 2.2.0 schemas and against
|
|
`uapf-cli validate`. Once validated, the package is signable.
|
|
|
|
## Concrete comparison — 3.5.2 / FG3-4
|
|
|
|
Sub-process 3.5.2 (*Saimnieciskie norēķini un to kustība*) has three
|
|
register steps. The transcoder's `sample-output/3.5.2.skeleton.bpmn`
|
|
contains three `userTask`s in two lanes (Nodarbinātais for 3.5.2.1 and
|
|
3.5.2.2, VPC for 3.5.2.3), four sequence flows, one start event and one
|
|
end event. It makes two register-side artefacts visible. Step 3.5.2.1
|
|
(the advance request) has only external successor references — it routes
|
|
out to *FG2/2.3.2* (budget commitment) and back, and is not directly
|
|
linked to 3.5.2.2 in the register's columns, so the skeleton shows it as
|
|
a small linear fragment of its own. Steps 3.5.2.2 and 3.5.2.3 have
|
|
reciprocal predecessor/successor entries in the register — each lists the
|
|
other in both directions — so the skeleton renders a two-task cycle.
|
|
|
|
The curated `processes/fg3-4` package resolves both. Its BPMN
|
|
`Process_SaimnieciskaNorekina` has 14 nodes and 14 flows across all three
|
|
lanes, a single clean entry and exit, and a `businessRuleTask` linked to a
|
|
separate DMN. The DMN `Decision_AvansaNorekins` is a FIRST-hit decision
|
|
with five rules; its inputs are `avansaSituacija` and `avansaVeids`, and
|
|
its output `norekinResultats` takes one of four values — `slegts`,
|
|
`atmaksa`, `papildu-izmaksa`, `parnesums` — the four outcomes the register
|
|
describes in prose but does not encode in a table. The gateway downstream
|
|
of the DMN routes to the finalisation task for each outcome (close,
|
|
reclaim from employee, additional disbursement, carry forward).
|
|
|
|
Three tasks in the skeleton; 14 nodes plus a 5-rule DMN in the executable.
|
|
None of what the executable adds is in the register's columns. It is in
|
|
the register's prose, in the SLA cells, and in the normative documents the
|
|
register cites — *Grāmatvedības likums*, MK noteikumi Nr. 749, MK
|
|
noteikumi Nr. 877. Pass 2 is where that material enters.
|
|
|
|
## Concrete comparison — 3.5.3 / FG3-5
|
|
|
|
Sub-process 3.5.3 (*Komandējuma (darba brauciena) dokumenti un to
|
|
kustība*) has four register steps. The transcoder's
|
|
`sample-output/3.5.3.skeleton.bpmn` contains eight nodes and six flows.
|
|
|
|
The curated `processes/fg3-5` package likewise has 14 nodes and 15 flows
|
|
across three lanes. It introduces an explicit *cancellation branch* — a
|
|
trip annulled before settlement vs a trip proceeded with — and the DMN
|
|
`Decision_KomandejumaNorekins` whose *parnesums* rule reconciles an
|
|
advance surplus against the next approved business trip from the same
|
|
funding line. Neither the cancellation branch nor the carry-forward rule
|
|
is in the register's predecessor/successor columns; both are in the
|
|
register's prose and in the cited *Komandējuma izdevumu noteikumi*.
|
|
|
|
## Final validation pass
|
|
|
|
The workspace contains three Level 4 executable packages, three Level 3
|
|
composition stubs, six Level 2 function-group manifests, the Level 1
|
|
domain manifest, the Level 0 enterprise index, the transcoder tool, and
|
|
this methodology note. The validation pass run after the level-marker
|
|
correction (next section):
|
|
|
|
- `uapf-cli validate processes/fg3-1` → `OK: package valid`.
|
|
- `uapf-cli validate processes/fg3-4` → `OK: package valid`.
|
|
- `uapf-cli validate processes/fg3-5` → `OK: package valid`.
|
|
- All `.bpmn` and `.dmn` files in the workspace are XML well-formed.
|
|
- BPMN graph integrity: every `sequenceFlow` references existing
|
|
`sourceRef`/`targetRef` nodes; every `flowNodeRef` resolves to a defined
|
|
node; every `incoming`/`outgoing` reference is consistent with the
|
|
corresponding flow's source/target.
|
|
- All `.bpmn` files now carry BPMN Diagram Interchange — they preview
|
|
cleanly in bpmn.io, Camunda Modeler and ProcessGit's web view.
|
|
- The transcoder is byte-deterministic: re-running it on the FG3 register
|
|
for 3.5.2 and 3.5.3 reproduces the committed `sample-output/` files
|
|
exactly.
|
|
|
|
## Conformance correction — Step-4 level-labelling
|
|
|
|
An initial pass of this workspace shipped with the FG3 sub-process stubs
|
|
(`fg3-2`, `fg3-3`, `fg3-6`) marked `level: 4` by template inheritance from
|
|
`fg3-1`, with no BPMN and no resources. That fails the spec's L4
|
|
requirement — *§01-concepts: "A Level-4 package MUST include BPMN and MUST
|
|
include resources and mappings, even if minimal."*
|
|
|
|
The cause was a Step-4 design error: the FG3 sub-process packages were
|
|
created in a single sweep with the same level marker as the FG3-1
|
|
template, without checking whether each one would actually carry
|
|
executable artefacts. Three of them never would in this POC's scope; they
|
|
are composition placeholders, which the spec models as **L3** (composed
|
|
subprocess / variant — `05-level-composition.md`).
|
|
|
|
The correction is a level-marker change: `fg3-2`, `fg3-3`, `fg3-6` are
|
|
now `level: 3` with `cornerstones.bpmn: false`. Their lack of BPMN is now
|
|
spec-conformant (L1–L3 MUST NOT duplicate L4 content). The three real
|
|
executables (`fg3-1`, `fg3-4`, `fg3-5`) remain L4. The mermaid in
|
|
`05-level-composition.md` shows L2 → L3 → L4 as a typical chain, but the
|
|
spec text is explicit that the diagram is informative and that L2
|
|
packages may reference L4 directly when no intermediate composition is
|
|
needed (`fg3` `includes` references the three L4s and the three L3 stubs
|
|
in parallel, which is conformant).
|
|
|
|
## Implications for the AI regulatory sandbox
|
|
|
|
The pipeline has four properties that bear on the sandbox's evaluation.
|
|
*Provenance*: every executable step has a mechanical trace back to a
|
|
register row, and the refinement layer is recorded in ownership metadata.
|
|
A reviewer can audit either pass independently. *Versionability*: the
|
|
register is itself versioned (publication dates, changelog); the workspace
|
|
is git-versioned on processgit; the packages carry lifecycle metadata. A
|
|
register change triggers a re-transcode and a diffable refinement.
|
|
*Separability of judgement*: what the register says and what the
|
|
refinement adds are mechanically distinguishable — the skeleton is
|
|
reproducible from the register alone, and the curator's contribution is
|
|
exactly the diff. *Coverage*: the same tool applies unchanged to FG1, FG2,
|
|
FG4, FG5 and FG6 — the parser locates headers by content. Three of FG3's
|
|
nine sub-processes have curated executables in this POC (FG3-1, FG3-4,
|
|
FG3-5); the remaining six FG3 sub-processes are composition stubs; the
|
|
other five function groups are untouched.
|
|
|
|
The sandbox question is whether AI-assisted transcription of regulatory
|
|
processes into executable, signable artefacts is feasible at production
|
|
scale. This workspace is one positive existence proof — small,
|
|
end-to-end, with both passes shipped and reproducible.
|
|
|
|
## Limitations and next steps
|
|
|
|
The POC has known gaps. *Semantic validation* is structural only: the
|
|
packages validate against UAPF schemas and graph-integrity rules, not yet
|
|
against accounting-law semantics. There is no automated check that, for
|
|
instance, the *parnesums* rule complies with the relevant MK noteikumi or
|
|
that the deadlines align with *Grāmatvedības likums* 28.p(5). A second
|
|
validator layer — rule coverage against the cited statutes — is the
|
|
obvious next step. *Engine execution* is in scope for the uapf-engine in
|
|
isolation but not yet integrated with a Sledger host that would run the
|
|
packages against real accounting state. *Coverage*: the POC takes three
|
|
sub-processes to L4; production-grade coverage of FG3 alone needs six
|
|
more, and FG1, FG2, FG4, FG5, FG6 are still untouched. *Refinement
|
|
automation* is currently human-driven; quantifying the AI-assisted portion
|
|
of Pass 2 — applying an LLM to the skeleton and measuring the curator's
|
|
remaining diff — is the natural sandbox experiment that follows.
|
|
|
|
---
|
|
|
|
References inside the workspace: `tools/register-transcoder/README.md` for
|
|
the transcoder's CLI and register-format assumptions;
|
|
`processes/fg3-1`, `processes/fg3-4`, `processes/fg3-5` for the curated
|
|
executables; `tools/register-transcoder/sample-output/` for the skeletons
|
|
the comparisons in this note refer to.
|