The 1.x package was a single ai.extract call wrapped in three BPMN
service tasks. No decision logic, no dmn cornerstone, no weights — the
risk/routing/validation algorithm lived invisibly in host code. There
was nothing for a runtime to actually execute.
2.0.0 makes it a real process:
- dmn cornerstone added with three decision tables:
* assess-personal-data-risk — PII regex signals -> risk level
* gdpr-processing-route — risk x centralisation -> CENTRAL/LOCAL,
anonymisation, redaction level
* human-validation-gate — confidence thresholds + PII re-scan
-> REJECTED/PENDING_REVIEW/APPROVED_AUTO
- BPMN expanded 3 -> 6 nodes (3 serviceTask + 3 businessRuleTask),
with horizontal DI.
- Task ids, mappings, docs, manifest (dmn:true), uapf.yaml, lifecycle
and eval-set updated; added a PII-bearing fixture.
Only the semantic extraction remains a model step. Risk classification,
GDPR routing and validation gating are now explicit ranked DMN rules —
inspectable, versioned, portable. Breaking change: structure + outputs.
2.8 KiB
Semantic Document Analysis
A UAPF Level-4 process package for extracting VDVC-conformant semantic metadata from free-text documents.
What this package is
A real, inspectable process — not a single AI call in BPMN costume. The flow has six executable nodes; three of them are DMN decision tables that carry the actual algorithm, with explicit ranked rules and weights.
Start
-> [service] Detect and redact PII ai.redact@1
-> [decision] Assess personal-data risk DMN assess-personal-data-risk
-> [decision] Decide GDPR processing route DMN gdpr-processing-route
-> [service] Extract semantic metadata ai.extract@1
-> [decision] Determine validation status DMN human-validation-gate
-> [service] Emit completed event event.emit@1
End
Only one node performs model inference (semantic extraction). PII detection, risk classification, GDPR routing and the human-validation gate are deterministic — the host cannot make them up.
The decision tables (dmn/)
assess-personal-data-risk
PII regex signals -> personalDataRisk. Personas kods or IBAN forces
HIGH; two or more PII categories, or contact data, gives MEDIUM; one
category LOW; nothing NONE. Hit policy FIRST (ranked).
gdpr-processing-route
personalDataRisk x allowCentralization -> processingRoute
(CENTRAL | LOCAL), anonymizationRequired, redactionLevel. A
sensitive document whose owner has not permitted centralisation stays
LOCAL with full redaction. This is the routing rule lifted out of the
host's generate_semantic_metadata.
human-validation-gate
outputPiiErrorCount, aiConfidenceScore, personalDataRisk ->
humanValidationStatus (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
requiresHumanReview. Any leaked PII or confidence below 0.3 -> REJECTED;
below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.
Capabilities required of the host
| Capability | Used by | Purpose |
|---|---|---|
| ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals |
| ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction |
| event.emit@1 | Task_EmitResult | Publish completion CloudEvent |
DMN decisions need no host capability — the runtime evaluates them.
Output contract
resources/schemas/vdvc-semantic-summary.schema.json — the ai.extract@1
output. The process additionally yields the DMN-decided fields
(personalDataRisk, processingRoute, redactionLevel,
humanValidationStatus, requiresHumanReview).
Compliance
EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
minimisation. See resources/guardrails.yaml and docs/.