1
0
Files
dokumenta-semantiska-analize/README.md
UAPF Steward dd69a04355 rewrite 2.0.0: real process — extract the algorithm into DMN
The 1.x package was a single ai.extract call wrapped in three BPMN
service tasks. No decision logic, no dmn cornerstone, no weights — the
risk/routing/validation algorithm lived invisibly in host code. There
was nothing for a runtime to actually execute.

2.0.0 makes it a real process:

- dmn cornerstone added with three decision tables:
  * assess-personal-data-risk  — PII regex signals -> risk level
  * gdpr-processing-route      — risk x centralisation -> CENTRAL/LOCAL,
                                  anonymisation, redaction level
  * human-validation-gate      — confidence thresholds + PII re-scan
                                  -> REJECTED/PENDING_REVIEW/APPROVED_AUTO
- BPMN expanded 3 -> 6 nodes (3 serviceTask + 3 businessRuleTask),
  with horizontal DI.
- Task ids, mappings, docs, manifest (dmn:true), uapf.yaml, lifecycle
  and eval-set updated; added a PII-bearing fixture.

Only the semantic extraction remains a model step. Risk classification,
GDPR routing and validation gating are now explicit ranked DMN rules —
inspectable, versioned, portable. Breaking change: structure + outputs.
2026-05-17 20:00:36 +00:00

69 lines
2.8 KiB
Markdown

# Semantic Document Analysis
A UAPF Level-4 process package for extracting VDVC-conformant semantic
metadata from free-text documents.
## What this package is
A **real, inspectable process** — not a single AI call in BPMN costume.
The flow has six executable nodes; three of them are DMN decision tables
that carry the actual algorithm, with explicit ranked rules and weights.
```
Start
-> [service] Detect and redact PII ai.redact@1
-> [decision] Assess personal-data risk DMN assess-personal-data-risk
-> [decision] Decide GDPR processing route DMN gdpr-processing-route
-> [service] Extract semantic metadata ai.extract@1
-> [decision] Determine validation status DMN human-validation-gate
-> [service] Emit completed event event.emit@1
End
```
Only **one** node performs model inference (semantic extraction). PII
detection, risk classification, GDPR routing and the human-validation
gate are deterministic — the host cannot make them up.
## The decision tables (dmn/)
### assess-personal-data-risk
PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces
HIGH; two or more PII categories, or contact data, gives MEDIUM; one
category LOW; nothing NONE. Hit policy FIRST (ranked).
### gdpr-processing-route
`personalDataRisk` x `allowCentralization` -> `processingRoute`
(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A
sensitive document whose owner has not permitted centralisation stays
LOCAL with full redaction. This is the routing rule lifted out of the
host's `generate_semantic_metadata`.
### human-validation-gate
`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` ->
`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED;
below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.
## Capabilities required of the host
| Capability | Used by | Purpose |
|----------------|------------------------|----------------------------------|
| ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals |
| ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction |
| event.emit@1 | Task_EmitResult | Publish completion CloudEvent |
DMN decisions need no host capability — the runtime evaluates them.
## Output contract
`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1
output. The process additionally yields the DMN-decided fields
(`personalDataRisk`, `processingRoute`, `redactionLevel`,
`humanValidationStatus`, `requiresHumanReview`).
## Compliance
EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
minimisation. See `resources/guardrails.yaml` and `docs/`.