dokumenta-semantiska-analize/README.md

# Semantic Document Analysis

A UAPF Level-4 process package for extracting VDVC-conformant semantic
metadata from free-text documents.

## What this package is

A **real, inspectable process** — not a single AI call in BPMN costume.
The flow has six executable nodes; three of them are DMN decision tables
that carry the actual algorithm, with explicit ranked rules and weights.

```
Start
  -> [service]  Detect and redact PII          ai.redact@1
  -> [decision] Assess personal-data risk      DMN assess-personal-data-risk
  -> [decision] Decide GDPR processing route   DMN gdpr-processing-route
  -> [service]  Extract semantic metadata      ai.extract@1
  -> [decision] Determine validation status    DMN human-validation-gate
  -> [service]  Emit completed event           event.emit@1
End
```

Only **one** node performs model inference (semantic extraction). PII
detection, risk classification, GDPR routing and the human-validation
gate are deterministic — the host cannot make them up.

## The decision tables (dmn/)

### assess-personal-data-risk
PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces
HIGH; two or more PII categories, or contact data, gives MEDIUM; one
category LOW; nothing NONE. Hit policy FIRST (ranked).

### gdpr-processing-route
`personalDataRisk` x `allowCentralization` -> `processingRoute`
(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A
sensitive document whose owner has not permitted centralisation stays
LOCAL with full redaction. This is the routing rule lifted out of the
host's `generate_semantic_metadata`.

### human-validation-gate
`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` ->
`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED;
below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.

## Capabilities required of the host

| Capability     | Used by                | Purpose                          |
|----------------|------------------------|----------------------------------|
| ai.redact@1    | Task_DetectRedactPii   | Mask PII + return regex signals  |
| ai.extract@1   | Task_ExtractSemantics  | VDVC semantic extraction         |
| event.emit@1   | Task_EmitResult        | Publish completion CloudEvent    |

DMN decisions need no host capability — the runtime evaluates them.

## Output contract

`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1
output. The process additionally yields the DMN-decided fields
(`personalDataRisk`, `processingRoute`, `redactionLevel`,
`humanValidationStatus`, `requiresHumanReview`).

## Compliance

EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
minimisation. See `resources/guardrails.yaml` and `docs/`.