1
0

feat(3.0.0): Algorithm Cards per UAPF v2.3.0 chapter 13

Wrap the three opaque UAPF-IP capabilities (ai.redact@1, ai.extract@1,
event.emit@1) in Algorithm Cards under algorithms/, per UAPF v2.3.0
chapter 13. Each Card supplies intent, IO contract, ownership,
validation history, risk class, audit configuration, and (where
relevant) privacy/risk extensions. Cards are referenced from resource
targets in resources/mappings.yaml.

Changes:
- NEW algorithms/pii_redactor.card.yaml — deterministic redactor
- NEW algorithms/vdvc_semantic_extractor.card.yaml — stochastic LLM
  extractor, EU AI Act high-risk, human oversight mandatory
- NEW algorithms/completion_event_emitter.card.yaml — deterministic
  CloudEvents 1.0 emitter
- uapf.yaml + manifest.json: version 2.0.0 -> 3.0.0,
  + paths.algorithms, + algorithm_cards: true
- resources/mappings.yaml: single agent.semantic-extractor target
  split into 3 algorithm-specific targets, each w/ algorithm_card ref
- bpmn/: UNCHANGED (algorithm-card refs live on resource targets,
  not in BPMN — no extension elements required)
- Removed provides_decisions from manifest (was not in SSOT manifest
  schema; DMN decisions are self-describing via the dmn/ cornerstone)
- README rewritten with algorithm-card audit-question table
This commit is contained in:
2026-05-20 12:34:59 +00:00
parent dd69a04355
commit 82fd21a45d
7 changed files with 372 additions and 83 deletions

127
README.md
View File

@@ -1,68 +1,79 @@
# Semantic Document Analysis
A UAPF Level-4 process package for extracting VDVC-conformant semantic
metadata from free-text documents.
UAPF Level-4 process for semantic analysis of free-text documents,
governed by **UAPF v2.3.0** (Algorithm Cards).
## What this package is
## What this package does
A **real, inspectable process** — not a single AI call in BPMN costume.
The flow has six executable nodes; three of them are DMN decision tables
that carry the actual algorithm, with explicit ranked rules and weights.
Three BPMN service tasks invoke three UAPF-IP host capabilities:
| Task | Capability | Algorithm Card |
|-----------------------|----------------|---------------------------------------------------------------------|
| `Task_DetectRedactPii`| `ai.redact@1` | [`algorithms/pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) |
| `Task_ExtractSemantics`| `ai.extract@1`| [`algorithms/vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) |
| `Task_EmitResult` | `event.emit@1` | [`algorithms/completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) |
Three DMN decision tables encode the deterministic policy:
- `assess-personal-data-risk` — PII regex signals → risk level
- `gdpr-processing-route` — selects CENTRAL vs LOCAL processing,
anonymisation, redaction level
- `human-validation-gate` — confidence thresholds → REJECTED /
PENDING_REVIEW / APPROVED_AUTO
Only `Task_ExtractSemantics` is a model-inference step (governed by the
high-risk `vdvc_semantic_extractor` Card). Everything else is
deterministic.
## v3.0.0 — Algorithm Cards
The three opaque host capabilities are now wrapped in Algorithm Cards
under `algorithms/`. Each Card supplies, per UAPF v2.3.0 chapter 13:
intent, IO contract, ownership, validation history, risk class, audit
configuration, and (where relevant) `privacy` and `risk` extensions.
Audit question → answer-location:
| Auditor asks | Read this |
|-----------------------------------------------|------------------------------------------------|
| What does the redactor detect? | `algorithms/pii_redactor.card.yaml` § io |
| What's the AI Act risk class of the extractor?| `vdvc_semantic_extractor.card.yaml` § risk |
| Who owns each algorithm? | each Card § owners |
| When was each algorithm last validated? | each Card § validation |
| What gets logged, with what retention? | each Card § audit |
| Why is human oversight needed? | `vdvc_semantic_extractor.card.yaml` § confidence |
### Delta from v2.0.0
- **+** `algorithms/` folder with three Cards (one per opaque host capability).
- **+** `algorithm_cards: true` and `paths.algorithms` in `uapf.yaml` / `manifest.json`.
- **~** `resources/mappings.yaml`: single `agent.semantic-extractor` target split into three algorithm-specific targets (`agent.pii_redactor`, `agent.vdvc_semantic_extractor`, `agent.completion_event_emitter`), each carrying its `algorithm_card` reference. Binding shape unchanged.
- **~** `bpmn/semantic-document-analysis.bpmn`: **unchanged**. Algorithm Cards live on resource targets, not in the BPMN — no extension elements required.
- **−** `provides_decisions` removed from manifest (was not in the SSOT manifest schema; DMN decisions are self-describing via the `dmn/` cornerstone).
## Structure
```
Start
-> [service] Detect and redact PII ai.redact@1
-> [decision] Assess personal-data risk DMN assess-personal-data-risk
-> [decision] Decide GDPR processing route DMN gdpr-processing-route
-> [service] Extract semantic metadata ai.extract@1
-> [decision] Determine validation status DMN human-validation-gate
-> [service] Emit completed event event.emit@1
End
.
├── uapf.yaml + manifest.json # Package manifest (UAPF v2.3.0)
├── bpmn/ # 1 BPMN process (unchanged from v2.0.0)
├── dmn/ # 3 DMN decision tables (unchanged from v2.0.0)
├── algorithms/ # 3 Algorithm Cards (NEW in v3.0.0)
├── resources/
├── mappings.yaml # Resource targets w/ algorithm_card refs (REFACTORED)
│ ├── guardrails.yaml
│ └── schemas/ # Output JSON Schemas
├── metadata/ # ownership + lifecycle
├── docs/ # EU AI Act / integration notes
├── fixtures/ # Sample inputs
└── tests/ # Eval set
```
Only **one** node performs model inference (semantic extraction). PII
detection, risk classification, GDPR routing and the human-validation
gate are deterministic — the host cannot make them up.
## Validation
## The decision tables (dmn/)
Validates against UAPF v2.3.0 schemas at
`github.com/UAPFormat/UAPF-specification`:
### assess-personal-data-risk
PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces
HIGH; two or more PII categories, or contact data, gives MEDIUM; one
category LOW; nothing NONE. Hit policy FIRST (ranked).
### gdpr-processing-route
`personalDataRisk` x `allowCentralization` -> `processingRoute`
(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A
sensitive document whose owner has not permitted centralisation stays
LOCAL with full redaction. This is the routing rule lifted out of the
host's `generate_semantic_metadata`.
### human-validation-gate
`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` ->
`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED;
below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.
## Capabilities required of the host
| Capability | Used by | Purpose |
|----------------|------------------------|----------------------------------|
| ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals |
| ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction |
| event.emit@1 | Task_EmitResult | Publish completion CloudEvent |
DMN decisions need no host capability — the runtime evaluates them.
## Output contract
`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1
output. The process additionally yields the DMN-decided fields
(`personalDataRisk`, `processingRoute`, `redactionLevel`,
`humanValidationStatus`, `requiresHumanReview`).
## Compliance
EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
minimisation. See `resources/guardrails.yaml` and `docs/`.
```bash
python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
```