feat(3.0.0): Algorithm Cards per UAPF v2.3.0 chapter 13

Wrap the three opaque UAPF-IP capabilities (ai.redact@1, ai.extract@1, event.emit@1) in Algorithm Cards under algorithms/, per UAPF v2.3.0 chapter 13. Each Card supplies intent, IO contract, ownership, validation history, risk class, audit configuration, and (where relevant) privacy/risk extensions. Cards are referenced from resource targets in resources/mappings.yaml. Changes: - NEW algorithms/pii_redactor.card.yaml — deterministic redactor - NEW algorithms/vdvc_semantic_extractor.card.yaml — stochastic LLM extractor, EU AI Act high-risk, human oversight mandatory - NEW algorithms/completion_event_emitter.card.yaml — deterministic CloudEvents 1.0 emitter - uapf.yaml + manifest.json: version 2.0.0 -> 3.0.0, + paths.algorithms, + algorithm_cards: true - resources/mappings.yaml: single agent.semantic-extractor target split into 3 algorithm-specific targets, each w/ algorithm_card ref - bpmn/: UNCHANGED (algorithm-card refs live on resource targets, not in BPMN — no extension elements required) - Removed provides_decisions from manifest (was not in SSOT manifest schema; DMN decisions are self-describing via the dmn/ cornerstone) - README rewritten with algorithm-card audit-question table
2026-05-20 12:34:59 +00:00
parent dd69a04355
commit 82fd21a45d
7 changed files with 372 additions and 83 deletions
--- a/README.md
+++ b/README.md
@@ -1,68 +1,79 @@
 # Semantic Document Analysis

-A UAPF Level-4 process package for extracting VDVC-conformant semantic
-metadata from free-text documents.
+UAPF Level-4 process for semantic analysis of free-text documents,
+governed by **UAPF v2.3.0** (Algorithm Cards).

-## What this package is
+## What this package does

-A **real, inspectable process** — not a single AI call in BPMN costume.
-The flow has six executable nodes; three of them are DMN decision tables
-that carry the actual algorithm, with explicit ranked rules and weights.
+Three BPMN service tasks invoke three UAPF-IP host capabilities:
+
+| Task                  | Capability     | Algorithm Card                                                      |
+|-----------------------|----------------|---------------------------------------------------------------------|
+| `Task_DetectRedactPii`| `ai.redact@1`  | [`algorithms/pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) |
+| `Task_ExtractSemantics`| `ai.extract@1`| [`algorithms/vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) |
+| `Task_EmitResult`     | `event.emit@1` | [`algorithms/completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) |
+
+Three DMN decision tables encode the deterministic policy:
+
+- `assess-personal-data-risk` — PII regex signals → risk level
+- `gdpr-processing-route` — selects CENTRAL vs LOCAL processing,
+  anonymisation, redaction level
+- `human-validation-gate` — confidence thresholds → REJECTED /
+  PENDING_REVIEW / APPROVED_AUTO
+
+Only `Task_ExtractSemantics` is a model-inference step (governed by the
+high-risk `vdvc_semantic_extractor` Card). Everything else is
+deterministic.
+
+## v3.0.0 — Algorithm Cards
+
+The three opaque host capabilities are now wrapped in Algorithm Cards
+under `algorithms/`. Each Card supplies, per UAPF v2.3.0 chapter 13:
+intent, IO contract, ownership, validation history, risk class, audit
+configuration, and (where relevant) `privacy` and `risk` extensions.
+
+Audit question → answer-location:
+
+| Auditor asks                                  | Read this                                      |
+|-----------------------------------------------|------------------------------------------------|
+| What does the redactor detect?                | `algorithms/pii_redactor.card.yaml` § io       |
+| What's the AI Act risk class of the extractor?| `vdvc_semantic_extractor.card.yaml` § risk     |
+| Who owns each algorithm?                      | each Card § owners                             |
+| When was each algorithm last validated?       | each Card § validation                         |
+| What gets logged, with what retention?        | each Card § audit                              |
+| Why is human oversight needed?                | `vdvc_semantic_extractor.card.yaml` § confidence |
+
+### Delta from v2.0.0
+
+- **+** `algorithms/` folder with three Cards (one per opaque host capability).
+- **+** `algorithm_cards: true` and `paths.algorithms` in `uapf.yaml` / `manifest.json`.
+- **~** `resources/mappings.yaml`: single `agent.semantic-extractor` target split into three algorithm-specific targets (`agent.pii_redactor`, `agent.vdvc_semantic_extractor`, `agent.completion_event_emitter`), each carrying its `algorithm_card` reference. Binding shape unchanged.
+- **~** `bpmn/semantic-document-analysis.bpmn`: **unchanged**. Algorithm Cards live on resource targets, not in the BPMN — no extension elements required.
+- **−** `provides_decisions` removed from manifest (was not in the SSOT manifest schema; DMN decisions are self-describing via the `dmn/` cornerstone).
+
+## Structure

 ```
-Start
-  -> [service]  Detect and redact PII          ai.redact@1
-  -> [decision] Assess personal-data risk      DMN assess-personal-data-risk
-  -> [decision] Decide GDPR processing route   DMN gdpr-processing-route
-  -> [service]  Extract semantic metadata      ai.extract@1
-  -> [decision] Determine validation status    DMN human-validation-gate
-  -> [service]  Emit completed event           event.emit@1
-End
+.
+├── uapf.yaml + manifest.json     # Package manifest (UAPF v2.3.0)
+├── bpmn/                          # 1 BPMN process (unchanged from v2.0.0)
+├── dmn/                           # 3 DMN decision tables (unchanged from v2.0.0)
+├── algorithms/                    # 3 Algorithm Cards (NEW in v3.0.0)
+├── resources/
+│   ├── mappings.yaml              # Resource targets w/ algorithm_card refs (REFACTORED)
+│   ├── guardrails.yaml
+│   └── schemas/                   # Output JSON Schemas
+├── metadata/                      # ownership + lifecycle
+├── docs/                          # EU AI Act / integration notes
+├── fixtures/                      # Sample inputs
+└── tests/                         # Eval set
 ```

-Only **one** node performs model inference (semantic extraction). PII
-detection, risk classification, GDPR routing and the human-validation
-gate are deterministic — the host cannot make them up.
+## Validation

-## The decision tables (dmn/)
+Validates against UAPF v2.3.0 schemas at
+`github.com/UAPFormat/UAPF-specification`:

-### assess-personal-data-risk
-PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces
-HIGH; two or more PII categories, or contact data, gives MEDIUM; one
-category LOW; nothing NONE. Hit policy FIRST (ranked).
-
-### gdpr-processing-route
-`personalDataRisk` x `allowCentralization` -> `processingRoute`
-(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A
-sensitive document whose owner has not permitted centralisation stays
-LOCAL with full redaction. This is the routing rule lifted out of the
-host's `generate_semantic_metadata`.
-
-### human-validation-gate
-`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` ->
-`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
-`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED;
-below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
-APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.
-
-## Capabilities required of the host
-
-| Capability     | Used by                | Purpose                          |
-|----------------|------------------------|----------------------------------|
-| ai.redact@1    | Task_DetectRedactPii   | Mask PII + return regex signals  |
-| ai.extract@1   | Task_ExtractSemantics  | VDVC semantic extraction         |
-| event.emit@1   | Task_EmitResult        | Publish completion CloudEvent    |
-
-DMN decisions need no host capability — the runtime evaluates them.
-
-## Output contract
-
-`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1
-output. The process additionally yields the DMN-decided fields
-(`personalDataRisk`, `processingRoute`, `redactionLevel`,
-`humanValidationStatus`, `requiresHumanReview`).
-
-## Compliance
-
-EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
-minimisation. See `resources/guardrails.yaml` and `docs/`.
+```bash
+python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
+```