# Semantic Document Analysis UAPF Level-4 process for semantic analysis of free-text documents, governed by **UAPF v2.3.0** (Algorithm Cards). ## What this package does Three BPMN service tasks invoke three UAPF-IP host capabilities: | Task | Capability | Algorithm Card | |-----------------------|----------------|---------------------------------------------------------------------| | `Task_DetectRedactPii`| `ai.redact@1` | [`algorithms/pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) | | `Task_ExtractSemantics`| `ai.extract@1`| [`algorithms/vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) | | `Task_EmitResult` | `event.emit@1` | [`algorithms/completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) | Three DMN decision tables encode the deterministic policy: - `assess-personal-data-risk` — PII regex signals → risk level - `gdpr-processing-route` — selects CENTRAL vs LOCAL processing, anonymisation, redaction level - `human-validation-gate` — confidence thresholds → REJECTED / PENDING_REVIEW / APPROVED_AUTO Only `Task_ExtractSemantics` is a model-inference step (governed by the high-risk `vdvc_semantic_extractor` Card). Everything else is deterministic. ## v3.0.0 — Algorithm Cards The three opaque host capabilities are now wrapped in Algorithm Cards under `algorithms/`. Each Card supplies, per UAPF v2.3.0 chapter 13: intent, IO contract, ownership, validation history, risk class, audit configuration, and (where relevant) `privacy` and `risk` extensions. Audit question → answer-location: | Auditor asks | Read this | |-----------------------------------------------|------------------------------------------------| | What does the redactor detect? | `algorithms/pii_redactor.card.yaml` § io | | What's the AI Act risk class of the extractor?| `vdvc_semantic_extractor.card.yaml` § risk | | Who owns each algorithm? | each Card § owners | | When was each algorithm last validated? | each Card § validation | | What gets logged, with what retention? | each Card § audit | | Why is human oversight needed? | `vdvc_semantic_extractor.card.yaml` § confidence | ### Delta from v2.0.0 - **+** `algorithms/` folder with three Cards (one per opaque host capability). - **+** `algorithm_cards: true` and `paths.algorithms` in `uapf.yaml` / `manifest.json`. - **~** `resources/mappings.yaml`: single `agent.semantic-extractor` target split into three algorithm-specific targets (`agent.pii_redactor`, `agent.vdvc_semantic_extractor`, `agent.completion_event_emitter`), each carrying its `algorithm_card` reference. Binding shape unchanged. - **~** `bpmn/semantic-document-analysis.bpmn`: **unchanged**. Algorithm Cards live on resource targets, not in the BPMN — no extension elements required. - **−** `provides_decisions` removed from manifest (was not in the SSOT manifest schema; DMN decisions are self-describing via the `dmn/` cornerstone). ## Structure ``` . ├── uapf.yaml + manifest.json # Package manifest (UAPF v2.3.0) ├── bpmn/ # 1 BPMN process (unchanged from v2.0.0) ├── dmn/ # 3 DMN decision tables (unchanged from v2.0.0) ├── algorithms/ # 3 Algorithm Cards (NEW in v3.0.0) ├── resources/ │ ├── mappings.yaml # Resource targets w/ algorithm_card refs (REFACTORED) │ ├── guardrails.yaml │ └── schemas/ # Output JSON Schemas ├── metadata/ # ownership + lifecycle ├── docs/ # EU AI Act / integration notes ├── fixtures/ # Sample inputs └── tests/ # Eval set ``` ## Validation Validates against UAPF v2.3.0 schemas at `github.com/UAPFormat/UAPF-specification`: ```bash python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize ```