You've already forked dokumenta-semantiska-analize
Import UAPF package
feat(3.0.0): Algorithm Cards per UAPF v2.3.0 chapter 13
Wrap the three opaque UAPF-IP capabilities (ai.redact@1, ai.extract@1, event.emit@1) in Algorithm Cards under algorithms/, per UAPF v2.3.0 chapter 13. Each Card supplies intent, IO contract, ownership, validation history, risk class, audit configuration, and (where relevant) privacy/risk extensions. Cards are referenced from resource targets in resources/mappings.yaml. Changes: - NEW algorithms/pii_redactor.card.yaml — deterministic redactor - NEW algorithms/vdvc_semantic_extractor.card.yaml — stochastic LLM extractor, EU AI Act high-risk, human oversight mandatory - NEW algorithms/completion_event_emitter.card.yaml — deterministic CloudEvents 1.0 emitter - uapf.yaml + manifest.json: version 2.0.0 -> 3.0.0, + paths.algorithms, + algorithm_cards: true - resources/mappings.yaml: single agent.semantic-extractor target split into 3 algorithm-specific targets, each w/ algorithm_card ref - bpmn/: UNCHANGED (algorithm-card refs live on resource targets, not in BPMN — no extension elements required) - Removed provides_decisions from manifest (was not in SSOT manifest schema; DMN decisions are self-describing via the dmn/ cornerstone) - README rewritten with algorithm-card audit-question table
This commit is contained in:
127
README.md
127
README.md
@@ -1,68 +1,79 @@
|
||||
# Semantic Document Analysis
|
||||
|
||||
A UAPF Level-4 process package for extracting VDVC-conformant semantic
|
||||
metadata from free-text documents.
|
||||
UAPF Level-4 process for semantic analysis of free-text documents,
|
||||
governed by **UAPF v2.3.0** (Algorithm Cards).
|
||||
|
||||
## What this package is
|
||||
## What this package does
|
||||
|
||||
A **real, inspectable process** — not a single AI call in BPMN costume.
|
||||
The flow has six executable nodes; three of them are DMN decision tables
|
||||
that carry the actual algorithm, with explicit ranked rules and weights.
|
||||
Three BPMN service tasks invoke three UAPF-IP host capabilities:
|
||||
|
||||
| Task | Capability | Algorithm Card |
|
||||
|-----------------------|----------------|---------------------------------------------------------------------|
|
||||
| `Task_DetectRedactPii`| `ai.redact@1` | [`algorithms/pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) |
|
||||
| `Task_ExtractSemantics`| `ai.extract@1`| [`algorithms/vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) |
|
||||
| `Task_EmitResult` | `event.emit@1` | [`algorithms/completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) |
|
||||
|
||||
Three DMN decision tables encode the deterministic policy:
|
||||
|
||||
- `assess-personal-data-risk` — PII regex signals → risk level
|
||||
- `gdpr-processing-route` — selects CENTRAL vs LOCAL processing,
|
||||
anonymisation, redaction level
|
||||
- `human-validation-gate` — confidence thresholds → REJECTED /
|
||||
PENDING_REVIEW / APPROVED_AUTO
|
||||
|
||||
Only `Task_ExtractSemantics` is a model-inference step (governed by the
|
||||
high-risk `vdvc_semantic_extractor` Card). Everything else is
|
||||
deterministic.
|
||||
|
||||
## v3.0.0 — Algorithm Cards
|
||||
|
||||
The three opaque host capabilities are now wrapped in Algorithm Cards
|
||||
under `algorithms/`. Each Card supplies, per UAPF v2.3.0 chapter 13:
|
||||
intent, IO contract, ownership, validation history, risk class, audit
|
||||
configuration, and (where relevant) `privacy` and `risk` extensions.
|
||||
|
||||
Audit question → answer-location:
|
||||
|
||||
| Auditor asks | Read this |
|
||||
|-----------------------------------------------|------------------------------------------------|
|
||||
| What does the redactor detect? | `algorithms/pii_redactor.card.yaml` § io |
|
||||
| What's the AI Act risk class of the extractor?| `vdvc_semantic_extractor.card.yaml` § risk |
|
||||
| Who owns each algorithm? | each Card § owners |
|
||||
| When was each algorithm last validated? | each Card § validation |
|
||||
| What gets logged, with what retention? | each Card § audit |
|
||||
| Why is human oversight needed? | `vdvc_semantic_extractor.card.yaml` § confidence |
|
||||
|
||||
### Delta from v2.0.0
|
||||
|
||||
- **+** `algorithms/` folder with three Cards (one per opaque host capability).
|
||||
- **+** `algorithm_cards: true` and `paths.algorithms` in `uapf.yaml` / `manifest.json`.
|
||||
- **~** `resources/mappings.yaml`: single `agent.semantic-extractor` target split into three algorithm-specific targets (`agent.pii_redactor`, `agent.vdvc_semantic_extractor`, `agent.completion_event_emitter`), each carrying its `algorithm_card` reference. Binding shape unchanged.
|
||||
- **~** `bpmn/semantic-document-analysis.bpmn`: **unchanged**. Algorithm Cards live on resource targets, not in the BPMN — no extension elements required.
|
||||
- **−** `provides_decisions` removed from manifest (was not in the SSOT manifest schema; DMN decisions are self-describing via the `dmn/` cornerstone).
|
||||
|
||||
## Structure
|
||||
|
||||
```
|
||||
Start
|
||||
-> [service] Detect and redact PII ai.redact@1
|
||||
-> [decision] Assess personal-data risk DMN assess-personal-data-risk
|
||||
-> [decision] Decide GDPR processing route DMN gdpr-processing-route
|
||||
-> [service] Extract semantic metadata ai.extract@1
|
||||
-> [decision] Determine validation status DMN human-validation-gate
|
||||
-> [service] Emit completed event event.emit@1
|
||||
End
|
||||
.
|
||||
├── uapf.yaml + manifest.json # Package manifest (UAPF v2.3.0)
|
||||
├── bpmn/ # 1 BPMN process (unchanged from v2.0.0)
|
||||
├── dmn/ # 3 DMN decision tables (unchanged from v2.0.0)
|
||||
├── algorithms/ # 3 Algorithm Cards (NEW in v3.0.0)
|
||||
├── resources/
|
||||
│ ├── mappings.yaml # Resource targets w/ algorithm_card refs (REFACTORED)
|
||||
│ ├── guardrails.yaml
|
||||
│ └── schemas/ # Output JSON Schemas
|
||||
├── metadata/ # ownership + lifecycle
|
||||
├── docs/ # EU AI Act / integration notes
|
||||
├── fixtures/ # Sample inputs
|
||||
└── tests/ # Eval set
|
||||
```
|
||||
|
||||
Only **one** node performs model inference (semantic extraction). PII
|
||||
detection, risk classification, GDPR routing and the human-validation
|
||||
gate are deterministic — the host cannot make them up.
|
||||
## Validation
|
||||
|
||||
## The decision tables (dmn/)
|
||||
Validates against UAPF v2.3.0 schemas at
|
||||
`github.com/UAPFormat/UAPF-specification`:
|
||||
|
||||
### assess-personal-data-risk
|
||||
PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces
|
||||
HIGH; two or more PII categories, or contact data, gives MEDIUM; one
|
||||
category LOW; nothing NONE. Hit policy FIRST (ranked).
|
||||
|
||||
### gdpr-processing-route
|
||||
`personalDataRisk` x `allowCentralization` -> `processingRoute`
|
||||
(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A
|
||||
sensitive document whose owner has not permitted centralisation stays
|
||||
LOCAL with full redaction. This is the routing rule lifted out of the
|
||||
host's `generate_semantic_metadata`.
|
||||
|
||||
### human-validation-gate
|
||||
`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` ->
|
||||
`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
|
||||
`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED;
|
||||
below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
|
||||
APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.
|
||||
|
||||
## Capabilities required of the host
|
||||
|
||||
| Capability | Used by | Purpose |
|
||||
|----------------|------------------------|----------------------------------|
|
||||
| ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals |
|
||||
| ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction |
|
||||
| event.emit@1 | Task_EmitResult | Publish completion CloudEvent |
|
||||
|
||||
DMN decisions need no host capability — the runtime evaluates them.
|
||||
|
||||
## Output contract
|
||||
|
||||
`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1
|
||||
output. The process additionally yields the DMN-decided fields
|
||||
(`personalDataRisk`, `processingRoute`, `redactionLevel`,
|
||||
`humanValidationStatus`, `requiresHumanReview`).
|
||||
|
||||
## Compliance
|
||||
|
||||
EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
|
||||
minimisation. See `resources/guardrails.yaml` and `docs/`.
|
||||
```bash
|
||||
python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user