You've already forked dokumenta-semantiska-analize
Import UAPF package
Wrap the three opaque UAPF-IP capabilities (ai.redact@1, ai.extract@1, event.emit@1) in Algorithm Cards under algorithms/, per UAPF v2.3.0 chapter 13. Each Card supplies intent, IO contract, ownership, validation history, risk class, audit configuration, and (where relevant) privacy/risk extensions. Cards are referenced from resource targets in resources/mappings.yaml. Changes: - NEW algorithms/pii_redactor.card.yaml — deterministic redactor - NEW algorithms/vdvc_semantic_extractor.card.yaml — stochastic LLM extractor, EU AI Act high-risk, human oversight mandatory - NEW algorithms/completion_event_emitter.card.yaml — deterministic CloudEvents 1.0 emitter - uapf.yaml + manifest.json: version 2.0.0 -> 3.0.0, + paths.algorithms, + algorithm_cards: true - resources/mappings.yaml: single agent.semantic-extractor target split into 3 algorithm-specific targets, each w/ algorithm_card ref - bpmn/: UNCHANGED (algorithm-card refs live on resource targets, not in BPMN — no extension elements required) - Removed provides_decisions from manifest (was not in SSOT manifest schema; DMN decisions are self-describing via the dmn/ cornerstone) - README rewritten with algorithm-card audit-question table
80 lines
4.2 KiB
Markdown
80 lines
4.2 KiB
Markdown
# Semantic Document Analysis
|
|
|
|
UAPF Level-4 process for semantic analysis of free-text documents,
|
|
governed by **UAPF v2.3.0** (Algorithm Cards).
|
|
|
|
## What this package does
|
|
|
|
Three BPMN service tasks invoke three UAPF-IP host capabilities:
|
|
|
|
| Task | Capability | Algorithm Card |
|
|
|-----------------------|----------------|---------------------------------------------------------------------|
|
|
| `Task_DetectRedactPii`| `ai.redact@1` | [`algorithms/pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) |
|
|
| `Task_ExtractSemantics`| `ai.extract@1`| [`algorithms/vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) |
|
|
| `Task_EmitResult` | `event.emit@1` | [`algorithms/completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) |
|
|
|
|
Three DMN decision tables encode the deterministic policy:
|
|
|
|
- `assess-personal-data-risk` — PII regex signals → risk level
|
|
- `gdpr-processing-route` — selects CENTRAL vs LOCAL processing,
|
|
anonymisation, redaction level
|
|
- `human-validation-gate` — confidence thresholds → REJECTED /
|
|
PENDING_REVIEW / APPROVED_AUTO
|
|
|
|
Only `Task_ExtractSemantics` is a model-inference step (governed by the
|
|
high-risk `vdvc_semantic_extractor` Card). Everything else is
|
|
deterministic.
|
|
|
|
## v3.0.0 — Algorithm Cards
|
|
|
|
The three opaque host capabilities are now wrapped in Algorithm Cards
|
|
under `algorithms/`. Each Card supplies, per UAPF v2.3.0 chapter 13:
|
|
intent, IO contract, ownership, validation history, risk class, audit
|
|
configuration, and (where relevant) `privacy` and `risk` extensions.
|
|
|
|
Audit question → answer-location:
|
|
|
|
| Auditor asks | Read this |
|
|
|-----------------------------------------------|------------------------------------------------|
|
|
| What does the redactor detect? | `algorithms/pii_redactor.card.yaml` § io |
|
|
| What's the AI Act risk class of the extractor?| `vdvc_semantic_extractor.card.yaml` § risk |
|
|
| Who owns each algorithm? | each Card § owners |
|
|
| When was each algorithm last validated? | each Card § validation |
|
|
| What gets logged, with what retention? | each Card § audit |
|
|
| Why is human oversight needed? | `vdvc_semantic_extractor.card.yaml` § confidence |
|
|
|
|
### Delta from v2.0.0
|
|
|
|
- **+** `algorithms/` folder with three Cards (one per opaque host capability).
|
|
- **+** `algorithm_cards: true` and `paths.algorithms` in `uapf.yaml` / `manifest.json`.
|
|
- **~** `resources/mappings.yaml`: single `agent.semantic-extractor` target split into three algorithm-specific targets (`agent.pii_redactor`, `agent.vdvc_semantic_extractor`, `agent.completion_event_emitter`), each carrying its `algorithm_card` reference. Binding shape unchanged.
|
|
- **~** `bpmn/semantic-document-analysis.bpmn`: **unchanged**. Algorithm Cards live on resource targets, not in the BPMN — no extension elements required.
|
|
- **−** `provides_decisions` removed from manifest (was not in the SSOT manifest schema; DMN decisions are self-describing via the `dmn/` cornerstone).
|
|
|
|
## Structure
|
|
|
|
```
|
|
.
|
|
├── uapf.yaml + manifest.json # Package manifest (UAPF v2.3.0)
|
|
├── bpmn/ # 1 BPMN process (unchanged from v2.0.0)
|
|
├── dmn/ # 3 DMN decision tables (unchanged from v2.0.0)
|
|
├── algorithms/ # 3 Algorithm Cards (NEW in v3.0.0)
|
|
├── resources/
|
|
│ ├── mappings.yaml # Resource targets w/ algorithm_card refs (REFACTORED)
|
|
│ ├── guardrails.yaml
|
|
│ └── schemas/ # Output JSON Schemas
|
|
├── metadata/ # ownership + lifecycle
|
|
├── docs/ # EU AI Act / integration notes
|
|
├── fixtures/ # Sample inputs
|
|
└── tests/ # Eval set
|
|
```
|
|
|
|
## Validation
|
|
|
|
Validates against UAPF v2.3.0 schemas at
|
|
`github.com/UAPFormat/UAPF-specification`:
|
|
|
|
```bash
|
|
python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
|
|
```
|