dokumenta-semantiska-analize/README.md

# Semantic Document Analysis

UAPF Level-4 process for semantic analysis of free-text documents,
governed by **UAPF v2.3.0** (Algorithm Cards).

## What this package does

Three BPMN service tasks invoke three UAPF-IP host capabilities:

| Task                  | Capability     | Algorithm Card                                                      |
|-----------------------|----------------|---------------------------------------------------------------------|
| `Task_DetectRedactPii`| `ai.redact@1`  | [`algorithms/pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) |
| `Task_ExtractSemantics`| `ai.extract@1`| [`algorithms/vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) |
| `Task_EmitResult`     | `event.emit@1` | [`algorithms/completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) |

Three DMN decision tables encode the deterministic policy:

- `assess-personal-data-risk` — PII regex signals → risk level
- `gdpr-processing-route` — selects CENTRAL vs LOCAL processing,
  anonymisation, redaction level
- `human-validation-gate` — confidence thresholds → REJECTED /
  PENDING_REVIEW / APPROVED_AUTO

Only `Task_ExtractSemantics` is a model-inference step (governed by the
high-risk `vdvc_semantic_extractor` Card). Everything else is
deterministic.

## v3.0.0 — Algorithm Cards

The three opaque host capabilities are now wrapped in Algorithm Cards
under `algorithms/`. Each Card supplies, per UAPF v2.3.0 chapter 13:
intent, IO contract, ownership, validation history, risk class, audit
configuration, and (where relevant) `privacy` and `risk` extensions.

Audit question → answer-location:

| Auditor asks                                  | Read this                                      |
|-----------------------------------------------|------------------------------------------------|
| What does the redactor detect?                | `algorithms/pii_redactor.card.yaml` § io       |
| What's the AI Act risk class of the extractor?| `vdvc_semantic_extractor.card.yaml` § risk     |
| Who owns each algorithm?                      | each Card § owners                             |
| When was each algorithm last validated?       | each Card § validation                         |
| What gets logged, with what retention?        | each Card § audit                              |
| Why is human oversight needed?                | `vdvc_semantic_extractor.card.yaml` § confidence |

### Delta from v2.0.0

- **+** `algorithms/` folder with three Cards (one per opaque host capability).
- **+** `algorithm_cards: true` and `paths.algorithms` in `uapf.yaml` / `manifest.json`.
- **~** `resources/mappings.yaml`: single `agent.semantic-extractor` target split into three algorithm-specific targets (`agent.pii_redactor`, `agent.vdvc_semantic_extractor`, `agent.completion_event_emitter`), each carrying its `algorithm_card` reference. Binding shape unchanged.
- **~** `bpmn/semantic-document-analysis.bpmn`: **unchanged**. Algorithm Cards live on resource targets, not in the BPMN — no extension elements required.
- **−** `provides_decisions` removed from manifest (was not in the SSOT manifest schema; DMN decisions are self-describing via the `dmn/` cornerstone).

## Structure

```
.
├── uapf.yaml + manifest.json     # Package manifest (UAPF v2.3.0)
├── bpmn/                          # 1 BPMN process (unchanged from v2.0.0)
├── dmn/                           # 3 DMN decision tables (unchanged from v2.0.0)
├── algorithms/                    # 3 Algorithm Cards (NEW in v3.0.0)
├── resources/
│   ├── mappings.yaml              # Resource targets w/ algorithm_card refs (REFACTORED)
│   ├── guardrails.yaml
│   └── schemas/                   # Output JSON Schemas
├── metadata/                      # ownership + lifecycle
├── docs/                          # EU AI Act / integration notes
├── fixtures/                      # Sample inputs
└── tests/                         # Eval set
```

## Validation

Validates against UAPF v2.3.0 schemas at
`github.com/UAPFormat/UAPF-specification`:

```bash
python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
```