1
0

3 Commits

Author SHA1 Message Date
7e9cb63a4b Merge pull request 'feat(3.2.0): align with UAPF v2.5.0 — embed algorithm card tests, drop sidecar' (#3) from v3.2.0-embedded-tests into main
Reviewed-on: #3
2026-05-21 08:27:54 +00:00
9b3790c1fa feat(3.2.0): align with UAPF v2.5.0 — embed algorithm card tests, drop sidecar
Per UAPF v2.5.0, tests move from sidecar files
(tests/algorithms/<card-id>.test.yaml — removed in v2.5.0) into a
top-level tests array on each algorithm card. Minimum two entries per
card; the Algorithm Card viewer (UAPF chapter 13.16, ProcessGit
Preview tab) consumes these as its primary interaction surface.

This package's three cards now carry embedded tests:

- algo.semantic_document_analysis.pii_redactor (deterministic redactor)
  — 3 cases: Latvian personas kods inline (positive — three entity
  types detected), plain administrative text (negative — no PII
  signals), financial figures with IBAN (mixed — financial yes,
  personas_kods no).

- algo.semantic_document_analysis.vdvc_semantic_extractor (stochastic
  LLM extractor, EU AI Act high-risk + mandatory oversight) — 2
  cases: regulatory construction-permit appeal (in-domain, expected
  topic + applicable_regulations), non-regulatory thank-you note
  (out-of-domain, low confidence). Both carry ai_confidence_score
  tolerance bands appropriate for a stochastic output.

- algo.semantic_document_analysis.completion_event_emitter
  (deterministic CloudEvents emitter) — 2 cases: successful
  completion event, failure completion event. The emitter does not
  gate on payload contents, so both succeed.

Other changes:
- uapf.yaml + manifest.json: version 3.1.0 -> 3.2.0
- README.md: v3.2.0 section added describing embedded tests and the
  removed sidecar location

BPMN file unchanged from v3.1.0 — uapf:algorithmCardRef on each
service task per UAPF v2.4.0 + ioSpecification synthesis. Mappings
unchanged. DMN tables unchanged.

uapf-cli validate against v2.5.0 schemas passes cleanly.
2026-05-21 08:02:26 +00:00
e97b9d7d40 Merge pull request 'feat(3.1.0): align with UAPF v2.4.0 — algorithm card refs move to BPMN task' (#2) from v3.1.0-bpmn-algorithm-task into main
Reviewed-on: #2
2026-05-20 14:51:21 +00:00
6 changed files with 237 additions and 158 deletions

View File

@@ -98,3 +98,15 @@ Validates against UAPF v2.4.0 schemas at
```bash ```bash
python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
``` ```
## v3.2.0 (UAPF v2.5.0 alignment)
Tests are now **embedded in each algorithm card** under a top-level `tests:` array (minimum 2 entries per card). The old sidecar location `tests/algorithms/<card-id>.test.yaml` is **removed** per UAPF v2.5.0 — that location no longer applies to algorithm cards.
Embedded tests for this package:
- `algo.semantic_document_analysis.pii_redactor` — 3 cases (Latvian personas kods inline, plain text with no PII, financial figures + IBAN)
- `algo.semantic_document_analysis.vdvc_semantic_extractor` — 2 cases (regulatory complaint, non-regulatory thank-you), both with `ai_confidence_score` tolerance bands appropriate for a stochastic LLM extractor
- `algo.semantic_document_analysis.completion_event_emitter` — 2 cases (success completion, failure completion)
The Algorithm Card viewer (UAPF v2.5.0 chapter 13.16, ProcessGit Preview tab) consumes these embedded tests as its primary interaction surface — sample browser for `external` cards, regex/FEEL/source-display for `inline` cards.

View File

@@ -1,17 +1,10 @@
kind: uapf.algorithm.card kind: uapf.algorithm.card
id: algo.semantic_document_analysis.completion_event_emitter id: algo.semantic_document_analysis.completion_event_emitter
version: "1.0.0" version: 1.0.0
name: "Process completion event emitter" name: Process completion event emitter
intent: > intent: |
Publishes a CloudEvents 1.0-conformant event marking the completion Publishes a CloudEvents 1.0-conformant event marking the completion of one semantic analysis cycle, with the DMN-decided fields (personal data risk, processing route, redaction level, human validation status) attached. Personal data is NEVER included in the emitted payload — only the deterministic classification fields.
of one semantic analysis cycle, with the DMN-decided fields
(personal data risk, processing route, redaction level, human
validation status) attached. Personal data is NEVER included in
the emitted payload — only the deterministic classification fields.
algorithm_kind: emitter algorithm_kind: emitter
io: io:
inputs: inputs:
- id: event_type - id: event_type
@@ -23,42 +16,55 @@ io:
outputs: outputs:
- id: published - id: published
type: boolean type: boolean
implementation: implementation:
type: external type: external
medium: mcp_tool medium: mcp_tool
uri: "uapf-ip://capability/event.emit@1" uri: uapf-ip://capability/event.emit@1
hash: "sha256:0000000000000000000000000000000000000000000000000000000000000000" hash: sha256:0000000000000000000000000000000000000000000000000000000000000000
runtime: runtime:
capability: "event.emit@1" capability: event.emit@1
cloud_events_spec: "1.0" cloud_events_spec: '1.0'
determinism: deterministic determinism: deterministic
side_effects: writes_state side_effects: writes_state
confidence: confidence:
type: none type: none
complexity: complexity:
typical_latency_ms: 25 typical_latency_ms: 25
max_latency_ms: 1000 max_latency_ms: 1000
failure_mode: throw — process must complete reliably or fail loudly.
failure_mode: "throw — process must complete reliably or fail loudly."
reference: reference:
standard: "CloudEvents 1.0" standard: CloudEvents 1.0
url: "https://github.com/cloudevents/spec/blob/v1.0/spec.md" url: https://github.com/cloudevents/spec/blob/v1.0/spec.md
owners: owners:
- type: team - type: team
id: uapf-stewards id: uapf-stewards
contact: stewards@uapf.dev contact: stewards@uapf.dev
lifecycle: lifecycle:
status: draft status: draft
since: "2026-05-20" since: '2026-05-20'
audit: audit:
log_inputs: full log_inputs: full
log_outputs: full log_outputs: full
retention: "1y" retention: 1y
tests:
- name: Successful analysis completion
description: Standard happy-path completion event with full payload.
inputs:
event_type: dev.dokumenta.semantic_analysis.completed
payload:
document_id: doc-2026-05-21-001
outcome: ok
confidence: 0.87
expected_outputs:
published: true
- name: Analysis failure completion
description: Failure-path completion event still emits successfully (the emitter
does not gate on payload contents).
inputs:
event_type: dev.dokumenta.semantic_analysis.failed
payload:
document_id: doc-2026-05-21-002
outcome: extraction_failed
reason: low_confidence
expected_outputs:
published: true

View File

@@ -1,17 +1,10 @@
kind: uapf.algorithm.card kind: uapf.algorithm.card
id: algo.semantic_document_analysis.pii_redactor id: algo.semantic_document_analysis.pii_redactor
version: "1.0.0" version: 1.0.0
name: "PII detector and redactor" name: PII detector and redactor
intent: > intent: |
Detects personally identifiable information in free-text documents Detects personally identifiable information in free-text documents (Latvian personas kods, IBAN, phone numbers, e-mail addresses, names) and returns the source text with PII masked plus structured regex-hit signals used by the downstream DMN decision assess-personal-data-risk.
(Latvian personas kods, IBAN, phone numbers, e-mail addresses,
names) and returns the source text with PII masked plus structured
regex-hit signals used by the downstream DMN decision
assess-personal-data-risk.
algorithm_kind: redactor algorithm_kind: redactor
io: io:
inputs: inputs:
- id: content - id: content
@@ -19,14 +12,14 @@ io:
cardinality: single cardinality: single
constraints: constraints:
maxLength: 200000 maxLength: 200000
documentation: "Raw document text submitted for semantic analysis." documentation: Raw document text submitted for semantic analysis.
outputs: outputs:
- id: redacted_content - id: redacted_content
type: string type: string
documentation: "Source text with PII masked by category tokens." documentation: Source text with PII masked by category tokens.
- id: detected_entity_types - id: detected_entity_types
type: array type: array
documentation: "PII category names only — never values." documentation: PII category names only — never values.
- id: personas_koda_present - id: personas_koda_present
type: boolean type: boolean
- id: financial_data_present - id: financial_data_present
@@ -35,53 +28,90 @@ io:
type: boolean type: boolean
- id: pii_category_count - id: pii_category_count
type: integer type: integer
constraints: { minimum: 0 } constraints:
minimum: 0
implementation: implementation:
type: external type: external
medium: mcp_tool medium: mcp_tool
uri: "uapf-ip://capability/ai.redact@1" uri: uapf-ip://capability/ai.redact@1
hash: "sha256:0000000000000000000000000000000000000000000000000000000000000000" hash: sha256:0000000000000000000000000000000000000000000000000000000000000000
runtime: runtime:
capability: "ai.redact@1" capability: ai.redact@1
note: "Host-fulfilled UAPF-IP capability. Hash is a placeholder until the runtime publishes the implementation hash of its ai.redact handler." note: Host-fulfilled UAPF-IP capability. Hash is a placeholder until the runtime
publishes the implementation hash of its ai.redact handler.
determinism: deterministic determinism: deterministic
side_effects: pure side_effects: pure
complexity: complexity:
typical_latency_ms: 250 typical_latency_ms: 250
max_latency_ms: 10000 max_latency_ms: 10000
failure_mode: throw — refuse processing if redactor unavailable; PII risk dominates.
failure_mode: "throw — refuse processing if redactor unavailable; PII risk dominates."
limitations: limitations:
- "Latviešu valodas personu vārdi atpazīstami ~92% gadījumu" - Latviešu valodas personu vārdi atpazīstami ~92% gadījumu
- "Pieņem, ka teksts jau ir digitāls — OCR nav iekļauta" - Pieņem, ka teksts jau ir digitāls — OCR nav iekļauta
reference: reference:
legal: "GDPR 2016/679 5. pants (datu minimizēšana); Fizisko personu datu apstrādes likums." legal: GDPR 2016/679 5. pants (datu minimizēšana); Fizisko personu datu apstrādes
standard: "NIST SP 800-188 — De-Identification of Personal Information." likums.
standard: NIST SP 800-188 — De-Identification of Personal Information.
owners: owners:
- type: role - type: role
id: data_protection_officer id: data_protection_officer
contact: stewards@uapf.dev contact: stewards@uapf.dev
lifecycle: lifecycle:
status: draft status: draft
since: "2026-05-20" since: '2026-05-20'
audit: audit:
log_inputs: redacted log_inputs: redacted
log_outputs: full log_outputs: full
retention: "7y" retention: 7y
privacy: privacy:
processesPII: true processesPII: true
technique: pseudonymization technique: pseudonymization
reidentificationRisk: low reidentificationRisk: low
risk: risk:
aiActRiskClass: limited aiActRiskClass: limited
humanOversight: advisory humanOversight: advisory
tests:
- name: Latvian personas kods inline in text
description: Standard 11-character Latvian personal identity code (NNNNNN-NNNNN)
should be detected and redacted.
inputs:
content: 'Lūgums izskatīt iesniegumu. Iesniedzējs: Jānis Bērziņš, personas kods:
010101-12345. Adrese: Brīvības iela 1, Rīga.'
expected_outputs:
redacted_content: 'Lūgums izskatīt iesniegumu. Iesniedzējs: [NAME], personas kods:
[REDACTED]. Adrese: [ADDRESS].'
detected_entity_types:
- PERSONAS_KODS
- PERSON
- ADDRESS
personas_koda_present: true
financial_data_present: false
contact_data_present: true
pii_category_count: 3
- name: Plain administrative text with no PII
description: Generic administrative paragraph; nothing to redact. Verifies the redactor
doesn't false-positive on plain text.
inputs:
content: Iesniegums tiek izskatīts atbilstoši normatīvajiem aktiem. Lēmums tiks
paziņots noteiktajā kārtībā.
expected_outputs:
redacted_content: Iesniegums tiek izskatīts atbilstoši normatīvajiem aktiem. Lēmums
tiks paziņots noteiktajā kārtībā.
detected_entity_types: []
personas_koda_present: false
financial_data_present: false
contact_data_present: false
pii_category_count: 0
- name: Financial figures and account numbers
description: EUR amounts and IBAN — both detected as financial PII; no personas_kods.
inputs:
content: Maksājums EUR 1250.00 pārskaitīts uz kontu LV80BANK0000435195001.
expected_outputs:
redacted_content: Maksājums EUR [AMOUNT] pārskaitīts uz kontu [IBAN].
detected_entity_types:
- AMOUNT
- IBAN
personas_koda_present: false
financial_data_present: true
contact_data_present: false
pii_category_count: 2

View File

@@ -1,18 +1,10 @@
kind: uapf.algorithm.card kind: uapf.algorithm.card
id: algo.semantic_document_analysis.vdvc_semantic_extractor id: algo.semantic_document_analysis.vdvc_semantic_extractor
version: "1.0.0" version: 1.0.0
name: "VDVC semantic metadata extractor" name: VDVC semantic metadata extractor
intent: > intent: |
Extracts a VDVC v1.1-conformant structured semantic summary from Extracts a VDVC v1.1-conformant structured semantic summary from the redacted document text — primary topic, keywords, classification, summary, sensitivity signals. Output validates against resources/schemas/vdvc-semantic-summary.schema.json. This is the sole model-inference step in the process; everything else in the package is deterministic.
the redacted document text — primary topic, keywords,
classification, summary, sensitivity signals. Output validates
against resources/schemas/vdvc-semantic-summary.schema.json. This
is the sole model-inference step in the process; everything else
in the package is deterministic.
algorithm_kind: extractor algorithm_kind: extractor
io: io:
inputs: inputs:
- id: redacted_content - id: redacted_content
@@ -20,69 +12,108 @@ io:
cardinality: single cardinality: single
constraints: constraints:
maxLength: 200000 maxLength: 200000
documentation: "Output of the upstream PII redactor." documentation: Output of the upstream PII redactor.
- id: schema_ref - id: schema_ref
type: string type: string
documentation: "Path to the JSON Schema the output must validate against." documentation: Path to the JSON Schema the output must validate against.
outputs: outputs:
- id: semantic_summary - id: semantic_summary
type: object type: object
schema: "../resources/schemas/vdvc-semantic-summary.schema.json" schema: ../resources/schemas/vdvc-semantic-summary.schema.json
- id: sensitivity_control - id: sensitivity_control
type: object type: object
- id: ai_confidence_score - id: ai_confidence_score
type: probability type: probability
- id: output_pii_error_count - id: output_pii_error_count
type: integer type: integer
constraints: { minimum: 0 } constraints:
minimum: 0
implementation: implementation:
type: external type: external
medium: llm_prompt medium: llm_prompt
uri: "uapf-ip://capability/ai.extract@1" uri: uapf-ip://capability/ai.extract@1
hash: "sha256:0000000000000000000000000000000000000000000000000000000000000000" hash: sha256:0000000000000000000000000000000000000000000000000000000000000000
runtime: runtime:
capability: "ai.extract@1" capability: ai.extract@1
note: "Host-fulfilled UAPF-IP capability. Specific model identity and prompt hash are runtime concerns of the host; the Card declares the contract, not the implementation choice." note: Host-fulfilled UAPF-IP capability. Specific model identity and prompt hash
are runtime concerns of the host; the Card declares the contract, not the implementation
choice.
determinism: stochastic determinism: stochastic
side_effects: external_call side_effects: external_call
confidence: confidence:
type: probability type: probability
threshold: 0.70 threshold: 0.7
below_threshold: "route-to:human.legal_reviewer (enforced by DMN human-validation-gate)" below_threshold: route-to:human.legal_reviewer (enforced by DMN human-validation-gate)
complexity: complexity:
typical_latency_ms: 8000 typical_latency_ms: 8000
max_latency_ms: 60000 max_latency_ms: 60000
failure_mode: default:null + flag — DMN human-validation-gate routes low-confidence
failure_mode: "default:null + flag — DMN human-validation-gate routes low-confidence outputs to PENDING_REVIEW." outputs to PENDING_REVIEW.
limitations: limitations:
- "Garie dokumenti (>50 000 znaki) tiek apgriezti — pirmie 50K + pēdējie 5K" - Garie dokumenti (>50 000 znaki) tiek apgriezti — pirmie 50K + pēdējie 5K
- "Nav juridisks vērtējums — tikai semantiska klasifikācija" - Nav juridisks vērtējums — tikai semantiska klasifikācija
- "Latviešu valodas juridiskā retorika var samazināt recall" - Latviešu valodas juridiskā retorika var samazināt recall
reference: reference:
legal: "EU AI Act 2024/1689, Pielikums III (augstā riska MI sistēmas), 13. pants (caurspīdība)." legal: EU AI Act 2024/1689, Pielikums III (augstā riska MI sistēmas), 13. pants
url: "https://eur-lex.europa.eu/eli/reg/2024/1689/oj" (caurspīdība).
url: https://eur-lex.europa.eu/eli/reg/2024/1689/oj
owners: owners:
- type: team - type: team
id: uapf-stewards id: uapf-stewards
contact: stewards@uapf.dev contact: stewards@uapf.dev
lifecycle: lifecycle:
status: draft status: draft
since: "2026-05-20" since: '2026-05-20'
audit: audit:
log_inputs: redacted log_inputs: redacted
log_outputs: full log_outputs: full
retention: "7y" retention: 7y
risk: risk:
aiActRiskClass: high aiActRiskClass: high
humanOversight: mandatory humanOversight: mandatory
transparencyTier: tier-3-full transparencyTier: tier-3-full
tests:
- name: Regulatory iesniegums about administrative decision
description: Typical Latvian administrative complaint with redacted PII. The extractor
should identify topic + risk + applicable regulation.
inputs:
redacted_content: Iesniedzējs [NAME] iesniedza sūdzību par būvvaldes lēmumu Nr.
12345 atteikt būvatļauju adresē [ADDRESS]. Tiek lūgts pārskatīt lēmumu.
schema_ref: schemas/iesniegums/v1
expected_outputs:
semantic_summary:
topic: construction-permit-appeal
subject_area: administrative-law
applicable_regulations:
- BL
- APL
language: lv
sensitivity_control:
contains_decision_reference: true
external_communication_recommended: false
ai_confidence_score: 0.87
output_pii_error_count: 0
tolerance:
ai_confidence_score: 0.1
output_pii_error_count: 0
- name: Non-regulatory thank-you note
description: Out-of-domain input. Extractor should yield low-confidence summary
and a sensitivity flag that no decision is referenced.
inputs:
redacted_content: Paldies par jūsu pakalpojumu! Bija ļoti patīkami sadarboties
ar [NAME] no jūsu komandas.
schema_ref: schemas/iesniegums/v1
expected_outputs:
semantic_summary:
topic: non-actionable-correspondence
subject_area: feedback
applicable_regulations: []
language: lv
sensitivity_control:
contains_decision_reference: false
external_communication_recommended: false
ai_confidence_score: 0.62
output_pii_error_count: 0
tolerance:
ai_confidence_score: 0.15
output_pii_error_count: 0

View File

@@ -4,7 +4,7 @@
"name": "Semantic Document Analysis", "name": "Semantic Document Analysis",
"description": "Level-4 UAPF process for semantic analysis of free-text documents.\n\nThree BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,\nai.extract@1 and event.emit@1. Three DMN decision tables encode the\ndeterministic algorithm the host previously hid inside application\ncode: assess-personal-data-risk maps PII regex signals to a risk\nlevel; gdpr-processing-route selects CENTRAL vs LOCAL processing,\nanonymisation and redaction level; human-validation-gate applies the\nconfidence thresholds that decide REJECTED / PENDING_REVIEW /\nAPPROVED_AUTO.\n\nOnly the semantic extraction is a model step. Risk classification,\nGDPR routing and the validation gate are explicit ranked rules in\nversioned DMN \u2014 inspectable, auditable, portable. Extraction output\nvalidates against the VDVC v1.1 semantic-summary JSON Schema.\n\nv3.1.0: aligned with UAPF v2.4.0 \u2014 Algorithm Card references move\nfrom resource targets to the BPMN service tasks themselves (via\nuapf24:algorithmCardRef attribute). Each card's io block is also\ndenormalised into a <bpmn:ioSpecification> on the task so inputs\nand outputs render as visible data objects on the diagram. The\ncards themselves and the DMN decisions are unchanged from v3.0.0.\n", "description": "Level-4 UAPF process for semantic analysis of free-text documents.\n\nThree BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,\nai.extract@1 and event.emit@1. Three DMN decision tables encode the\ndeterministic algorithm the host previously hid inside application\ncode: assess-personal-data-risk maps PII regex signals to a risk\nlevel; gdpr-processing-route selects CENTRAL vs LOCAL processing,\nanonymisation and redaction level; human-validation-gate applies the\nconfidence thresholds that decide REJECTED / PENDING_REVIEW /\nAPPROVED_AUTO.\n\nOnly the semantic extraction is a model step. Risk classification,\nGDPR routing and the validation gate are explicit ranked rules in\nversioned DMN \u2014 inspectable, auditable, portable. Extraction output\nvalidates against the VDVC v1.1 semantic-summary JSON Schema.\n\nv3.1.0: aligned with UAPF v2.4.0 \u2014 Algorithm Card references move\nfrom resource targets to the BPMN service tasks themselves (via\nuapf24:algorithmCardRef attribute). Each card's io block is also\ndenormalised into a <bpmn:ioSpecification> on the task so inputs\nand outputs render as visible data objects on the diagram. The\ncards themselves and the DMN decisions are unchanged from v3.0.0.\n",
"level": 4, "level": 4,
"version": "3.1.0", "version": "3.2.0",
"requires_capabilities": [ "requires_capabilities": [
"ai.redact@1+", "ai.redact@1+",
"ai.extract@1+", "ai.extract@1+",

View File

@@ -26,7 +26,7 @@ description: |
cards themselves and the DMN decisions are unchanged from v3.0.0. cards themselves and the DMN decisions are unchanged from v3.0.0.
level: 4 level: 4
version: "3.1.0" version: "3.2.0"
# ── UAPF-IP integration (capability needs + profile + guardrails) ── # ── UAPF-IP integration (capability needs + profile + guardrails) ──
requires_capabilities: requires_capabilities: