1
0

feat(3.2.0): align with UAPF v2.5.0 — embed algorithm card tests, drop sidecar

Per UAPF v2.5.0, tests move from sidecar files
(tests/algorithms/<card-id>.test.yaml — removed in v2.5.0) into a
top-level tests array on each algorithm card. Minimum two entries per
card; the Algorithm Card viewer (UAPF chapter 13.16, ProcessGit
Preview tab) consumes these as its primary interaction surface.

This package's three cards now carry embedded tests:

- algo.semantic_document_analysis.pii_redactor (deterministic redactor)
  — 3 cases: Latvian personas kods inline (positive — three entity
  types detected), plain administrative text (negative — no PII
  signals), financial figures with IBAN (mixed — financial yes,
  personas_kods no).

- algo.semantic_document_analysis.vdvc_semantic_extractor (stochastic
  LLM extractor, EU AI Act high-risk + mandatory oversight) — 2
  cases: regulatory construction-permit appeal (in-domain, expected
  topic + applicable_regulations), non-regulatory thank-you note
  (out-of-domain, low confidence). Both carry ai_confidence_score
  tolerance bands appropriate for a stochastic output.

- algo.semantic_document_analysis.completion_event_emitter
  (deterministic CloudEvents emitter) — 2 cases: successful
  completion event, failure completion event. The emitter does not
  gate on payload contents, so both succeed.

Other changes:
- uapf.yaml + manifest.json: version 3.1.0 -> 3.2.0
- README.md: v3.2.0 section added describing embedded tests and the
  removed sidecar location

BPMN file unchanged from v3.1.0 — uapf:algorithmCardRef on each
service task per UAPF v2.4.0 + ioSpecification synthesis. Mappings
unchanged. DMN tables unchanged.

uapf-cli validate against v2.5.0 schemas passes cleanly.
This commit is contained in:
2026-05-21 08:02:26 +00:00
parent e97b9d7d40
commit 9b3790c1fa
6 changed files with 237 additions and 158 deletions

View File

@@ -1,87 +1,117 @@
kind: uapf.algorithm.card
id: algo.semantic_document_analysis.pii_redactor
version: "1.0.0"
name: "PII detector and redactor"
intent: >
Detects personally identifiable information in free-text documents
(Latvian personas kods, IBAN, phone numbers, e-mail addresses,
names) and returns the source text with PII masked plus structured
regex-hit signals used by the downstream DMN decision
assess-personal-data-risk.
version: 1.0.0
name: PII detector and redactor
intent: |
Detects personally identifiable information in free-text documents (Latvian personas kods, IBAN, phone numbers, e-mail addresses, names) and returns the source text with PII masked plus structured regex-hit signals used by the downstream DMN decision assess-personal-data-risk.
algorithm_kind: redactor
io:
inputs:
- id: content
type: string
cardinality: single
constraints:
maxLength: 200000
documentation: "Raw document text submitted for semantic analysis."
- id: content
type: string
cardinality: single
constraints:
maxLength: 200000
documentation: Raw document text submitted for semantic analysis.
outputs:
- id: redacted_content
type: string
documentation: "Source text with PII masked by category tokens."
- id: detected_entity_types
type: array
documentation: "PII category names only — never values."
- id: personas_koda_present
type: boolean
- id: financial_data_present
type: boolean
- id: contact_data_present
type: boolean
- id: pii_category_count
type: integer
constraints: { minimum: 0 }
- id: redacted_content
type: string
documentation: Source text with PII masked by category tokens.
- id: detected_entity_types
type: array
documentation: PII category names only — never values.
- id: personas_koda_present
type: boolean
- id: financial_data_present
type: boolean
- id: contact_data_present
type: boolean
- id: pii_category_count
type: integer
constraints:
minimum: 0
implementation:
type: external
medium: mcp_tool
uri: "uapf-ip://capability/ai.redact@1"
hash: "sha256:0000000000000000000000000000000000000000000000000000000000000000"
uri: uapf-ip://capability/ai.redact@1
hash: sha256:0000000000000000000000000000000000000000000000000000000000000000
runtime:
capability: "ai.redact@1"
note: "Host-fulfilled UAPF-IP capability. Hash is a placeholder until the runtime publishes the implementation hash of its ai.redact handler."
capability: ai.redact@1
note: Host-fulfilled UAPF-IP capability. Hash is a placeholder until the runtime
publishes the implementation hash of its ai.redact handler.
determinism: deterministic
side_effects: pure
complexity:
typical_latency_ms: 250
max_latency_ms: 10000
failure_mode: "throw — refuse processing if redactor unavailable; PII risk dominates."
failure_mode: throw — refuse processing if redactor unavailable; PII risk dominates.
limitations:
- "Latviešu valodas personu vārdi atpazīstami ~92% gadījumu"
- "Pieņem, ka teksts jau ir digitāls — OCR nav iekļauta"
- Latviešu valodas personu vārdi atpazīstami ~92% gadījumu
- Pieņem, ka teksts jau ir digitāls — OCR nav iekļauta
reference:
legal: "GDPR 2016/679 5. pants (datu minimizēšana); Fizisko personu datu apstrādes likums."
standard: "NIST SP 800-188 — De-Identification of Personal Information."
legal: GDPR 2016/679 5. pants (datu minimizēšana); Fizisko personu datu apstrādes
likums.
standard: NIST SP 800-188 — De-Identification of Personal Information.
owners:
- type: role
id: data_protection_officer
contact: stewards@uapf.dev
- type: role
id: data_protection_officer
contact: stewards@uapf.dev
lifecycle:
status: draft
since: "2026-05-20"
since: '2026-05-20'
audit:
log_inputs: redacted
log_outputs: full
retention: "7y"
retention: 7y
privacy:
processesPII: true
technique: pseudonymization
reidentificationRisk: low
risk:
aiActRiskClass: limited
humanOversight: advisory
tests:
- name: Latvian personas kods inline in text
description: Standard 11-character Latvian personal identity code (NNNNNN-NNNNN)
should be detected and redacted.
inputs:
content: 'Lūgums izskatīt iesniegumu. Iesniedzējs: Jānis Bērziņš, personas kods:
010101-12345. Adrese: Brīvības iela 1, Rīga.'
expected_outputs:
redacted_content: 'Lūgums izskatīt iesniegumu. Iesniedzējs: [NAME], personas kods:
[REDACTED]. Adrese: [ADDRESS].'
detected_entity_types:
- PERSONAS_KODS
- PERSON
- ADDRESS
personas_koda_present: true
financial_data_present: false
contact_data_present: true
pii_category_count: 3
- name: Plain administrative text with no PII
description: Generic administrative paragraph; nothing to redact. Verifies the redactor
doesn't false-positive on plain text.
inputs:
content: Iesniegums tiek izskatīts atbilstoši normatīvajiem aktiem. Lēmums tiks
paziņots noteiktajā kārtībā.
expected_outputs:
redacted_content: Iesniegums tiek izskatīts atbilstoši normatīvajiem aktiem. Lēmums
tiks paziņots noteiktajā kārtībā.
detected_entity_types: []
personas_koda_present: false
financial_data_present: false
contact_data_present: false
pii_category_count: 0
- name: Financial figures and account numbers
description: EUR amounts and IBAN — both detected as financial PII; no personas_kods.
inputs:
content: Maksājums EUR 1250.00 pārskaitīts uz kontu LV80BANK0000435195001.
expected_outputs:
redacted_content: Maksājums EUR [AMOUNT] pārskaitīts uz kontu [IBAN].
detected_entity_types:
- AMOUNT
- IBAN
personas_koda_present: false
financial_data_present: true
contact_data_present: false
pii_category_count: 2