You've already forked dokumenta-semantiska-analize
Import UAPF package
feat(3.2.0): align with UAPF v2.5.0 — embed algorithm card tests, drop sidecar
Per UAPF v2.5.0, tests move from sidecar files (tests/algorithms/<card-id>.test.yaml — removed in v2.5.0) into a top-level tests array on each algorithm card. Minimum two entries per card; the Algorithm Card viewer (UAPF chapter 13.16, ProcessGit Preview tab) consumes these as its primary interaction surface. This package's three cards now carry embedded tests: - algo.semantic_document_analysis.pii_redactor (deterministic redactor) — 3 cases: Latvian personas kods inline (positive — three entity types detected), plain administrative text (negative — no PII signals), financial figures with IBAN (mixed — financial yes, personas_kods no). - algo.semantic_document_analysis.vdvc_semantic_extractor (stochastic LLM extractor, EU AI Act high-risk + mandatory oversight) — 2 cases: regulatory construction-permit appeal (in-domain, expected topic + applicable_regulations), non-regulatory thank-you note (out-of-domain, low confidence). Both carry ai_confidence_score tolerance bands appropriate for a stochastic output. - algo.semantic_document_analysis.completion_event_emitter (deterministic CloudEvents emitter) — 2 cases: successful completion event, failure completion event. The emitter does not gate on payload contents, so both succeed. Other changes: - uapf.yaml + manifest.json: version 3.1.0 -> 3.2.0 - README.md: v3.2.0 section added describing embedded tests and the removed sidecar location BPMN file unchanged from v3.1.0 — uapf:algorithmCardRef on each service task per UAPF v2.4.0 + ioSpecification synthesis. Mappings unchanged. DMN tables unchanged. uapf-cli validate against v2.5.0 schemas passes cleanly.
This commit is contained in:
@@ -1,87 +1,117 @@
|
||||
kind: uapf.algorithm.card
|
||||
|
||||
id: algo.semantic_document_analysis.pii_redactor
|
||||
version: "1.0.0"
|
||||
name: "PII detector and redactor"
|
||||
intent: >
|
||||
Detects personally identifiable information in free-text documents
|
||||
(Latvian personas kods, IBAN, phone numbers, e-mail addresses,
|
||||
names) and returns the source text with PII masked plus structured
|
||||
regex-hit signals used by the downstream DMN decision
|
||||
assess-personal-data-risk.
|
||||
|
||||
version: 1.0.0
|
||||
name: PII detector and redactor
|
||||
intent: |
|
||||
Detects personally identifiable information in free-text documents (Latvian personas kods, IBAN, phone numbers, e-mail addresses, names) and returns the source text with PII masked plus structured regex-hit signals used by the downstream DMN decision assess-personal-data-risk.
|
||||
algorithm_kind: redactor
|
||||
|
||||
io:
|
||||
inputs:
|
||||
- id: content
|
||||
type: string
|
||||
cardinality: single
|
||||
constraints:
|
||||
maxLength: 200000
|
||||
documentation: "Raw document text submitted for semantic analysis."
|
||||
- id: content
|
||||
type: string
|
||||
cardinality: single
|
||||
constraints:
|
||||
maxLength: 200000
|
||||
documentation: Raw document text submitted for semantic analysis.
|
||||
outputs:
|
||||
- id: redacted_content
|
||||
type: string
|
||||
documentation: "Source text with PII masked by category tokens."
|
||||
- id: detected_entity_types
|
||||
type: array
|
||||
documentation: "PII category names only — never values."
|
||||
- id: personas_koda_present
|
||||
type: boolean
|
||||
- id: financial_data_present
|
||||
type: boolean
|
||||
- id: contact_data_present
|
||||
type: boolean
|
||||
- id: pii_category_count
|
||||
type: integer
|
||||
constraints: { minimum: 0 }
|
||||
|
||||
- id: redacted_content
|
||||
type: string
|
||||
documentation: Source text with PII masked by category tokens.
|
||||
- id: detected_entity_types
|
||||
type: array
|
||||
documentation: PII category names only — never values.
|
||||
- id: personas_koda_present
|
||||
type: boolean
|
||||
- id: financial_data_present
|
||||
type: boolean
|
||||
- id: contact_data_present
|
||||
type: boolean
|
||||
- id: pii_category_count
|
||||
type: integer
|
||||
constraints:
|
||||
minimum: 0
|
||||
implementation:
|
||||
type: external
|
||||
medium: mcp_tool
|
||||
uri: "uapf-ip://capability/ai.redact@1"
|
||||
hash: "sha256:0000000000000000000000000000000000000000000000000000000000000000"
|
||||
uri: uapf-ip://capability/ai.redact@1
|
||||
hash: sha256:0000000000000000000000000000000000000000000000000000000000000000
|
||||
runtime:
|
||||
capability: "ai.redact@1"
|
||||
note: "Host-fulfilled UAPF-IP capability. Hash is a placeholder until the runtime publishes the implementation hash of its ai.redact handler."
|
||||
|
||||
capability: ai.redact@1
|
||||
note: Host-fulfilled UAPF-IP capability. Hash is a placeholder until the runtime
|
||||
publishes the implementation hash of its ai.redact handler.
|
||||
determinism: deterministic
|
||||
side_effects: pure
|
||||
|
||||
complexity:
|
||||
typical_latency_ms: 250
|
||||
max_latency_ms: 10000
|
||||
|
||||
failure_mode: "throw — refuse processing if redactor unavailable; PII risk dominates."
|
||||
|
||||
failure_mode: throw — refuse processing if redactor unavailable; PII risk dominates.
|
||||
limitations:
|
||||
- "Latviešu valodas personu vārdi atpazīstami ~92% gadījumu"
|
||||
- "Pieņem, ka teksts jau ir digitāls — OCR nav iekļauta"
|
||||
|
||||
- Latviešu valodas personu vārdi atpazīstami ~92% gadījumu
|
||||
- Pieņem, ka teksts jau ir digitāls — OCR nav iekļauta
|
||||
reference:
|
||||
legal: "GDPR 2016/679 5. pants (datu minimizēšana); Fizisko personu datu apstrādes likums."
|
||||
standard: "NIST SP 800-188 — De-Identification of Personal Information."
|
||||
|
||||
legal: GDPR 2016/679 5. pants (datu minimizēšana); Fizisko personu datu apstrādes
|
||||
likums.
|
||||
standard: NIST SP 800-188 — De-Identification of Personal Information.
|
||||
owners:
|
||||
- type: role
|
||||
id: data_protection_officer
|
||||
contact: stewards@uapf.dev
|
||||
|
||||
- type: role
|
||||
id: data_protection_officer
|
||||
contact: stewards@uapf.dev
|
||||
lifecycle:
|
||||
status: draft
|
||||
since: "2026-05-20"
|
||||
|
||||
since: '2026-05-20'
|
||||
audit:
|
||||
log_inputs: redacted
|
||||
log_outputs: full
|
||||
retention: "7y"
|
||||
|
||||
retention: 7y
|
||||
privacy:
|
||||
processesPII: true
|
||||
technique: pseudonymization
|
||||
reidentificationRisk: low
|
||||
|
||||
risk:
|
||||
aiActRiskClass: limited
|
||||
humanOversight: advisory
|
||||
tests:
|
||||
- name: Latvian personas kods inline in text
|
||||
description: Standard 11-character Latvian personal identity code (NNNNNN-NNNNN)
|
||||
should be detected and redacted.
|
||||
inputs:
|
||||
content: 'Lūgums izskatīt iesniegumu. Iesniedzējs: Jānis Bērziņš, personas kods:
|
||||
010101-12345. Adrese: Brīvības iela 1, Rīga.'
|
||||
expected_outputs:
|
||||
redacted_content: 'Lūgums izskatīt iesniegumu. Iesniedzējs: [NAME], personas kods:
|
||||
[REDACTED]. Adrese: [ADDRESS].'
|
||||
detected_entity_types:
|
||||
- PERSONAS_KODS
|
||||
- PERSON
|
||||
- ADDRESS
|
||||
personas_koda_present: true
|
||||
financial_data_present: false
|
||||
contact_data_present: true
|
||||
pii_category_count: 3
|
||||
- name: Plain administrative text with no PII
|
||||
description: Generic administrative paragraph; nothing to redact. Verifies the redactor
|
||||
doesn't false-positive on plain text.
|
||||
inputs:
|
||||
content: Iesniegums tiek izskatīts atbilstoši normatīvajiem aktiem. Lēmums tiks
|
||||
paziņots noteiktajā kārtībā.
|
||||
expected_outputs:
|
||||
redacted_content: Iesniegums tiek izskatīts atbilstoši normatīvajiem aktiem. Lēmums
|
||||
tiks paziņots noteiktajā kārtībā.
|
||||
detected_entity_types: []
|
||||
personas_koda_present: false
|
||||
financial_data_present: false
|
||||
contact_data_present: false
|
||||
pii_category_count: 0
|
||||
- name: Financial figures and account numbers
|
||||
description: EUR amounts and IBAN — both detected as financial PII; no personas_kods.
|
||||
inputs:
|
||||
content: Maksājums EUR 1250.00 pārskaitīts uz kontu LV80BANK0000435195001.
|
||||
expected_outputs:
|
||||
redacted_content: Maksājums EUR [AMOUNT] pārskaitīts uz kontu [IBAN].
|
||||
detected_entity_types:
|
||||
- AMOUNT
|
||||
- IBAN
|
||||
personas_koda_present: false
|
||||
financial_data_present: true
|
||||
contact_data_present: false
|
||||
pii_category_count: 2
|
||||
|
||||
Reference in New Issue
Block a user