kind: uapf.algorithm.card id: algo.semantic_document_analysis.pii_redactor version: 1.0.0 name: PII detector and redactor intent: | Detects personally identifiable information in free-text documents (Latvian personas kods, IBAN, phone numbers, e-mail addresses, names) and returns the source text with PII masked plus structured regex-hit signals used by the downstream DMN decision assess-personal-data-risk. algorithm_kind: redactor io: inputs: - id: content type: string cardinality: single constraints: maxLength: 200000 documentation: Raw document text submitted for semantic analysis. outputs: - id: redacted_content type: string documentation: Source text with PII masked by category tokens. - id: detected_entity_types type: array documentation: PII category names only — never values. - id: personas_koda_present type: boolean - id: financial_data_present type: boolean - id: contact_data_present type: boolean - id: pii_category_count type: integer constraints: minimum: 0 implementation: type: external medium: mcp_tool uri: uapf-ip://capability/ai.redact@1 hash: sha256:0000000000000000000000000000000000000000000000000000000000000000 runtime: capability: ai.redact@1 note: Host-fulfilled UAPF-IP capability. Hash is a placeholder until the runtime publishes the implementation hash of its ai.redact handler. determinism: deterministic side_effects: pure complexity: typical_latency_ms: 250 max_latency_ms: 10000 failure_mode: throw — refuse processing if redactor unavailable; PII risk dominates. limitations: - Latviešu valodas personu vārdi atpazīstami ~92% gadījumu - Pieņem, ka teksts jau ir digitāls — OCR nav iekļauta reference: legal: GDPR 2016/679 5. pants (datu minimizēšana); Fizisko personu datu apstrādes likums. standard: NIST SP 800-188 — De-Identification of Personal Information. owners: - type: role id: data_protection_officer contact: stewards@uapf.dev lifecycle: status: draft since: '2026-05-20' audit: log_inputs: redacted log_outputs: full retention: 7y privacy: processesPII: true technique: pseudonymization reidentificationRisk: low risk: aiActRiskClass: limited humanOversight: advisory tests: - name: Latvian personas kods inline in text description: Standard 11-character Latvian personal identity code (NNNNNN-NNNNN) should be detected and redacted. inputs: content: 'Lūgums izskatīt iesniegumu. Iesniedzējs: Jānis Bērziņš, personas kods: 010101-12345. Adrese: Brīvības iela 1, Rīga.' expected_outputs: redacted_content: 'Lūgums izskatīt iesniegumu. Iesniedzējs: [NAME], personas kods: [REDACTED]. Adrese: [ADDRESS].' detected_entity_types: - PERSONAS_KODS - PERSON - ADDRESS personas_koda_present: true financial_data_present: false contact_data_present: true pii_category_count: 3 - name: Plain administrative text with no PII description: Generic administrative paragraph; nothing to redact. Verifies the redactor doesn't false-positive on plain text. inputs: content: Iesniegums tiek izskatīts atbilstoši normatīvajiem aktiem. Lēmums tiks paziņots noteiktajā kārtībā. expected_outputs: redacted_content: Iesniegums tiek izskatīts atbilstoši normatīvajiem aktiem. Lēmums tiks paziņots noteiktajā kārtībā. detected_entity_types: [] personas_koda_present: false financial_data_present: false contact_data_present: false pii_category_count: 0 - name: Financial figures and account numbers description: EUR amounts and IBAN — both detected as financial PII; no personas_kods. inputs: content: Maksājums EUR 1250.00 pārskaitīts uz kontu LV80BANK0000435195001. expected_outputs: redacted_content: Maksājums EUR [AMOUNT] pārskaitīts uz kontu [IBAN]. detected_entity_types: - AMOUNT - IBAN personas_koda_present: false financial_data_present: true contact_data_present: false pii_category_count: 2