v1.0.0: dev.uapf.semantic-document-analysis

UAPF v1.1 SSOT-conformant Level 4 process package — reusable semantic document analysis, shareable across DMS / intake / mailroom systems. Structure: - uapf.yaml (kind: uapf.package, level 4) + manifest.json engine-compat - bpmn/semantic-document-analysis.bpmn.xml — 3 service tasks invoking reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1 - resources/mappings.yaml — task->target bindings with I/O contracts - resources/schemas/vdvc-semantic-summary.schema.json — output contract - resources/guardrails.yaml — GDPR + EU AI Act constraints - metadata/ownership.yaml + metadata/lifecycle.yaml - docs/, fixtures/, tests/eval-set.json Validates clean against UAPFormat/UAPF-specification schemas.
2026-05-16 09:32:55 +00:00
commit ae0c646021
16 changed files with 422 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,3 @@
+*.uapf
+__pycache__/
+.DS_Store
--- a/README.md
+++ b/README.md
@@ -0,0 +1,19 @@
+# dev.uapf.semantic-document-analysis
+
+UAPF v1.1 SSOT-conformant Level 4 process package providing reusable
+semantic document analysis as `Process_SemanticDocumentAnalysis`.
+
+See `docs/00-overview.md` for what it does, `docs/01-eu-ai-act.md` for
+the regulatory analysis, and `docs/02-integration.md` for runtime
+integration notes.
+
+## Validates against
+
+Run `jsonschema -i <each yaml-rendered-as-json> <each schema>` against
+the canonical schemas in
+`github.com/UAPFormat/UAPF-specification/schemas/`:
+
+- `uapf-manifest.schema.json` (root manifest)
+- `ownership.schema.json` (metadata/ownership.yaml)
+- `lifecycle.schema.json` (metadata/lifecycle.yaml)
+- `resource-mapping.schema.json` (resources/mappings.yaml)
--- a/bpmn/semantic-document-analysis.bpmn.xml
+++ b/bpmn/semantic-document-analysis.bpmn.xml
@@ -0,0 +1,55 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<bpmn:definitions
+    xmlns:bpmn="http://www.omg.org/spec/BPMN/20100524/MODEL"
+    xmlns:uapf="https://uapf.dev/bpmn-ext/v1"
+    id="Definitions_SemanticAnalysis"
+    targetNamespace="https://uapf.dev/processes/semantic-document-analysis">
+
+  <bpmn:process id="Process_SemanticDocumentAnalysis"
+                name="Semantic document analysis"
+                isExecutable="true">
+
+    <bpmn:startEvent id="Start" name="Document text received"/>
+
+    <bpmn:serviceTask id="Task_RedactPii"
+                      name="Redact personally identifiable information"
+                      uapf:capability="ai.redact@1">
+      <bpmn:documentation>
+        Calls ai.redact@1 to mask names, identifiers, addresses, financial
+        and health data before downstream extraction. Required by
+        resources/guardrails.yaml (GDPR Art. 5 minimisation).
+      </bpmn:documentation>
+    </bpmn:serviceTask>
+
+    <bpmn:serviceTask id="Task_ExtractSemantics"
+                      name="Extract semantic metadata"
+                      uapf:capability="ai.extract@1"
+                      uapf:schemaRef="resources/schemas/vdvc-semantic-summary.schema.json">
+      <bpmn:documentation>
+        Calls ai.extract@1 with the redacted text and the VDVC v1.1 output
+        schema (resources/schemas/vdvc-semantic-summary.schema.json). The
+        host's AI agent must produce output that validates against that
+        schema. Output records aiModelVersion + aiConfidenceScore per
+        EU AI Act Art. 13.
+      </bpmn:documentation>
+    </bpmn:serviceTask>
+
+    <bpmn:serviceTask id="Task_EmitResultEvent"
+                      name="Emit semantic-analysis-completed event"
+                      uapf:capability="event.emit@1"
+                      uapf:eventType="document.semantic-analysis.completed.v1">
+      <bpmn:documentation>
+        Calls event.emit@1 to publish a CloudEvent containing the extracted
+        semantic summary. Downstream processes consume this event.
+      </bpmn:documentation>
+    </bpmn:serviceTask>
+
+    <bpmn:endEvent id="End" name="Semantic analysis complete"/>
+
+    <bpmn:sequenceFlow id="f1" sourceRef="Start"                 targetRef="Task_RedactPii"/>
+    <bpmn:sequenceFlow id="f2" sourceRef="Task_RedactPii"        targetRef="Task_ExtractSemantics"/>
+    <bpmn:sequenceFlow id="f3" sourceRef="Task_ExtractSemantics" targetRef="Task_EmitResultEvent"/>
+    <bpmn:sequenceFlow id="f4" sourceRef="Task_EmitResultEvent"  targetRef="End"/>
+
+  </bpmn:process>
+</bpmn:definitions>
--- a/docs/00-overview.md
+++ b/docs/00-overview.md
@@ -0,0 +1,32 @@
+# dev.uapf.semantic-document-analysis — Overview
+
+**UAPF v1.1 SSOT-conformant** Level 4 process package providing
+reusable semantic document analysis.
+
+## What
+
+A 3-step BPMN process that, given free-text document content:
+
+1. Redacts PII via `ai.redact@1`
+2. Extracts VDVC v1.1 structured semantic metadata via `ai.extract@1`
+3. Emits `document.semantic-analysis.completed.v1` CloudEvent via `event.emit@1`
+
+## What's portable
+
+The package ships:
+- The BPMN flow (the algorithm shape)
+- The VDVC output JSON Schema (the output contract)
+- The resource mapping (input/output contracts, timeouts, retries)
+- The guardrails policy (GDPR + EU AI Act constraints)
+
+The host system supplies the actual AI agent that fulfils the three
+capabilities. Multiple hosts can implement the same capabilities;
+multiple packages can require the same capabilities.
+
+## How to consume
+
+Drop this `.uapf` into any UAPF-conformant runtime. The runtime
+exposes `uapf.run_process` (per UAPF-specification §6.3.1) targeting
+`Process_SemanticDocumentAnalysis`. The runtime resolves the resource
+mapping to find a target with the three required capabilities and
+invokes them in order per the BPMN flow.
--- a/docs/01-eu-ai-act.md
+++ b/docs/01-eu-ai-act.md
@@ -0,0 +1,14 @@
+# EU AI Act analysis
+
+**Classification:** Annex III §5(a) and §8(a) — high-risk AI system
+under Regulation 2024/1689 when used for triage in public-service or
+justice-administration contexts.
+
+**Operator obligations** (the deploying organisation, not this package):
+
+- Art. 13 transparency — output records `summarySource`, `aiConfidenceScore`, `aiModelVersion`
+- Art. 14 human oversight — `humanValidationStatus` blocks consequential action
+- Art. 9 risk management — see `resources/guardrails.yaml`
+- Art. 12 logging — host MUST log invocation with package version + model id
+
+See `resources/guardrails.yaml` for the machine-readable policy.
--- a/docs/02-integration.md
+++ b/docs/02-integration.md
@@ -0,0 +1,21 @@
+# Integration with UAPF-conformant runtimes
+
+This package validates against the UAPF v1.1 SSOT schemas:
+- `uapf.yaml` → `uapf-manifest.schema.json`
+- `metadata/ownership.yaml` → `ownership.schema.json`
+- `metadata/lifecycle.yaml` → `lifecycle.schema.json`
+- `resources/mappings.yaml` → `resource-mapping.schema.json`
+
+It conforms to UAPF Conformance Checklist §"Level 4 requirements":
+- ✓ uapf.yaml present
+- ✓ ≥1 BPMN file (`bpmn/semantic-document-analysis.bpmn.xml`)
+- ✓ `resources/mappings.yaml` present
+- ✓ `metadata/ownership.yaml` + `metadata/lifecycle.yaml` present
+- ✓ Cornerstones declared correctly (bpmn + resources)
+
+## Engine compatibility notes
+
+`uapf-engine@1.0.0` only reads `manifest.json` (pre-v1.1 file naming).
+This package ships both `uapf.yaml` (SSOT-mandated) and `manifest.json`
+(byte-identical-semantics JSON copy) for compatibility until upstream
+engine supports YAML.
--- a/fixtures/child-rights-input.json
+++ b/fixtures/child-rights-input.json
@@ -0,0 +1,3 @@
+{
+  "text": "Vēršos Tiesībsarga birojā par bāriņtiesas lēmumu attiecībā uz manu mazdēlu (vecums 7 gadi). Lēmums pieņemts steidzamā kārtībā 2026. gada martā. Lūdzu izvērtēt rīcības atbilstību Bērnu tiesību aizsardzības likumam."
+}
--- a/fixtures/discrimination-input.json
+++ b/fixtures/discrimination-input.json
@@ -0,0 +1,3 @@
+{
+  "text": "Iesniedzējs ar otrās grupas invaliditāti norāda, ka valsts iestādes darba intervijā atklāti pateikts: 'nevaram pieņemt cilvēkus ar īpašām vajadzībām'. Lūdz Tiesībsargu izmeklēt."
+}
--- a/manifest.json
+++ b/manifest.json
@@ -0,0 +1,45 @@
+{
+  "kind": "uapf.package",
+  "id": "dev.uapf.semantic-document-analysis",
+  "name": "Semantic Document Analysis (UAPF reference algorithm)",
+  "description": "Level-4 UAPF process for extracting VDVC-conformant semantic metadata\n(topic, summary, urgency, risk, sensitivity) from a free-text document.\n\nPortable across document management systems, intake portals, mailroom\nscanners, case-management platforms. Three BPMN service tasks invoke\nthe reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1.\nThe host fulfils each capability with its own AI agent; this package\nsupplies the BPMN flow, the VDVC output JSON Schema, the guardrails,\nand the resource mapping contract.\n",
+  "level": 4,
+  "version": "1.0.0",
+  "includes": [],
+  "dependencies": {},
+  "cornerstones": {
+    "bpmn": true,
+    "dmn": false,
+    "cmmn": false,
+    "resources": true
+  },
+  "paths": {
+    "bpmn": "bpmn",
+    "dmn": "dmn",
+    "cmmn": "cmmn",
+    "resources": "resources",
+    "metadata": "metadata"
+  },
+  "exposure": {
+    "mcp": {
+      "enabled": true,
+      "runnable": true,
+      "exposedEntrypoints": [
+        "Process_SemanticDocumentAnalysis"
+      ],
+      "exposedArtifacts": [
+        "manifest",
+        "bpmn",
+        "docs"
+      ]
+    }
+  },
+  "owners": [
+    {
+      "type": "team",
+      "id": "uapf-stewards",
+      "contact": "stewards@uapf.dev"
+    }
+  ],
+  "lifecycle": "draft"
+}
--- a/metadata/lifecycle.yaml
+++ b/metadata/lifecycle.yaml
@@ -0,0 +1,9 @@
+kind: uapf.metadata.lifecycle
+status: draft
+created: "2026-05-15T20:30:00Z"
+lastModified: "2026-05-15T20:30:00Z"
+changeHistory:
+  - version: "1.0.0"
+    date: "2026-05-15"
+    summary: "Initial release. Reusable UAPF v1.1 Level 4 process for VDVC semantic metadata extraction. Three reserved-namespace capabilities (ai.redact@1, ai.extract@1, event.emit@1). VDVC output schema in resources/schemas/."
+    author: "uapf-stewards"
--- a/metadata/ownership.yaml
+++ b/metadata/ownership.yaml
@@ -0,0 +1,9 @@
+kind: uapf.metadata.ownership
+owners:
+  - type: team
+    id: uapf-stewards
+    name: UAPF Stewards
+    contact: stewards@uapf.dev
+    role: owner
+approvers:
+  - uapf-stewards
--- a/resources/guardrails.yaml
+++ b/resources/guardrails.yaml
@@ -0,0 +1,32 @@
+# Non-normative supplementary file. UAPF v1.1 does NOT cornerstone guardrails;
+# they live under resources/ as a host-readable policy snapshot.
+authority: dev.uapf.stewards
+version: "1.0.0"
+
+privacy:
+  forbidden_in_output:
+    - personal_name
+    - personal_id_number
+    - postal_address
+    - phone_number
+    - email_address
+    - bank_account
+    - iban
+    - health_record_value
+    - biometric_value
+  pii_handling:
+    - "Detected PII MUST be listed in sensitivityControl.detectedEntityTypes as TYPE names only, never values."
+    - "Set personalDataRisk according to detected types: NONE < LOW < MEDIUM < HIGH."
+
+eu_ai_act:
+  classification: "Annex III §5(a) and §8(a) — high-risk per Regulation 2024/1689"
+  required_transparency_fields:
+    - "semanticSummary.summarySource MUST be \"AI\""
+    - "semanticSummary.aiConfidenceScore MUST be 0.0–1.0"
+    - "semanticSummary.aiModelVersion MUST be the exact model identifier"
+  human_oversight: "humanValidationStatus MUST be PENDING or REQUIRED on completion; consuming higher-level process MUST surface to a human before any consequential action."
+
+accuracy:
+  - "Do not fabricate fields not supported by source text."
+  - "Set aiConfidenceScore below 0.3 when classification is uncertain."
+  - "If document is unreadable or too short, set humanValidationStatus to REQUIRED."
--- a/resources/mappings.yaml
+++ b/resources/mappings.yaml
@@ -0,0 +1,54 @@
+kind: uapf.resources.mapping
+
+targets:
+  - id: agent.semantic-extractor
+    type: ai_agent
+    name: Semantic Extraction AI Agent
+    description: |
+      Host-provided AI agent that fulfils ai.redact@1, ai.extract@1, and
+      event.emit@1 for this process. Implementation is the host's choice
+      (Claude, GPT, on-prem LLM, etc.); this package supplies the BPMN
+      flow, the output schema, and the guardrails.
+    capabilities:
+      - capability.ai.redact
+      - capability.ai.extract
+      - capability.event.emit
+
+bindings:
+  - source: { type: bpmn.serviceTask, ref: Task_RedactPii }
+    targetId: agent.semantic-extractor
+    mode: autonomous
+    contract:
+      input:
+        - { name: text,       type: string, required: true }
+        - { name: categories, type: array,  required: false, description: "Optional PII categories; defaults to host policy." }
+      output:
+        - { name: redactedText, type: string }
+        - { name: detections,   type: array }
+      timeout: "10s"
+    requiredCapabilities: [capability.ai.redact]
+
+  - source: { type: bpmn.serviceTask, ref: Task_ExtractSemantics }
+    targetId: agent.semantic-extractor
+    mode: autonomous
+    contract:
+      input:
+        - { name: text,   type: string, required: true,  description: "Redacted text from previous task." }
+        - { name: schema, type: object, required: true,  description: "VDVC v1.1 output schema. Reference: resources/schemas/vdvc-semantic-summary.schema.json" }
+      output:
+        - { name: extracted,  type: object, description: "Validates against resources/schemas/vdvc-semantic-summary.schema.json" }
+        - { name: confidence, type: number }
+        - { name: modelUsed,  type: string }
+      timeout: "30s"
+      retries: { maxAttempts: 2, backoffMs: 2000 }
+    requiredCapabilities: [capability.ai.extract]
+
+  - source: { type: bpmn.serviceTask, ref: Task_EmitResultEvent }
+    targetId: agent.semantic-extractor
+    mode: autonomous
+    contract:
+      input:
+        - { name: eventType, type: string, required: true }
+        - { name: payload,   type: object, required: true }
+      timeout: "5s"
+    requiredCapabilities: [capability.event.emit]
--- a/resources/schemas/vdvc-semantic-summary.schema.json
+++ b/resources/schemas/vdvc-semantic-summary.schema.json
@@ -0,0 +1,49 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://uapf.dev/schemas/vdvc/semantic-summary-v1.1.json",
+  "title": "VDVC Semantic Summary v1.1",
+  "description": "Output contract for ai.extract@1 when invoked by Process_SemanticDocumentAnalysis. The host's AI agent MUST produce output validating against this schema.",
+  "type": "object",
+  "required": ["semanticSummary", "sensitivityControl"],
+  "properties": {
+    "semanticSummary": {
+      "type": "object",
+      "required": ["primaryTopic", "summary", "summarySource", "aiConfidenceScore", "aiModelVersion", "humanValidationStatus"],
+      "properties": {
+        "primaryTopic":          { "type": "string", "maxLength": 200 },
+        "subTopics":             { "type": "array", "items": { "type": "string" } },
+        "summary":               { "type": "string", "maxLength": 4000 },
+        "documentPurpose":       { "type": "string", "maxLength": 200 },
+        "requestedAction":       { "type": "string", "maxLength": 200 },
+        "involvedPartyTypes":    { "type": "array", "items": { "type": "string" }, "description": "Party TYPES only, never names." },
+        "geographicScope":       { "type": "string" },
+        "sectorTags":            { "type": "array", "items": { "type": "string" } },
+        "legalDomain":           { "type": "string" },
+        "estimatedRiskLevel":    { "enum": ["LOW", "MEDIUM", "HIGH", "CRITICAL"] },
+        "urgencyLevel":          { "enum": ["LOW", "NORMAL", "HIGH", "URGENT"] },
+        "keywords":              { "type": "array", "maxItems": 20, "items": { "type": "string" } },
+        "detectedLanguage":      { "type": "string", "pattern": "^[a-z]{2}$" },
+        "summarySource":         { "const": "AI" },
+        "aiConfidenceScore":     { "type": "number", "minimum": 0, "maximum": 1 },
+        "aiModelVersion":        { "type": "string" },
+        "humanValidationStatus": { "enum": ["PENDING", "REQUIRED", "VALIDATED", "REJECTED"] },
+        "mentions_child":        { "type": "boolean" },
+        "ongoing_harm":          { "type": "boolean" },
+        "vulnerable_group":      { "type": "boolean" },
+        "criminal_indication":   { "type": "boolean" }
+      }
+    },
+    "sensitivityControl": {
+      "type": "object",
+      "required": ["personalDataRisk", "allowCentralization", "redactionLevel"],
+      "properties": {
+        "personalDataRisk":      { "enum": ["NONE", "LOW", "MEDIUM", "HIGH"] },
+        "allowCentralization":   { "type": "boolean" },
+        "redactionLevel":        { "enum": ["NONE", "PARTIAL", "FULL"] },
+        "accessRestrictionBasis":{ "type": "string" },
+        "classifiedInformation": { "type": "boolean" },
+        "detectedEntityTypes":   { "type": "array", "items": { "type": "string" } }
+      }
+    }
+  }
+}
--- a/tests/eval-set.json
+++ b/tests/eval-set.json
@@ -0,0 +1,24 @@
+{
+  "algorithm": "Process_SemanticDocumentAnalysis",
+  "package_version": "1.0.0",
+  "cases": [
+    {
+      "id": "child-rights",
+      "input_fixture": "fixtures/child-rights-input.json",
+      "expected_facets": {
+        "mentions_child": true,
+        "vulnerable_group": true,
+        "humanValidationStatus": "PENDING"
+      }
+    },
+    {
+      "id": "discrimination-disability",
+      "input_fixture": "fixtures/discrimination-input.json",
+      "expected_facets": {
+        "vulnerable_group": true,
+        "humanValidationStatus": "PENDING"
+      }
+    }
+  ],
+  "success_criteria": { "min_pass_rate": 0.95, "max_p95_latency_ms": 15000 }
+}
--- a/uapf.yaml
+++ b/uapf.yaml
@@ -0,0 +1,50 @@
+kind: uapf.package
+id: dev.uapf.semantic-document-analysis
+name: Semantic Document Analysis (UAPF reference algorithm)
+description: |
+  Level-4 UAPF process for extracting VDVC-conformant semantic metadata
+  (topic, summary, urgency, risk, sensitivity) from a free-text document.
+
+  Portable across document management systems, intake portals, mailroom
+  scanners, case-management platforms. Three BPMN service tasks invoke
+  the reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1.
+  The host fulfils each capability with its own AI agent; this package
+  supplies the BPMN flow, the VDVC output JSON Schema, the guardrails,
+  and the resource mapping contract.
+
+level: 4
+version: "1.0.0"
+
+includes: []
+dependencies: {}
+
+cornerstones:
+  bpmn: true
+  dmn: false
+  cmmn: false
+  resources: true
+
+paths:
+  bpmn: bpmn
+  dmn: dmn
+  cmmn: cmmn
+  resources: resources
+  metadata: metadata
+
+exposure:
+  mcp:
+    enabled: true
+    runnable: true
+    exposedEntrypoints:
+      - "Process_SemanticDocumentAnalysis"
+    exposedArtifacts:
+      - manifest
+      - bpmn
+      - docs
+
+owners:
+  - type: team
+    id: uapf-stewards
+    contact: stewards@uapf.dev
+
+lifecycle: draft