1
0

v1.0.0: dev.uapf.semantic-document-analysis

UAPF v1.1 SSOT-conformant Level 4 process package — reusable semantic
document analysis, shareable across DMS / intake / mailroom systems.

Structure:
- uapf.yaml (kind: uapf.package, level 4) + manifest.json engine-compat
- bpmn/semantic-document-analysis.bpmn.xml — 3 service tasks invoking
  reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1
- resources/mappings.yaml — task->target bindings with I/O contracts
- resources/schemas/vdvc-semantic-summary.schema.json — output contract
- resources/guardrails.yaml — GDPR + EU AI Act constraints
- metadata/ownership.yaml + metadata/lifecycle.yaml
- docs/, fixtures/, tests/eval-set.json

Validates clean against UAPFormat/UAPF-specification schemas.
This commit is contained in:
2026-05-16 09:32:55 +00:00
commit ae0c646021
16 changed files with 422 additions and 0 deletions

3
.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
*.uapf
__pycache__/
.DS_Store

19
README.md Normal file
View File

@@ -0,0 +1,19 @@
# dev.uapf.semantic-document-analysis
UAPF v1.1 SSOT-conformant Level 4 process package providing reusable
semantic document analysis as `Process_SemanticDocumentAnalysis`.
See `docs/00-overview.md` for what it does, `docs/01-eu-ai-act.md` for
the regulatory analysis, and `docs/02-integration.md` for runtime
integration notes.
## Validates against
Run `jsonschema -i <each yaml-rendered-as-json> <each schema>` against
the canonical schemas in
`github.com/UAPFormat/UAPF-specification/schemas/`:
- `uapf-manifest.schema.json` (root manifest)
- `ownership.schema.json` (metadata/ownership.yaml)
- `lifecycle.schema.json` (metadata/lifecycle.yaml)
- `resource-mapping.schema.json` (resources/mappings.yaml)

View File

@@ -0,0 +1,55 @@
<?xml version="1.0" encoding="UTF-8"?>
<bpmn:definitions
xmlns:bpmn="http://www.omg.org/spec/BPMN/20100524/MODEL"
xmlns:uapf="https://uapf.dev/bpmn-ext/v1"
id="Definitions_SemanticAnalysis"
targetNamespace="https://uapf.dev/processes/semantic-document-analysis">
<bpmn:process id="Process_SemanticDocumentAnalysis"
name="Semantic document analysis"
isExecutable="true">
<bpmn:startEvent id="Start" name="Document text received"/>
<bpmn:serviceTask id="Task_RedactPii"
name="Redact personally identifiable information"
uapf:capability="ai.redact@1">
<bpmn:documentation>
Calls ai.redact@1 to mask names, identifiers, addresses, financial
and health data before downstream extraction. Required by
resources/guardrails.yaml (GDPR Art. 5 minimisation).
</bpmn:documentation>
</bpmn:serviceTask>
<bpmn:serviceTask id="Task_ExtractSemantics"
name="Extract semantic metadata"
uapf:capability="ai.extract@1"
uapf:schemaRef="resources/schemas/vdvc-semantic-summary.schema.json">
<bpmn:documentation>
Calls ai.extract@1 with the redacted text and the VDVC v1.1 output
schema (resources/schemas/vdvc-semantic-summary.schema.json). The
host's AI agent must produce output that validates against that
schema. Output records aiModelVersion + aiConfidenceScore per
EU AI Act Art. 13.
</bpmn:documentation>
</bpmn:serviceTask>
<bpmn:serviceTask id="Task_EmitResultEvent"
name="Emit semantic-analysis-completed event"
uapf:capability="event.emit@1"
uapf:eventType="document.semantic-analysis.completed.v1">
<bpmn:documentation>
Calls event.emit@1 to publish a CloudEvent containing the extracted
semantic summary. Downstream processes consume this event.
</bpmn:documentation>
</bpmn:serviceTask>
<bpmn:endEvent id="End" name="Semantic analysis complete"/>
<bpmn:sequenceFlow id="f1" sourceRef="Start" targetRef="Task_RedactPii"/>
<bpmn:sequenceFlow id="f2" sourceRef="Task_RedactPii" targetRef="Task_ExtractSemantics"/>
<bpmn:sequenceFlow id="f3" sourceRef="Task_ExtractSemantics" targetRef="Task_EmitResultEvent"/>
<bpmn:sequenceFlow id="f4" sourceRef="Task_EmitResultEvent" targetRef="End"/>
</bpmn:process>
</bpmn:definitions>

32
docs/00-overview.md Normal file
View File

@@ -0,0 +1,32 @@
# dev.uapf.semantic-document-analysis — Overview
**UAPF v1.1 SSOT-conformant** Level 4 process package providing
reusable semantic document analysis.
## What
A 3-step BPMN process that, given free-text document content:
1. Redacts PII via `ai.redact@1`
2. Extracts VDVC v1.1 structured semantic metadata via `ai.extract@1`
3. Emits `document.semantic-analysis.completed.v1` CloudEvent via `event.emit@1`
## What's portable
The package ships:
- The BPMN flow (the algorithm shape)
- The VDVC output JSON Schema (the output contract)
- The resource mapping (input/output contracts, timeouts, retries)
- The guardrails policy (GDPR + EU AI Act constraints)
The host system supplies the actual AI agent that fulfils the three
capabilities. Multiple hosts can implement the same capabilities;
multiple packages can require the same capabilities.
## How to consume
Drop this `.uapf` into any UAPF-conformant runtime. The runtime
exposes `uapf.run_process` (per UAPF-specification §6.3.1) targeting
`Process_SemanticDocumentAnalysis`. The runtime resolves the resource
mapping to find a target with the three required capabilities and
invokes them in order per the BPMN flow.

14
docs/01-eu-ai-act.md Normal file
View File

@@ -0,0 +1,14 @@
# EU AI Act analysis
**Classification:** Annex III §5(a) and §8(a) — high-risk AI system
under Regulation 2024/1689 when used for triage in public-service or
justice-administration contexts.
**Operator obligations** (the deploying organisation, not this package):
- Art. 13 transparency — output records `summarySource`, `aiConfidenceScore`, `aiModelVersion`
- Art. 14 human oversight — `humanValidationStatus` blocks consequential action
- Art. 9 risk management — see `resources/guardrails.yaml`
- Art. 12 logging — host MUST log invocation with package version + model id
See `resources/guardrails.yaml` for the machine-readable policy.

21
docs/02-integration.md Normal file
View File

@@ -0,0 +1,21 @@
# Integration with UAPF-conformant runtimes
This package validates against the UAPF v1.1 SSOT schemas:
- `uapf.yaml``uapf-manifest.schema.json`
- `metadata/ownership.yaml``ownership.schema.json`
- `metadata/lifecycle.yaml``lifecycle.schema.json`
- `resources/mappings.yaml``resource-mapping.schema.json`
It conforms to UAPF Conformance Checklist §"Level 4 requirements":
- ✓ uapf.yaml present
- ✓ ≥1 BPMN file (`bpmn/semantic-document-analysis.bpmn.xml`)
-`resources/mappings.yaml` present
-`metadata/ownership.yaml` + `metadata/lifecycle.yaml` present
- ✓ Cornerstones declared correctly (bpmn + resources)
## Engine compatibility notes
`uapf-engine@1.0.0` only reads `manifest.json` (pre-v1.1 file naming).
This package ships both `uapf.yaml` (SSOT-mandated) and `manifest.json`
(byte-identical-semantics JSON copy) for compatibility until upstream
engine supports YAML.

View File

@@ -0,0 +1,3 @@
{
"text": "Vēršos Tiesībsarga birojā par bāriņtiesas lēmumu attiecībā uz manu mazdēlu (vecums 7 gadi). Lēmums pieņemts steidzamā kārtībā 2026. gada martā. Lūdzu izvērtēt rīcības atbilstību Bērnu tiesību aizsardzības likumam."
}

View File

@@ -0,0 +1,3 @@
{
"text": "Iesniedzējs ar otrās grupas invaliditāti norāda, ka valsts iestādes darba intervijā atklāti pateikts: 'nevaram pieņemt cilvēkus ar īpašām vajadzībām'. Lūdz Tiesībsargu izmeklēt."
}

45
manifest.json Normal file
View File

@@ -0,0 +1,45 @@
{
"kind": "uapf.package",
"id": "dev.uapf.semantic-document-analysis",
"name": "Semantic Document Analysis (UAPF reference algorithm)",
"description": "Level-4 UAPF process for extracting VDVC-conformant semantic metadata\n(topic, summary, urgency, risk, sensitivity) from a free-text document.\n\nPortable across document management systems, intake portals, mailroom\nscanners, case-management platforms. Three BPMN service tasks invoke\nthe reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1.\nThe host fulfils each capability with its own AI agent; this package\nsupplies the BPMN flow, the VDVC output JSON Schema, the guardrails,\nand the resource mapping contract.\n",
"level": 4,
"version": "1.0.0",
"includes": [],
"dependencies": {},
"cornerstones": {
"bpmn": true,
"dmn": false,
"cmmn": false,
"resources": true
},
"paths": {
"bpmn": "bpmn",
"dmn": "dmn",
"cmmn": "cmmn",
"resources": "resources",
"metadata": "metadata"
},
"exposure": {
"mcp": {
"enabled": true,
"runnable": true,
"exposedEntrypoints": [
"Process_SemanticDocumentAnalysis"
],
"exposedArtifacts": [
"manifest",
"bpmn",
"docs"
]
}
},
"owners": [
{
"type": "team",
"id": "uapf-stewards",
"contact": "stewards@uapf.dev"
}
],
"lifecycle": "draft"
}

9
metadata/lifecycle.yaml Normal file
View File

@@ -0,0 +1,9 @@
kind: uapf.metadata.lifecycle
status: draft
created: "2026-05-15T20:30:00Z"
lastModified: "2026-05-15T20:30:00Z"
changeHistory:
- version: "1.0.0"
date: "2026-05-15"
summary: "Initial release. Reusable UAPF v1.1 Level 4 process for VDVC semantic metadata extraction. Three reserved-namespace capabilities (ai.redact@1, ai.extract@1, event.emit@1). VDVC output schema in resources/schemas/."
author: "uapf-stewards"

9
metadata/ownership.yaml Normal file
View File

@@ -0,0 +1,9 @@
kind: uapf.metadata.ownership
owners:
- type: team
id: uapf-stewards
name: UAPF Stewards
contact: stewards@uapf.dev
role: owner
approvers:
- uapf-stewards

32
resources/guardrails.yaml Normal file
View File

@@ -0,0 +1,32 @@
# Non-normative supplementary file. UAPF v1.1 does NOT cornerstone guardrails;
# they live under resources/ as a host-readable policy snapshot.
authority: dev.uapf.stewards
version: "1.0.0"
privacy:
forbidden_in_output:
- personal_name
- personal_id_number
- postal_address
- phone_number
- email_address
- bank_account
- iban
- health_record_value
- biometric_value
pii_handling:
- "Detected PII MUST be listed in sensitivityControl.detectedEntityTypes as TYPE names only, never values."
- "Set personalDataRisk according to detected types: NONE < LOW < MEDIUM < HIGH."
eu_ai_act:
classification: "Annex III §5(a) and §8(a) — high-risk per Regulation 2024/1689"
required_transparency_fields:
- "semanticSummary.summarySource MUST be \"AI\""
- "semanticSummary.aiConfidenceScore MUST be 0.0–1.0"
- "semanticSummary.aiModelVersion MUST be the exact model identifier"
human_oversight: "humanValidationStatus MUST be PENDING or REQUIRED on completion; consuming higher-level process MUST surface to a human before any consequential action."
accuracy:
- "Do not fabricate fields not supported by source text."
- "Set aiConfidenceScore below 0.3 when classification is uncertain."
- "If document is unreadable or too short, set humanValidationStatus to REQUIRED."

54
resources/mappings.yaml Normal file
View File

@@ -0,0 +1,54 @@
kind: uapf.resources.mapping
targets:
- id: agent.semantic-extractor
type: ai_agent
name: Semantic Extraction AI Agent
description: |
Host-provided AI agent that fulfils ai.redact@1, ai.extract@1, and
event.emit@1 for this process. Implementation is the host's choice
(Claude, GPT, on-prem LLM, etc.); this package supplies the BPMN
flow, the output schema, and the guardrails.
capabilities:
- capability.ai.redact
- capability.ai.extract
- capability.event.emit
bindings:
- source: { type: bpmn.serviceTask, ref: Task_RedactPii }
targetId: agent.semantic-extractor
mode: autonomous
contract:
input:
- { name: text, type: string, required: true }
- { name: categories, type: array, required: false, description: "Optional PII categories; defaults to host policy." }
output:
- { name: redactedText, type: string }
- { name: detections, type: array }
timeout: "10s"
requiredCapabilities: [capability.ai.redact]
- source: { type: bpmn.serviceTask, ref: Task_ExtractSemantics }
targetId: agent.semantic-extractor
mode: autonomous
contract:
input:
- { name: text, type: string, required: true, description: "Redacted text from previous task." }
- { name: schema, type: object, required: true, description: "VDVC v1.1 output schema. Reference: resources/schemas/vdvc-semantic-summary.schema.json" }
output:
- { name: extracted, type: object, description: "Validates against resources/schemas/vdvc-semantic-summary.schema.json" }
- { name: confidence, type: number }
- { name: modelUsed, type: string }
timeout: "30s"
retries: { maxAttempts: 2, backoffMs: 2000 }
requiredCapabilities: [capability.ai.extract]
- source: { type: bpmn.serviceTask, ref: Task_EmitResultEvent }
targetId: agent.semantic-extractor
mode: autonomous
contract:
input:
- { name: eventType, type: string, required: true }
- { name: payload, type: object, required: true }
timeout: "5s"
requiredCapabilities: [capability.event.emit]

View File

@@ -0,0 +1,49 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://uapf.dev/schemas/vdvc/semantic-summary-v1.1.json",
"title": "VDVC Semantic Summary v1.1",
"description": "Output contract for ai.extract@1 when invoked by Process_SemanticDocumentAnalysis. The host's AI agent MUST produce output validating against this schema.",
"type": "object",
"required": ["semanticSummary", "sensitivityControl"],
"properties": {
"semanticSummary": {
"type": "object",
"required": ["primaryTopic", "summary", "summarySource", "aiConfidenceScore", "aiModelVersion", "humanValidationStatus"],
"properties": {
"primaryTopic": { "type": "string", "maxLength": 200 },
"subTopics": { "type": "array", "items": { "type": "string" } },
"summary": { "type": "string", "maxLength": 4000 },
"documentPurpose": { "type": "string", "maxLength": 200 },
"requestedAction": { "type": "string", "maxLength": 200 },
"involvedPartyTypes": { "type": "array", "items": { "type": "string" }, "description": "Party TYPES only, never names." },
"geographicScope": { "type": "string" },
"sectorTags": { "type": "array", "items": { "type": "string" } },
"legalDomain": { "type": "string" },
"estimatedRiskLevel": { "enum": ["LOW", "MEDIUM", "HIGH", "CRITICAL"] },
"urgencyLevel": { "enum": ["LOW", "NORMAL", "HIGH", "URGENT"] },
"keywords": { "type": "array", "maxItems": 20, "items": { "type": "string" } },
"detectedLanguage": { "type": "string", "pattern": "^[a-z]{2}$" },
"summarySource": { "const": "AI" },
"aiConfidenceScore": { "type": "number", "minimum": 0, "maximum": 1 },
"aiModelVersion": { "type": "string" },
"humanValidationStatus": { "enum": ["PENDING", "REQUIRED", "VALIDATED", "REJECTED"] },
"mentions_child": { "type": "boolean" },
"ongoing_harm": { "type": "boolean" },
"vulnerable_group": { "type": "boolean" },
"criminal_indication": { "type": "boolean" }
}
},
"sensitivityControl": {
"type": "object",
"required": ["personalDataRisk", "allowCentralization", "redactionLevel"],
"properties": {
"personalDataRisk": { "enum": ["NONE", "LOW", "MEDIUM", "HIGH"] },
"allowCentralization": { "type": "boolean" },
"redactionLevel": { "enum": ["NONE", "PARTIAL", "FULL"] },
"accessRestrictionBasis":{ "type": "string" },
"classifiedInformation": { "type": "boolean" },
"detectedEntityTypes": { "type": "array", "items": { "type": "string" } }
}
}
}
}

24
tests/eval-set.json Normal file
View File

@@ -0,0 +1,24 @@
{
"algorithm": "Process_SemanticDocumentAnalysis",
"package_version": "1.0.0",
"cases": [
{
"id": "child-rights",
"input_fixture": "fixtures/child-rights-input.json",
"expected_facets": {
"mentions_child": true,
"vulnerable_group": true,
"humanValidationStatus": "PENDING"
}
},
{
"id": "discrimination-disability",
"input_fixture": "fixtures/discrimination-input.json",
"expected_facets": {
"vulnerable_group": true,
"humanValidationStatus": "PENDING"
}
}
],
"success_criteria": { "min_pass_rate": 0.95, "max_p95_latency_ms": 15000 }
}

50
uapf.yaml Normal file
View File

@@ -0,0 +1,50 @@
kind: uapf.package
id: dev.uapf.semantic-document-analysis
name: Semantic Document Analysis (UAPF reference algorithm)
description: |
Level-4 UAPF process for extracting VDVC-conformant semantic metadata
(topic, summary, urgency, risk, sensitivity) from a free-text document.
Portable across document management systems, intake portals, mailroom
scanners, case-management platforms. Three BPMN service tasks invoke
the reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1.
The host fulfils each capability with its own AI agent; this package
supplies the BPMN flow, the VDVC output JSON Schema, the guardrails,
and the resource mapping contract.
level: 4
version: "1.0.0"
includes: []
dependencies: {}
cornerstones:
bpmn: true
dmn: false
cmmn: false
resources: true
paths:
bpmn: bpmn
dmn: dmn
cmmn: cmmn
resources: resources
metadata: metadata
exposure:
mcp:
enabled: true
runnable: true
exposedEntrypoints:
- "Process_SemanticDocumentAnalysis"
exposedArtifacts:
- manifest
- bpmn
- docs
owners:
- type: team
id: uapf-stewards
contact: stewards@uapf.dev
lifecycle: draft