You've already forked dokumenta-semantiska-analize
Import UAPF package
rewrite 2.0.0: real process — extract the algorithm into DMN
The 1.x package was a single ai.extract call wrapped in three BPMN
service tasks. No decision logic, no dmn cornerstone, no weights — the
risk/routing/validation algorithm lived invisibly in host code. There
was nothing for a runtime to actually execute.
2.0.0 makes it a real process:
- dmn cornerstone added with three decision tables:
* assess-personal-data-risk — PII regex signals -> risk level
* gdpr-processing-route — risk x centralisation -> CENTRAL/LOCAL,
anonymisation, redaction level
* human-validation-gate — confidence thresholds + PII re-scan
-> REJECTED/PENDING_REVIEW/APPROVED_AUTO
- BPMN expanded 3 -> 6 nodes (3 serviceTask + 3 businessRuleTask),
with horizontal DI.
- Task ids, mappings, docs, manifest (dmn:true), uapf.yaml, lifecycle
and eval-set updated; added a PII-bearing fixture.
Only the semantic extraction remains a model step. Risk classification,
GDPR routing and validation gating are now explicit ranked DMN rules —
inspectable, versioned, portable. Breaking change: structure + outputs.
This commit is contained in:
77
README.md
77
README.md
@@ -1,19 +1,68 @@
|
|||||||
# dev.uapf.semantic-document-analysis
|
# Semantic Document Analysis
|
||||||
|
|
||||||
UAPF v1.1 SSOT-conformant Level 4 process package providing reusable
|
A UAPF Level-4 process package for extracting VDVC-conformant semantic
|
||||||
semantic document analysis as `Process_SemanticDocumentAnalysis`.
|
metadata from free-text documents.
|
||||||
|
|
||||||
See `docs/00-overview.md` for what it does, `docs/01-eu-ai-act.md` for
|
## What this package is
|
||||||
the regulatory analysis, and `docs/02-integration.md` for runtime
|
|
||||||
integration notes.
|
|
||||||
|
|
||||||
## Validates against
|
A **real, inspectable process** — not a single AI call in BPMN costume.
|
||||||
|
The flow has six executable nodes; three of them are DMN decision tables
|
||||||
|
that carry the actual algorithm, with explicit ranked rules and weights.
|
||||||
|
|
||||||
Run `jsonschema -i <each yaml-rendered-as-json> <each schema>` against
|
```
|
||||||
the canonical schemas in
|
Start
|
||||||
`github.com/UAPFormat/UAPF-specification/schemas/`:
|
-> [service] Detect and redact PII ai.redact@1
|
||||||
|
-> [decision] Assess personal-data risk DMN assess-personal-data-risk
|
||||||
|
-> [decision] Decide GDPR processing route DMN gdpr-processing-route
|
||||||
|
-> [service] Extract semantic metadata ai.extract@1
|
||||||
|
-> [decision] Determine validation status DMN human-validation-gate
|
||||||
|
-> [service] Emit completed event event.emit@1
|
||||||
|
End
|
||||||
|
```
|
||||||
|
|
||||||
- `uapf-manifest.schema.json` (root manifest)
|
Only **one** node performs model inference (semantic extraction). PII
|
||||||
- `ownership.schema.json` (metadata/ownership.yaml)
|
detection, risk classification, GDPR routing and the human-validation
|
||||||
- `lifecycle.schema.json` (metadata/lifecycle.yaml)
|
gate are deterministic — the host cannot make them up.
|
||||||
- `resource-mapping.schema.json` (resources/mappings.yaml)
|
|
||||||
|
## The decision tables (dmn/)
|
||||||
|
|
||||||
|
### assess-personal-data-risk
|
||||||
|
PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces
|
||||||
|
HIGH; two or more PII categories, or contact data, gives MEDIUM; one
|
||||||
|
category LOW; nothing NONE. Hit policy FIRST (ranked).
|
||||||
|
|
||||||
|
### gdpr-processing-route
|
||||||
|
`personalDataRisk` x `allowCentralization` -> `processingRoute`
|
||||||
|
(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A
|
||||||
|
sensitive document whose owner has not permitted centralisation stays
|
||||||
|
LOCAL with full redaction. This is the routing rule lifted out of the
|
||||||
|
host's `generate_semantic_metadata`.
|
||||||
|
|
||||||
|
### human-validation-gate
|
||||||
|
`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` ->
|
||||||
|
`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
|
||||||
|
`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED;
|
||||||
|
below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
|
||||||
|
APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.
|
||||||
|
|
||||||
|
## Capabilities required of the host
|
||||||
|
|
||||||
|
| Capability | Used by | Purpose |
|
||||||
|
|----------------|------------------------|----------------------------------|
|
||||||
|
| ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals |
|
||||||
|
| ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction |
|
||||||
|
| event.emit@1 | Task_EmitResult | Publish completion CloudEvent |
|
||||||
|
|
||||||
|
DMN decisions need no host capability — the runtime evaluates them.
|
||||||
|
|
||||||
|
## Output contract
|
||||||
|
|
||||||
|
`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1
|
||||||
|
output. The process additionally yields the DMN-decided fields
|
||||||
|
(`personalDataRisk`, `processingRoute`, `redactionLevel`,
|
||||||
|
`humanValidationStatus`, `requiresHumanReview`).
|
||||||
|
|
||||||
|
## Compliance
|
||||||
|
|
||||||
|
EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
|
||||||
|
minimisation. See `resources/guardrails.yaml` and `docs/`.
|
||||||
|
|||||||
@@ -14,45 +14,89 @@
|
|||||||
|
|
||||||
<bpmn:startEvent id="Start" name="Document text received"/>
|
<bpmn:startEvent id="Start" name="Document text received"/>
|
||||||
|
|
||||||
<bpmn:serviceTask id="Task_RedactPii"
|
<bpmn:serviceTask id="Task_DetectRedactPii"
|
||||||
name="Redact personally identifiable information"
|
name="Detect and redact PII"
|
||||||
uapf:capability="ai.redact@1">
|
uapf:capability="ai.redact@1">
|
||||||
<bpmn:documentation>
|
<bpmn:documentation>
|
||||||
Calls ai.redact@1 to mask names, identifiers, addresses, financial
|
Calls ai.redact@1 over the source text. Beyond masking, the host
|
||||||
and health data before downstream extraction. Required by
|
runs the four Latvian PII regex detectors (personas kods, IBAN,
|
||||||
resources/guardrails.yaml (GDPR Art. 5 minimisation).
|
e-mail, phone) and returns the deterministic signal set the risk
|
||||||
|
decision consumes: personasKodaPresent, financialDataPresent,
|
||||||
|
contactDataPresent, piiCategoryCount, detectedEntityTypes, plus
|
||||||
|
redactedContent. No model inference — pure pattern detection.
|
||||||
</bpmn:documentation>
|
</bpmn:documentation>
|
||||||
</bpmn:serviceTask>
|
</bpmn:serviceTask>
|
||||||
|
|
||||||
|
<bpmn:businessRuleTask id="Decision_AssessRisk"
|
||||||
|
name="Assess personal-data risk"
|
||||||
|
uapf:decision="assess-personal-data-risk">
|
||||||
|
<bpmn:documentation>
|
||||||
|
DMN dmn/assess-personal-data-risk.dmn. Maps the PII signal set to
|
||||||
|
personalDataRisk (NONE | LOW | MEDIUM | HIGH) by explicit ranked
|
||||||
|
rules. Personas kods or IBAN forces HIGH; two or more categories
|
||||||
|
or contact data gives MEDIUM. Deterministic and auditable.
|
||||||
|
</bpmn:documentation>
|
||||||
|
</bpmn:businessRuleTask>
|
||||||
|
|
||||||
|
<bpmn:businessRuleTask id="Decision_GdprRoute"
|
||||||
|
name="Decide GDPR processing route"
|
||||||
|
uapf:decision="gdpr-processing-route">
|
||||||
|
<bpmn:documentation>
|
||||||
|
DMN dmn/gdpr-processing-route.dmn. From personalDataRisk and
|
||||||
|
allowCentralization decides processingRoute (CENTRAL | LOCAL),
|
||||||
|
anonymizationRequired and redactionLevel. This is the routing
|
||||||
|
rule extracted from the host's generate_semantic_metadata: a
|
||||||
|
sensitive document where centralisation is not permitted stays
|
||||||
|
LOCAL with full redaction.
|
||||||
|
</bpmn:documentation>
|
||||||
|
</bpmn:businessRuleTask>
|
||||||
|
|
||||||
<bpmn:serviceTask id="Task_ExtractSemantics"
|
<bpmn:serviceTask id="Task_ExtractSemantics"
|
||||||
name="Extract semantic metadata"
|
name="Extract semantic metadata"
|
||||||
uapf:capability="ai.extract@1"
|
uapf:capability="ai.extract@1"
|
||||||
uapf:schemaRef="resources/schemas/vdvc-semantic-summary.schema.json">
|
uapf:schemaRef="resources/schemas/vdvc-semantic-summary.schema.json">
|
||||||
<bpmn:documentation>
|
<bpmn:documentation>
|
||||||
Calls ai.extract@1 with the redacted text and the VDVC v1.1 output
|
Calls ai.extract@1 on redactedContent with the VDVC v1.1 output
|
||||||
schema (resources/schemas/vdvc-semantic-summary.schema.json). The
|
schema. This is the single bounded model step: it produces the
|
||||||
host's AI agent must produce output that validates against that
|
semanticSummary (topic, summary, keywords, urgency, risk) and
|
||||||
schema. Output records aiModelVersion + aiConfidenceScore per
|
must validate against resources/schemas/vdvc-semantic-summary.
|
||||||
EU AI Act Art. 13.
|
The host also returns flat aiConfidenceScore and the result of
|
||||||
|
the post-extraction PII re-scan as outputPiiErrorCount.
|
||||||
</bpmn:documentation>
|
</bpmn:documentation>
|
||||||
</bpmn:serviceTask>
|
</bpmn:serviceTask>
|
||||||
|
|
||||||
<bpmn:serviceTask id="Task_EmitResultEvent"
|
<bpmn:businessRuleTask id="Decision_ValidationGate"
|
||||||
|
name="Determine human-validation status"
|
||||||
|
uapf:decision="human-validation-gate">
|
||||||
|
<bpmn:documentation>
|
||||||
|
DMN dmn/human-validation-gate.dmn. From outputPiiErrorCount,
|
||||||
|
aiConfidenceScore and personalDataRisk decides
|
||||||
|
humanValidationStatus (REJECTED | PENDING_REVIEW | APPROVED_AUTO)
|
||||||
|
and requiresHumanReview. Any leaked PII or confidence below 0.3
|
||||||
|
rejects; below 0.7, or HIGH risk, forces review; 0.7 and above
|
||||||
|
with clean output auto-approves. The thresholds are the weights.
|
||||||
|
</bpmn:documentation>
|
||||||
|
</bpmn:businessRuleTask>
|
||||||
|
|
||||||
|
<bpmn:serviceTask id="Task_EmitResult"
|
||||||
name="Emit semantic-analysis-completed event"
|
name="Emit semantic-analysis-completed event"
|
||||||
uapf:capability="event.emit@1"
|
uapf:capability="event.emit@1"
|
||||||
uapf:eventType="document.semantic-analysis.completed.v1">
|
uapf:eventType="document.semantic-analysis.completed.v1">
|
||||||
<bpmn:documentation>
|
<bpmn:documentation>
|
||||||
Calls event.emit@1 to publish a CloudEvent containing the extracted
|
Calls event.emit@1 to publish a CloudEvent carrying the semantic
|
||||||
semantic summary. Downstream processes consume this event.
|
summary, the routing decision and the validation status.
|
||||||
</bpmn:documentation>
|
</bpmn:documentation>
|
||||||
</bpmn:serviceTask>
|
</bpmn:serviceTask>
|
||||||
|
|
||||||
<bpmn:endEvent id="End" name="Semantic analysis complete"/>
|
<bpmn:endEvent id="End" name="Semantic analysis complete"/>
|
||||||
|
|
||||||
<bpmn:sequenceFlow id="f1" sourceRef="Start" targetRef="Task_RedactPii"/>
|
<bpmn:sequenceFlow id="f1" sourceRef="Start" targetRef="Task_DetectRedactPii"/>
|
||||||
<bpmn:sequenceFlow id="f2" sourceRef="Task_RedactPii" targetRef="Task_ExtractSemantics"/>
|
<bpmn:sequenceFlow id="f2" sourceRef="Task_DetectRedactPii" targetRef="Decision_AssessRisk"/>
|
||||||
<bpmn:sequenceFlow id="f3" sourceRef="Task_ExtractSemantics" targetRef="Task_EmitResultEvent"/>
|
<bpmn:sequenceFlow id="f3" sourceRef="Decision_AssessRisk" targetRef="Decision_GdprRoute"/>
|
||||||
<bpmn:sequenceFlow id="f4" sourceRef="Task_EmitResultEvent" targetRef="End"/>
|
<bpmn:sequenceFlow id="f4" sourceRef="Decision_GdprRoute" targetRef="Task_ExtractSemantics"/>
|
||||||
|
<bpmn:sequenceFlow id="f5" sourceRef="Task_ExtractSemantics" targetRef="Decision_ValidationGate"/>
|
||||||
|
<bpmn:sequenceFlow id="f6" sourceRef="Decision_ValidationGate" targetRef="Task_EmitResult"/>
|
||||||
|
<bpmn:sequenceFlow id="f7" sourceRef="Task_EmitResult" targetRef="End"/>
|
||||||
|
|
||||||
</bpmn:process>
|
</bpmn:process>
|
||||||
|
|
||||||
@@ -61,33 +105,54 @@
|
|||||||
<bpmndi:BPMNShape id="Start_di" bpmnElement="Start">
|
<bpmndi:BPMNShape id="Start_di" bpmnElement="Start">
|
||||||
<dc:Bounds x="152" y="102" width="36" height="36"/>
|
<dc:Bounds x="152" y="102" width="36" height="36"/>
|
||||||
</bpmndi:BPMNShape>
|
</bpmndi:BPMNShape>
|
||||||
<bpmndi:BPMNShape id="Task_RedactPii_di" bpmnElement="Task_RedactPii">
|
<bpmndi:BPMNShape id="Task_DetectRedactPii_di" bpmnElement="Task_DetectRedactPii">
|
||||||
<dc:Bounds x="240" y="80" width="100" height="80"/>
|
<dc:Bounds x="240" y="90" width="110" height="80"/>
|
||||||
|
</bpmndi:BPMNShape>
|
||||||
|
<bpmndi:BPMNShape id="Decision_AssessRisk_di" bpmnElement="Decision_AssessRisk">
|
||||||
|
<dc:Bounds x="410" y="90" width="110" height="80"/>
|
||||||
|
</bpmndi:BPMNShape>
|
||||||
|
<bpmndi:BPMNShape id="Decision_GdprRoute_di" bpmnElement="Decision_GdprRoute">
|
||||||
|
<dc:Bounds x="580" y="90" width="120" height="80"/>
|
||||||
</bpmndi:BPMNShape>
|
</bpmndi:BPMNShape>
|
||||||
<bpmndi:BPMNShape id="Task_ExtractSemantics_di" bpmnElement="Task_ExtractSemantics">
|
<bpmndi:BPMNShape id="Task_ExtractSemantics_di" bpmnElement="Task_ExtractSemantics">
|
||||||
<dc:Bounds x="420" y="80" width="100" height="80"/>
|
<dc:Bounds x="760" y="90" width="110" height="80"/>
|
||||||
</bpmndi:BPMNShape>
|
</bpmndi:BPMNShape>
|
||||||
<bpmndi:BPMNShape id="Task_EmitResultEvent_di" bpmnElement="Task_EmitResultEvent">
|
<bpmndi:BPMNShape id="Decision_ValidationGate_di" bpmnElement="Decision_ValidationGate">
|
||||||
<dc:Bounds x="600" y="80" width="100" height="80"/>
|
<dc:Bounds x="930" y="90" width="120" height="80"/>
|
||||||
|
</bpmndi:BPMNShape>
|
||||||
|
<bpmndi:BPMNShape id="Task_EmitResult_di" bpmnElement="Task_EmitResult">
|
||||||
|
<dc:Bounds x="1110" y="90" width="110" height="80"/>
|
||||||
</bpmndi:BPMNShape>
|
</bpmndi:BPMNShape>
|
||||||
<bpmndi:BPMNShape id="End_di" bpmnElement="End">
|
<bpmndi:BPMNShape id="End_di" bpmnElement="End">
|
||||||
<dc:Bounds x="780" y="102" width="36" height="36"/>
|
<dc:Bounds x="1290" y="102" width="36" height="36"/>
|
||||||
</bpmndi:BPMNShape>
|
</bpmndi:BPMNShape>
|
||||||
<bpmndi:BPMNEdge id="f1_di" bpmnElement="f1">
|
<bpmndi:BPMNEdge id="f1_di" bpmnElement="f1">
|
||||||
<di:waypoint x="188" y="120"/>
|
<di:waypoint x="188" y="120"/>
|
||||||
<di:waypoint x="240" y="120"/>
|
<di:waypoint x="240" y="120"/>
|
||||||
</bpmndi:BPMNEdge>
|
</bpmndi:BPMNEdge>
|
||||||
<bpmndi:BPMNEdge id="f2_di" bpmnElement="f2">
|
<bpmndi:BPMNEdge id="f2_di" bpmnElement="f2">
|
||||||
<di:waypoint x="340" y="120"/>
|
<di:waypoint x="350" y="120"/>
|
||||||
<di:waypoint x="420" y="120"/>
|
<di:waypoint x="410" y="120"/>
|
||||||
</bpmndi:BPMNEdge>
|
</bpmndi:BPMNEdge>
|
||||||
<bpmndi:BPMNEdge id="f3_di" bpmnElement="f3">
|
<bpmndi:BPMNEdge id="f3_di" bpmnElement="f3">
|
||||||
<di:waypoint x="520" y="120"/>
|
<di:waypoint x="520" y="120"/>
|
||||||
<di:waypoint x="600" y="120"/>
|
<di:waypoint x="580" y="120"/>
|
||||||
</bpmndi:BPMNEdge>
|
</bpmndi:BPMNEdge>
|
||||||
<bpmndi:BPMNEdge id="f4_di" bpmnElement="f4">
|
<bpmndi:BPMNEdge id="f4_di" bpmnElement="f4">
|
||||||
<di:waypoint x="700" y="120"/>
|
<di:waypoint x="700" y="120"/>
|
||||||
<di:waypoint x="780" y="120"/>
|
<di:waypoint x="760" y="120"/>
|
||||||
|
</bpmndi:BPMNEdge>
|
||||||
|
<bpmndi:BPMNEdge id="f5_di" bpmnElement="f5">
|
||||||
|
<di:waypoint x="870" y="120"/>
|
||||||
|
<di:waypoint x="930" y="120"/>
|
||||||
|
</bpmndi:BPMNEdge>
|
||||||
|
<bpmndi:BPMNEdge id="f6_di" bpmnElement="f6">
|
||||||
|
<di:waypoint x="1050" y="120"/>
|
||||||
|
<di:waypoint x="1110" y="120"/>
|
||||||
|
</bpmndi:BPMNEdge>
|
||||||
|
<bpmndi:BPMNEdge id="f7_di" bpmnElement="f7">
|
||||||
|
<di:waypoint x="1220" y="120"/>
|
||||||
|
<di:waypoint x="1290" y="120"/>
|
||||||
</bpmndi:BPMNEdge>
|
</bpmndi:BPMNEdge>
|
||||||
</bpmndi:BPMNPlane>
|
</bpmndi:BPMNPlane>
|
||||||
</bpmndi:BPMNDiagram>
|
</bpmndi:BPMNDiagram>
|
||||||
|
|||||||
71
dmn/assess-personal-data-risk.dmn
Normal file
71
dmn/assess-personal-data-risk.dmn
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<dmn:definitions xmlns:dmn="https://www.omg.org/spec/DMN/20191111/MODEL/"
|
||||||
|
id="Definitions_AssessPersonalDataRisk"
|
||||||
|
namespace="https://uapf.dev/processes/semantic-document-analysis">
|
||||||
|
<dmn:decision id="assess-personal-data-risk" name="Assess personal-data risk">
|
||||||
|
<dmn:decisionTable hitPolicy="FIRST">
|
||||||
|
<dmn:input id="i_pk" label="Personas kods present">
|
||||||
|
<dmn:inputExpression typeRef="boolean"><dmn:text>personasKodaPresent</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:input id="i_fin" label="Financial identifier present">
|
||||||
|
<dmn:inputExpression typeRef="boolean"><dmn:text>financialDataPresent</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:input id="i_contact" label="Contact data present">
|
||||||
|
<dmn:inputExpression typeRef="boolean"><dmn:text>contactDataPresent</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:input id="i_cnt" label="Distinct PII categories">
|
||||||
|
<dmn:inputExpression typeRef="number"><dmn:text>piiCategoryCount</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:output id="o_risk" label="Personal-data risk" name="personalDataRisk" typeRef="string"/>
|
||||||
|
<dmn:output id="o_rat" label="Rationale" name="riskRationale" typeRef="string"/>
|
||||||
|
<dmn:rule id="R1_personas_kods">
|
||||||
|
<dmn:inputEntry><dmn:text>true</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"HIGH"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"Personas kods (national ID) detected"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R2_financial">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>true</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"HIGH"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"Financial account identifier (IBAN) detected"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R3_multi_category">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>[2..999]</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"MEDIUM"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"Two or more distinct PII categories present"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R4_contact">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>true</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"MEDIUM"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"Contact data (email/phone) detected"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R5_single">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>1</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"LOW"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"Single PII category present"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R6_none">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"NONE"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"No personal data detected by regex scan"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
</dmn:decisionTable>
|
||||||
|
</dmn:decision>
|
||||||
|
</dmn:definitions>
|
||||||
60
dmn/gdpr-processing-route.dmn
Normal file
60
dmn/gdpr-processing-route.dmn
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<dmn:definitions xmlns:dmn="https://www.omg.org/spec/DMN/20191111/MODEL/"
|
||||||
|
id="Definitions_GdprProcessingRoute"
|
||||||
|
namespace="https://uapf.dev/processes/semantic-document-analysis">
|
||||||
|
<dmn:decision id="gdpr-processing-route" name="GDPR processing route">
|
||||||
|
<dmn:decisionTable hitPolicy="FIRST">
|
||||||
|
<dmn:input id="i_risk" label="Personal-data risk">
|
||||||
|
<dmn:inputExpression typeRef="string"><dmn:text>personalDataRisk</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:input id="i_central" label="Centralisation permitted">
|
||||||
|
<dmn:inputExpression typeRef="boolean"><dmn:text>allowCentralization</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:output id="o_route" label="Processing route" name="processingRoute" typeRef="string"/>
|
||||||
|
<dmn:output id="o_anon" label="Anonymisation required" name="anonymizationRequired" typeRef="boolean"/>
|
||||||
|
<dmn:output id="o_redact" label="Redaction level" name="redactionLevel" typeRef="string"/>
|
||||||
|
<dmn:rule id="R1_sensitive_local">
|
||||||
|
<dmn:inputEntry><dmn:text>"HIGH","MEDIUM"</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>false</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"LOCAL"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>true</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"FULL"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R2_any_local">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>false</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"LOCAL"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>false</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"PARTIAL"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R3_sensitive_central">
|
||||||
|
<dmn:inputEntry><dmn:text>"HIGH","MEDIUM"</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>true</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"CENTRAL"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>true</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"FULL"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R4_low_central">
|
||||||
|
<dmn:inputEntry><dmn:text>"LOW"</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>true</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"CENTRAL"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>false</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"PARTIAL"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R5_none_central">
|
||||||
|
<dmn:inputEntry><dmn:text>"NONE"</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>true</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"CENTRAL"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>false</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"NONE"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="R6_default">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"CENTRAL"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>false</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"PARTIAL"</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
</dmn:decisionTable>
|
||||||
|
</dmn:decision>
|
||||||
|
</dmn:definitions>
|
||||||
62
dmn/human-validation-gate.dmn
Normal file
62
dmn/human-validation-gate.dmn
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<dmn:definitions xmlns:dmn="https://www.omg.org/spec/DMN/20191111/MODEL/"
|
||||||
|
id="Definitions_HumanValidationGate"
|
||||||
|
namespace="https://uapf.dev/processes/semantic-document-analysis">
|
||||||
|
<dmn:decision id="human-validation-gate" name="Human-validation gate">
|
||||||
|
<dmn:decisionTable hitPolicy="FIRST">
|
||||||
|
<dmn:input id="i_pii" label="Output PII error count">
|
||||||
|
<dmn:inputExpression typeRef="number"><dmn:text>outputPiiErrorCount</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:input id="i_conf" label="AI confidence score">
|
||||||
|
<dmn:inputExpression typeRef="number"><dmn:text>aiConfidenceScore</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:input id="i_risk" label="Personal-data risk">
|
||||||
|
<dmn:inputExpression typeRef="string"><dmn:text>personalDataRisk</dmn:text></dmn:inputExpression>
|
||||||
|
</dmn:input>
|
||||||
|
<dmn:output id="o_status" label="Human-validation status" name="humanValidationStatus" typeRef="string"/>
|
||||||
|
<dmn:output id="o_review" label="Requires human review" name="requiresHumanReview" typeRef="boolean"/>
|
||||||
|
<dmn:rule id="V1_pii_leak">
|
||||||
|
<dmn:inputEntry><dmn:text>[1..9999]</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"REJECTED"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>true</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="V2_low_confidence">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>[0..0.3)</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"REJECTED"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>true</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="V3_mid_confidence">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>[0.3..0.7)</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"PENDING_REVIEW"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>true</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="V4_high_risk_review">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>"HIGH"</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"PENDING_REVIEW"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>true</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="V5_auto_approve">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>[0.7..1]</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"APPROVED_AUTO"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>false</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
<dmn:rule id="V6_default">
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:inputEntry><dmn:text>-</dmn:text></dmn:inputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>"PENDING_REVIEW"</dmn:text></dmn:outputEntry>
|
||||||
|
<dmn:outputEntry><dmn:text>true</dmn:text></dmn:outputEntry>
|
||||||
|
</dmn:rule>
|
||||||
|
</dmn:decisionTable>
|
||||||
|
</dmn:decision>
|
||||||
|
</dmn:definitions>
|
||||||
@@ -1,32 +1,46 @@
|
|||||||
# dev.uapf.semantic-document-analysis — Overview
|
# dev.uapf.semantic-document-analysis — Overview
|
||||||
|
|
||||||
**UAPF v1.1 SSOT-conformant** Level 4 process package providing
|
**UAPF v1.1 SSOT-conformant** Level 4 process package for semantic
|
||||||
reusable semantic document analysis.
|
document analysis.
|
||||||
|
|
||||||
## What
|
## What
|
||||||
|
|
||||||
A 3-step BPMN process that, given free-text document content:
|
A six-node BPMN process that, given free-text document content:
|
||||||
|
|
||||||
1. Redacts PII via `ai.redact@1`
|
1. **Detect and redact PII** (`ai.redact@1`) — masks PII and returns the
|
||||||
2. Extracts VDVC v1.1 structured semantic metadata via `ai.extract@1`
|
deterministic regex signal set (personas kods / IBAN / contact data /
|
||||||
3. Emits `document.semantic-analysis.completed.v1` CloudEvent via `event.emit@1`
|
category count).
|
||||||
|
2. **Assess personal-data risk** (DMN `assess-personal-data-risk`) —
|
||||||
|
ranked rules map the signal set to `personalDataRisk`.
|
||||||
|
3. **Decide GDPR processing route** (DMN `gdpr-processing-route`) —
|
||||||
|
`personalDataRisk` x `allowCentralization` -> CENTRAL/LOCAL,
|
||||||
|
anonymisation and redaction level.
|
||||||
|
4. **Extract semantic metadata** (`ai.extract@1`) — the one model step;
|
||||||
|
produces VDVC v1.1 structured metadata.
|
||||||
|
5. **Determine validation status** (DMN `human-validation-gate`) —
|
||||||
|
confidence thresholds + PII re-scan -> REJECTED / PENDING_REVIEW /
|
||||||
|
APPROVED_AUTO.
|
||||||
|
6. **Emit** `document.semantic-analysis.completed.v1` (`event.emit@1`).
|
||||||
|
|
||||||
|
## Why this shape
|
||||||
|
|
||||||
|
The previous 1.x package was a single `ai.extract` call wrapped in
|
||||||
|
BPMN. The decision logic — risk, routing, validation gating — lived
|
||||||
|
invisibly in host code. Version 2.0 extracts that logic into three
|
||||||
|
versioned DMN decision tables. The algorithm is now in the package:
|
||||||
|
inspectable, diff-able, portable. The host supplies inference for one
|
||||||
|
bounded step only.
|
||||||
|
|
||||||
## What's portable
|
## What's portable
|
||||||
|
|
||||||
The package ships:
|
- The BPMN flow (the process shape)
|
||||||
- The BPMN flow (the algorithm shape)
|
- Three DMN decision tables (the algorithm and its weights)
|
||||||
- The VDVC output JSON Schema (the output contract)
|
- The VDVC output JSON Schema (the extraction contract)
|
||||||
- The resource mapping (input/output contracts, timeouts, retries)
|
- The resource mapping and the guardrails policy
|
||||||
- The guardrails policy (GDPR + EU AI Act constraints)
|
|
||||||
|
|
||||||
The host system supplies the actual AI agent that fulfils the three
|
|
||||||
capabilities. Multiple hosts can implement the same capabilities;
|
|
||||||
multiple packages can require the same capabilities.
|
|
||||||
|
|
||||||
## How to consume
|
## How to consume
|
||||||
|
|
||||||
Drop this `.uapf` into any UAPF-conformant runtime. The runtime
|
Drop this `.uapf` into any UAPF-conformant runtime and run
|
||||||
exposes `uapf.run_process` (per UAPF-specification §6.3.1) targeting
|
`Process_SemanticDocumentAnalysis`. The runtime evaluates the DMN
|
||||||
`Process_SemanticDocumentAnalysis`. The runtime resolves the resource
|
decisions itself and resolves the resource mapping for the three
|
||||||
mapping to find a target with the three required capabilities and
|
capability-backed service tasks.
|
||||||
invokes them in order per the BPMN flow.
|
|
||||||
|
|||||||
@@ -1,3 +1,3 @@
|
|||||||
{
|
{
|
||||||
"text": "Vēršos Tiesībsarga birojā par bāriņtiesas lēmumu attiecībā uz manu mazdēlu (vecums 7 gadi). Lēmums pieņemts steidzamā kārtībā 2026. gada martā. Lūdzu izvērtēt rīcības atbilstību Bērnu tiesību aizsardzības likumam."
|
"content": "Vēršos Tiesībsarga birojā par bāriņtiesas lēmumu attiecībā uz manu mazdēlu (vecums 7 gadi). Lēmums pieņemts steidzamā kārtībā 2026. gada martā. Lūdzu izvērtēt rīcības atbilstību Bērnu tiesību aizsardzības likumam."
|
||||||
}
|
}
|
||||||
@@ -1,3 +1,3 @@
|
|||||||
{
|
{
|
||||||
"text": "Iesniedzējs ar otrās grupas invaliditāti norāda, ka valsts iestādes darba intervijā atklāti pateikts: 'nevaram pieņemt cilvēkus ar īpašām vajadzībām'. Lūdz Tiesībsargu izmeklēt."
|
"content": "Iesniedzējs ar otrās grupas invaliditāti norāda, ka valsts iestādes darba intervijā atklāti pateikts: 'nevaram pieņemt cilvēkus ar īpašām vajadzībām'. Lūdz Tiesībsargu izmeklēt."
|
||||||
}
|
}
|
||||||
3
fixtures/pii-bearing-input.json
Normal file
3
fixtures/pii-bearing-input.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
{
|
||||||
|
"content": "Iesniedzējs (personas kods 010180-12345) lūdz pārskatīt lēmumu. Saziņai: janis.berzins@example.lv, tālrunis +371 29123456. Norēķini: LV80BANK0000435195001."
|
||||||
|
}
|
||||||
@@ -1,15 +1,15 @@
|
|||||||
{
|
{
|
||||||
"kind": "uapf.package",
|
"kind": "uapf.package",
|
||||||
"id": "dev.uapf.semantic-document-analysis",
|
"id": "dev.uapf.semantic-document-analysis",
|
||||||
"name": "Semantic Document Analysis (UAPF reference algorithm)",
|
"name": "Semantic Document Analysis",
|
||||||
"description": "Level-4 UAPF process for extracting VDVC-conformant semantic metadata\n(topic, summary, urgency, risk, sensitivity) from a free-text document.\n\nPortable across document management systems, intake portals, mailroom\nscanners, case-management platforms. Three BPMN service tasks invoke\nthe reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1.\nThe host fulfils each capability with its own AI agent; this package\nsupplies the BPMN flow, the VDVC output JSON Schema, the guardrails,\nand the resource mapping contract.\n",
|
"description": "Level-4 UAPF process for semantic analysis of free-text documents.\n\nThree BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,\nai.extract@1 and event.emit@1. Three DMN decision tables encode the\ndeterministic algorithm the host previously hid inside application\ncode: assess-personal-data-risk maps PII regex signals to a risk\nlevel; gdpr-processing-route selects CENTRAL vs LOCAL processing,\nanonymisation and redaction level; human-validation-gate applies the\nconfidence thresholds that decide REJECTED / PENDING_REVIEW /\nAPPROVED_AUTO.\n\nOnly the semantic extraction is a model step. Risk classification,\nGDPR routing and the validation gate are explicit ranked rules in\nversioned DMN \u2014 inspectable, auditable, portable. Extraction output\nvalidates against the VDVC v1.1 semantic-summary JSON Schema.\n",
|
||||||
"level": 4,
|
"level": 4,
|
||||||
"version": "1.0.0",
|
"version": "2.0.0",
|
||||||
"includes": [],
|
"includes": [],
|
||||||
"dependencies": {},
|
"dependencies": {},
|
||||||
"cornerstones": {
|
"cornerstones": {
|
||||||
"bpmn": true,
|
"bpmn": true,
|
||||||
"dmn": false,
|
"dmn": true,
|
||||||
"cmmn": false,
|
"cmmn": false,
|
||||||
"resources": true
|
"resources": true
|
||||||
},
|
},
|
||||||
@@ -30,6 +30,7 @@
|
|||||||
"exposedArtifacts": [
|
"exposedArtifacts": [
|
||||||
"manifest",
|
"manifest",
|
||||||
"bpmn",
|
"bpmn",
|
||||||
|
"dmn",
|
||||||
"docs"
|
"docs"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
@@ -42,4 +43,4 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"lifecycle": "draft"
|
"lifecycle": "draft"
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,9 +1,13 @@
|
|||||||
kind: uapf.metadata.lifecycle
|
kind: uapf.metadata.lifecycle
|
||||||
status: draft
|
status: draft
|
||||||
created: "2026-05-15T20:30:00Z"
|
created: "2026-05-15T20:30:00Z"
|
||||||
lastModified: "2026-05-15T20:30:00Z"
|
lastModified: "2026-05-17T00:00:00Z"
|
||||||
changeHistory:
|
changeHistory:
|
||||||
- version: "1.0.0"
|
- version: "1.0.0"
|
||||||
date: "2026-05-15"
|
date: "2026-05-15"
|
||||||
summary: "Initial release. Reusable UAPF v1.1 Level 4 process for VDVC semantic metadata extraction. Three reserved-namespace capabilities (ai.redact@1, ai.extract@1, event.emit@1). VDVC output schema in resources/schemas/."
|
summary: "Initial release. Three reserved-namespace capabilities; VDVC output schema."
|
||||||
|
author: "uapf-stewards"
|
||||||
|
- version: "2.0.0"
|
||||||
|
date: "2026-05-17"
|
||||||
|
summary: "Full rewrite into a real process. Added the dmn cornerstone with three decision tables (assess-personal-data-risk, gdpr-processing-route, human-validation-gate) carrying the risk, routing and validation-gating algorithm previously hidden in host code. BPMN expanded from 3 to 6 nodes. ai.redact now returns deterministic PII regex signals; ai.extract returns flat aiConfidenceScore and outputPiiErrorCount. Breaking: package structure and outputs changed."
|
||||||
author: "uapf-stewards"
|
author: "uapf-stewards"
|
||||||
|
|||||||
@@ -3,16 +3,24 @@
|
|||||||
version: 1
|
version: 1
|
||||||
|
|
||||||
server:
|
server:
|
||||||
name: "Semantic Document Analysis - UAPF reference algorithm"
|
name: "Semantic Document Analysis"
|
||||||
description: "MCP server for the semantic document analysis process (UAPF package dev.uapf.semantic-document-analysis)."
|
description: "MCP server for the semantic document analysis UAPF package (dev.uapf.semantic-document-analysis) - BPMN flow plus three DMN decision tables."
|
||||||
instructions: |
|
instructions: |
|
||||||
This repository is a UAPF process package - the reference algorithm for
|
This repository is a UAPF process package for semantic document
|
||||||
extracting VDVC-conformant semantic metadata from a document. Use
|
analysis. The algorithm lives in dmn/ as three decision tables; the
|
||||||
'search' and 'get_entity' to explore the BPMN flow, and 'validate' to
|
BPMN in bpmn/ wires them with three capability-backed service tasks.
|
||||||
check the model. The process executes via UAPF-IP; see the /uapf-ip
|
Use 'search' and 'get_entity' to explore, 'validate' to check models.
|
||||||
endpoint of this repo.
|
|
||||||
|
|
||||||
sources:
|
sources:
|
||||||
- path: "bpmn/semantic-document-analysis.bpmn"
|
- path: "bpmn/semantic-document-analysis.bpmn"
|
||||||
type: "xml"
|
type: "xml"
|
||||||
description: "BPMN - redact, extract, emit flow"
|
description: "BPMN - the six-node process flow"
|
||||||
|
- path: "dmn/assess-personal-data-risk.dmn"
|
||||||
|
type: "xml"
|
||||||
|
description: "DMN - PII signals to personal-data risk"
|
||||||
|
- path: "dmn/gdpr-processing-route.dmn"
|
||||||
|
type: "xml"
|
||||||
|
description: "DMN - risk to CENTRAL/LOCAL processing route"
|
||||||
|
- path: "dmn/human-validation-gate.dmn"
|
||||||
|
type: "xml"
|
||||||
|
description: "DMN - confidence thresholds to validation status"
|
||||||
|
|||||||
@@ -1,49 +1,60 @@
|
|||||||
kind: uapf.resources.mapping
|
kind: uapf.resources.mapping
|
||||||
|
|
||||||
|
# Host-readable contract for the capability-backed service tasks. The three
|
||||||
|
# DMN decisions (assess-personal-data-risk, gdpr-processing-route,
|
||||||
|
# human-validation-gate) are NOT listed here: they are evaluated by the
|
||||||
|
# UAPF runtime against the dmn/ cornerstone and need no host resource.
|
||||||
|
|
||||||
targets:
|
targets:
|
||||||
- id: agent.semantic-extractor
|
- id: agent.semantic-extractor
|
||||||
type: ai_agent
|
type: ai_agent
|
||||||
name: Semantic Extraction AI Agent
|
name: Semantic Extraction AI Agent
|
||||||
description: |
|
description: |
|
||||||
Host-provided AI agent that fulfils ai.redact@1, ai.extract@1, and
|
Host-provided agent fulfilling ai.redact@1, ai.extract@1 and
|
||||||
event.emit@1 for this process. Implementation is the host's choice
|
event.emit@1. Implementation is the host's choice; this package
|
||||||
(Claude, GPT, on-prem LLM, etc.); this package supplies the BPMN
|
supplies the BPMN flow, the DMN decision logic, the output schema
|
||||||
flow, the output schema, and the guardrails.
|
and the guardrails.
|
||||||
capabilities:
|
capabilities:
|
||||||
- capability.ai.redact
|
- capability.ai.redact
|
||||||
- capability.ai.extract
|
- capability.ai.extract
|
||||||
- capability.event.emit
|
- capability.event.emit
|
||||||
|
|
||||||
bindings:
|
bindings:
|
||||||
- source: { type: bpmn.serviceTask, ref: Task_RedactPii }
|
- source: { type: bpmn.serviceTask, ref: Task_DetectRedactPii }
|
||||||
targetId: agent.semantic-extractor
|
targetId: agent.semantic-extractor
|
||||||
mode: autonomous
|
mode: autonomous
|
||||||
contract:
|
contract:
|
||||||
input:
|
input:
|
||||||
- { name: text, type: string, required: true }
|
- { name: content, type: string, required: true }
|
||||||
- { name: categories, type: array, required: false, description: "Optional PII categories; defaults to host policy." }
|
|
||||||
output:
|
output:
|
||||||
- { name: redactedText, type: string }
|
- { name: redactedContent, type: string, description: "Source text with PII masked." }
|
||||||
- { name: detections, type: array }
|
- { name: detectedEntityTypes, type: array, description: "PII TYPE names only, never values." }
|
||||||
|
- { name: personasKodaPresent, type: boolean, description: "Latvian national ID regex hit." }
|
||||||
|
- { name: financialDataPresent,type: boolean, description: "IBAN regex hit." }
|
||||||
|
- { name: contactDataPresent, type: boolean, description: "E-mail or phone regex hit." }
|
||||||
|
- { name: piiCategoryCount, type: number, description: "Count of distinct PII categories detected." }
|
||||||
timeout: "10s"
|
timeout: "10s"
|
||||||
requiredCapabilities: [capability.ai.redact]
|
requiredCapabilities: [capability.ai.redact]
|
||||||
|
feeds: [assess-personal-data-risk]
|
||||||
|
|
||||||
- source: { type: bpmn.serviceTask, ref: Task_ExtractSemantics }
|
- source: { type: bpmn.serviceTask, ref: Task_ExtractSemantics }
|
||||||
targetId: agent.semantic-extractor
|
targetId: agent.semantic-extractor
|
||||||
mode: autonomous
|
mode: autonomous
|
||||||
contract:
|
contract:
|
||||||
input:
|
input:
|
||||||
- { name: text, type: string, required: true, description: "Redacted text from previous task." }
|
- { name: redactedContent, type: string, required: true }
|
||||||
- { name: schema, type: object, required: true, description: "VDVC v1.1 output schema. Reference: resources/schemas/vdvc-semantic-summary.schema.json" }
|
- { name: schemaRef, type: string, required: true, description: "resources/schemas/vdvc-semantic-summary.schema.json" }
|
||||||
output:
|
output:
|
||||||
- { name: extracted, type: object, description: "Validates against resources/schemas/vdvc-semantic-summary.schema.json" }
|
- { name: semanticSummary, type: object, description: "Validates against the VDVC v1.1 schema." }
|
||||||
- { name: confidence, type: number }
|
- { name: sensitivityControl, type: object }
|
||||||
- { name: modelUsed, type: string }
|
- { name: aiConfidenceScore, type: number, description: "Flat 0.0-1.0; consumed by human-validation-gate." }
|
||||||
|
- { name: outputPiiErrorCount, type: number, description: "PII re-scan hits on extracted text; consumed by human-validation-gate." }
|
||||||
timeout: "30s"
|
timeout: "30s"
|
||||||
retries: { maxAttempts: 2, backoffMs: 2000 }
|
retries: { maxAttempts: 2, backoffMs: 2000 }
|
||||||
requiredCapabilities: [capability.ai.extract]
|
requiredCapabilities: [capability.ai.extract]
|
||||||
|
feeds: [human-validation-gate]
|
||||||
|
|
||||||
- source: { type: bpmn.serviceTask, ref: Task_EmitResultEvent }
|
- source: { type: bpmn.serviceTask, ref: Task_EmitResult }
|
||||||
targetId: agent.semantic-extractor
|
targetId: agent.semantic-extractor
|
||||||
mode: autonomous
|
mode: autonomous
|
||||||
contract:
|
contract:
|
||||||
|
|||||||
@@ -1,24 +1,41 @@
|
|||||||
{
|
{
|
||||||
"algorithm": "Process_SemanticDocumentAnalysis",
|
"algorithm": "Process_SemanticDocumentAnalysis",
|
||||||
"package_version": "1.0.0",
|
"package_version": "2.0.0",
|
||||||
"cases": [
|
"cases": [
|
||||||
{
|
{
|
||||||
"id": "child-rights",
|
"id": "pii-bearing-high-risk",
|
||||||
"input_fixture": "fixtures/child-rights-input.json",
|
"input_fixture": "fixtures/pii-bearing-input.json",
|
||||||
"expected_facets": {
|
"input_vars": {
|
||||||
"mentions_child": true,
|
"allowCentralization": false
|
||||||
"vulnerable_group": true,
|
},
|
||||||
"humanValidationStatus": "PENDING"
|
"expect_decisions": {
|
||||||
|
"assess-personal-data-risk": {
|
||||||
|
"personalDataRisk": "HIGH"
|
||||||
|
},
|
||||||
|
"gdpr-processing-route": {
|
||||||
|
"processingRoute": "LOCAL",
|
||||||
|
"redactionLevel": "FULL"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": "discrimination-disability",
|
"id": "no-pii-discrimination",
|
||||||
"input_fixture": "fixtures/discrimination-input.json",
|
"input_fixture": "fixtures/discrimination-input.json",
|
||||||
"expected_facets": {
|
"input_vars": {
|
||||||
"vulnerable_group": true,
|
"allowCentralization": true
|
||||||
"humanValidationStatus": "PENDING"
|
},
|
||||||
|
"expect_decisions": {
|
||||||
|
"assess-personal-data-risk": {
|
||||||
|
"personalDataRisk": "NONE"
|
||||||
|
},
|
||||||
|
"gdpr-processing-route": {
|
||||||
|
"processingRoute": "CENTRAL"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"success_criteria": { "min_pass_rate": 0.95, "max_p95_latency_ms": 15000 }
|
"success_criteria": {
|
||||||
|
"min_pass_rate": 0.95,
|
||||||
|
"max_p95_latency_ms": 20000
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
37
uapf.yaml
37
uapf.yaml
@@ -1,28 +1,38 @@
|
|||||||
kind: uapf.package
|
kind: uapf.package
|
||||||
id: dev.uapf.semantic-document-analysis
|
id: dev.uapf.semantic-document-analysis
|
||||||
name: Semantic Document Analysis (UAPF reference algorithm)
|
name: Semantic Document Analysis
|
||||||
description: |
|
description: |
|
||||||
Level-4 UAPF process for extracting VDVC-conformant semantic metadata
|
Level-4 UAPF process for semantic analysis of free-text documents.
|
||||||
(topic, summary, urgency, risk, sensitivity) from a free-text document.
|
|
||||||
|
|
||||||
Portable across document management systems, intake portals, mailroom
|
Three BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,
|
||||||
scanners, case-management platforms. Three BPMN service tasks invoke
|
ai.extract@1 and event.emit@1. Three DMN decision tables encode the
|
||||||
the reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1.
|
deterministic algorithm the host previously hid inside application
|
||||||
The host fulfils each capability with its own AI agent; this package
|
code: assess-personal-data-risk maps PII regex signals to a risk
|
||||||
supplies the BPMN flow, the VDVC output JSON Schema, the guardrails,
|
level; gdpr-processing-route selects CENTRAL vs LOCAL processing,
|
||||||
and the resource mapping contract.
|
anonymisation and redaction level; human-validation-gate applies the
|
||||||
|
confidence thresholds that decide REJECTED / PENDING_REVIEW /
|
||||||
|
APPROVED_AUTO.
|
||||||
|
|
||||||
|
Only the semantic extraction is a model step. Risk classification,
|
||||||
|
GDPR routing and the validation gate are explicit ranked rules in
|
||||||
|
versioned DMN — inspectable, auditable, portable. Extraction output
|
||||||
|
validates against the VDVC v1.1 semantic-summary JSON Schema.
|
||||||
|
|
||||||
level: 4
|
level: 4
|
||||||
version: "1.0.0"
|
version: "2.0.0"
|
||||||
|
|
||||||
# ── UAPF-IP integration (capability needs + profile + guardrails) ──
|
# ── UAPF-IP integration (capability needs + profile + guardrails) ──
|
||||||
# Declared so a UAPF-IP runtime / the ProcessGit /uapf-ip endpoint can
|
|
||||||
# discover what this package requires before loading it.
|
|
||||||
requires_capabilities:
|
requires_capabilities:
|
||||||
- ai.redact@1+
|
- ai.redact@1+
|
||||||
- ai.extract@1+
|
- ai.extract@1+
|
||||||
- event.emit@1+
|
- event.emit@1+
|
||||||
|
|
||||||
|
# DMN decisions are evaluated by the runtime itself — no host capability.
|
||||||
|
provides_decisions:
|
||||||
|
- assess-personal-data-risk
|
||||||
|
- gdpr-processing-route
|
||||||
|
- human-validation-gate
|
||||||
|
|
||||||
profiles_supported:
|
profiles_supported:
|
||||||
- uapf-ip-orchestrated
|
- uapf-ip-orchestrated
|
||||||
|
|
||||||
@@ -33,7 +43,7 @@ dependencies: {}
|
|||||||
|
|
||||||
cornerstones:
|
cornerstones:
|
||||||
bpmn: true
|
bpmn: true
|
||||||
dmn: false
|
dmn: true
|
||||||
cmmn: false
|
cmmn: false
|
||||||
resources: true
|
resources: true
|
||||||
|
|
||||||
@@ -53,6 +63,7 @@ exposure:
|
|||||||
exposedArtifacts:
|
exposedArtifacts:
|
||||||
- manifest
|
- manifest
|
||||||
- bpmn
|
- bpmn
|
||||||
|
- dmn
|
||||||
- docs
|
- docs
|
||||||
|
|
||||||
owners:
|
owners:
|
||||||
|
|||||||
Reference in New Issue
Block a user