1
0
Files
2026-02-08 23:12:26 +00:00

504 lines
26 KiB
Markdown

# VDVC Document Classification Schema — Assessment & Transformation Proposal
**Subject:** VARAM DVS "Namejs" Document Classification Schema 2026
**VDVC namespace:** `urn:vdvc:classification:2026`
**Regulatory basis:** MK noteikumi Nr. 282 (07.05.2024) "Dokumentu un arhīvu pārvaldības noteikumi"
**Prepared by:** Rihards / PwC Latvia — Digitalization, AI & Cybersecurity
**Date:** February 2026
---
## 1. Executive Summary
VARAM's Document Management System (DVS "Namejs") relies on a classification schema ("klasifikācijas shēma", formerly "lietu nomenklatūra") maintained as a human-edited Excel spreadsheet with **647 coded entries** across 3 domains and up to 5 hierarchy levels.
This assessment identifies **three layers of problems**: data quality issues in the spreadsheet itself (fixable mechanically), structural design issues in the schema (fixable with refactoring), and a **fundamental architectural problem** — the classification philosophy conflates normative document origins with functional classification, producing an unmanageably large, duplicate-heavy taxonomy that is hostile to both human clerks and DVS systems.
The proposed solution is a VDVC-namespaced, Git-versioned XML repository on ProcessGit with a **custom web-based management GUI** (not Excel) backed by XSD-validated XML, served via MCP endpoint for AI-assisted document classification.
---
## 2. Regulatory Framework
### 2.1 MK noteikumi Nr. 282 (07.05.2024)
The governing regulation prescribes a **function-based hierarchical classification** (§33):
| Level | MK Nr. 282 Definition | What It Should Contain |
|-------|----------------------|----------------------|
| **L1** | Institūcijas **funkcija** vai augstākā struktūrvienība | Broad organizational function (e.g., "Management", "HR", "Procurement") |
| **L2** | Funkcijas izpildes nepieciešamie **uzdevumi (procesi)** | Processes within the function (e.g., "Recruitment", "Payroll") |
| **L3** | Uzdevumu veikšanai nepieciešamās **darbības** | Specific activities/document types (e.g., "Employment contracts", "Timesheets") |
Key regulatory requirements:
- Schema must be synchronized with Latvijas Nacionālais arhīvs (LNA) every **5 years** (§42)
- Sector-level schemas ("nozares klasifikācijas shēma") every **8 years** (§42)
- Must specify: index, name, retention term, responsible unit, media type, IS location (§31)
- Classification basis: functions, structural units, document types, or **mixed** (§33)
### 2.2 VDVC Context
The schema should use the **VDVC** (Valsts Dokumentu Vadības Centrs) namespace since VDVC is the country-wide document management authority under VARAM management, and this classification approach could be standardized across government institutions — not just VARAM internally.
---
## 3. Critical Assessment: Why Is the Classification So Complicated?
### 3.1 The Core Problem — Normative Document Proliferation
VARAM's explanation is that every new document category originates from a normative act (law, MK regulation, EU directive) that delegates a process to the organization. When a new regulation is adopted, a new classification entry is created. **This approach is fundamentally flawed** for the following reasons:
#### Problem A: Confusing "What Triggered the Document" with "What Kind of Document It Is"
MK Nr. 282 §33 defines classification by **function and process**, not by **legal basis**. The legal basis for a document is metadata (a property of the document), not a structural category. When VARAM creates a separate category for "Sarakste ar valsts pārvaldes iestādēm DIENESTA VAJADZĪBĀM" (P-1-13-5) vs "Sarakste ar valsts pārvaldes iestādēm, juridiskām un fiziskām personām" (P-1-13-2) vs "Sarakste ar valsts pārvaldes iestādēm jautājumiem, kas saistīti ar valsts noslēpumu" (P-1-13-9), **these are all the same function** (correspondence) with different metadata attributes (audience, classification level).
A proper design would have:
```
P-1-13 Sarakste (Correspondence)
→ metadata: audience = [government | private | foreign | internal | classified]
→ metadata: securityLevel = [public | restricted | secret]
```
Instead of 9 sub-categories of correspondence with identical document types inside them.
#### Problem B: EU Investment Project Explosion
The most egregious example is **I2 (Investīciju projektu ieviešana)** with **33 top-level entries** — each representing a specific EU-funded project:
```
I2-1 Projekta "Informācijas sistēmu ... Nr. 2.2.1.1/17/I/012" dokumenti
I2-2 Projekta "Atvērto datu ... Nr. 2.2.1.1/19/I/004" dokumenti
...
I2-33 Projekta "Valsts pārvaldes vienota valsts finanšu..." dokumenti
```
Each of these 33 projects then has **identical sub-structure**: correspondence, contracts, orders, communications materials. This is a textbook example of **data masquerading as structure**. The project identity is a data attribute, not a classification level.
A proper design:
```
I2-1 Investīciju projektu ieviešana (Project Implementation)
I2-1-1 Korespondence (Correspondence)
I2-1-2 Līgumi (Contracts)
I2-1-3 Rīkojumi, protokoli (Orders, protocols)
I2-1-4 Komunikācijas materiāli (Communications)
→ metadata: projectId = "2.2.1.1/17/I/012"
→ metadata: projectName = "Informācijas sistēmu..."
→ metadata: fundingSource = "ERAF" | "ANM" | ...
```
This would reduce I2 from **~166 entries to ~10**, while preserving all information through metadata.
#### Problem C: I1 (Investīciju programmu vadība) Duplicates
Similarly, I1 has **16+ programme-level groups** (I1-1 through I1-16), each with largely identical sub-structures for different EU operational programmes. The programmes differ in retention dates and responsible departments, but these are metadata, not structure.
Current: **327 entries** in I1
Proposed: **~40-50 entries** (function-based) + programme as metadata
#### Problem D: Category Count vs. Clerk Cognitive Capacity
With **~400 leaf categories** (plus ~250 structural grouping rows), a clerk creating a new document faces an impossible cognitive task. Research in classification science (Rosch, 1978; Miller, 1956) shows humans can reliably distinguish 7±2 categories at each level. VARAM's schema has:
- 3 domain categories (P, I1, I2) — **good**
- 9-33 L1 categories per domain — **border-case to unmanageable**
- Up to 13+ L2 per L1 — **too many, especially without descriptions**
The result is predictable: clerks default to a handful of "safe" categories, misclassify documents, or spend excessive time navigating the hierarchy — defeating the purpose of classification.
### 3.2 Structural Assessment Summary
| Metric | Current | Proposed (after normalization) |
|--------|---------|-------------------------------|
| Total entries | 647 | ~120-150 |
| Leaf categories (clerk-facing) | ~496 | ~80-100 |
| I2 entries | 166 | ~10-15 |
| I1 entries | 327 | ~40-50 |
| Max L2 categories per L1 | 33 | ≤10 |
| Project-specific categories | ~200+ | 0 (metadata) |
| Duplicate structural patterns | ~30 identical sub-trees | 0 |
### 3.3 What Should Be Structure vs. What Should Be Metadata
| Currently a Category Level | Should Be | Reason |
|---------------------------|-----------|--------|
| Specific EU project name | **Metadata tag** | Project is an instance, not a function |
| Specific EU programme | **Metadata tag** | Programme is a funding context |
| Audience of correspondence | **Metadata enum** | Audience doesn't change the document type |
| Security classification | **Metadata field** | Orthogonal to document function |
| EU Commission flag on retention | **Metadata boolean** | Compliance attribute, not structure |
| Department assignment | **Metadata reference** | Departments change; functions don't |
---
## 4. Data Quality Issues (Spreadsheet-Level)
### 4.1 Issues Summary
| # | Issue | Severity | Scope |
|---|-------|----------|-------|
| 1 | Mixed code separators (hyphens and dots) | CRITICAL | 221/647 codes (34%) |
| 2 | NBSP (\\xa0) disguised as empty retention | HIGH | 93 rows |
| 3 | 50+ retention term format variants | HIGH | 496 retention values |
| 4 | Zero descriptions in Description column | HIGH | 100% of rows |
| 5 | Multi-department free-text assignments | MEDIUM | 67 rows |
| 6 | Level data in wrong columns | MEDIUM | 64 rows |
| 7 | Typo: `Il-9-2` instead of `I1-9-2` | LOW | 1 row |
| 8 | Trailing dot in code `I1-13-1.1.` | LOW | 1 row |
| 9 | Typo: "Patstāvīgi" instead of "Pastāvīgi" | LOW | 4 rows |
*(Full technical detail in previous assessment — omitted for brevity)*
### 4.2 Retention Term Normalization
50+ free-text variants need consolidation to 5 structured types:
| Type | Example Input | Structured Output |
|------|--------------|-------------------|
| Permanent | "Pastāvīgi", "Patstāvīgi" | `<permanent/>` |
| Duration | "5 gadi", "75 gadi" | `<duration years="5"/>` |
| Duration + trigger | "5 gadi pēc projekta noslēguma..." | `<duration years="5" trigger="project_closure"/>` |
| Fixed date | "31.12.2034.", "2031-12-31 00:00:00" | `<fixedDate>2034-12-31</fixedDate>` |
| EU flagged | "31.12.2032. EK" | `<fixedDate euCommission="true">2032-12-31</fixedDate>` |
---
## 5. Proposed Architecture
### 5.1 Design Principles
1. **VDVC namespace**`urn:vdvc:classification:2026` — reusable across government
2. **Function-first classification** — per MK Nr. 282 §33, classify by what the organization does, not by what regulation triggered the document
3. **Metadata-rich, structure-lean** — project, programme, audience as tags, not tree levels
4. **No Excel** — custom web GUI that edits backend XML directly; prevents spreadsheet drift
5. **Git-versioned SSOT** — XML on ProcessGit with full audit trail
6. **MCP-served** — machine-readable API for DVS integration and AI-assisted classification
### 5.2 XSD Schema (VDVC Domain)
```xml
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="urn:vdvc:classification:2026"
xmlns:vdvc="urn:vdvc:classification:2026">
<xs:element name="classificationSchema" type="vdvc:SchemaType"/>
<xs:complexType name="SchemaType">
<xs:sequence>
<xs:element name="metadata" type="vdvc:MetadataType"/>
<xs:element name="vocabularies" type="vdvc:VocabulariesType"/>
<xs:element name="domains" type="vdvc:DomainListType"/>
</xs:sequence>
<xs:attribute name="version" type="xs:string" use="required"/>
<xs:attribute name="effectiveDate" type="xs:date" use="required"/>
<xs:attribute name="institution" type="xs:string" use="required"/>
</xs:complexType>
<!-- Controlled vocabularies (departments, programmes, projects) -->
<xs:complexType name="VocabulariesType">
<xs:sequence>
<xs:element name="departments" type="vdvc:DeptListType"/>
<xs:element name="programmes" type="vdvc:ProgrammeListType" minOccurs="0"/>
<xs:element name="projects" type="vdvc:ProjectListType" minOccurs="0"/>
<xs:element name="retentionTerms" type="vdvc:RetTermListType"/>
</xs:sequence>
</xs:complexType>
<!-- Category node (recursive, function-based) -->
<xs:complexType name="CategoryType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="description" type="xs:string" minOccurs="0"/>
<xs:element name="legalBasis" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="retention" type="vdvc:RetentionType" minOccurs="0"/>
<xs:element name="responsibleUnits" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="unitRef" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="mediaType" type="vdvc:MediaTypeEnum" minOccurs="0"/>
<xs:element name="system" type="xs:string" minOccurs="0"/>
<xs:element name="applicableContexts" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="programmeRef" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="projectRef" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="subcategory" type="vdvc:CategoryType"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="code" type="xs:string" use="required"/>
<xs:attribute name="level" type="xs:positiveInteger" use="required"/>
<xs:attribute name="status" type="vdvc:StatusEnum" default="active"/>
</xs:complexType>
<!-- Retention: structured, not free-text -->
<xs:complexType name="RetentionType">
<xs:choice>
<xs:element name="permanent" type="xs:boolean"/>
<xs:element name="duration">
<xs:complexType>
<xs:attribute name="years" type="xs:positiveInteger" use="required"/>
<xs:attribute name="triggerEvent" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:element name="fixedDate" type="xs:date"/>
</xs:choice>
<xs:attribute name="euCommission" type="xs:boolean" default="false"/>
<xs:attribute name="legalReference" type="xs:string"/>
<xs:attribute name="originalText" type="xs:string"/>
</xs:complexType>
<!-- Enumerations -->
<xs:simpleType name="MediaTypeEnum">
<xs:restriction base="xs:string">
<xs:enumeration value="electronic"/>
<xs:enumeration value="paper"/>
<xs:enumeration value="hybrid"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="StatusEnum">
<xs:restriction base="xs:string">
<xs:enumeration value="active"/>
<xs:enumeration value="deprecated"/>
<xs:enumeration value="draft"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
```
**Key design decisions:**
- `legalBasis` is a **metadata field on categories**, not a structural level — normative acts reference which regulations require this category, but don't create separate tree branches
- `applicableContexts` with `programmeRef` / `projectRef` replaces the 33 duplicate I2 sub-trees — a single "Korespondence" category can be tagged with all applicable projects
- `status` enables deprecation without deletion (audit trail)
- `retentionType` with `legalReference` links retention to its legal source
### 5.3 Proposed Simplified Classification Tree
```
VARAM Classification Schema (VDVC:2026)
P — Pārvalde (Administration)
├── P-1 Iestādes vadība (Institutional Management)
│ ├── P-1-1 Normatīvie dokumenti (Regulatory documents)
│ ├── P-1-2 Rīkojumi (Orders)
│ ├── P-1-3 Sanāksmes un protokoli (Meetings & protocols)
│ ├── P-1-4 Plānošana un pārskati (Planning & reporting)
│ ├── P-1-5 Sarakste (Correspondence)
│ │ → metadata: audience, securityLevel
│ ├── P-1-6 Pilnvaras un lēmumi (Authorizations & decisions)
│ └── P-1-7 Drošība un trauksme (Security & whistleblowing)
├── P-2 Budžets (Budget Planning)
├── P-3 Personālvadība (HR Management)
│ ├── P-3-1 Darba līgumi (Employment contracts)
│ ├── P-3-2 Personāla lietas (Personnel files)
│ ├── P-3-3 Apmācības (Training)
│ └── P-3-4 Novērtēšana (Performance evaluation)
├── P-4 Saimnieciskie jautājumi (Facilities)
├── P-5 Iepirkumi (Procurement)
├── P-6 Juridiskā funkcija (Legal)
├── P-7 Komunikācija (Communications)
├── P-8 Audits (Audit)
└── P-9 Finanšu vadība (Financial Management)
I1 — Investīciju programmu vadība (Programme Management)
├── I1-1 Programmu plānošana (Programme planning)
├── I1-2 Uzraudzība un kontrole (Monitoring & control)
├── I1-3 Finanšu pārvaldība (Financial management)
├── I1-4 Ziņojumi un pārskati (Reports)
├── I1-5 Maksājumi un pārbaudes (Payments & verification)
└── I1-6 Sarakste un lēmumi (Correspondence & decisions)
→ metadata: programmeRef = [ERAF, ANM, ESF, ...]
I2 — Investīciju projektu ieviešana (Project Implementation)
├── I2-1 Korespondence (Correspondence)
├── I2-2 Līgumi un grozījumi (Contracts & amendments)
├── I2-3 Rīkojumi un protokoli (Orders & protocols)
├── I2-4 Komunikācija (Communications materials)
├── I2-5 Finanšu dokumentācija (Financial documentation)
└── I2-6 Noslēguma dokumenti (Closure documents)
→ metadata: projectRef = [project-001, project-002, ...]
```
**From 647 entries → ~80-100 functional categories** + rich metadata vocabularies.
---
## 6. Custom Web GUI (Not Excel)
### 6.1 Why Not Excel
| Problem with Excel | Impact |
|-------------------|--------|
| People edit the Excel directly, bypassing validation | Reintroduces data quality issues |
| Cannot enforce controlled vocabularies | Free-text retention terms return |
| Cannot represent metadata (project/programme tags) on categories | Structural duplication returns |
| No validation against XSD schema | Invalid data enters the system |
| No version control / audit trail | Changes are invisible |
| Cannot embed business logic (retention calculation, department lookup) | Manual errors |
| Multiple people can have different versions | No SSOT guarantee |
### 6.2 GUI Architecture
```
┌─────────────────────────────────────────────┐
│ VDVC Classification Editor │
│ ┌───────────────────────────────────────┐ │
│ │ Tree Navigator (collapsible) │ │
│ │ ├── P — Pārvalde │ │
│ │ │ ├── P-1 Iestādes vadība │ │
│ │ │ │ ├── P-1-1 Normatīvie dok. │ │
│ │ │ │ └── P-1-2 Rīkojumi ←[EDIT] │ │
│ │ └── I1 — Investīciju programmas │ │
│ └───────────────────────────────────────┘ │
│ ┌───────────────────────────────────────┐ │
│ │ Category Detail Panel │ │
│ │ Code: [P-1-2] Status: [Active ▼] │ │
│ │ Name: [Rīkojumi un to pielikumi...] │ │
│ │ Description: [Ministru rīkojumi...] │ │
│ │ ─── Retention ─── │ │
│ │ Type: [Permanent ▼] │ │
│ │ Legal ref: [MK Nr. 282 §31] │ │
│ │ ─── Responsibility ─── │ │
│ │ Departments: [LN ×] [KD ×] [+ Add] │ │
│ │ ─── Context ─── │ │
│ │ Programmes: [all] │ │
│ │ Media: [Electronic ▼] │ │
│ │ System: [DVS Namejs ▼] │ │
│ │ ─── Legal Basis ─── │ │
│ │ [+ Add normative reference] │ │
│ └───────────────────────────────────────┘ │
│ [Save] [Validate] [Preview XML] [History] │
└─────────────────────────────────────────────┘
```
### 6.3 GUI Features
| Feature | Purpose |
|---------|---------|
| **Tree navigation** | Hierarchical browse/search with drag-drop reordering |
| **Controlled vocabulary dropdowns** | Departments, retention types, media types — no free-text |
| **Inline XSD validation** | Real-time validation as users edit; cannot save invalid data |
| **Retention calculator** | Input retention rule → system shows calculated expiry per document date |
| **Department lookup** | Autocomplete from VDVC organization registry (ProcessGit VARAM MCP) |
| **Diff / history view** | Git-backed change tracking with who-changed-what |
| **Bulk import** | One-time import from current Excel, then Excel is retired |
| **Export views** | Generate read-only Excel, HTML, PDF for stakeholders |
| **Legal basis linker** | Reference normative acts by Latvijas Vēstnesis number |
| **Multi-user with roles** | Lietvedis (view), department editor, schema admin |
### 6.4 Technology Stack
```
Frontend: React + Tailwind (ProcessGit-integrated SPA)
Backend: ProcessGit API + MCP server
Storage: Git repository (XML + XSD)
Validation: Client-side XSD validation + server-side on commit
Auth: ProcessGit OAuth / VARAM SSO
Deploy: processgit.org/VARAM/Document_classification_schema/
```
---
## 7. ProcessGit Repository Structure
```
VARAM/Document_classification_schema/
├── README.md
├── schema/
│ ├── vdvc-classification-2026.xsd ← Schema definition
│ └── vdvc-vocabularies.xsd ← Shared controlled vocabularies
├── data/
│ ├── varam-classification-2026.xml ← Canonical SSOT
│ ├── vocabularies/
│ │ ├── departments.xml ← Cross-ref with VARAM org registry
│ │ ├── programmes.xml ← EU programme registry
│ │ └── projects.xml ← Active project registry
│ └── archive/
│ └── original-excel-2026.xlsx ← Original for audit trail
├── gui/
│ ├── index.html ← Classification editor SPA
│ ├── src/ ← React components
│ └── package.json
├── render/
│ ├── classification.xslt ← Human-readable transform
│ └── classification.html ← Auto-generated view
├── mcp/
│ └── server-config.yaml ← MCP server endpoint
├── tools/
│ ├── import-excel.py ← One-time Excel import
│ ├── export-excel.py ← Read-only Excel generation
│ ├── validate.py ← XSD validation
│ └── retention-calculator.py ← Retention date computation
└── docs/
├── migration-mapping.md ← Old code → new code mapping
└── normative-basis.md ← Legal references
```
---
## 8. MCP Server Integration
Extend the existing ProcessGit MCP pattern (already live for VARAM Organizations Register):
| MCP Tool | Input | Output |
|----------|-------|--------|
| `vdvc:search` | Full-text query in LV/EN | Matching categories with context |
| `vdvc:get_category` | Category code | Full details + metadata |
| `vdvc:list_categories` | Filters: domain, level, dept, programme | Filtered list |
| `vdvc:suggest_category` | Document title + body text | Top 3-5 category suggestions with confidence |
| `vdvc:validate_code` | Category code | Validity check + active status |
| `vdvc:calculate_retention` | Category code + document date | Retention expiry date |
| `vdvc:describe_model` | — | Schema structure, vocabularies, stats |
The `suggest_category` tool is the **key efficiency enabler**: instead of a clerk navigating ~100 categories, the AI reads the document and recommends the best matches.
---
## 9. Roadmap
| Phase | Duration | Deliverables |
|-------|----------|-------------|
| **Phase 1**: Assessment approval & schema design | 1 week | Approved XSD, normalization rules, migration mapping |
| **Phase 2**: Data cleaning & functional restructure | 2-3 weeks | Normalized XML with ~100 categories; old→new code mapping |
| **Phase 3**: GUI development | 3-4 weeks | React SPA on ProcessGit; tree editor, validation, export |
| **Phase 4**: ProcessGit deployment & MCP server | 1-2 weeks | Live repo, MCP endpoint, vocabularies |
| **Phase 5**: AI description generation | 1-2 weeks | AI-drafted Latvian descriptions for all categories |
| **Phase 6**: DVS "Namejs" integration | 2-3 weeks | Classification import adapter, clerk-facing AI assist |
**Total: 10-15 weeks**
---
## 10. Risk Assessment
| Risk | Impact | Mitigation |
|------|--------|------------|
| LNA (Latvijas Nacionālais arhīvs) rejects restructured schema | HIGH | Maintain old↔new code mapping; preserve all retention terms with `originalText`; engage LNA early |
| Lietvedis staff resist moving from Excel | MEDIUM | GUI provides Excel-like table view; generate read-only Excel exports on demand |
| Normative acts explicitly reference old codes | MEDIUM | Deprecate rather than delete; old codes resolve to new via alias table |
| Project-as-metadata breaks DVS "Namejs" import format | MEDIUM | Provide flat-file export that expands metadata back to rows for legacy DVS |
| Functional restructure conflicts with department ownership | MEDIUM | Map departments to functions, not to categories; allow multi-department tags |
---
## 11. Conclusion
The current classification schema is complicated not because document management is inherently complex, but because **normative document origins have been used as structural taxonomy levels** instead of metadata. Every new EU project, every new regulatory delegation, creates a new branch in the tree rather than a new tag on an existing functional category.
The proposed approach:
1. **Restructures** the tree from 647 entries to ~100 functional categories
2. **Enriches** each category with metadata (project, programme, legal basis, audience)
3. **Replaces Excel** with a validated, Git-backed web GUI
4. **Serves** the schema via MCP for AI-assisted classification
5. **Complies** with MK Nr. 282 §33 function-based classification requirements
The VDVC namespace ensures this approach can be replicated across government institutions, not just VARAM.