26 KiB
VDVC Document Classification Schema — Assessment & Transformation Proposal
Subject: VARAM DVS "Namejs" Document Classification Schema 2026
VDVC namespace: urn:vdvc:classification:2026
Regulatory basis: MK noteikumi Nr. 282 (07.05.2024) "Dokumentu un arhīvu pārvaldības noteikumi"
Prepared by: Rihards / PwC Latvia — Digitalization, AI & Cybersecurity
Date: February 2026
1. Executive Summary
VARAM's Document Management System (DVS "Namejs") relies on a classification schema ("klasifikācijas shēma", formerly "lietu nomenklatūra") maintained as a human-edited Excel spreadsheet with 647 coded entries across 3 domains and up to 5 hierarchy levels.
This assessment identifies three layers of problems: data quality issues in the spreadsheet itself (fixable mechanically), structural design issues in the schema (fixable with refactoring), and a fundamental architectural problem — the classification philosophy conflates normative document origins with functional classification, producing an unmanageably large, duplicate-heavy taxonomy that is hostile to both human clerks and DVS systems.
The proposed solution is a VDVC-namespaced, Git-versioned XML repository on ProcessGit with a custom web-based management GUI (not Excel) backed by XSD-validated XML, served via MCP endpoint for AI-assisted document classification.
2. Regulatory Framework
2.1 MK noteikumi Nr. 282 (07.05.2024)
The governing regulation prescribes a function-based hierarchical classification (§33):
| Level | MK Nr. 282 Definition | What It Should Contain |
|---|---|---|
| L1 | Institūcijas funkcija vai augstākā struktūrvienība | Broad organizational function (e.g., "Management", "HR", "Procurement") |
| L2 | Funkcijas izpildes nepieciešamie uzdevumi (procesi) | Processes within the function (e.g., "Recruitment", "Payroll") |
| L3 | Uzdevumu veikšanai nepieciešamās darbības | Specific activities/document types (e.g., "Employment contracts", "Timesheets") |
Key regulatory requirements:
- Schema must be synchronized with Latvijas Nacionālais arhīvs (LNA) every 5 years (§42)
- Sector-level schemas ("nozares klasifikācijas shēma") every 8 years (§42)
- Must specify: index, name, retention term, responsible unit, media type, IS location (§31)
- Classification basis: functions, structural units, document types, or mixed (§33)
2.2 VDVC Context
The schema should use the VDVC (Valsts Dokumentu Vadības Centrs) namespace since VDVC is the country-wide document management authority under VARAM management, and this classification approach could be standardized across government institutions — not just VARAM internally.
3. Critical Assessment: Why Is the Classification So Complicated?
3.1 The Core Problem — Normative Document Proliferation
VARAM's explanation is that every new document category originates from a normative act (law, MK regulation, EU directive) that delegates a process to the organization. When a new regulation is adopted, a new classification entry is created. This approach is fundamentally flawed for the following reasons:
Problem A: Confusing "What Triggered the Document" with "What Kind of Document It Is"
MK Nr. 282 §33 defines classification by function and process, not by legal basis. The legal basis for a document is metadata (a property of the document), not a structural category. When VARAM creates a separate category for "Sarakste ar valsts pārvaldes iestādēm DIENESTA VAJADZĪBĀM" (P-1-13-5) vs "Sarakste ar valsts pārvaldes iestādēm, juridiskām un fiziskām personām" (P-1-13-2) vs "Sarakste ar valsts pārvaldes iestādēm jautājumiem, kas saistīti ar valsts noslēpumu" (P-1-13-9), these are all the same function (correspondence) with different metadata attributes (audience, classification level).
A proper design would have:
P-1-13 Sarakste (Correspondence)
→ metadata: audience = [government | private | foreign | internal | classified]
→ metadata: securityLevel = [public | restricted | secret]
Instead of 9 sub-categories of correspondence with identical document types inside them.
Problem B: EU Investment Project Explosion
The most egregious example is I2 (Investīciju projektu ieviešana) with 33 top-level entries — each representing a specific EU-funded project:
I2-1 Projekta "Informācijas sistēmu ... Nr. 2.2.1.1/17/I/012" dokumenti
I2-2 Projekta "Atvērto datu ... Nr. 2.2.1.1/19/I/004" dokumenti
...
I2-33 Projekta "Valsts pārvaldes vienota valsts finanšu..." dokumenti
Each of these 33 projects then has identical sub-structure: correspondence, contracts, orders, communications materials. This is a textbook example of data masquerading as structure. The project identity is a data attribute, not a classification level.
A proper design:
I2-1 Investīciju projektu ieviešana (Project Implementation)
I2-1-1 Korespondence (Correspondence)
I2-1-2 Līgumi (Contracts)
I2-1-3 Rīkojumi, protokoli (Orders, protocols)
I2-1-4 Komunikācijas materiāli (Communications)
→ metadata: projectId = "2.2.1.1/17/I/012"
→ metadata: projectName = "Informācijas sistēmu..."
→ metadata: fundingSource = "ERAF" | "ANM" | ...
This would reduce I2 from ~166 entries to ~10, while preserving all information through metadata.
Problem C: I1 (Investīciju programmu vadība) Duplicates
Similarly, I1 has 16+ programme-level groups (I1-1 through I1-16), each with largely identical sub-structures for different EU operational programmes. The programmes differ in retention dates and responsible departments, but these are metadata, not structure.
Current: 327 entries in I1 Proposed: ~40-50 entries (function-based) + programme as metadata
Problem D: Category Count vs. Clerk Cognitive Capacity
With ~400 leaf categories (plus ~250 structural grouping rows), a clerk creating a new document faces an impossible cognitive task. Research in classification science (Rosch, 1978; Miller, 1956) shows humans can reliably distinguish 7±2 categories at each level. VARAM's schema has:
- 3 domain categories (P, I1, I2) — good
- 9-33 L1 categories per domain — border-case to unmanageable
- Up to 13+ L2 per L1 — too many, especially without descriptions
The result is predictable: clerks default to a handful of "safe" categories, misclassify documents, or spend excessive time navigating the hierarchy — defeating the purpose of classification.
3.2 Structural Assessment Summary
| Metric | Current | Proposed (after normalization) |
|---|---|---|
| Total entries | 647 | ~120-150 |
| Leaf categories (clerk-facing) | ~496 | ~80-100 |
| I2 entries | 166 | ~10-15 |
| I1 entries | 327 | ~40-50 |
| Max L2 categories per L1 | 33 | ≤10 |
| Project-specific categories | ~200+ | 0 (metadata) |
| Duplicate structural patterns | ~30 identical sub-trees | 0 |
3.3 What Should Be Structure vs. What Should Be Metadata
| Currently a Category Level | Should Be | Reason |
|---|---|---|
| Specific EU project name | Metadata tag | Project is an instance, not a function |
| Specific EU programme | Metadata tag | Programme is a funding context |
| Audience of correspondence | Metadata enum | Audience doesn't change the document type |
| Security classification | Metadata field | Orthogonal to document function |
| EU Commission flag on retention | Metadata boolean | Compliance attribute, not structure |
| Department assignment | Metadata reference | Departments change; functions don't |
4. Data Quality Issues (Spreadsheet-Level)
4.1 Issues Summary
| # | Issue | Severity | Scope |
|---|---|---|---|
| 1 | Mixed code separators (hyphens and dots) | CRITICAL | 221/647 codes (34%) |
| 2 | NBSP (\xa0) disguised as empty retention | HIGH | 93 rows |
| 3 | 50+ retention term format variants | HIGH | 496 retention values |
| 4 | Zero descriptions in Description column | HIGH | 100% of rows |
| 5 | Multi-department free-text assignments | MEDIUM | 67 rows |
| 6 | Level data in wrong columns | MEDIUM | 64 rows |
| 7 | Typo: Il-9-2 instead of I1-9-2 |
LOW | 1 row |
| 8 | Trailing dot in code I1-13-1.1. |
LOW | 1 row |
| 9 | Typo: "Patstāvīgi" instead of "Pastāvīgi" | LOW | 4 rows |
(Full technical detail in previous assessment — omitted for brevity)
4.2 Retention Term Normalization
50+ free-text variants need consolidation to 5 structured types:
| Type | Example Input | Structured Output |
|---|---|---|
| Permanent | "Pastāvīgi", "Patstāvīgi" | <permanent/> |
| Duration | "5 gadi", "75 gadi" | <duration years="5"/> |
| Duration + trigger | "5 gadi pēc projekta noslēguma..." | <duration years="5" trigger="project_closure"/> |
| Fixed date | "31.12.2034.", "2031-12-31 00:00:00" | <fixedDate>2034-12-31</fixedDate> |
| EU flagged | "31.12.2032. EK" | <fixedDate euCommission="true">2032-12-31</fixedDate> |
5. Proposed Architecture
5.1 Design Principles
- VDVC namespace —
urn:vdvc:classification:2026— reusable across government - Function-first classification — per MK Nr. 282 §33, classify by what the organization does, not by what regulation triggered the document
- Metadata-rich, structure-lean — project, programme, audience as tags, not tree levels
- No Excel — custom web GUI that edits backend XML directly; prevents spreadsheet drift
- Git-versioned SSOT — XML on ProcessGit with full audit trail
- MCP-served — machine-readable API for DVS integration and AI-assisted classification
5.2 XSD Schema (VDVC Domain)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="urn:vdvc:classification:2026"
xmlns:vdvc="urn:vdvc:classification:2026">
<xs:element name="classificationSchema" type="vdvc:SchemaType"/>
<xs:complexType name="SchemaType">
<xs:sequence>
<xs:element name="metadata" type="vdvc:MetadataType"/>
<xs:element name="vocabularies" type="vdvc:VocabulariesType"/>
<xs:element name="domains" type="vdvc:DomainListType"/>
</xs:sequence>
<xs:attribute name="version" type="xs:string" use="required"/>
<xs:attribute name="effectiveDate" type="xs:date" use="required"/>
<xs:attribute name="institution" type="xs:string" use="required"/>
</xs:complexType>
<!-- Controlled vocabularies (departments, programmes, projects) -->
<xs:complexType name="VocabulariesType">
<xs:sequence>
<xs:element name="departments" type="vdvc:DeptListType"/>
<xs:element name="programmes" type="vdvc:ProgrammeListType" minOccurs="0"/>
<xs:element name="projects" type="vdvc:ProjectListType" minOccurs="0"/>
<xs:element name="retentionTerms" type="vdvc:RetTermListType"/>
</xs:sequence>
</xs:complexType>
<!-- Category node (recursive, function-based) -->
<xs:complexType name="CategoryType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="description" type="xs:string" minOccurs="0"/>
<xs:element name="legalBasis" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="retention" type="vdvc:RetentionType" minOccurs="0"/>
<xs:element name="responsibleUnits" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="unitRef" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="mediaType" type="vdvc:MediaTypeEnum" minOccurs="0"/>
<xs:element name="system" type="xs:string" minOccurs="0"/>
<xs:element name="applicableContexts" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="programmeRef" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="projectRef" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="subcategory" type="vdvc:CategoryType"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="code" type="xs:string" use="required"/>
<xs:attribute name="level" type="xs:positiveInteger" use="required"/>
<xs:attribute name="status" type="vdvc:StatusEnum" default="active"/>
</xs:complexType>
<!-- Retention: structured, not free-text -->
<xs:complexType name="RetentionType">
<xs:choice>
<xs:element name="permanent" type="xs:boolean"/>
<xs:element name="duration">
<xs:complexType>
<xs:attribute name="years" type="xs:positiveInteger" use="required"/>
<xs:attribute name="triggerEvent" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:element name="fixedDate" type="xs:date"/>
</xs:choice>
<xs:attribute name="euCommission" type="xs:boolean" default="false"/>
<xs:attribute name="legalReference" type="xs:string"/>
<xs:attribute name="originalText" type="xs:string"/>
</xs:complexType>
<!-- Enumerations -->
<xs:simpleType name="MediaTypeEnum">
<xs:restriction base="xs:string">
<xs:enumeration value="electronic"/>
<xs:enumeration value="paper"/>
<xs:enumeration value="hybrid"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="StatusEnum">
<xs:restriction base="xs:string">
<xs:enumeration value="active"/>
<xs:enumeration value="deprecated"/>
<xs:enumeration value="draft"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
Key design decisions:
legalBasisis a metadata field on categories, not a structural level — normative acts reference which regulations require this category, but don't create separate tree branchesapplicableContextswithprogrammeRef/projectRefreplaces the 33 duplicate I2 sub-trees — a single "Korespondence" category can be tagged with all applicable projectsstatusenables deprecation without deletion (audit trail)retentionTypewithlegalReferencelinks retention to its legal source
5.3 Proposed Simplified Classification Tree
VARAM Classification Schema (VDVC:2026)
P — Pārvalde (Administration)
├── P-1 Iestādes vadība (Institutional Management)
│ ├── P-1-1 Normatīvie dokumenti (Regulatory documents)
│ ├── P-1-2 Rīkojumi (Orders)
│ ├── P-1-3 Sanāksmes un protokoli (Meetings & protocols)
│ ├── P-1-4 Plānošana un pārskati (Planning & reporting)
│ ├── P-1-5 Sarakste (Correspondence)
│ │ → metadata: audience, securityLevel
│ ├── P-1-6 Pilnvaras un lēmumi (Authorizations & decisions)
│ └── P-1-7 Drošība un trauksme (Security & whistleblowing)
├── P-2 Budžets (Budget Planning)
├── P-3 Personālvadība (HR Management)
│ ├── P-3-1 Darba līgumi (Employment contracts)
│ ├── P-3-2 Personāla lietas (Personnel files)
│ ├── P-3-3 Apmācības (Training)
│ └── P-3-4 Novērtēšana (Performance evaluation)
├── P-4 Saimnieciskie jautājumi (Facilities)
├── P-5 Iepirkumi (Procurement)
├── P-6 Juridiskā funkcija (Legal)
├── P-7 Komunikācija (Communications)
├── P-8 Audits (Audit)
└── P-9 Finanšu vadība (Financial Management)
I1 — Investīciju programmu vadība (Programme Management)
├── I1-1 Programmu plānošana (Programme planning)
├── I1-2 Uzraudzība un kontrole (Monitoring & control)
├── I1-3 Finanšu pārvaldība (Financial management)
├── I1-4 Ziņojumi un pārskati (Reports)
├── I1-5 Maksājumi un pārbaudes (Payments & verification)
└── I1-6 Sarakste un lēmumi (Correspondence & decisions)
→ metadata: programmeRef = [ERAF, ANM, ESF, ...]
I2 — Investīciju projektu ieviešana (Project Implementation)
├── I2-1 Korespondence (Correspondence)
├── I2-2 Līgumi un grozījumi (Contracts & amendments)
├── I2-3 Rīkojumi un protokoli (Orders & protocols)
├── I2-4 Komunikācija (Communications materials)
├── I2-5 Finanšu dokumentācija (Financial documentation)
└── I2-6 Noslēguma dokumenti (Closure documents)
→ metadata: projectRef = [project-001, project-002, ...]
From 647 entries → ~80-100 functional categories + rich metadata vocabularies.
6. Custom Web GUI (Not Excel)
6.1 Why Not Excel
| Problem with Excel | Impact |
|---|---|
| People edit the Excel directly, bypassing validation | Reintroduces data quality issues |
| Cannot enforce controlled vocabularies | Free-text retention terms return |
| Cannot represent metadata (project/programme tags) on categories | Structural duplication returns |
| No validation against XSD schema | Invalid data enters the system |
| No version control / audit trail | Changes are invisible |
| Cannot embed business logic (retention calculation, department lookup) | Manual errors |
| Multiple people can have different versions | No SSOT guarantee |
6.2 GUI Architecture
┌─────────────────────────────────────────────┐
│ VDVC Classification Editor │
│ ┌───────────────────────────────────────┐ │
│ │ Tree Navigator (collapsible) │ │
│ │ ├── P — Pārvalde │ │
│ │ │ ├── P-1 Iestādes vadība │ │
│ │ │ │ ├── P-1-1 Normatīvie dok. │ │
│ │ │ │ └── P-1-2 Rīkojumi ←[EDIT] │ │
│ │ └── I1 — Investīciju programmas │ │
│ └───────────────────────────────────────┘ │
│ ┌───────────────────────────────────────┐ │
│ │ Category Detail Panel │ │
│ │ Code: [P-1-2] Status: [Active ▼] │ │
│ │ Name: [Rīkojumi un to pielikumi...] │ │
│ │ Description: [Ministru rīkojumi...] │ │
│ │ ─── Retention ─── │ │
│ │ Type: [Permanent ▼] │ │
│ │ Legal ref: [MK Nr. 282 §31] │ │
│ │ ─── Responsibility ─── │ │
│ │ Departments: [LN ×] [KD ×] [+ Add] │ │
│ │ ─── Context ─── │ │
│ │ Programmes: [all] │ │
│ │ Media: [Electronic ▼] │ │
│ │ System: [DVS Namejs ▼] │ │
│ │ ─── Legal Basis ─── │ │
│ │ [+ Add normative reference] │ │
│ └───────────────────────────────────────┘ │
│ [Save] [Validate] [Preview XML] [History] │
└─────────────────────────────────────────────┘
6.3 GUI Features
| Feature | Purpose |
|---|---|
| Tree navigation | Hierarchical browse/search with drag-drop reordering |
| Controlled vocabulary dropdowns | Departments, retention types, media types — no free-text |
| Inline XSD validation | Real-time validation as users edit; cannot save invalid data |
| Retention calculator | Input retention rule → system shows calculated expiry per document date |
| Department lookup | Autocomplete from VDVC organization registry (ProcessGit VARAM MCP) |
| Diff / history view | Git-backed change tracking with who-changed-what |
| Bulk import | One-time import from current Excel, then Excel is retired |
| Export views | Generate read-only Excel, HTML, PDF for stakeholders |
| Legal basis linker | Reference normative acts by Latvijas Vēstnesis number |
| Multi-user with roles | Lietvedis (view), department editor, schema admin |
6.4 Technology Stack
Frontend: React + Tailwind (ProcessGit-integrated SPA)
Backend: ProcessGit API + MCP server
Storage: Git repository (XML + XSD)
Validation: Client-side XSD validation + server-side on commit
Auth: ProcessGit OAuth / VARAM SSO
Deploy: processgit.org/VARAM/Document_classification_schema/
7. ProcessGit Repository Structure
VARAM/Document_classification_schema/
├── README.md
├── schema/
│ ├── vdvc-classification-2026.xsd ← Schema definition
│ └── vdvc-vocabularies.xsd ← Shared controlled vocabularies
├── data/
│ ├── varam-classification-2026.xml ← Canonical SSOT
│ ├── vocabularies/
│ │ ├── departments.xml ← Cross-ref with VARAM org registry
│ │ ├── programmes.xml ← EU programme registry
│ │ └── projects.xml ← Active project registry
│ └── archive/
│ └── original-excel-2026.xlsx ← Original for audit trail
├── gui/
│ ├── index.html ← Classification editor SPA
│ ├── src/ ← React components
│ └── package.json
├── render/
│ ├── classification.xslt ← Human-readable transform
│ └── classification.html ← Auto-generated view
├── mcp/
│ └── server-config.yaml ← MCP server endpoint
├── tools/
│ ├── import-excel.py ← One-time Excel import
│ ├── export-excel.py ← Read-only Excel generation
│ ├── validate.py ← XSD validation
│ └── retention-calculator.py ← Retention date computation
└── docs/
├── migration-mapping.md ← Old code → new code mapping
└── normative-basis.md ← Legal references
8. MCP Server Integration
Extend the existing ProcessGit MCP pattern (already live for VARAM Organizations Register):
| MCP Tool | Input | Output |
|---|---|---|
vdvc:search |
Full-text query in LV/EN | Matching categories with context |
vdvc:get_category |
Category code | Full details + metadata |
vdvc:list_categories |
Filters: domain, level, dept, programme | Filtered list |
vdvc:suggest_category |
Document title + body text | Top 3-5 category suggestions with confidence |
vdvc:validate_code |
Category code | Validity check + active status |
vdvc:calculate_retention |
Category code + document date | Retention expiry date |
vdvc:describe_model |
— | Schema structure, vocabularies, stats |
The suggest_category tool is the key efficiency enabler: instead of a clerk navigating ~100 categories, the AI reads the document and recommends the best matches.
9. Roadmap
| Phase | Duration | Deliverables |
|---|---|---|
| Phase 1: Assessment approval & schema design | 1 week | Approved XSD, normalization rules, migration mapping |
| Phase 2: Data cleaning & functional restructure | 2-3 weeks | Normalized XML with ~100 categories; old→new code mapping |
| Phase 3: GUI development | 3-4 weeks | React SPA on ProcessGit; tree editor, validation, export |
| Phase 4: ProcessGit deployment & MCP server | 1-2 weeks | Live repo, MCP endpoint, vocabularies |
| Phase 5: AI description generation | 1-2 weeks | AI-drafted Latvian descriptions for all categories |
| Phase 6: DVS "Namejs" integration | 2-3 weeks | Classification import adapter, clerk-facing AI assist |
Total: 10-15 weeks
10. Risk Assessment
| Risk | Impact | Mitigation |
|---|---|---|
| LNA (Latvijas Nacionālais arhīvs) rejects restructured schema | HIGH | Maintain old↔new code mapping; preserve all retention terms with originalText; engage LNA early |
| Lietvedis staff resist moving from Excel | MEDIUM | GUI provides Excel-like table view; generate read-only Excel exports on demand |
| Normative acts explicitly reference old codes | MEDIUM | Deprecate rather than delete; old codes resolve to new via alias table |
| Project-as-metadata breaks DVS "Namejs" import format | MEDIUM | Provide flat-file export that expands metadata back to rows for legacy DVS |
| Functional restructure conflicts with department ownership | MEDIUM | Map departments to functions, not to categories; allow multi-department tags |
11. Conclusion
The current classification schema is complicated not because document management is inherently complex, but because normative document origins have been used as structural taxonomy levels instead of metadata. Every new EU project, every new regulatory delegation, creates a new branch in the tree rather than a new tag on an existing functional category.
The proposed approach:
- Restructures the tree from 647 entries to ~100 functional categories
- Enriches each category with metadata (project, programme, legal basis, audience)
- Replaces Excel with a validated, Git-backed web GUI
- Serves the schema via MCP for AI-assisted classification
- Complies with MK Nr. 282 §33 function-based classification requirements
The VDVC namespace ensures this approach can be replicated across government institutions, not just VARAM.