# VDVC Document Classification Schema — Assessment & Transformation Proposal **Subject:** VARAM DVS "Namejs" Document Classification Schema 2026 **VDVC namespace:** `urn:vdvc:classification:2026` **Regulatory basis:** MK noteikumi Nr. 282 (07.05.2024) "Dokumentu un arhīvu pārvaldības noteikumi" **Prepared by:** Rihards / PwC Latvia — Digitalization, AI & Cybersecurity **Date:** February 2026 --- ## 1. Executive Summary VARAM's Document Management System (DVS "Namejs") relies on a classification schema ("klasifikācijas shēma", formerly "lietu nomenklatūra") maintained as a human-edited Excel spreadsheet with **647 coded entries** across 3 domains and up to 5 hierarchy levels. This assessment identifies **three layers of problems**: data quality issues in the spreadsheet itself (fixable mechanically), structural design issues in the schema (fixable with refactoring), and a **fundamental architectural problem** — the classification philosophy conflates normative document origins with functional classification, producing an unmanageably large, duplicate-heavy taxonomy that is hostile to both human clerks and DVS systems. The proposed solution is a VDVC-namespaced, Git-versioned XML repository on ProcessGit with a **custom web-based management GUI** (not Excel) backed by XSD-validated XML, served via MCP endpoint for AI-assisted document classification. --- ## 2. Regulatory Framework ### 2.1 MK noteikumi Nr. 282 (07.05.2024) The governing regulation prescribes a **function-based hierarchical classification** (§33): | Level | MK Nr. 282 Definition | What It Should Contain | |-------|----------------------|----------------------| | **L1** | Institūcijas **funkcija** vai augstākā struktūrvienība | Broad organizational function (e.g., "Management", "HR", "Procurement") | | **L2** | Funkcijas izpildes nepieciešamie **uzdevumi (procesi)** | Processes within the function (e.g., "Recruitment", "Payroll") | | **L3** | Uzdevumu veikšanai nepieciešamās **darbības** | Specific activities/document types (e.g., "Employment contracts", "Timesheets") | Key regulatory requirements: - Schema must be synchronized with Latvijas Nacionālais arhīvs (LNA) every **5 years** (§42) - Sector-level schemas ("nozares klasifikācijas shēma") every **8 years** (§42) - Must specify: index, name, retention term, responsible unit, media type, IS location (§31) - Classification basis: functions, structural units, document types, or **mixed** (§33) ### 2.2 VDVC Context The schema should use the **VDVC** (Valsts Dokumentu Vadības Centrs) namespace since VDVC is the country-wide document management authority under VARAM management, and this classification approach could be standardized across government institutions — not just VARAM internally. --- ## 3. Critical Assessment: Why Is the Classification So Complicated? ### 3.1 The Core Problem — Normative Document Proliferation VARAM's explanation is that every new document category originates from a normative act (law, MK regulation, EU directive) that delegates a process to the organization. When a new regulation is adopted, a new classification entry is created. **This approach is fundamentally flawed** for the following reasons: #### Problem A: Confusing "What Triggered the Document" with "What Kind of Document It Is" MK Nr. 282 §33 defines classification by **function and process**, not by **legal basis**. The legal basis for a document is metadata (a property of the document), not a structural category. When VARAM creates a separate category for "Sarakste ar valsts pārvaldes iestādēm DIENESTA VAJADZĪBĀM" (P-1-13-5) vs "Sarakste ar valsts pārvaldes iestādēm, juridiskām un fiziskām personām" (P-1-13-2) vs "Sarakste ar valsts pārvaldes iestādēm jautājumiem, kas saistīti ar valsts noslēpumu" (P-1-13-9), **these are all the same function** (correspondence) with different metadata attributes (audience, classification level). A proper design would have: ``` P-1-13 Sarakste (Correspondence) → metadata: audience = [government | private | foreign | internal | classified] → metadata: securityLevel = [public | restricted | secret] ``` Instead of 9 sub-categories of correspondence with identical document types inside them. #### Problem B: EU Investment Project Explosion The most egregious example is **I2 (Investīciju projektu ieviešana)** with **33 top-level entries** — each representing a specific EU-funded project: ``` I2-1 Projekta "Informācijas sistēmu ... Nr. 2.2.1.1/17/I/012" dokumenti I2-2 Projekta "Atvērto datu ... Nr. 2.2.1.1/19/I/004" dokumenti ... I2-33 Projekta "Valsts pārvaldes vienota valsts finanšu..." dokumenti ``` Each of these 33 projects then has **identical sub-structure**: correspondence, contracts, orders, communications materials. This is a textbook example of **data masquerading as structure**. The project identity is a data attribute, not a classification level. A proper design: ``` I2-1 Investīciju projektu ieviešana (Project Implementation) I2-1-1 Korespondence (Correspondence) I2-1-2 Līgumi (Contracts) I2-1-3 Rīkojumi, protokoli (Orders, protocols) I2-1-4 Komunikācijas materiāli (Communications) → metadata: projectId = "2.2.1.1/17/I/012" → metadata: projectName = "Informācijas sistēmu..." → metadata: fundingSource = "ERAF" | "ANM" | ... ``` This would reduce I2 from **~166 entries to ~10**, while preserving all information through metadata. #### Problem C: I1 (Investīciju programmu vadība) Duplicates Similarly, I1 has **16+ programme-level groups** (I1-1 through I1-16), each with largely identical sub-structures for different EU operational programmes. The programmes differ in retention dates and responsible departments, but these are metadata, not structure. Current: **327 entries** in I1 Proposed: **~40-50 entries** (function-based) + programme as metadata #### Problem D: Category Count vs. Clerk Cognitive Capacity With **~400 leaf categories** (plus ~250 structural grouping rows), a clerk creating a new document faces an impossible cognitive task. Research in classification science (Rosch, 1978; Miller, 1956) shows humans can reliably distinguish 7±2 categories at each level. VARAM's schema has: - 3 domain categories (P, I1, I2) — **good** - 9-33 L1 categories per domain — **border-case to unmanageable** - Up to 13+ L2 per L1 — **too many, especially without descriptions** The result is predictable: clerks default to a handful of "safe" categories, misclassify documents, or spend excessive time navigating the hierarchy — defeating the purpose of classification. ### 3.2 Structural Assessment Summary | Metric | Current | Proposed (after normalization) | |--------|---------|-------------------------------| | Total entries | 647 | ~120-150 | | Leaf categories (clerk-facing) | ~496 | ~80-100 | | I2 entries | 166 | ~10-15 | | I1 entries | 327 | ~40-50 | | Max L2 categories per L1 | 33 | ≤10 | | Project-specific categories | ~200+ | 0 (metadata) | | Duplicate structural patterns | ~30 identical sub-trees | 0 | ### 3.3 What Should Be Structure vs. What Should Be Metadata | Currently a Category Level | Should Be | Reason | |---------------------------|-----------|--------| | Specific EU project name | **Metadata tag** | Project is an instance, not a function | | Specific EU programme | **Metadata tag** | Programme is a funding context | | Audience of correspondence | **Metadata enum** | Audience doesn't change the document type | | Security classification | **Metadata field** | Orthogonal to document function | | EU Commission flag on retention | **Metadata boolean** | Compliance attribute, not structure | | Department assignment | **Metadata reference** | Departments change; functions don't | --- ## 4. Data Quality Issues (Spreadsheet-Level) ### 4.1 Issues Summary | # | Issue | Severity | Scope | |---|-------|----------|-------| | 1 | Mixed code separators (hyphens and dots) | CRITICAL | 221/647 codes (34%) | | 2 | NBSP (\\xa0) disguised as empty retention | HIGH | 93 rows | | 3 | 50+ retention term format variants | HIGH | 496 retention values | | 4 | Zero descriptions in Description column | HIGH | 100% of rows | | 5 | Multi-department free-text assignments | MEDIUM | 67 rows | | 6 | Level data in wrong columns | MEDIUM | 64 rows | | 7 | Typo: `Il-9-2` instead of `I1-9-2` | LOW | 1 row | | 8 | Trailing dot in code `I1-13-1.1.` | LOW | 1 row | | 9 | Typo: "Patstāvīgi" instead of "Pastāvīgi" | LOW | 4 rows | *(Full technical detail in previous assessment — omitted for brevity)* ### 4.2 Retention Term Normalization 50+ free-text variants need consolidation to 5 structured types: | Type | Example Input | Structured Output | |------|--------------|-------------------| | Permanent | "Pastāvīgi", "Patstāvīgi" | `` | | Duration | "5 gadi", "75 gadi" | `` | | Duration + trigger | "5 gadi pēc projekta noslēguma..." | `` | | Fixed date | "31.12.2034.", "2031-12-31 00:00:00" | `2034-12-31` | | EU flagged | "31.12.2032. EK" | `2032-12-31` | --- ## 5. Proposed Architecture ### 5.1 Design Principles 1. **VDVC namespace** — `urn:vdvc:classification:2026` — reusable across government 2. **Function-first classification** — per MK Nr. 282 §33, classify by what the organization does, not by what regulation triggered the document 3. **Metadata-rich, structure-lean** — project, programme, audience as tags, not tree levels 4. **No Excel** — custom web GUI that edits backend XML directly; prevents spreadsheet drift 5. **Git-versioned SSOT** — XML on ProcessGit with full audit trail 6. **MCP-served** — machine-readable API for DVS integration and AI-assisted classification ### 5.2 XSD Schema (VDVC Domain) ```xml ``` **Key design decisions:** - `legalBasis` is a **metadata field on categories**, not a structural level — normative acts reference which regulations require this category, but don't create separate tree branches - `applicableContexts` with `programmeRef` / `projectRef` replaces the 33 duplicate I2 sub-trees — a single "Korespondence" category can be tagged with all applicable projects - `status` enables deprecation without deletion (audit trail) - `retentionType` with `legalReference` links retention to its legal source ### 5.3 Proposed Simplified Classification Tree ``` VARAM Classification Schema (VDVC:2026) P — Pārvalde (Administration) ├── P-1 Iestādes vadība (Institutional Management) │ ├── P-1-1 Normatīvie dokumenti (Regulatory documents) │ ├── P-1-2 Rīkojumi (Orders) │ ├── P-1-3 Sanāksmes un protokoli (Meetings & protocols) │ ├── P-1-4 Plānošana un pārskati (Planning & reporting) │ ├── P-1-5 Sarakste (Correspondence) │ │ → metadata: audience, securityLevel │ ├── P-1-6 Pilnvaras un lēmumi (Authorizations & decisions) │ └── P-1-7 Drošība un trauksme (Security & whistleblowing) ├── P-2 Budžets (Budget Planning) ├── P-3 Personālvadība (HR Management) │ ├── P-3-1 Darba līgumi (Employment contracts) │ ├── P-3-2 Personāla lietas (Personnel files) │ ├── P-3-3 Apmācības (Training) │ └── P-3-4 Novērtēšana (Performance evaluation) ├── P-4 Saimnieciskie jautājumi (Facilities) ├── P-5 Iepirkumi (Procurement) ├── P-6 Juridiskā funkcija (Legal) ├── P-7 Komunikācija (Communications) ├── P-8 Audits (Audit) └── P-9 Finanšu vadība (Financial Management) I1 — Investīciju programmu vadība (Programme Management) ├── I1-1 Programmu plānošana (Programme planning) ├── I1-2 Uzraudzība un kontrole (Monitoring & control) ├── I1-3 Finanšu pārvaldība (Financial management) ├── I1-4 Ziņojumi un pārskati (Reports) ├── I1-5 Maksājumi un pārbaudes (Payments & verification) └── I1-6 Sarakste un lēmumi (Correspondence & decisions) → metadata: programmeRef = [ERAF, ANM, ESF, ...] I2 — Investīciju projektu ieviešana (Project Implementation) ├── I2-1 Korespondence (Correspondence) ├── I2-2 Līgumi un grozījumi (Contracts & amendments) ├── I2-3 Rīkojumi un protokoli (Orders & protocols) ├── I2-4 Komunikācija (Communications materials) ├── I2-5 Finanšu dokumentācija (Financial documentation) └── I2-6 Noslēguma dokumenti (Closure documents) → metadata: projectRef = [project-001, project-002, ...] ``` **From 647 entries → ~80-100 functional categories** + rich metadata vocabularies. --- ## 6. Custom Web GUI (Not Excel) ### 6.1 Why Not Excel | Problem with Excel | Impact | |-------------------|--------| | People edit the Excel directly, bypassing validation | Reintroduces data quality issues | | Cannot enforce controlled vocabularies | Free-text retention terms return | | Cannot represent metadata (project/programme tags) on categories | Structural duplication returns | | No validation against XSD schema | Invalid data enters the system | | No version control / audit trail | Changes are invisible | | Cannot embed business logic (retention calculation, department lookup) | Manual errors | | Multiple people can have different versions | No SSOT guarantee | ### 6.2 GUI Architecture ``` ┌─────────────────────────────────────────────┐ │ VDVC Classification Editor │ │ ┌───────────────────────────────────────┐ │ │ │ Tree Navigator (collapsible) │ │ │ │ ├── P — Pārvalde │ │ │ │ │ ├── P-1 Iestādes vadība │ │ │ │ │ │ ├── P-1-1 Normatīvie dok. │ │ │ │ │ │ └── P-1-2 Rīkojumi ←[EDIT] │ │ │ │ └── I1 — Investīciju programmas │ │ │ └───────────────────────────────────────┘ │ │ ┌───────────────────────────────────────┐ │ │ │ Category Detail Panel │ │ │ │ Code: [P-1-2] Status: [Active ▼] │ │ │ │ Name: [Rīkojumi un to pielikumi...] │ │ │ │ Description: [Ministru rīkojumi...] │ │ │ │ ─── Retention ─── │ │ │ │ Type: [Permanent ▼] │ │ │ │ Legal ref: [MK Nr. 282 §31] │ │ │ │ ─── Responsibility ─── │ │ │ │ Departments: [LN ×] [KD ×] [+ Add] │ │ │ │ ─── Context ─── │ │ │ │ Programmes: [all] │ │ │ │ Media: [Electronic ▼] │ │ │ │ System: [DVS Namejs ▼] │ │ │ │ ─── Legal Basis ─── │ │ │ │ [+ Add normative reference] │ │ │ └───────────────────────────────────────┘ │ │ [Save] [Validate] [Preview XML] [History] │ └─────────────────────────────────────────────┘ ``` ### 6.3 GUI Features | Feature | Purpose | |---------|---------| | **Tree navigation** | Hierarchical browse/search with drag-drop reordering | | **Controlled vocabulary dropdowns** | Departments, retention types, media types — no free-text | | **Inline XSD validation** | Real-time validation as users edit; cannot save invalid data | | **Retention calculator** | Input retention rule → system shows calculated expiry per document date | | **Department lookup** | Autocomplete from VDVC organization registry (ProcessGit VARAM MCP) | | **Diff / history view** | Git-backed change tracking with who-changed-what | | **Bulk import** | One-time import from current Excel, then Excel is retired | | **Export views** | Generate read-only Excel, HTML, PDF for stakeholders | | **Legal basis linker** | Reference normative acts by Latvijas Vēstnesis number | | **Multi-user with roles** | Lietvedis (view), department editor, schema admin | ### 6.4 Technology Stack ``` Frontend: React + Tailwind (ProcessGit-integrated SPA) Backend: ProcessGit API + MCP server Storage: Git repository (XML + XSD) Validation: Client-side XSD validation + server-side on commit Auth: ProcessGit OAuth / VARAM SSO Deploy: processgit.org/VARAM/Document_classification_schema/ ``` --- ## 7. ProcessGit Repository Structure ``` VARAM/Document_classification_schema/ ├── README.md ├── schema/ │ ├── vdvc-classification-2026.xsd ← Schema definition │ └── vdvc-vocabularies.xsd ← Shared controlled vocabularies ├── data/ │ ├── varam-classification-2026.xml ← Canonical SSOT │ ├── vocabularies/ │ │ ├── departments.xml ← Cross-ref with VARAM org registry │ │ ├── programmes.xml ← EU programme registry │ │ └── projects.xml ← Active project registry │ └── archive/ │ └── original-excel-2026.xlsx ← Original for audit trail ├── gui/ │ ├── index.html ← Classification editor SPA │ ├── src/ ← React components │ └── package.json ├── render/ │ ├── classification.xslt ← Human-readable transform │ └── classification.html ← Auto-generated view ├── mcp/ │ └── server-config.yaml ← MCP server endpoint ├── tools/ │ ├── import-excel.py ← One-time Excel import │ ├── export-excel.py ← Read-only Excel generation │ ├── validate.py ← XSD validation │ └── retention-calculator.py ← Retention date computation └── docs/ ├── migration-mapping.md ← Old code → new code mapping └── normative-basis.md ← Legal references ``` --- ## 8. MCP Server Integration Extend the existing ProcessGit MCP pattern (already live for VARAM Organizations Register): | MCP Tool | Input | Output | |----------|-------|--------| | `vdvc:search` | Full-text query in LV/EN | Matching categories with context | | `vdvc:get_category` | Category code | Full details + metadata | | `vdvc:list_categories` | Filters: domain, level, dept, programme | Filtered list | | `vdvc:suggest_category` | Document title + body text | Top 3-5 category suggestions with confidence | | `vdvc:validate_code` | Category code | Validity check + active status | | `vdvc:calculate_retention` | Category code + document date | Retention expiry date | | `vdvc:describe_model` | — | Schema structure, vocabularies, stats | The `suggest_category` tool is the **key efficiency enabler**: instead of a clerk navigating ~100 categories, the AI reads the document and recommends the best matches. --- ## 9. Roadmap | Phase | Duration | Deliverables | |-------|----------|-------------| | **Phase 1**: Assessment approval & schema design | 1 week | Approved XSD, normalization rules, migration mapping | | **Phase 2**: Data cleaning & functional restructure | 2-3 weeks | Normalized XML with ~100 categories; old→new code mapping | | **Phase 3**: GUI development | 3-4 weeks | React SPA on ProcessGit; tree editor, validation, export | | **Phase 4**: ProcessGit deployment & MCP server | 1-2 weeks | Live repo, MCP endpoint, vocabularies | | **Phase 5**: AI description generation | 1-2 weeks | AI-drafted Latvian descriptions for all categories | | **Phase 6**: DVS "Namejs" integration | 2-3 weeks | Classification import adapter, clerk-facing AI assist | **Total: 10-15 weeks** --- ## 10. Risk Assessment | Risk | Impact | Mitigation | |------|--------|------------| | LNA (Latvijas Nacionālais arhīvs) rejects restructured schema | HIGH | Maintain old↔new code mapping; preserve all retention terms with `originalText`; engage LNA early | | Lietvedis staff resist moving from Excel | MEDIUM | GUI provides Excel-like table view; generate read-only Excel exports on demand | | Normative acts explicitly reference old codes | MEDIUM | Deprecate rather than delete; old codes resolve to new via alias table | | Project-as-metadata breaks DVS "Namejs" import format | MEDIUM | Provide flat-file export that expands metadata back to rows for legacy DVS | | Functional restructure conflicts with department ownership | MEDIUM | Map departments to functions, not to categories; allow multi-department tags | --- ## 11. Conclusion The current classification schema is complicated not because document management is inherently complex, but because **normative document origins have been used as structural taxonomy levels** instead of metadata. Every new EU project, every new regulatory delegation, creates a new branch in the tree rather than a new tag on an existing functional category. The proposed approach: 1. **Restructures** the tree from 647 entries to ~100 functional categories 2. **Enriches** each category with metadata (project, programme, legal basis, audience) 3. **Replaces Excel** with a validated, Git-backed web GUI 4. **Serves** the schema via MCP for AI-assisted classification 5. **Complies** with MK Nr. 282 §33 function-based classification requirements The VDVC namespace ensures this approach can be replicated across government institutions, not just VARAM.