1
0
Files
2026-02-08 23:12:26 +00:00

26 KiB

VDVC Document Classification Schema — Assessment & Transformation Proposal

Subject: VARAM DVS "Namejs" Document Classification Schema 2026 VDVC namespace: urn:vdvc:classification:2026 Regulatory basis: MK noteikumi Nr. 282 (07.05.2024) "Dokumentu un arhīvu pārvaldības noteikumi" Prepared by: Rihards / PwC Latvia — Digitalization, AI & Cybersecurity Date: February 2026


1. Executive Summary

VARAM's Document Management System (DVS "Namejs") relies on a classification schema ("klasifikācijas shēma", formerly "lietu nomenklatūra") maintained as a human-edited Excel spreadsheet with 647 coded entries across 3 domains and up to 5 hierarchy levels.

This assessment identifies three layers of problems: data quality issues in the spreadsheet itself (fixable mechanically), structural design issues in the schema (fixable with refactoring), and a fundamental architectural problem — the classification philosophy conflates normative document origins with functional classification, producing an unmanageably large, duplicate-heavy taxonomy that is hostile to both human clerks and DVS systems.

The proposed solution is a VDVC-namespaced, Git-versioned XML repository on ProcessGit with a custom web-based management GUI (not Excel) backed by XSD-validated XML, served via MCP endpoint for AI-assisted document classification.


2. Regulatory Framework

2.1 MK noteikumi Nr. 282 (07.05.2024)

The governing regulation prescribes a function-based hierarchical classification (§33):

Level MK Nr. 282 Definition What It Should Contain
L1 Institūcijas funkcija vai augstākā struktūrvienība Broad organizational function (e.g., "Management", "HR", "Procurement")
L2 Funkcijas izpildes nepieciešamie uzdevumi (procesi) Processes within the function (e.g., "Recruitment", "Payroll")
L3 Uzdevumu veikšanai nepieciešamās darbības Specific activities/document types (e.g., "Employment contracts", "Timesheets")

Key regulatory requirements:

  • Schema must be synchronized with Latvijas Nacionālais arhīvs (LNA) every 5 years (§42)
  • Sector-level schemas ("nozares klasifikācijas shēma") every 8 years (§42)
  • Must specify: index, name, retention term, responsible unit, media type, IS location (§31)
  • Classification basis: functions, structural units, document types, or mixed (§33)

2.2 VDVC Context

The schema should use the VDVC (Valsts Dokumentu Vadības Centrs) namespace since VDVC is the country-wide document management authority under VARAM management, and this classification approach could be standardized across government institutions — not just VARAM internally.


3. Critical Assessment: Why Is the Classification So Complicated?

3.1 The Core Problem — Normative Document Proliferation

VARAM's explanation is that every new document category originates from a normative act (law, MK regulation, EU directive) that delegates a process to the organization. When a new regulation is adopted, a new classification entry is created. This approach is fundamentally flawed for the following reasons:

Problem A: Confusing "What Triggered the Document" with "What Kind of Document It Is"

MK Nr. 282 §33 defines classification by function and process, not by legal basis. The legal basis for a document is metadata (a property of the document), not a structural category. When VARAM creates a separate category for "Sarakste ar valsts pārvaldes iestādēm DIENESTA VAJADZĪBĀM" (P-1-13-5) vs "Sarakste ar valsts pārvaldes iestādēm, juridiskām un fiziskām personām" (P-1-13-2) vs "Sarakste ar valsts pārvaldes iestādēm jautājumiem, kas saistīti ar valsts noslēpumu" (P-1-13-9), these are all the same function (correspondence) with different metadata attributes (audience, classification level).

A proper design would have:

P-1-13  Sarakste (Correspondence)
  → metadata: audience = [government | private | foreign | internal | classified]
  → metadata: securityLevel = [public | restricted | secret]

Instead of 9 sub-categories of correspondence with identical document types inside them.

Problem B: EU Investment Project Explosion

The most egregious example is I2 (Investīciju projektu ieviešana) with 33 top-level entries — each representing a specific EU-funded project:

I2-1   Projekta "Informācijas sistēmu ... Nr. 2.2.1.1/17/I/012" dokumenti
I2-2   Projekta "Atvērto datu ... Nr. 2.2.1.1/19/I/004" dokumenti
...
I2-33  Projekta "Valsts pārvaldes vienota valsts finanšu..." dokumenti

Each of these 33 projects then has identical sub-structure: correspondence, contracts, orders, communications materials. This is a textbook example of data masquerading as structure. The project identity is a data attribute, not a classification level.

A proper design:

I2-1   Investīciju projektu ieviešana (Project Implementation)
  I2-1-1  Korespondence (Correspondence)
  I2-1-2  Līgumi (Contracts)
  I2-1-3  Rīkojumi, protokoli (Orders, protocols)
  I2-1-4  Komunikācijas materiāli (Communications)
  → metadata: projectId = "2.2.1.1/17/I/012"
  → metadata: projectName = "Informācijas sistēmu..."
  → metadata: fundingSource = "ERAF" | "ANM" | ...

This would reduce I2 from ~166 entries to ~10, while preserving all information through metadata.

Problem C: I1 (Investīciju programmu vadība) Duplicates

Similarly, I1 has 16+ programme-level groups (I1-1 through I1-16), each with largely identical sub-structures for different EU operational programmes. The programmes differ in retention dates and responsible departments, but these are metadata, not structure.

Current: 327 entries in I1 Proposed: ~40-50 entries (function-based) + programme as metadata

Problem D: Category Count vs. Clerk Cognitive Capacity

With ~400 leaf categories (plus ~250 structural grouping rows), a clerk creating a new document faces an impossible cognitive task. Research in classification science (Rosch, 1978; Miller, 1956) shows humans can reliably distinguish 7±2 categories at each level. VARAM's schema has:

  • 3 domain categories (P, I1, I2) — good
  • 9-33 L1 categories per domain — border-case to unmanageable
  • Up to 13+ L2 per L1 — too many, especially without descriptions

The result is predictable: clerks default to a handful of "safe" categories, misclassify documents, or spend excessive time navigating the hierarchy — defeating the purpose of classification.

3.2 Structural Assessment Summary

Metric Current Proposed (after normalization)
Total entries 647 ~120-150
Leaf categories (clerk-facing) ~496 ~80-100
I2 entries 166 ~10-15
I1 entries 327 ~40-50
Max L2 categories per L1 33 ≤10
Project-specific categories ~200+ 0 (metadata)
Duplicate structural patterns ~30 identical sub-trees 0

3.3 What Should Be Structure vs. What Should Be Metadata

Currently a Category Level Should Be Reason
Specific EU project name Metadata tag Project is an instance, not a function
Specific EU programme Metadata tag Programme is a funding context
Audience of correspondence Metadata enum Audience doesn't change the document type
Security classification Metadata field Orthogonal to document function
EU Commission flag on retention Metadata boolean Compliance attribute, not structure
Department assignment Metadata reference Departments change; functions don't

4. Data Quality Issues (Spreadsheet-Level)

4.1 Issues Summary

# Issue Severity Scope
1 Mixed code separators (hyphens and dots) CRITICAL 221/647 codes (34%)
2 NBSP (\xa0) disguised as empty retention HIGH 93 rows
3 50+ retention term format variants HIGH 496 retention values
4 Zero descriptions in Description column HIGH 100% of rows
5 Multi-department free-text assignments MEDIUM 67 rows
6 Level data in wrong columns MEDIUM 64 rows
7 Typo: Il-9-2 instead of I1-9-2 LOW 1 row
8 Trailing dot in code I1-13-1.1. LOW 1 row
9 Typo: "Patstāvīgi" instead of "Pastāvīgi" LOW 4 rows

(Full technical detail in previous assessment — omitted for brevity)

4.2 Retention Term Normalization

50+ free-text variants need consolidation to 5 structured types:

Type Example Input Structured Output
Permanent "Pastāvīgi", "Patstāvīgi" <permanent/>
Duration "5 gadi", "75 gadi" <duration years="5"/>
Duration + trigger "5 gadi pēc projekta noslēguma..." <duration years="5" trigger="project_closure"/>
Fixed date "31.12.2034.", "2031-12-31 00:00:00" <fixedDate>2034-12-31</fixedDate>
EU flagged "31.12.2032. EK" <fixedDate euCommission="true">2032-12-31</fixedDate>

5. Proposed Architecture

5.1 Design Principles

  1. VDVC namespaceurn:vdvc:classification:2026 — reusable across government
  2. Function-first classification — per MK Nr. 282 §33, classify by what the organization does, not by what regulation triggered the document
  3. Metadata-rich, structure-lean — project, programme, audience as tags, not tree levels
  4. No Excel — custom web GUI that edits backend XML directly; prevents spreadsheet drift
  5. Git-versioned SSOT — XML on ProcessGit with full audit trail
  6. MCP-served — machine-readable API for DVS integration and AI-assisted classification

5.2 XSD Schema (VDVC Domain)

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="urn:vdvc:classification:2026"
           xmlns:vdvc="urn:vdvc:classification:2026">

  <xs:element name="classificationSchema" type="vdvc:SchemaType"/>

  <xs:complexType name="SchemaType">
    <xs:sequence>
      <xs:element name="metadata" type="vdvc:MetadataType"/>
      <xs:element name="vocabularies" type="vdvc:VocabulariesType"/>
      <xs:element name="domains" type="vdvc:DomainListType"/>
    </xs:sequence>
    <xs:attribute name="version" type="xs:string" use="required"/>
    <xs:attribute name="effectiveDate" type="xs:date" use="required"/>
    <xs:attribute name="institution" type="xs:string" use="required"/>
  </xs:complexType>

  <!-- Controlled vocabularies (departments, programmes, projects) -->
  <xs:complexType name="VocabulariesType">
    <xs:sequence>
      <xs:element name="departments" type="vdvc:DeptListType"/>
      <xs:element name="programmes" type="vdvc:ProgrammeListType" minOccurs="0"/>
      <xs:element name="projects" type="vdvc:ProjectListType" minOccurs="0"/>
      <xs:element name="retentionTerms" type="vdvc:RetTermListType"/>
    </xs:sequence>
  </xs:complexType>

  <!-- Category node (recursive, function-based) -->
  <xs:complexType name="CategoryType">
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="description" type="xs:string" minOccurs="0"/>
      <xs:element name="legalBasis" type="xs:string" minOccurs="0"
                  maxOccurs="unbounded"/>
      <xs:element name="retention" type="vdvc:RetentionType" minOccurs="0"/>
      <xs:element name="responsibleUnits" minOccurs="0">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="unitRef" type="xs:string" maxOccurs="unbounded"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="mediaType" type="vdvc:MediaTypeEnum" minOccurs="0"/>
      <xs:element name="system" type="xs:string" minOccurs="0"/>
      <xs:element name="applicableContexts" minOccurs="0">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="programmeRef" type="xs:string"
                        minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="projectRef" type="xs:string"
                        minOccurs="0" maxOccurs="unbounded"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="subcategory" type="vdvc:CategoryType"
                  minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="code" type="xs:string" use="required"/>
    <xs:attribute name="level" type="xs:positiveInteger" use="required"/>
    <xs:attribute name="status" type="vdvc:StatusEnum" default="active"/>
  </xs:complexType>

  <!-- Retention: structured, not free-text -->
  <xs:complexType name="RetentionType">
    <xs:choice>
      <xs:element name="permanent" type="xs:boolean"/>
      <xs:element name="duration">
        <xs:complexType>
          <xs:attribute name="years" type="xs:positiveInteger" use="required"/>
          <xs:attribute name="triggerEvent" type="xs:string"/>
        </xs:complexType>
      </xs:element>
      <xs:element name="fixedDate" type="xs:date"/>
    </xs:choice>
    <xs:attribute name="euCommission" type="xs:boolean" default="false"/>
    <xs:attribute name="legalReference" type="xs:string"/>
    <xs:attribute name="originalText" type="xs:string"/>
  </xs:complexType>

  <!-- Enumerations -->
  <xs:simpleType name="MediaTypeEnum">
    <xs:restriction base="xs:string">
      <xs:enumeration value="electronic"/>
      <xs:enumeration value="paper"/>
      <xs:enumeration value="hybrid"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="StatusEnum">
    <xs:restriction base="xs:string">
      <xs:enumeration value="active"/>
      <xs:enumeration value="deprecated"/>
      <xs:enumeration value="draft"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

Key design decisions:

  • legalBasis is a metadata field on categories, not a structural level — normative acts reference which regulations require this category, but don't create separate tree branches
  • applicableContexts with programmeRef / projectRef replaces the 33 duplicate I2 sub-trees — a single "Korespondence" category can be tagged with all applicable projects
  • status enables deprecation without deletion (audit trail)
  • retentionType with legalReference links retention to its legal source

5.3 Proposed Simplified Classification Tree

VARAM Classification Schema (VDVC:2026)

P — Pārvalde (Administration)
├── P-1  Iestādes vadība (Institutional Management)
│   ├── P-1-1  Normatīvie dokumenti (Regulatory documents)
│   ├── P-1-2  Rīkojumi (Orders)
│   ├── P-1-3  Sanāksmes un protokoli (Meetings & protocols)
│   ├── P-1-4  Plānošana un pārskati (Planning & reporting)
│   ├── P-1-5  Sarakste (Correspondence)
│   │         → metadata: audience, securityLevel
│   ├── P-1-6  Pilnvaras un lēmumi (Authorizations & decisions)
│   └── P-1-7  Drošība un trauksme (Security & whistleblowing)
├── P-2  Budžets (Budget Planning)
├── P-3  Personālvadība (HR Management)
│   ├── P-3-1  Darba līgumi (Employment contracts)
│   ├── P-3-2  Personāla lietas (Personnel files)
│   ├── P-3-3  Apmācības (Training)
│   └── P-3-4  Novērtēšana (Performance evaluation)
├── P-4  Saimnieciskie jautājumi (Facilities)
├── P-5  Iepirkumi (Procurement)
├── P-6  Juridiskā funkcija (Legal)
├── P-7  Komunikācija (Communications)
├── P-8  Audits (Audit)
└── P-9  Finanšu vadība (Financial Management)

I1 — Investīciju programmu vadība (Programme Management)
├── I1-1  Programmu plānošana (Programme planning)
├── I1-2  Uzraudzība un kontrole (Monitoring & control)
├── I1-3  Finanšu pārvaldība (Financial management)
├── I1-4  Ziņojumi un pārskati (Reports)
├── I1-5  Maksājumi un pārbaudes (Payments & verification)
└── I1-6  Sarakste un lēmumi (Correspondence & decisions)
          → metadata: programmeRef = [ERAF, ANM, ESF, ...]

I2 — Investīciju projektu ieviešana (Project Implementation)
├── I2-1  Korespondence (Correspondence)
├── I2-2  Līgumi un grozījumi (Contracts & amendments)
├── I2-3  Rīkojumi un protokoli (Orders & protocols)
├── I2-4  Komunikācija (Communications materials)
├── I2-5  Finanšu dokumentācija (Financial documentation)
└── I2-6  Noslēguma dokumenti (Closure documents)
          → metadata: projectRef = [project-001, project-002, ...]

From 647 entries → ~80-100 functional categories + rich metadata vocabularies.


6. Custom Web GUI (Not Excel)

6.1 Why Not Excel

Problem with Excel Impact
People edit the Excel directly, bypassing validation Reintroduces data quality issues
Cannot enforce controlled vocabularies Free-text retention terms return
Cannot represent metadata (project/programme tags) on categories Structural duplication returns
No validation against XSD schema Invalid data enters the system
No version control / audit trail Changes are invisible
Cannot embed business logic (retention calculation, department lookup) Manual errors
Multiple people can have different versions No SSOT guarantee

6.2 GUI Architecture

┌─────────────────────────────────────────────┐
│           VDVC Classification Editor         │
│  ┌───────────────────────────────────────┐  │
│  │  Tree Navigator (collapsible)          │  │
│  │  ├── P — Pārvalde                      │  │
│  │  │   ├── P-1 Iestādes vadība          │  │
│  │  │   │   ├── P-1-1 Normatīvie dok.    │  │
│  │  │   │   └── P-1-2 Rīkojumi ←[EDIT]  │  │
│  │  └── I1 — Investīciju programmas       │  │
│  └───────────────────────────────────────┘  │
│  ┌───────────────────────────────────────┐  │
│  │  Category Detail Panel                 │  │
│  │  Code: [P-1-2]     Status: [Active ▼] │  │
│  │  Name: [Rīkojumi un to pielikumi...]   │  │
│  │  Description: [Ministru rīkojumi...]   │  │
│  │  ─── Retention ───                     │  │
│  │  Type: [Permanent ▼]                   │  │
│  │  Legal ref: [MK Nr. 282 §31]          │  │
│  │  ─── Responsibility ───                │  │
│  │  Departments: [LN ×] [KD ×] [+ Add]   │  │
│  │  ─── Context ───                       │  │
│  │  Programmes: [all]                     │  │
│  │  Media: [Electronic ▼]                 │  │
│  │  System: [DVS Namejs ▼]               │  │
│  │  ─── Legal Basis ───                   │  │
│  │  [+ Add normative reference]           │  │
│  └───────────────────────────────────────┘  │
│  [Save] [Validate] [Preview XML] [History]  │
└─────────────────────────────────────────────┘

6.3 GUI Features

Feature Purpose
Tree navigation Hierarchical browse/search with drag-drop reordering
Controlled vocabulary dropdowns Departments, retention types, media types — no free-text
Inline XSD validation Real-time validation as users edit; cannot save invalid data
Retention calculator Input retention rule → system shows calculated expiry per document date
Department lookup Autocomplete from VDVC organization registry (ProcessGit VARAM MCP)
Diff / history view Git-backed change tracking with who-changed-what
Bulk import One-time import from current Excel, then Excel is retired
Export views Generate read-only Excel, HTML, PDF for stakeholders
Legal basis linker Reference normative acts by Latvijas Vēstnesis number
Multi-user with roles Lietvedis (view), department editor, schema admin

6.4 Technology Stack

Frontend:  React + Tailwind (ProcessGit-integrated SPA)
Backend:   ProcessGit API + MCP server
Storage:   Git repository (XML + XSD)
Validation: Client-side XSD validation + server-side on commit
Auth:      ProcessGit OAuth / VARAM SSO
Deploy:    processgit.org/VARAM/Document_classification_schema/

7. ProcessGit Repository Structure

VARAM/Document_classification_schema/
├── README.md
├── schema/
│   ├── vdvc-classification-2026.xsd       ← Schema definition
│   └── vdvc-vocabularies.xsd              ← Shared controlled vocabularies
├── data/
│   ├── varam-classification-2026.xml      ← Canonical SSOT
│   ├── vocabularies/
│   │   ├── departments.xml                ← Cross-ref with VARAM org registry
│   │   ├── programmes.xml                 ← EU programme registry
│   │   └── projects.xml                   ← Active project registry
│   └── archive/
│       └── original-excel-2026.xlsx       ← Original for audit trail
├── gui/
│   ├── index.html                         ← Classification editor SPA
│   ├── src/                               ← React components
│   └── package.json
├── render/
│   ├── classification.xslt               ← Human-readable transform
│   └── classification.html               ← Auto-generated view
├── mcp/
│   └── server-config.yaml                ← MCP server endpoint
├── tools/
│   ├── import-excel.py                   ← One-time Excel import
│   ├── export-excel.py                   ← Read-only Excel generation
│   ├── validate.py                       ← XSD validation
│   └── retention-calculator.py           ← Retention date computation
└── docs/
    ├── migration-mapping.md              ← Old code → new code mapping
    └── normative-basis.md                ← Legal references

8. MCP Server Integration

Extend the existing ProcessGit MCP pattern (already live for VARAM Organizations Register):

MCP Tool Input Output
vdvc:search Full-text query in LV/EN Matching categories with context
vdvc:get_category Category code Full details + metadata
vdvc:list_categories Filters: domain, level, dept, programme Filtered list
vdvc:suggest_category Document title + body text Top 3-5 category suggestions with confidence
vdvc:validate_code Category code Validity check + active status
vdvc:calculate_retention Category code + document date Retention expiry date
vdvc:describe_model Schema structure, vocabularies, stats

The suggest_category tool is the key efficiency enabler: instead of a clerk navigating ~100 categories, the AI reads the document and recommends the best matches.


9. Roadmap

Phase Duration Deliverables
Phase 1: Assessment approval & schema design 1 week Approved XSD, normalization rules, migration mapping
Phase 2: Data cleaning & functional restructure 2-3 weeks Normalized XML with ~100 categories; old→new code mapping
Phase 3: GUI development 3-4 weeks React SPA on ProcessGit; tree editor, validation, export
Phase 4: ProcessGit deployment & MCP server 1-2 weeks Live repo, MCP endpoint, vocabularies
Phase 5: AI description generation 1-2 weeks AI-drafted Latvian descriptions for all categories
Phase 6: DVS "Namejs" integration 2-3 weeks Classification import adapter, clerk-facing AI assist

Total: 10-15 weeks


10. Risk Assessment

Risk Impact Mitigation
LNA (Latvijas Nacionālais arhīvs) rejects restructured schema HIGH Maintain old↔new code mapping; preserve all retention terms with originalText; engage LNA early
Lietvedis staff resist moving from Excel MEDIUM GUI provides Excel-like table view; generate read-only Excel exports on demand
Normative acts explicitly reference old codes MEDIUM Deprecate rather than delete; old codes resolve to new via alias table
Project-as-metadata breaks DVS "Namejs" import format MEDIUM Provide flat-file export that expands metadata back to rows for legacy DVS
Functional restructure conflicts with department ownership MEDIUM Map departments to functions, not to categories; allow multi-department tags

11. Conclusion

The current classification schema is complicated not because document management is inherently complex, but because normative document origins have been used as structural taxonomy levels instead of metadata. Every new EU project, every new regulatory delegation, creates a new branch in the tree rather than a new tag on an existing functional category.

The proposed approach:

  1. Restructures the tree from 647 entries to ~100 functional categories
  2. Enriches each category with metadata (project, programme, legal basis, audience)
  3. Replaces Excel with a validated, Git-backed web GUI
  4. Serves the schema via MCP for AI-assisted classification
  5. Complies with MK Nr. 282 §33 function-based classification requirements

The VDVC namespace ensures this approach can be replicated across government institutions, not just VARAM.