VARAM/VARAM_classification_schema

MCP Server

Active

Fork 0

Files

Rihards f1b731544f Upload files to "/"

2026-02-08 23:12:26 +00:00

26 KiB

Raw Permalink Blame History

VDVC Document Classification Schema — Assessment & Transformation Proposal

Subject: VARAM DVS "Namejs" Document Classification Schema 2026 VDVC namespace: urn:vdvc:classification:2026 Regulatory basis: MK noteikumi Nr. 282 (07.05.2024) "Dokumentu un arhīvu pārvaldības noteikumi" Prepared by: Rihards / PwC Latvia — Digitalization, AI & Cybersecurity Date: February 2026

1. Executive Summary

VARAM's Document Management System (DVS "Namejs") relies on a classification schema ("klasifikācijas shēma", formerly "lietu nomenklatūra") maintained as a human-edited Excel spreadsheet with 647 coded entries across 3 domains and up to 5 hierarchy levels.

This assessment identifies three layers of problems: data quality issues in the spreadsheet itself (fixable mechanically), structural design issues in the schema (fixable with refactoring), and a fundamental architectural problem — the classification philosophy conflates normative document origins with functional classification, producing an unmanageably large, duplicate-heavy taxonomy that is hostile to both human clerks and DVS systems.

The proposed solution is a VDVC-namespaced, Git-versioned XML repository on ProcessGit with a custom web-based management GUI (not Excel) backed by XSD-validated XML, served via MCP endpoint for AI-assisted document classification.

2. Regulatory Framework

2.1 MK noteikumi Nr. 282 (07.05.2024)

The governing regulation prescribes a function-based hierarchical classification (§33):

Level	MK Nr. 282 Definition	What It Should Contain
L1	Institūcijas funkcija vai augstākā struktūrvienība	Broad organizational function (e.g., "Management", "HR", "Procurement")
L2	Funkcijas izpildes nepieciešamie uzdevumi (procesi)	Processes within the function (e.g., "Recruitment", "Payroll")
L3	Uzdevumu veikšanai nepieciešamās darbības	Specific activities/document types (e.g., "Employment contracts", "Timesheets")

Key regulatory requirements:

Schema must be synchronized with Latvijas Nacionālais arhīvs (LNA) every 5 years (§42)
Sector-level schemas ("nozares klasifikācijas shēma") every 8 years (§42)
Must specify: index, name, retention term, responsible unit, media type, IS location (§31)
Classification basis: functions, structural units, document types, or mixed (§33)

2.2 VDVC Context

The schema should use the VDVC (Valsts Dokumentu Vadības Centrs) namespace since VDVC is the country-wide document management authority under VARAM management, and this classification approach could be standardized across government institutions — not just VARAM internally.

3. Critical Assessment: Why Is the Classification So Complicated?

3.1 The Core Problem — Normative Document Proliferation

VARAM's explanation is that every new document category originates from a normative act (law, MK regulation, EU directive) that delegates a process to the organization. When a new regulation is adopted, a new classification entry is created. This approach is fundamentally flawed for the following reasons:

Problem A: Confusing "What Triggered the Document" with "What Kind of Document It Is"

MK Nr. 282 §33 defines classification by function and process, not by legal basis. The legal basis for a document is metadata (a property of the document), not a structural category. When VARAM creates a separate category for "Sarakste ar valsts pārvaldes iestādēm DIENESTA VAJADZĪBĀM" (P-1-13-5) vs "Sarakste ar valsts pārvaldes iestādēm, juridiskām un fiziskām personām" (P-1-13-2) vs "Sarakste ar valsts pārvaldes iestādēm jautājumiem, kas saistīti ar valsts noslēpumu" (P-1-13-9), these are all the same function (correspondence) with different metadata attributes (audience, classification level).

A proper design would have:

P-1-13  Sarakste (Correspondence)
  → metadata: audience = [government | private | foreign | internal | classified]
  → metadata: securityLevel = [public | restricted | secret]

Instead of 9 sub-categories of correspondence with identical document types inside them.

Problem B: EU Investment Project Explosion

The most egregious example is I2 (Investīciju projektu ieviešana) with 33 top-level entries — each representing a specific EU-funded project:

I2-1   Projekta "Informācijas sistēmu ... Nr. 2.2.1.1/17/I/012" dokumenti
I2-2   Projekta "Atvērto datu ... Nr. 2.2.1.1/19/I/004" dokumenti
...
I2-33  Projekta "Valsts pārvaldes vienota valsts finanšu..." dokumenti

Each of these 33 projects then has identical sub-structure: correspondence, contracts, orders, communications materials. This is a textbook example of data masquerading as structure. The project identity is a data attribute, not a classification level.

A proper design:

I2-1   Investīciju projektu ieviešana (Project Implementation)
  I2-1-1  Korespondence (Correspondence)
  I2-1-2  Līgumi (Contracts)
  I2-1-3  Rīkojumi, protokoli (Orders, protocols)
  I2-1-4  Komunikācijas materiāli (Communications)
  → metadata: projectId = "2.2.1.1/17/I/012"
  → metadata: projectName = "Informācijas sistēmu..."
  → metadata: fundingSource = "ERAF" | "ANM" | ...

This would reduce I2 from ~166 entries to ~10, while preserving all information through metadata.

Problem C: I1 (Investīciju programmu vadība) Duplicates

Similarly, I1 has 16+ programme-level groups (I1-1 through I1-16), each with largely identical sub-structures for different EU operational programmes. The programmes differ in retention dates and responsible departments, but these are metadata, not structure.

Current: 327 entries in I1 Proposed: ~40-50 entries (function-based) + programme as metadata

Problem D: Category Count vs. Clerk Cognitive Capacity

With ~400 leaf categories (plus ~250 structural grouping rows), a clerk creating a new document faces an impossible cognitive task. Research in classification science (Rosch, 1978; Miller, 1956) shows humans can reliably distinguish 7±2 categories at each level. VARAM's schema has:

3 domain categories (P, I1, I2) — good
9-33 L1 categories per domain — border-case to unmanageable
Up to 13+ L2 per L1 — too many, especially without descriptions

The result is predictable: clerks default to a handful of "safe" categories, misclassify documents, or spend excessive time navigating the hierarchy — defeating the purpose of classification.

3.2 Structural Assessment Summary

Metric	Current	Proposed (after normalization)
Total entries	647	~120-150
Leaf categories (clerk-facing)	~496	~80-100
I2 entries	166	~10-15
I1 entries	327	~40-50
Max L2 categories per L1	33	≤10
Project-specific categories	~200+	0 (metadata)
Duplicate structural patterns	~30 identical sub-trees	0

3.3 What Should Be Structure vs. What Should Be Metadata

Currently a Category Level	Should Be	Reason
Specific EU project name	Metadata tag	Project is an instance, not a function
Specific EU programme	Metadata tag	Programme is a funding context
Audience of correspondence	Metadata enum	Audience doesn't change the document type
Security classification	Metadata field	Orthogonal to document function
EU Commission flag on retention	Metadata boolean	Compliance attribute, not structure
Department assignment	Metadata reference	Departments change; functions don't

4. Data Quality Issues (Spreadsheet-Level)

4.1 Issues Summary

#	Issue	Severity	Scope
1	Mixed code separators (hyphens and dots)	CRITICAL	221/647 codes (34%)
2	NBSP (\xa0) disguised as empty retention	HIGH	93 rows
3	50+ retention term format variants	HIGH	496 retention values
4	Zero descriptions in Description column	HIGH	100% of rows
5	Multi-department free-text assignments	MEDIUM	67 rows
6	Level data in wrong columns	MEDIUM	64 rows
7	Typo: `Il-9-2` instead of `I1-9-2`	LOW	1 row
8	Trailing dot in code `I1-13-1.1.`	LOW	1 row
9	Typo: "Patstāvīgi" instead of "Pastāvīgi"	LOW	4 rows

(Full technical detail in previous assessment — omitted for brevity)

4.2 Retention Term Normalization

50+ free-text variants need consolidation to 5 structured types:

Type	Example Input	Structured Output
Permanent	"Pastāvīgi", "Patstāvīgi"	`<permanent/>`
Duration	"5 gadi", "75 gadi"	`<duration years="5"/>`
Duration + trigger	"5 gadi pēc projekta noslēguma..."	`<duration years="5" trigger="project_closure"/>`
Fixed date	"31.12.2034.", "2031-12-31 00:00:00"	`<fixedDate>2034-12-31</fixedDate>`
EU flagged	"31.12.2032. EK"	`<fixedDate euCommission="true">2032-12-31</fixedDate>`

5. Proposed Architecture

5.1 Design Principles

VDVC namespace — urn:vdvc:classification:2026 — reusable across government
Function-first classification — per MK Nr. 282 §33, classify by what the organization does, not by what regulation triggered the document
Metadata-rich, structure-lean — project, programme, audience as tags, not tree levels
No Excel — custom web GUI that edits backend XML directly; prevents spreadsheet drift
Git-versioned SSOT — XML on ProcessGit with full audit trail
MCP-served — machine-readable API for DVS integration and AI-assisted classification

5.2 XSD Schema (VDVC Domain)

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="urn:vdvc:classification:2026"
           xmlns:vdvc="urn:vdvc:classification:2026">

  <xs:element name="classificationSchema" type="vdvc:SchemaType"/>

  <xs:complexType name="SchemaType">
    <xs:sequence>
      <xs:element name="metadata" type="vdvc:MetadataType"/>
      <xs:element name="vocabularies" type="vdvc:VocabulariesType"/>
      <xs:element name="domains" type="vdvc:DomainListType"/>
    </xs:sequence>
    <xs:attribute name="version" type="xs:string" use="required"/>
    <xs:attribute name="effectiveDate" type="xs:date" use="required"/>
    <xs:attribute name="institution" type="xs:string" use="required"/>
  </xs:complexType>

  <!-- Controlled vocabularies (departments, programmes, projects) -->
  <xs:complexType name="VocabulariesType">
    <xs:sequence>
      <xs:element name="departments" type="vdvc:DeptListType"/>
      <xs:element name="programmes" type="vdvc:ProgrammeListType" minOccurs="0"/>
      <xs:element name="projects" type="vdvc:ProjectListType" minOccurs="0"/>
      <xs:element name="retentionTerms" type="vdvc:RetTermListType"/>
    </xs:sequence>
  </xs:complexType>

  <!-- Category node (recursive, function-based) -->
  <xs:complexType name="CategoryType">
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="description" type="xs:string" minOccurs="0"/>
      <xs:element name="legalBasis" type="xs:string" minOccurs="0"
                  maxOccurs="unbounded"/>
      <xs:element name="retention" type="vdvc:RetentionType" minOccurs="0"/>
      <xs:element name="responsibleUnits" minOccurs="0">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="unitRef" type="xs:string" maxOccurs="unbounded"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="mediaType" type="vdvc:MediaTypeEnum" minOccurs="0"/>
      <xs:element name="system" type="xs:string" minOccurs="0"/>
      <xs:element name="applicableContexts" minOccurs="0">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="programmeRef" type="xs:string"
                        minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="projectRef" type="xs:string"
                        minOccurs="0" maxOccurs="unbounded"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="subcategory" type="vdvc:CategoryType"
                  minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="code" type="xs:string" use="required"/>
    <xs:attribute name="level" type="xs:positiveInteger" use="required"/>
    <xs:attribute name="status" type="vdvc:StatusEnum" default="active"/>
  </xs:complexType>

  <!-- Retention: structured, not free-text -->
  <xs:complexType name="RetentionType">
    <xs:choice>
      <xs:element name="permanent" type="xs:boolean"/>
      <xs:element name="duration">
        <xs:complexType>
          <xs:attribute name="years" type="xs:positiveInteger" use="required"/>
          <xs:attribute name="triggerEvent" type="xs:string"/>
        </xs:complexType>
      </xs:element>
      <xs:element name="fixedDate" type="xs:date"/>
    </xs:choice>
    <xs:attribute name="euCommission" type="xs:boolean" default="false"/>
    <xs:attribute name="legalReference" type="xs:string"/>
    <xs:attribute name="originalText" type="xs:string"/>
  </xs:complexType>

  <!-- Enumerations -->
  <xs:simpleType name="MediaTypeEnum">
    <xs:restriction base="xs:string">
      <xs:enumeration value="electronic"/>
      <xs:enumeration value="paper"/>
      <xs:enumeration value="hybrid"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="StatusEnum">
    <xs:restriction base="xs:string">
      <xs:enumeration value="active"/>
      <xs:enumeration value="deprecated"/>
      <xs:enumeration value="draft"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

Key design decisions:

legalBasis is a metadata field on categories, not a structural level — normative acts reference which regulations require this category, but don't create separate tree branches
applicableContexts with programmeRef / projectRef replaces the 33 duplicate I2 sub-trees — a single "Korespondence" category can be tagged with all applicable projects
status enables deprecation without deletion (audit trail)
retentionType with legalReference links retention to its legal source

5.3 Proposed Simplified Classification Tree

VARAM Classification Schema (VDVC:2026)

P — Pārvalde (Administration)
├── P-1  Iestādes vadība (Institutional Management)
│   ├── P-1-1  Normatīvie dokumenti (Regulatory documents)
│   ├── P-1-2  Rīkojumi (Orders)
│   ├── P-1-3  Sanāksmes un protokoli (Meetings & protocols)
│   ├── P-1-4  Plānošana un pārskati (Planning & reporting)
│   ├── P-1-5  Sarakste (Correspondence)
│   │         → metadata: audience, securityLevel
│   ├── P-1-6  Pilnvaras un lēmumi (Authorizations & decisions)
│   └── P-1-7  Drošība un trauksme (Security & whistleblowing)
├── P-2  Budžets (Budget Planning)
├── P-3  Personālvadība (HR Management)
│   ├── P-3-1  Darba līgumi (Employment contracts)
│   ├── P-3-2  Personāla lietas (Personnel files)
│   ├── P-3-3  Apmācības (Training)
│   └── P-3-4  Novērtēšana (Performance evaluation)
├── P-4  Saimnieciskie jautājumi (Facilities)
├── P-5  Iepirkumi (Procurement)
├── P-6  Juridiskā funkcija (Legal)
├── P-7  Komunikācija (Communications)
├── P-8  Audits (Audit)
└── P-9  Finanšu vadība (Financial Management)

I1 — Investīciju programmu vadība (Programme Management)
├── I1-1  Programmu plānošana (Programme planning)
├── I1-2  Uzraudzība un kontrole (Monitoring & control)
├── I1-3  Finanšu pārvaldība (Financial management)
├── I1-4  Ziņojumi un pārskati (Reports)
├── I1-5  Maksājumi un pārbaudes (Payments & verification)
└── I1-6  Sarakste un lēmumi (Correspondence & decisions)
          → metadata: programmeRef = [ERAF, ANM, ESF, ...]

I2 — Investīciju projektu ieviešana (Project Implementation)
├── I2-1  Korespondence (Correspondence)
├── I2-2  Līgumi un grozījumi (Contracts & amendments)
├── I2-3  Rīkojumi un protokoli (Orders & protocols)
├── I2-4  Komunikācija (Communications materials)
├── I2-5  Finanšu dokumentācija (Financial documentation)
└── I2-6  Noslēguma dokumenti (Closure documents)
          → metadata: projectRef = [project-001, project-002, ...]

From 647 entries → ~80-100 functional categories + rich metadata vocabularies.

6. Custom Web GUI (Not Excel)

6.1 Why Not Excel

Problem with Excel	Impact
People edit the Excel directly, bypassing validation	Reintroduces data quality issues
Cannot enforce controlled vocabularies	Free-text retention terms return
Cannot represent metadata (project/programme tags) on categories	Structural duplication returns
No validation against XSD schema	Invalid data enters the system
No version control / audit trail	Changes are invisible
Cannot embed business logic (retention calculation, department lookup)	Manual errors
Multiple people can have different versions	No SSOT guarantee

6.2 GUI Architecture

┌─────────────────────────────────────────────┐
│           VDVC Classification Editor         │
│  ┌───────────────────────────────────────┐  │
│  │  Tree Navigator (collapsible)          │  │
│  │  ├── P — Pārvalde                      │  │
│  │  │   ├── P-1 Iestādes vadība          │  │
│  │  │   │   ├── P-1-1 Normatīvie dok.    │  │
│  │  │   │   └── P-1-2 Rīkojumi ←[EDIT]  │  │
│  │  └── I1 — Investīciju programmas       │  │
│  └───────────────────────────────────────┘  │
│  ┌───────────────────────────────────────┐  │
│  │  Category Detail Panel                 │  │
│  │  Code: [P-1-2]     Status: [Active ▼] │  │
│  │  Name: [Rīkojumi un to pielikumi...]   │  │
│  │  Description: [Ministru rīkojumi...]   │  │
│  │  ─── Retention ───                     │  │
│  │  Type: [Permanent ▼]                   │  │
│  │  Legal ref: [MK Nr. 282 §31]          │  │
│  │  ─── Responsibility ───                │  │
│  │  Departments: [LN ×] [KD ×] [+ Add]   │  │
│  │  ─── Context ───                       │  │
│  │  Programmes: [all]                     │  │
│  │  Media: [Electronic ▼]                 │  │
│  │  System: [DVS Namejs ▼]               │  │
│  │  ─── Legal Basis ───                   │  │
│  │  [+ Add normative reference]           │  │
│  └───────────────────────────────────────┘  │
│  [Save] [Validate] [Preview XML] [History]  │
└─────────────────────────────────────────────┘

6.3 GUI Features

Feature	Purpose
Tree navigation	Hierarchical browse/search with drag-drop reordering
Controlled vocabulary dropdowns	Departments, retention types, media types — no free-text
Inline XSD validation	Real-time validation as users edit; cannot save invalid data
Retention calculator	Input retention rule → system shows calculated expiry per document date
Department lookup	Autocomplete from VDVC organization registry (ProcessGit VARAM MCP)
Diff / history view	Git-backed change tracking with who-changed-what
Bulk import	One-time import from current Excel, then Excel is retired
Export views	Generate read-only Excel, HTML, PDF for stakeholders
Legal basis linker	Reference normative acts by Latvijas Vēstnesis number
Multi-user with roles	Lietvedis (view), department editor, schema admin

6.4 Technology Stack

Frontend:  React + Tailwind (ProcessGit-integrated SPA)
Backend:   ProcessGit API + MCP server
Storage:   Git repository (XML + XSD)
Validation: Client-side XSD validation + server-side on commit
Auth:      ProcessGit OAuth / VARAM SSO
Deploy:    processgit.org/VARAM/Document_classification_schema/

7. ProcessGit Repository Structure

VARAM/Document_classification_schema/
├── README.md
├── schema/
│   ├── vdvc-classification-2026.xsd       ← Schema definition
│   └── vdvc-vocabularies.xsd              ← Shared controlled vocabularies
├── data/
│   ├── varam-classification-2026.xml      ← Canonical SSOT
│   ├── vocabularies/
│   │   ├── departments.xml                ← Cross-ref with VARAM org registry
│   │   ├── programmes.xml                 ← EU programme registry
│   │   └── projects.xml                   ← Active project registry
│   └── archive/
│       └── original-excel-2026.xlsx       ← Original for audit trail
├── gui/
│   ├── index.html                         ← Classification editor SPA
│   ├── src/                               ← React components
│   └── package.json
├── render/
│   ├── classification.xslt               ← Human-readable transform
│   └── classification.html               ← Auto-generated view
├── mcp/
│   └── server-config.yaml                ← MCP server endpoint
├── tools/
│   ├── import-excel.py                   ← One-time Excel import
│   ├── export-excel.py                   ← Read-only Excel generation
│   ├── validate.py                       ← XSD validation
│   └── retention-calculator.py           ← Retention date computation
└── docs/
    ├── migration-mapping.md              ← Old code → new code mapping
    └── normative-basis.md                ← Legal references

8. MCP Server Integration

Extend the existing ProcessGit MCP pattern (already live for VARAM Organizations Register):

MCP Tool	Input	Output
`vdvc:search`	Full-text query in LV/EN	Matching categories with context
`vdvc:get_category`	Category code	Full details + metadata
`vdvc:list_categories`	Filters: domain, level, dept, programme	Filtered list
`vdvc:suggest_category`	Document title + body text	Top 3-5 category suggestions with confidence
`vdvc:validate_code`	Category code	Validity check + active status
`vdvc:calculate_retention`	Category code + document date	Retention expiry date
`vdvc:describe_model`	—	Schema structure, vocabularies, stats

The suggest_category tool is the key efficiency enabler: instead of a clerk navigating ~100 categories, the AI reads the document and recommends the best matches.

9. Roadmap

Phase	Duration	Deliverables
Phase 1: Assessment approval & schema design	1 week	Approved XSD, normalization rules, migration mapping
Phase 2: Data cleaning & functional restructure	2-3 weeks	Normalized XML with ~100 categories; old→new code mapping
Phase 3: GUI development	3-4 weeks	React SPA on ProcessGit; tree editor, validation, export
Phase 4: ProcessGit deployment & MCP server	1-2 weeks	Live repo, MCP endpoint, vocabularies
Phase 5: AI description generation	1-2 weeks	AI-drafted Latvian descriptions for all categories
Phase 6: DVS "Namejs" integration	2-3 weeks	Classification import adapter, clerk-facing AI assist

Total: 10-15 weeks

10. Risk Assessment

Risk	Impact	Mitigation
LNA (Latvijas Nacionālais arhīvs) rejects restructured schema	HIGH	Maintain old↔new code mapping; preserve all retention terms with `originalText`; engage LNA early
Lietvedis staff resist moving from Excel	MEDIUM	GUI provides Excel-like table view; generate read-only Excel exports on demand
Normative acts explicitly reference old codes	MEDIUM	Deprecate rather than delete; old codes resolve to new via alias table
Project-as-metadata breaks DVS "Namejs" import format	MEDIUM	Provide flat-file export that expands metadata back to rows for legacy DVS
Functional restructure conflicts with department ownership	MEDIUM	Map departments to functions, not to categories; allow multi-department tags

11. Conclusion

The current classification schema is complicated not because document management is inherently complex, but because normative document origins have been used as structural taxonomy levels instead of metadata. Every new EU project, every new regulatory delegation, creates a new branch in the tree rather than a new tag on an existing functional category.

The proposed approach:

Restructures the tree from 647 entries to ~100 functional categories
Enriches each category with metadata (project, programme, legal basis, audience)
Replaces Excel with a validated, Git-backed web GUI
Serves the schema via MCP for AI-assisted classification
Complies with MK Nr. 282 §33 function-based classification requirements

The VDVC namespace ensures this approach can be replicated across government institutions, not just VARAM.

26 KiB Raw Permalink Blame History