A retrieval-augmented LLM cut drafting time for Amsterdam's municipal objection response letters from hours to minutes while preserving high legal consistency. The human-in-the-loop system delivered large efficiency gains in a single-city deployment but is tailored to Dutch municipal law and a narrow task set.

LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters

Virgill van der Meer, Julien Rossi · May 12, 2026

arxiv descriptive medium evidence 7/10 relevance Source PDF

LegalCheck, a RAG/CAG LLM system with curated legal knowledge and expert-in-the-loop review, produced near-final municipal objection response letters in minutes and captured most essential legal reasoning while reducing reviewers' workload in a Municipality of Amsterdam deployment.

Public-sector legal departments in the Netherlands face acute staff shortages, increased case volumes, and increased pressure to meet regulatory compliance. This paper presents LegalCheck, a novel system that addresses these challenges by automating the drafting of objection response letters through a combination of Retrieval-Augmented Generation (RAG) and Context-Augmented Generation (CAG). Using a large language model (LLM) alongside curated legal knowledge bases, LegalCheck performs retrieval of relevant laws and precedents, and uses controlled prompting to incorporate both external knowledge and case-specific details into a coherent draft. An expert-in-the-loop review ensures that each generated letter is legally sound and contextually appropriate. In a real-world deployment within the Municipality of Amsterdam, LegalCheck produced near-final advice letters in minutes rather than hours, while maintaining high legal consistency and factual accuracy. The output is based on actual regulations and prior cases, providing explainable outputs that captured the vast majority of required legal reasoning (often 80\% to 100\% of essential content). Legal professionals found that the system reduced their workload and ensured a consistent application of legal standards, without replacing human judgment. These results demonstrate substantial efficiency gains, improved legal consistency, and positive user acceptance. More broadly, this work illustrates how responsible AI can be deployed in the legal domain by augmenting LLMs with domain knowledge and governance mechanisms.

Summary

Main Finding

LegalCheck—deployed in the Municipality of Amsterdam—combines Retrieval-Augmented Generation (RAG) with a multi-stage Context-Augmented Generation (CAG) loop and human-in-the-loop review to draft near-final municipal legal advice letters in minutes rather than hours. Grounding generation in a curated case/library and iterative jurist feedback produced high legal consistency and factual accuracy (the system often captured ~80%–100% of essential content), reduced workload, and improved consistency without removing human responsibility.

Key Points

Novelty: First reported in vivo deployment of an LLM-based RAG+CAG pipeline for official municipal legal advice letters (objection responses) integrated into everyday legal operations.
Architecture:
- RAG layer: semantic retrieval of prior cases, statutes, and policy snippets to ground generation.
- CAG loop: two-stage prompt approach—(1) initial draft conditioned on retrieved materials + case docs + “steering advice”; (2) refinement pass where jurist annotations are incorporated to regenerate an improved draft.
- Human-in-the-loop: jurist specifies dictum (uphold/reject), can provide steering advice, reviews and finalizes drafts; early deployment used a second reviewer as additional safeguard.
Models & settings:
- Embeddings: OpenAI text-embedding-ada-002 (1536-d).
- Generator: GPT-4o via Azure OpenAI (extended context window), low temperature (0.1) to prioritize consistency.
- Retrieval: documents chunked by section; Top-K retrieval typically 50–200 chunks; domain indexes sized ~150 to 14,000 cases.
Implementation details:
- Web app (Flask) with domain-specific pipelines and templates.
- Optional de-identification/re-identification steps for privacy.
- Reuse of same LLM for initial generation and refinement to preserve style/consistency.
Outcomes from pilot:
- Dramatic time savings: drafting time reduced from hours to minutes.
- Content coverage: generated drafts contained the majority of essential legal reasoning (often 80%–100%).
- User acceptance: legal professionals reported reduced workload and better consistency; system viewed as augmentation, not replacement.
- Explainability: retrieval grounding allowed citations/traceability to statutes and prior cases, reducing hallucination risk.
Governance & safety:
- Human final sign-off mandated; system designed to augment, not automate final decisions.
- Privacy and transparency constraints considered (de-identification, traceability).
- Early double-review step used as an additional safety layer.

Data & Methods

Data sources:
- Centralized municipal case library (Zaakbibliotheek) consolidating ~55,000 past objection cases and decisions.
- Pilot focused domains: waste-fine objections and removal/towing of bicycles, motor vehicles, and boats.
- Knowledge base included prior advice letters, frequently-cited statutes/regulations, and policy/guidelines.
Preprocessing & retrieval:
- Documents split into semantically meaningful chunks (by section), embedded offline, stored in memory-based indexes (given moderate domain sizes).
- Query constructed from officer report, objection letter, and any jurist steering advice; encoded and matched to Top-K similar chunks.
Generation & prompting:
- Prompt template includes case docs, retrieved chunks, and style/format instructions (e.g., “Explanation” section tone and structure).
- Low temperature to limit variance; model instructed to base arguments on provided references.
CAG refinement:
- Jurist annotations on draft are incorporated into a refinement prompt that includes the draft, case docs, annotations, and retrieved sources; generate revised draft (v2).
- Typical workflow: zero or one refinement pass; multiple iterations supported but rare.
Evaluation:
- Mixed quantitative and qualitative evaluation during real-world deployment.
- Metrics reported: drafting time reduction, content coverage estimates (80%–100% of essential elements), legal/factual accuracy and consistency (expert-validated).
- Qualitative feedback from legal professionals on usability, trust, and workflow impact.
System constraints & deployment logistics:
- Different domain configurations per case type; adding a new domain requires curating prior cases and configuring prompts/templates.
- Privacy safeguards (de-identification) and governance practices (human sign-off, double review early on).

Implications for AI Economics

Productivity & labor economics:
- Large productivity gains in clerical/cognitive drafting tasks—time per letter reduced from hours to minutes—imply direct labor-cost savings and higher throughput. For municipal departments with growing caseloads (e.g., 35% increase cited), this can materially reduce backlog and waiting times.
- Augmentation effect: systems like LegalCheck are complements to jurists (boosting output per worker) rather than pure substitutes when human finalization is enforced. This suggests reallocation of human effort toward higher-value, complex legal tasks and oversight.
Value of organizational data:
- Municipal historical case libraries become high-value assets. Quality of retrieval grounding depends on curated, well-structured internal datasets; investments in data curation and indexing yield recurring returns via improved AI outputs.
Deployment & governance costs:
- Realizing gains requires non-trivial upfront costs: dataset curation, prompt engineering, domain-specific templates, privacy/de-identification pipelines, and training staff on human-AI workflows.
- Ongoing governance (human review, audits, liability management, transparency measures) creates recurring operational expenses that must be balanced against labor savings.
Standardization and quality effects:
- Increased consistency in decision-writing can reduce variance-driven inefficiencies and potential legal disputes tied to inconsistent reasoning. This standardization has welfare implications: potentially lower litigation costs and faster resolutions.
Market and scaling considerations:
- Public-sector demand for domain-specific RAG+CAG tools could create niche markets (municipal/legal operations software). Economies of scale accrue to platforms that can reuse models and pipelines across similar legal domains, but domain-specific retrieval datasets remain necessary.
- Scalability depends on ability to curate sufficient prior-case corpora and on the costs of model inference for high-volume use.
Labor-market dynamics & long-term effects:
- Short-to-medium term: increased productivity may reduce need for additional entry-level drafting hires but increase demand for roles focused on supervision, validation, prompt engineering, and data curation.
- Long-term substitution risk exists if automation pressures expand beyond drafting into substantive adjudicative tasks; robust governance and legal/regulatory constraints will shape that trajectory.
Access to justice and social impact:
- Efficiency gains can improve public service responsiveness and access by shortening wait times and freeing staff to handle complex or vulnerable cases. Conversely, over-reliance without strong oversight risks errors with public-facing consequences.
Research and policy priorities:
- Need for formal cost-benefit analyses comparing labor savings to governance and implementation costs.
- Empirical study of how augmentation affects case outcomes, appeals, and downstream litigation costs.
- Policy frameworks to allocate liability, ensure transparency of sources, and manage workforce transitions.

Suggestions for further economic analysis: - Quantify per-letter time and cost savings and estimate break-even for implementation given curation/governance expenses. - Model labor reallocation effects (e.g., fewer junior drafters, more oversight/analytics staff) and wage/skill implications. - Assess externalities such as reduced litigation rates and administrative consistency benefits.

If you want, I can produce (a) a short cost model estimating labor savings vs. governance costs for a municipal legal office, or (b) a slide-style one-page summary framing LegalCheck’s economic impacts for policymakers. Which would be most useful?

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper reports real-world deployment metrics (time saved, percent of essential content covered) and user feedback from a municipal legal department, which provide credible, applied evidence of efficiency and quality improvements; however, there is no randomized or quasi-experimental identification, no counterfactual or control group, limited reporting of sample sizes and statistical tests, and possible selection and measurement biases. Methods Rigormedium — Methods include a production deployment, retrieval-augmented system design, curated legal knowledge bases, and expert-in-the-loop review — appropriate for a systems evaluation — but the evaluation lacks rigorous causal design, pre-specified metrics and protocols, comprehensive quantitative validation (e.g., blinded ratings, inter-rater reliability, statistical inference), and clarity on sample sizes and case mix. SampleReal-world deployment in the Municipality of Amsterdam's public-sector legal department using a curated Dutch legal knowledge base and prior-case database; outputs evaluated by legal professionals in that department (expert-in-the-loop), with reported metrics (e.g., drafting time reduced from hours to minutes, generated letters covering ~80–100% of essential content); exact number of cases, users, and time-period not specified. Themeshuman_ai_collab productivity adoption GeneralizabilitySingle-city, single-organization deployment (Municipality of Amsterdam) — may not generalize to other municipalities or countries, System evaluated on a specific task: objection response letters — results may not extend to other legal document types or more complex litigation, Context-specific to Dutch law and curated local precedents — limited transferability to other legal systems or languages, Performance depends on curated knowledge bases and expert reviewers — scaling to organizations without similar resources may yield different outcomes, Short-term deployment metrics; long-term effects on workload, quality, and labor allocation not assessed

Claims (8)

Claim	Direction	Confidence	Outcome	Details
LegalCheck produced near-final advice letters in minutes rather than hours. Task Completion Time	positive	high	time to produce advice/objection response letters	minutes rather than hours 0.18
LegalCheck maintained high legal consistency and factual accuracy when generating draft letters. Output Quality	positive	high	legal consistency and factual accuracy of generated letters	0.18
The system's output captured the vast majority of required legal reasoning—often 80% to 100% of essential content. Output Quality	positive	high	proportion of essential legal reasoning/content captured in generated drafts	80% to 100% of essential content 0.18
Legal professionals found that the system reduced their workload. Worker Satisfaction	positive	high	perceived workload of legal professionals	0.18
Legal professionals found that the system ensured a consistent application of legal standards without replacing human judgment. Output Quality	positive	high	consistency in application of legal standards and preservation of human oversight	0.18
LegalCheck uses a combination of Retrieval-Augmented Generation (RAG) and Context-Augmented Generation (CAG) with curated legal knowledge bases and controlled prompting to retrieve relevant laws and precedents and incorporate case-specific details into coherent drafts. Other	positive	high	n/a (system design / method description)	0.3
The system produced explainable outputs based on actual regulations and prior cases, providing citations/explainability that support legal reasoning. Output Quality	positive	high	explainability / traceability of generated legal reasoning to source regulations and cases	0.18
Deploying LegalCheck in the Municipality of Amsterdam demonstrated substantial efficiency gains, improved legal consistency, and positive user acceptance. Organizational Efficiency	positive	high	efficiency (time), legal consistency, user acceptance	0.18