The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

A Governed Memory layer stitches together agent-specific memories into a governed shared store, eliminating cross-entity leakage and enforcement failures while cutting context tokens and maintaining retrieval accuracy; controlled tests and a public benchmark show high recall, strong governance routing, and no quality penalty from schema enforcement.

Governed Memory: A Production Architecture for Multi-Agent Workflows
Hamed Taheri · March 18, 2026
arxiv descriptive medium evidence 7/10 relevance Source PDF
Governed Memory—a shared memory and governance layer using a dual-memory model, tiered routing, reflection-bounded retrieval, and a closed-loop schema lifecycle—reduces token usage, prevents cross-entity leakage, enforces governance, and preserves retrieval quality in controlled tests and a benchmark.

Enterprise AI deploys dozens of autonomous agent nodes across workflows, each acting on the same entities with no shared memory and no common governance. We identify five structural challenges arising from this memory governance gap: memory silos across agent workflows; governance fragmentation across teams and tools; unstructured memories unusable by downstream systems; redundant context delivery in autonomous multi-step executions; and silent quality degradation without feedback loops. We present Governed Memory, a shared memory and governance layer addressing this gap through four mechanisms: a dual memory model combining open-set atomic facts with schema-enforced typed properties; tiered governance routing with progressive context delivery; reflection-bounded retrieval with entity-scoped isolation; and a closed-loop schema lifecycle with AI-assisted authoring and automated per-property refinement. We validate each mechanism through controlled experiments (N=250, five content types): 99.6% fact recall with complementary dual-modality coverage; 92% governance routing precision; 50% token reduction from progressive delivery; zero cross-entity leakage across 500 adversarial queries; 100% adversarial governance compliance; and output quality saturation at approximately seven governed memories per entity. On the LoCoMo benchmark, the architecture achieves 74.8% overall accuracy, confirming that governance and schema enforcement impose no retrieval quality penalty. The system is in production at Personize.ai.

Summary

Main Finding

Governed Memory is a production architecture that closes the “memory governance gap” in multi-agent enterprise AI by providing a shared memory + governance layer. It combines a dual memory model (open-set atomic facts + schema-enforced typed properties), tiered governance routing with session-aware progressive delivery, reflection-bounded retrieval with entity-scoped isolation, and a closed-loop schema lifecycle. In production at Personize.ai, the architecture yields large practical gains (high recall, governance precision, token savings, zero cross-entity leakage in adversarial tests) while imposing no measured retrieval-quality penalty on an external benchmark.

Key Points

  • The memory governance gap: in enterprise settings many autonomous agent nodes act on the same entities but lack a shared memory/governance layer, producing five structural problems: memory silos, governance fragmentation, unstructured memories unusable downstream, redundant context injection across multi-step execution, and silent quality degradation.
  • Four architectural mechanisms:
  • Dual memory model - Open-set memories: coreference-resolved atomic facts stored as vector embeddings; lightweight quality gates (coref, self-containment, temporal). - Schema-enforced memories: typed property values (text, number, date, boolean, options, arrays) with per-property confidence and update semantics. - Dual extraction: single LLM pass outputs both modalities; write-side deduplication (cosine ≥ 0.92) and background consolidation (threshold 0.95).
  • Tiered governance routing - Fast mode (~850 ms, no LLM) using embeddings + keyword heuristics + HyPE (Hypothetical Prompt Enrichment). - Full mode (~2–5 s) uses embedding pre-filter + LLM structured analysis. - Progressive context delivery: session-aware delta injection that tracks delivered variables/sections to avoid re-injecting previously delivered guidance.
  • Reflection-bounded retrieval - Vector search scoped by org partition + CRM key (entity isolation). - Bounded reflection loop (default ≤ 2 rounds): LLM judges completeness (temp 0.1) and generates follow-up queries (temp 0.3) if needed.
  • Schema lifecycle & closed-loop quality - AI-assisted schema authoring, rubric scoring, execution logging, per-property auto-refinement to detect and fix schema drift and extraction degradation.
  • System design choices for safety/ops: organization + entity scoping, two-phase PII redaction, provenance metadata, SDK/MCP interface for multi-agent integration.
  • Relationship to prior work: builds on RAG and memory primitives (SimpleMem/Mem0) but operates at an infrastructure/governance layer above retrieval primitives.

Data & Methods

  • Architecture / algorithms
    • Dual extraction pipeline: pre-extraction PII redaction → chunking → property selection via embedding similarity → single LLM dual extraction (open-set facts + typed properties) → post-redaction → cross-chunk deduplication → quality gates → embeddings → write deduplication. Embedding model: text-embedding-3-small. Storage: LanceDB + DynamoDB (per paper).
    • Governance variable enrichment includes HyPE-generated synthetic queries, scope inference, content-aware embeddings.
    • Governance routing modes: Fast (embedding + keyword) vs Full (LLM classification), with Auto mode choosing between them.
    • Reflection loop: bounded rounds (default 2), LLM judge at low temp for completeness, generation of 1–2 follow-up queries when incomplete, re-embed & search, merge/dedup across rounds.
    • Session-aware progressive delivery records delivered variables/sections to inject only deltas on subsequent steps.
  • Evaluation / experiments (as reported)
    • Controlled experiments: N = 250 across five content types (paper does not enumerate types in the excerpt). Reported operational/validation metrics:
      • Fact recall: 99.6% (with complementary dual-modality coverage).
      • Governance routing: 92% precision (reported also: 88% recall).
      • Progressive delivery: ~50% token reduction on average.
      • Entity isolation/adversarial validation: zero cross-entity leakage across 500 adversarial queries; adversarial governance compliance 100%.
      • Adversarial validation details: 3,800 results, 500 queries, 2.74% flag rate with all flags false positives (no true leakage).
      • Reflection gains: +25.7 percentage points completeness with 2 rounds; 62.8% gain on hard multi-hop queries (most gains in the first extra round).
      • Output quality saturation: quality improvements saturate at ~7 governed memories per entity.
    • External benchmark: LoCoMo benchmark overall accuracy = 74.8%, indicating governance/schema enforcement did not reduce retrieval accuracy.
    • Latency measures: Fast governance path ≈ 850 ms (no LLM), Full path ≈ 2–5 s (LLM analysis).
  • Productionization: system implemented with standard API/SDK to let heterogeneous agents read/write/govern memory; per-organization partitioning and CRM-key-based entity scoping enforce isolation; two-phase content redaction pipeline.

Implications for AI Economics

  • Productivity and cost efficiency
    • Token / compute savings: progressive delivery (~50% token reduction) reduces per-step LLM cost in autonomous multi-step workflows, directly lowering variable usage costs for repeated governance/context injection.
    • Latency-cost trade-offs: the fast path provides low-latency, low-cost governance routing; the full LLM path is costlier/slower but used selectively—this tiering enables economical scaling of many agent nodes.
  • Value of governance as infrastructure
    • Centralized memory + governance converts dispersed agent outputs into organizational capital. That can increase the marginal productivity of agents by enabling reuse, aggregation, and structured downstream consumption (CRM sync, analytics), increasing the realized value per piece of generated content.
    • Schema-enforced memories create queryable, monetizable data (e.g., feed scoring models, sales/marketing automation) whereas free-text memories are largely only prompt augmentation.
  • Risk reduction and compliance economics
    • Zero cross-entity leakage in adversarial tests and 100% adversarial governance compliance imply lower regulatory/compliance risk and lower expected costs from privacy breaches and policy violations—important in regulated industries.
    • Centralized policy propagation (one source of truth) reduces coordination costs and the economic friction of keeping many agent configs and teams consistent.
  • Maintenance and long-run operating costs
    • The closed-loop schema lifecycle (rubrics, execution logging, per-property refinement) addresses silent quality degradation and schema drift, reducing long-tail maintenance costs and the probability of large latent defects (e.g., bad CRM fields).
    • Quality saturation at ~7 governed memories per entity suggests diminishing returns to storing ever-more memories per entity; this informs optimal storage and retrieval budget planning.
  • Market and vendor dynamics
    • Middleware value capture: a governance/memory layer that becomes central across agents could create strong vendor leverage and switching costs for organizations that adopt a particular provider (standardization risk).
    • Standardization externalities: if architectures like Governed Memory become de facto standards, they could shift where value accrues in the AI stack—from model providers to governance/memory infrastructure providers.
  • Research and measurement suggestions for economists
    • Quantify cost savings from token reductions and fewer downstream error corrections using A/B experiments in production workflows.
    • Measure productivity gains (time-to-resolution, conversion uplift) attributable to shared memories and schema enforcement versus siloed agent deployments.
    • Model adoption dynamics: analyze investment thresholds for centralizing memory/governance versus incremental integration of per-agent safeguards.
    • Study concentration effects: evaluate how central governance platforms affect competition among agent/tool vendors and the bargaining power of middleware providers.

Potential trade-offs and caveats (economic considerations) - Upfront engineering and operational costs to deploy and maintain a governance layer, plus potential vendor lock-in effects. - Added latency/cost when using full-mode governance routing; organizations must balance speed vs. depth of governance. - Centralized storage of entity memories increases value of a breach; robust security and legal controls are essential.

Summary conclusion Governed Memory formalizes and operationalizes a governance-oriented memory layer for multi-agent enterprise AI that appears to materially improve recall, governance precision, privacy isolation, and token efficiency while enabling structured downstream use and closed-loop quality management. For AI economists, this architecture is an example where infrastructure design (governance + shared memory) has direct, measurable economic impacts on cost, productivity, compliance risk, and market structure.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper reports controlled experiments (N=250 across five content types), adversarial tests (500 queries), and evaluation on the LoCoMo benchmark, providing multiple quantitative metrics that support the engineering claims; however, results appear proprietary, key dataset and baseline details are missing, statistical uncertainty is not reported, and tests may reflect implementation-specific tuning rather than broadly generalizable effects. Methods Rigormedium — The authors run structured experiments (including adversarial checks) and a public benchmark, and quantify recall, routing precision, token savings, leakage, and compliance; but the methodology omits crucial details (data provenance, selection/sampling, baselines and comparators, model/back-end specifics, evaluation protocols, and statistical tests), limiting reproducibility and independent assessment. SampleControlled evaluation on 250 items spanning five content types, 500 adversarial entity-isolation queries, and performance reporting on the LoCoMo benchmark; plus an in-production deployment at Personize.ai. The abstract does not specify whether the 250 items are real enterprise records, synthetic prompts, or public datasets, nor does it name the LLM/back-end variants used. Themesorg_design adoption human_ai_collab productivity GeneralizabilitySingle-vendor/production environment (Personize.ai) may reflect bespoke engineering and not general enterprise stacks, Undisclosed LLM/back-end and index/storage implementations could materially affect results, Five content types and N=250 sample size are limited relative to enterprise heterogeneity, Adversarial tests and benchmarks may not capture real-world workflow diversity or long-run degradation, Governance policies and schema requirements vary widely across organizations, limiting policy portability

Claims (16)

ClaimDirectionConfidenceOutcomeDetails
The paper identifies five structural challenges arising from the memory governance gap: memory silos across agent workflows; governance fragmentation across teams and tools; unstructured memories unusable by downstream systems; redundant context delivery in autonomous multi-step executions; and silent quality degradation without feedback loops. Other negative medium presence/identification of five structural governance challenges
Five structural challenges identified
0.11
The paper presents Governed Memory, a shared memory and governance layer addressing the memory governance gap. Other positive medium existence of an architecture called Governed Memory
Governed Memory architecture described
0.11
Governed Memory implements a dual memory model combining open-set atomic facts with schema-enforced typed properties. Other positive medium memory model design: open-set atomic facts + schema-enforced typed properties
dual memory model (open-set atomic facts + schema-enforced properties)
0.11
Governed Memory uses tiered governance routing with progressive context delivery. Other positive medium governance routing strategy (tiered) and context delivery method (progressive)
tiered governance routing with progressive delivery
0.11
Governed Memory uses reflection-bounded retrieval with entity-scoped isolation. Other positive medium retrieval strategy (reflection-bounded) and isolation scope (entity-scoped)
reflection-bounded retrieval; entity-scoped isolation
0.11
Governed Memory implements a closed-loop schema lifecycle with AI-assisted authoring and automated per-property refinement. Other positive medium schema lifecycle process including AI-assisted authoring and per-property refinement
closed-loop schema lifecycle with AI-assisted authoring
0.11
Controlled experiments were run with N = 250 across five content types to validate the mechanisms. Other null_result high experimental sample size and content-type breadth (N=250, 5 content types)
n=250
Controlled experiments across 5 content types; N=250
0.18
The system achieved 99.6% fact recall (with complementary dual-modality coverage) in the controlled experiments. Output Quality positive high fact recall (percentage recall of facts)
n=250
99.6% fact recall
0.18
Governance routing precision was 92% in the experiments. Output Quality positive high governance routing precision (percentage)
n=250
92% governance routing precision
0.18
Progressive context delivery yielded a 50% token reduction. Organizational Efficiency positive high token usage reduction (percentage)
n=250
50% token reduction from progressive context delivery
0.18
There was zero cross-entity leakage across 500 adversarial queries. Ai Safety And Ethics positive high cross-entity information leakage (count/occurrence across 500 queries)
n=500
zero cross-entity leakage across 500 adversarial queries
0.18
Adversarial governance compliance was 100%. Governance And Regulation positive high governance compliance under adversarial queries (percentage)
n=500
100% adversarial governance compliance
0.18
Output quality saturates at approximately seven governed memories per entity. Output Quality null_result medium output quality as a function of number of governed memories per entity (saturation point ≈ 7)
n=250
output quality saturates at ≈7 governed memories per entity
0.11
On the LoCoMo benchmark, the architecture achieves 74.8% overall accuracy. Output Quality positive high overall accuracy on the LoCoMo benchmark (percentage)
74.8% overall accuracy on LoCoMo benchmark
0.18
The LoCoMo result confirms that governance and schema enforcement impose no retrieval quality penalty. Output Quality positive medium inferred retrieval quality impact of governance/schema enforcement (no penalty)
No retrieval quality penalty inferred from LoCoMo results
0.11
The system is in production at Personize.ai. Adoption Rate positive medium deployment status (production at Personize.ai)
deployed in production at Personize.ai
0.11

Notes