A stateless memory design gives enterprises the auditability and determinism they need without sacrificing accuracy: Deterministic Projection Memory cuts decision-time LLM calls to one, runs 7–15× faster under tight budgets and improves factual precision and reasoning coherence markedly versus summarization pipelines at 20× compression.
Enterprise deployment of long-horizon decision agents in regulated domains (underwriting, claims adjudication, tax examination) is dominated by retrieval-augmented pipelines despite a decade of increasingly sophisticated stateful memory architectures. We argue this reflects a hidden requirement: regulated deployment is load-bearing on four systems properties (deterministic replay, auditable rationale, multi-tenant isolation, statelessness for horizontal scale), and stateful architectures violate them by construction. We propose Deterministic Projection Memory (DPM): an append-only event log plus one task-conditioned projection at decision time. On ten regulated decisioning cases at three memory budgets, DPM matches summarization-based memory at generous budgets and substantially outperforms it when the budget binds: at a 20x compression ratio, DPM improves factual precision by +0.52 (Cohen's h=1.17, p=0.0014) and reasoning coherence by +0.53 (h=1.13, p=0.0034), paired permutation, n=10. DPM is additionally 7-15x faster at binding budgets, making one LLM call at decision time instead of N. A determinism study of 10 replays per case at temperature zero shows both architectures inherit residual API-level nondeterminism, but the asymmetry is structural: DPM exposes one nondeterministic call; summarization exposes N compounding calls. The audit surface follows the same one-versus-N pattern: DPM logs two LLM calls per decision while summarization logs 83-97 on LongHorizon-Bench. We conclude with TAMS, a practitioner heuristic for architecture selection, and a failure analysis of stateful memory under enterprise operating conditions. The contribution is the argument that statelessness is the load-bearing property explaining enterprise's preference for weaker but replayable retrieval pipelines, and that DPM demonstrates this property is attainable without the decisioning penalty retrieval pays.
Summary
Main Finding
Deterministic Projection Memory (DPM) — an append-only event log plus a single task-conditioned, temperature-zero projection at decision time — preserves the enterprise-friendly systems properties (deterministic replay, auditable rationale, multi-tenant isolation, and statelessness for horizontal scale) while matching or improving decision alignment vs. a strong stateful summarization baseline. At tight memory budgets DPM substantially improves factual precision and reasoning coherence, while also being 7–15× faster (one LLM call vs N incremental calls). The core contribution is a systems argument: statelessness is the load-bearing property driving enterprise adoption, and DPM demonstrates that property can be achieved without sacrificing decision quality on practical regulated tasks that fit within a single projection call.
Key Points
- Enterprise constraints matter: regulated decision systems require deterministic replay, auditable rationale, multi-tenant isolation, and statelessness for horizontal scaling. Many stateful memory architectures violate these properties by design (they accumulate mutable state via repeated LLM calls).
- DPM architecture: immutable append-only event log E plus a single projection π(E, T, B) → M at decision time. Projection emits structured memory (facts / reasoning / compliance), cites event indices, runs at temperature 0, and is budget-bounded.
- Operational advantage: DPM reduces the nondeterminism/replay surface from N intermediate LLM calls to a single projection call, making byte-exact replay feasible if paired with deterministic inference.
- Empirical outcome:
- Benchmark: LongHorizon-Bench (10 cases: 5 mortgage, 5 claims; ~26–28k chars, 82–96 events/case).
- Conditions: Summ-only baseline (incremental summarization after each event) vs. DPM.
- Budgets: tight=1,338 chars (20× compression), moderate=5,352 (5×), loose=13,381 (2×).
- Metrics: FRP (factual precision – anchor recovery), RCS (reasoning coherence), EDA (decision accuracy), CRR (compliance reconstruction).
- Results (paired tests, n=10):
- Tight budget: FRP +0.515 (p=0.001, Cohen’s h=1.17); RCS +0.533 (p=0.003, h=1.13). EDA and CRR improved (Δ=+0.50 each) with p≈0.065–0.066 (large effect sizes but not meeting conventional p<0.05).
- Moderate & loose budgets: no statistically significant difference on the four axes.
- Speed: DPM 7–15× faster because it makes one LLM call at decision time instead of many incremental consolidation calls.
- Determinism study: temperature-zero calls against live API (Anthropic claude-haiku) show residual API-level nondeterminism (byte drift on order of single-digit tokens). Structural asymmetry: DPM exposes one nondeterministic call; stateful summarization exposes N compounding calls.
- Scope & limitations:
- DPM applies to trajectory memory (events within a single decision); it does not replace corpus retrieval/indexing.
- Single-projection DPM requires the trajectory to fit a single model context window; hierarchical DPM for longer horizons is future work and reintroduces intermediate calls.
- Byte-exact replay still requires a deterministic inference runtime; DPM only minimizes the practical surface that must be made deterministic.
- Practitioner output: TAMS — a task-property heuristic to select between stateless (DPM/RAG) and stateful memory architectures (choose DPM/ stateless when replay/audit/isolation/scale are primary and trajectories fit single projection; choose stateful when an agent must edit memory mid-trajectory or requires richer internal deliberation that cannot be captured by a single projection).
Data & Methods
- Benchmark: LongHorizon-Bench (regulated decisioning domains: mortgage underwriting under ECOA/Reg B; insurance claims adjudication).
- 10 cases (5 loan, 5 claim), each ~26–28k characters, 82–96 events.
- Ground truth constructed by decision-first inversion so all required anchors are derivable.
- Architectures compared:
- Summ-only (stateful incremental summarization; summary updated after each event).
- DPM (append-only log, single projection at decision time).
- Memory budgets: tight 1,338 chars (20× compression), moderate 5,352 (5×), loose 13,381 (2×). Under Summ-only these were running-summary caps; under DPM they were target projection lengths.
- Backend: claude-haiku-4-5-20251001 for agents and judges, temperature=0, fixed seed in call stack; judge calls for RCS/CRR used case-specific rubrics.
- Statistics: paired permutation tests (10,000 resamples) paired by case; paired-bootstrap 95% CIs on mean deltas; Cohen’s h for effect sizes on proportion metrics. Four decision-alignment axes evaluated: FRP, RCS, EDA, CRR.
- Determinism experiment: 10 replays per case at temperature zero; measured byte-level drift.
Implications for AI Economics
- Compute & API cost reductions:
- One-shot projection vs N incremental consolidations reduces LLM calls per decision by O(N), directly lowering API/compute spend. Measured 7–15× latency improvement implies substantial per-decision resource savings and higher throughput on fixed compute budgets.
- Operational & compliance cost reductions:
- Structural determinism and auditable rationales simplify regulatory investigations and internal audits (fewer artifacts to log, inspect, or reconstruct). This reduces legal/compliance risk and the engineering overhead of retrofitting audits onto stateful systems.
- Multi-tenant isolation by construction reduces privacy/leakage risk and lowers costs for data governance and tenant-scoping infrastructure.
- Scaling economics:
- Statelessness enables elastic horizontal scaling without per-request node affinity or heavyweight shared state coordination. That lowers operational complexity and OPEX for large-scale enterprise deployments (fewer persistent caches, less stateful orchestration).
- Product & market implications:
- Enterprises will likely favor memory architectures that trade expressive power for operational guarantees when regulated decisions are involved. This explains continued industry preference for RAG-like pipelines despite academic gains in stateful memory accuracy.
- Research on stateful memory must internalize systems constraints (determinism, auditability, tenancy, scale) to increase enterprise adoption — delivering decision-quality gains alone is not sufficient.
- Trade-offs and investment choices:
- To obtain bit-exact replay, DPM still requires investing in deterministic inference runtimes (self-hosted weights, deterministic samplers) — an upfront capex/engineering cost that is now more tractable because only one projection call must be pinned.
- For tasks requiring in-trajectory memory editing/deliberation or for trajectories exceeding current model context windows, stateful architectures or hierarchical DPM variants may still be necessary; enterprises must evaluate the economic trade-off between increased model/API costs and the business value of additional agent capabilities.
- Recommendation for decision-makers:
- When compliance, auditability, tenant isolation, and scalable throughput are primary drivers (typical in regulated verticals), adopt stateless projection designs (DPM/RAG) where feasible — this reduces operating costs and regulatory risk with little/no decision-quality penalty in many practical settings.
- Budget and context-window constraints matter: DPM is especially advantageous when memory budgets are tight (where it outperforms summarization).
Summary takeaways for AI economists: stateless projection architectures materially change the cost-risk profile of deploying long-horizon decision agents in regulated settings by lowering per-decision compute, simplifying regulatory evidence assembly, improving isolation, and enabling simpler horizontal scaling. These operational benefits can explain enterprise adoption patterns and should be included in economic models of memory-augmented agent deployment and in evaluations of new memory research.
Assessment
Claims (10)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Enterprise deployment of long-horizon decision agents in regulated domains (underwriting, claims adjudication, tax examination) is dominated by retrieval-augmented pipelines despite a decade of increasingly sophisticated stateful memory architectures. Adoption Rate | positive | medium | prevalence of retrieval-augmented pipelines in enterprise deployment |
0.01
|
| Regulated deployment imposes four load-bearing systems properties — deterministic replay, auditable rationale, multi-tenant isolation, statelessness for horizontal scale — and stateful architectures violate them by construction. Governance And Regulation | negative | high | compatibility of stateful architectures with regulatory/system properties |
0.02
|
| We propose Deterministic Projection Memory (DPM): an append-only event log plus one task-conditioned projection at decision time. Other | positive | high | architecture design (DPM specification) |
0.02
|
| On ten regulated decisioning cases at three memory budgets, DPM matches summarization-based memory at generous budgets and substantially outperforms it when the budget binds. Output Quality | positive | high | relative performance (match/outperform) of DPM vs summarization-based memory across memory budgets |
n=10
0.12
|
| At a 20x compression ratio, DPM improves factual precision by +0.52 (Cohen's h=1.17, p=0.0014) compared to summarization-based memory (paired permutation, n=10). Output Quality | positive | high | factual precision |
n=10
+0.52 (Cohen's h=1.17, p=0.0014)
0.12
|
| At a 20x compression ratio, DPM improves reasoning coherence by +0.53 (Cohen's h=1.13, p=0.0034) compared to summarization-based memory (paired permutation, n=10). Output Quality | positive | high | reasoning coherence |
n=10
+0.53 (h=1.13, p=0.0034)
0.12
|
| DPM is additionally 7-15x faster at binding budgets, making one LLM call at decision time instead of N. Task Completion Time | positive | high | decision-time latency / number of LLM calls |
7-15x faster; one LLM call at decision time instead of N
0.12
|
| A determinism study of 10 replays per case at temperature zero shows both architectures inherit residual API-level nondeterminism, but DPM exposes one nondeterministic call while summarization exposes N compounding calls. Ai Safety And Ethics | mixed | high | system nondeterminism / number of nondeterministic LLM calls exposed per decision |
n=10
DPM: one nondeterministic call; summarization: N compounding calls
0.12
|
| The audit surface follows the same one-versus-N pattern: DPM logs two LLM calls per decision while summarization logs 83-97 on LongHorizon-Bench. Governance And Regulation | positive | high | number of LLM calls logged per decision (audit surface) |
DPM logs two LLM calls per decision; summarization logs 83-97
0.12
|
| Statelessness is the load-bearing property explaining enterprises' preference for weaker but replayable retrieval pipelines, and DPM demonstrates this property is attainable without the decisioning penalty retrieval pays. Governance And Regulation | positive | high | trade-off between stateless architectures and decisioning performance / auditability |
n=10
0.12
|