PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents

AI coding assistants now support a growing share of software work, from quick scripts to production applications. Yet these agents remain largely stateless: each new session re-reads project files, re-derives prior decisions, and - most costly - may repeat debugging attempts that already failed. Reconstructing this context can consume an estimated 5,000-20,000 tokens per session; the bottleneck is often not model capability but missing project memory. We present projectmem, an open-source, local-first memory and judgment layer for AI coding agents. projectmem records development as an append-only, plain-text event log of typed events - issues, attempts, fixes, decisions, and notes - and deterministically projects that log into compact, AI-readable summaries served through the Model Context Protocol (MCP). Beyond storage, projectmem adds a deterministic pre-action gate that warns an agent before it repeats a previously failed fix or edits a known-fragile file. We frame this as Memory-as-Governance: memory that does not merely answer the agent but acts on its next action. The system runs fully offline with no telemetry; its immutable log also serves as a provenance trail for reproducible, auditable AI-assisted development. projectmem ships as a three-dependency Python package (14 MCP tools, 19 CLI commands, 37 automated tests) and is evaluated through a two-month self-study across 10 projects comprising 207 logged events. Source code: https://github.com/riponcm/projectmem.

Summary

Main Finding

projectmem is a local-first, event-sourced memory and judgment layer for AI coding agents that materially reduces repeated debugging effort and context-reestablishment costs by (1) recording development as an immutable plain-text event log, (2) projecting deterministic, compact summaries for agents, and (3) enforcing a deterministic pre-action judgment gate that warns agents before repeating previously-failed fixes or editing fragile files. This Memory-as-Governance design lowers token and developer time waste, preserves privacy/auditability, and enables cross-project lessons without cloud telemetry.

Key Points

Core idea: an append-only, human-readable event log (JSONL + Markdown) of typed events (issue / attempt / fix / decision / note) that is deterministically projected into compact summaries (summary.md, PROJECT_MAP.md) agents read.
Judgment gate: precheck_file(path) performs a deterministic lookup into the log to warn before an action that would repeat past failures or touch high-churn/issue files. This converts passive memory into actionable governance (Memory-as-Governance).
Local-first and privacy-preserving: no default network telemetry, secret redaction on write, fully offline operation; memory is git-native, grep-able, diff-able, auditable.
No vector DB / no embeddings: avoids nondeterministic retrieval costs and recurring read/embedding overheads; projection is deterministic and rebuildable.
MCP-native and tool-agnostic: exposes 14 typed MCP tools (9 read, 5 write) so multiple MCP clients can consume identical memory; also provides a Markdown bridge for non-MCP tools.
Cross-project memory: library-level lessons (“gotchas”) can be promoted to a machine-wide global store and surfaced in projects that match the same detected stack, with source attribution—still local only.
Operational tooling: repository backfill from git history, git hooks and optional file-churn watcher for automatic capture, token-budgeted context assembly, ROI/token-savings estimator, visualization dashboard.
Implementation footprint: three-runtime-dependency Python package, pip-installable, ~5 MB, 19 CLI commands, 14 MCP tools, 37 automated tests.
Safety/usability features: secret redaction by default (patterns for common credential tokens), write tools return readable errors and suppress stdout to avoid protocol corruption.

Data & Methods

System design and implementation:
- Event schema: typed events (issue, attempt, fix, decision, note) with timestamp, optional location, free text; events appended to events.jsonl in .projectmem/.
- Deterministic projections: regenerate summary.md and PROJECT_MAP.md from the log; these are the agent-facing artifacts.
- Judgment gate: precheck_file reads only the log (no model call) and returns deterministic warnings derived from failed attempts, open issues, and churn.
- MCP server: FastMCP-based server exposing typed read/write tools; universal Markdown bridge and pjm wrap for non-MCP clients.
- Cross-project/global store: ~/.projectmem/global for machine-wide gotchas keyed by stack detection (package manifests).
- Security: default-on credential redaction with tests to avoid false positives; no cloud sync.
Evaluation:
- Implementation release: open-source repository with the package and code.
- Usage study: two-month self-study across 10 projects, producing 207 logged events (issues/attempts/fixes/etc.).
- Empirical-cost framing: authors estimate re-establishing context per session previously consumed ≈ 5,000–20,000 tokens; they report that pre-action warnings can save roughly ~30 minutes of re-debugging per prevented repeat in illustrative cases.
- Operational metrics in package: 14 MCP tools, 19 CLI commands, 37 automated tests; backfill and automated capture mechanisms tested in instrumented workflows.
- Token-budget tooling: get_context / pjm wrap produce token-bounded context summaries for non-MCP clients; pjm score reports estimated hours/tokens/dollars saved (machine-readable).
Limitations of evaluation:
- The study is a self-study (single-operator, ten projects) rather than a randomized controlled trial; reported token and time savings are estimated rather than measured at scale across teams.
- No comparative benchmark vs. vector-store memories on task-accuracy benchmarks; the evaluation emphasizes usability, determinism, and anecdotal ROI.

Implications for AI Economics

Direct cost savings on model inference and API spend:
- By avoiding re-submission of large code contexts each session (authors estimate 5k–20k tokens per session), projectmem reduces API token consumption for cloud-hosted LLMs. Deterministic local projections plus token-budgeted context injection shrink per-turn token usage, lowering per-session inference costs for pay-per-token models.
- The deterministic pre-action gate prevents wasted cycles of failed fixes, saving developer time and associated billable hours or opportunity cost; even modest per-warning time savings (e.g., ~30 minutes) compound across teams and projects.
Reduced compute and bandwidth externalities:
- Local-first design shifts work from repeated cloud calls to local reads/projections, decreasing cloud compute demand and network bandwidth for enterprises that otherwise rely on hosted memory or retrievers.
Vendor-lock and pricing leverage:
- A locally-hosted, no-telemetry memory layer reduces dependence on hosted memory providers (vector DB-as-a-service or managed memory), allowing firms to avoid recurring service fees and retain negotiating leverage over hosted LLM providers.
Auditability, compliance, and liability:
- Immutable, git-native logs create auditable trails of agent actions and decision provenance. This can reduce compliance costs in regulated industries (finance, healthcare, critical infrastructure) where traceability of automation decisions matters, and reduce liability and post-incident investigation costs.
Product-market and business model effects:
- Memory-as-Governance creates a distinct product space from retrieval-as-a-service: vendors could compete on determinism, governance rules, and local privacy guarantees rather than just retrieval accuracy. This opens both open-source and paid enterprise offerings (local appliances, hybrid sync for cross-team sharing) with differentiated pricing.
- Cross-project, machine-local gotchas form a non-cloud shared knowledge asset: firms may value internal aggregation (improved on-boarding, fewer repeated library pitfalls) and pay for curated, private memory layers or tooling integrations.
Labor and organizational effects:
- Increased agent effectiveness and fewer repeated failures likely raise per-developer productivity, shifting the skill mix toward higher-level design/coordination tasks. This may reduce marginal demand for routine debugging labor but augment demand for roles that build and maintain memory/governance infrastructure.
- Memory-as-Governance can change workflow incentives: deterministic gates reduce low-value repetition but could create over-reliance on prior judgments or false negatives if logs are incomplete. Organizations will need governance over what gets logged/promoted to avoid stale or over-broad restrictions.
Distributional considerations and externalities:
- Smaller teams or individual developers benefit from free/open local tooling that reduces token costs; large orgs may capture more value via curated, cross-project gotchas. This can widen productivity gaps unless community-shared memory artifacts emerge.
- If many firms adopt local memory layers, aggregate cloud inference demand could dampen, affecting LLM providers’ revenue models—encouraging providers to monetize value-added managed memory, integrations, or fine-tuning services.
Risks and possible negative economic effects:
- Maintenance costs: operating projectmem (backfills, guardrail tuning, redaction maintenance) entails engineering overhead. If upkeep is high, net gains could be smaller for some teams.
- Moral hazard: deterministic warnings may discourage exploration of fixes that are contextually different; governance rules need careful scope and review to avoid stifling effective novel fixes.
- Capitalization of memory: firms might begin to treat internal memories as proprietary assets, raising switching costs and potential anti-competitive effects if memory artifacts are export-controlled or vendor-locked.
Policy and regulatory implications:
- Immutable, auditable logs align with regulatory demand for explainability of automated development in safety-critical domains. Conversely, long-lived local logs that leak could be a compliance risk—so the redaction and access-control economics matter.
- Deterministic, local governance may be favored in regulated industries over opaque cloud-based safety models; this could shape procurement and compliance standards.

Overall, projectmem suggests that investment in deterministic, local, governance-oriented memory infrastructure can yield measurable token and labor-cost savings, alter vendor economics by reducing reliance on cloud retrieval services, and create new organizational assets (auditable memory) with both productivity and regulatory value—while introducing maintenance and governance trade-offs organizations must manage.

Assessment

Paper Typedescriptive Evidence Strengthlow — Evaluation is a small, non-randomized two-month self-study across 10 projects with 207 logged events and no control group, objective productivity metrics, or external validation, so claims about effectiveness are provisional. Methods Rigorlow — Design and implementation appear engineering-solid (tests, CLI, MCP integration), but the empirical assessment lacks rigor: small N, likely author-led self-study, no pre-registered protocol, no baseline/comparison, and limited quantitative outcome measures. SampleA developer-facing, open-source, local-first Python package (three dependencies) implementing an append-only, plain-text project event log and deterministic projection into AI-readable summaries via the Model Context Protocol; evaluated in a two-month self-study across 10 projects with 207 logged events (authors provide code, 14 MCP tools, 19 CLI commands, 37 tests), run fully offline with no telemetry; likely author/early-adopter users rather than a representative user sample. Themeshuman_ai_collab productivity governance GeneralizabilitySmall, non-random self-selected project sample (10 projects) limits external validity, Short duration (two months) may not capture long-term maintenance or scaling issues, Unknown diversity of project types, languages, team sizes, and domain complexity, Likely author or early-adopter bias (no independent users or organizations), No evaluation in collaborative/multi-developer settings or production-scale codebases

Claims (9)

Claim	Direction	Outcome	Confidence & Evidence	Details
Reconstructing this context can consume an estimated 5,000-20,000 tokens per session. Task Completion Time	negative	context_size_in_tokens_per_session	Reading fidelity high Study strength low	5,000-20,000 tokens per session 0.09
The bottleneck is often not model capability but missing project memory. Developer Productivity	negative	primary_bottleneck_for_ai_coding_agents	Reading fidelity high Study strength low	0.09
We present projectmem, an open-source, local-first memory and judgment layer for AI coding agents. Other	positive	system_availability_and_design	Reading fidelity high Study strength medium	0.18
projectmem records development as an append-only, plain-text event log of typed events (issues, attempts, fixes, decisions, and notes) and deterministically projects that log into compact, AI-readable summaries served through the Model Context Protocol (MCP). Other	positive	memory_representation_and_summary_serving	Reading fidelity high Study strength high	0.3
projectmem adds a deterministic pre-action gate that warns an agent before it repeats a previously failed fix or edits a known-fragile file. Error Rate	positive	pre-action_warnings_to_prevent_repeated_failures	Reading fidelity high Study strength medium	0.18
We frame this as Memory-as-Governance: memory that does not merely answer the agent but acts on its next action. Governance And Regulation	positive	memory_behaviour_as_governance_mechanism	Reading fidelity high Study strength speculative	0.03
The system runs fully offline with no telemetry; its immutable log also serves as a provenance trail for reproducible, auditable AI-assisted development. Regulatory Compliance	positive	privacy_and_provenance_for_auditing	Reading fidelity high Study strength medium	0.18
projectmem ships as a three-dependency Python package (14 MCP tools, 19 CLI commands, 37 automated tests). Other	positive	software_package_composition	Reading fidelity high Study strength high	three-dependency Python package (14 MCP tools, 19 CLI commands, 37 automated tests) 0.3
projectmem is evaluated through a two-month self-study across 10 projects comprising 207 logged events. Other	positive	evaluation_scope_and_sample	Reading fidelity high Study strength medium	n=10 two-month self-study across 10 projects comprising 207 logged events 0.18

A local, append-only project memory nudges AI coding assistants away from repeating past failures and preserves an auditable provenance trail; a two-month self-study over 10 projects (207 events) demonstrates feasibility though not yet rigorous proof of productivity gains.