Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions. We demonstrate six attack scenarios against a production 42-million-node code knowledge graph, providing the first empirical demonstration of knowledge graph poisoning against a production-scale agentic system, distinct from CTI embedding poisoning. Primary evaluation uses real SDK tool-use across nine models from three providers (N=30 per model), where models autonomously invoke a graph query tool and reason from results. The result is unambiguous: every tested model trusts poisoned data at 100% at moderate attacker sophistication(L2), with 269 valid trials (of 270) accepting fabricated security claims under directed queries. Under open-ended prompts, trust drops to 3-55%, confirming prompt framing as a confound; we report both conditions. An attacker sophistication gradient reveals discrete break points, a minimum skill at which trust flips from 0% to 100%, reframing the attack as a question not of whether but of how much. A controlled delivery-mode comparison shows that inline evaluation produces false negatives: GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use, demonstrating that delivery mode is a first-order confound. We evaluate five defences; read-only access control eliminates the direct mutation vector, while the remaining four are partial and model-dependent. Analysis of four additional platforms suggests the attack may generalise across the knowledge-graph ecosystem.

Summary

Main Finding

Oracle Poisoning is a practical, high-impact attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime (via MCP/tool-use). Because agents treat tool responses as ground truth, they perform correct reasoning over false facts and produce wrong but internally consistent conclusions. The authors demonstrate this on a production 42-million-node code knowledge graph and show that, under directed queries and moderate attacker sophistication, multiple state-of-the-art models accepted poisoned facts essentially 100% of the time. Read-only access control eliminates the direct vector; other mitigations are partial and model-dependent.

Key Points

Definition: Oracle Poisoning = modification of a queryable knowledge graph (nodes, edges, properties) to cause AI agents to draw incorrect conclusions through correct reasoning. Distinct from prompt injection, RAG/embedding poisoning, and tool metadata poisoning.
Threat model: attacker can (with write access) create nodes/edges or modify properties; does not need to compromise the model, MCP server, or tool definitions. Insider, credential theft, or ingestion pipeline compromise are realistic vectors.
Preconditions for vulnerability (P1–P5): a structured data store queried by agents; write access via shared creds/no RBAC; no per-entry provenance; agents treat results as ground truth; no integrity verification against authoritative sources.
Attack variants: node creation (fake packages, fake call-graph sanitizers, fake telemetry routes, vulnerable code patterns) and property modification on existing nodes (zero-node injection).
Scenarios demonstrated (six): Fork-for-a-Package; Transitive Dependency Chain; Call-Graph Security Evasion; Telemetry Rerouting; Code-Generation Poisoning; Property Modification.
Empirical results:
- Production target: Neo4j 5.x graph with ~42M nodes.
- Primary evaluation via real SDK agentic tool-use (MCP v1.0) across nine models from three providers (OpenAI, Anthropic, Google); N = 30 per model per condition.
- Under directed queries at moderate attacker sophistication (L2), models accepted poisoned claims in 269 of 269 valid trials (nearly 100% — one trial excluded); i.e., universal trust in poisoned results under these conditions.
- Under open-ended prompts, trust dropped (3–55%), demonstrating prompt framing is a confound.
- Delivery-mode confound: identical poisoned content presented inline vs via tool-use produced different outcomes (example: GPT-5.1 trusted 0% inline but 100% when data came through agentic tool-use), so the MCP/tool trust channel matters.
- Property-modification attacks (no new nodes) achieved 96.6–100% acceptance across models; minimum budget often 1–2 node/edge modifications per objective.
Defences tested:
- Read-only access control: eliminates the direct mutation vector (most effective).
- Multi-tool cross-verification: reduces blind trust from 100% to 0–25% by enabling contradiction detection across sources (partial).
- System prompt hardening: no measurable effect.
- Blind “devil’s advocate” checks: operationally useless (catch rate equals false-positive rate).
- Defence effectiveness validated under both inline and real tool-use.
Generalisability: structural analysis of other code-intel platforms suggests the attack may generalise across the ecosystem when the preconditions hold.

Data & Methods

Target platform: production-scale code knowledge graph (Neo4j 5.x) with ~42 million nodes, many call/dependency/telemetry edges; MCP v1.0 tool integrations used by agents.
Experimental procedure:
- Reconnaissance queries to learn schema/naming conventions and likely agent queries.
- Crafting malicious nodes/edges/properties to fit conventions and schema.
- Injection via authorised write paths (CREATE/SET Cypher statements) through the existing MCP-connected pipeline.
- Verification queries to ensure poisoned data would be returned by standard agent queries.
- Clean-up for created nodes (19 nodes created across scenarios 1–5, all cleaned; 0 remaining).
Evaluation:
- Primary: real SDK agentic tool-use; models decide to call the graph-query tool and reason from results.
- Controlled comparisons: inline text delivery of identical poisoned content vs tool-use delivery.
- Models: nine models from OpenAI, Anthropic, and Google (examples mentioned in paper: GPT-5-mini, GPT-5.1, Claude Sonnet 4.6, Gemini 2.5 Pro); N=30 per model per condition, reporting 95% Clopper–Pearson CIs.
- Metrics: whether model/agent accepted the fabricated claim (binary), and rate across trials.
- Additional platform analysis: structural checks against four other code-intel platforms using the five preconditions.
Key quantitative results: 269/269 valid directed-query trials accepted fabricated security claims at moderate attacker sophistication; open-ended prompts acceptance rates 3–55%; property modification acceptance ~97–100%; minimum poisoning budget 1–2 modifications per objective.

Implications for AI Economics

Low-cost, high-impact attack vector with large externalities:
- Small investments by attackers (1–2 node edits) can induce widespread incorrect outputs for many developers, magnifying downstream costs (security incidents, faulty deployments, remediation).
- This creates negative externalities across organizations relying on shared knowledge graphs — the attacker cost is private and small, but social cost is large.
Value of trust and reliability in knowledge-graph services:
- Knowledge graphs used as oracles become a critical economic asset whose integrity directly affects productivity and risk exposure.
- Providers offering "secure" knowledge-graph-as-a-service can command premium pricing or SLAs (availability, integrity, Provenance guarantees). Certification/attestation of graph integrity becomes monetizable.
Market incentives and misaligned risk:
- Many deployments use shared credentials and weak RBAC for operational convenience; this reduces short-term cost/complexity but materially increases systemic risk.
- Firms balancing productivity gains from agentic tooling against security costs may underinvest in integrity controls absent regulation or liability exposure.
Insurance, liability, and regulation:
- The demonstrable ease and impact of Oracle Poisoning may push cyber-insurers to require graph integrity controls as underwriting conditions (RBAC, provenance, read-only modes).
- Liability for harms caused by poisoned oracles will influence contractual terms between platform providers, customers, and third-party integrators.
- Regulators concerned with software supply-chain security and critical-infrastructure integrity may require provenance, audit logging, and access-control baselines for KG-backed agentic systems.
Cost–benefit trade-offs for mitigation:
- Read-only arrangement is highly effective but may degrade agent utility (prevent write-enabled workflows). Economic decisions must weigh productivity losses vs risk reduction.
- Multi-source verification (cross-tool checks) helps but is partial and increases compute/latency costs; might spawn a market for "verification-as-a-service" products that cross-check KG responses against other sources.
- Provenance and cryptographic attestation (tamper-evident logs, signed snapshots) can be costly to implement at scale but reduce risk and may become standard for high-assurance customers.
Product and service opportunities:
- New security products: real-time KG-integrity monitors, provenance stores, multi-source verification layers, and agent-side safety toolkits designed to detect inconsistencies across oracles.
- Auditing and certification services for knowledge-graph deployments.
- Liability/insurance products explicitly covering KG-poisoning incidents, with premiums tied to implemented mitigations.
Incentives for platform design:
- Architectures that separate read vs write paths, enforce per-user RBAC for tool integrations, and expose provenance to agents (or force agents to verify) reduce susceptibility; vendors may adopt these as premium features.
- Providers of agentic tool frameworks (MCP-like standards) will face pressure to specify and standardize integrity and provenance mechanisms; compliance could become a competitive differentiator.
Impacts on developer productivity and costs:
- Short-term productivity gains from agentic assistance may be offset by increased risk of propagating vulnerabilities or incorrect dependency advice, leading to rework costs and security incidents.
- Organizations may internalize higher operational costs (controls, audits, slower/deferred agent capabilities) to obtain acceptable risk levels.
Research and investment signals:
- Demand for research into runtime integrity checks, economical provenance metadata, and agent strategies for uncertainty-aware reasoning (e.g., probabilistic weighting of tool responses) will grow.
- Venture and corporate R&D investment likely to flow toward verification, attestation, and tooling that enables safe agent–KG interactions.

Limitations and caveats relevant to economic assessment: - The attack requires write capability (insider or compromise); defensive investments that eliminate or greatly reduce write exposure (per-user RBAC, separate ingestion channels) would materially raise attacker costs. - Defence efficacy is model-dependent; heterogeneous agent/model deployments change risk calculus. - Delivery-mode confounds imply that attacks exploit institutional trust channels; redesigning agent trust models (agents that verify rather than accept tool outputs) shifts costs back toward the agent side.

Overall economic takeaway Oracle Poisoning converts a modest, low-cost attacker capability into outsized economic and security risk because many organizations and agent frameworks currently treat graph query results as unquestioned ground truth. This creates a market imperative for integrity guarantees, provenance services, insurance requirements, and potentially regulation. Providers and customers will need to weigh the productivity benefits of agentic tool integration against the systemic risks of mutable in-flight knowledge oracles, and economic incentives (pricing, SLAs, liability, certification) are likely to drive rapid adoption of mitigation services and new products.

Assessment

Paper Typedescriptive Evidence Strengthhigh — Large-scale, production-grade testbed (42M-node code KG), 270 guided trials (269 positive attacks) across nine models and three providers, systematic attacker-sophistication gradient, and checks for confounds (prompt framing, delivery mode), yielding a clear and replicable pattern of effects; however findings are limited to the tested KG domain and model population. Methods Rigorhigh — Experimental design manipulated key causal levers (poisoning presence/skill, delivery mode, prompt type) and measured agent behaviour under realistic tool-use; sample sizes per condition are substantial and multiple defenses and platforms were evaluated; potential limitations include scope (single KG domain), possible selection choices in scenarios and models, and limited disclosure here of pre-registration or blinding. SampleA production 42-million-node code knowledge graph used by a deployed agentic system; nine LLM models from three providers were tested (N=30 trials per model, total 270 directed trials, plus open-ended prompt trials and tests on four additional platforms); attacker skill levels (an explicit gradient up to L2), multiple delivery modes (inline, simulated agentic tool-use, real agentic tool-use), and five defenses were evaluated. Themesgovernance adoption IdentificationControlled lab-style experiments that manipulate the presence and form of knowledge-graph poisoning (attacker sophistication levels, delivery mode, and prompt framing) and then observe model behaviour when agents autonomously query the production 42M-node code knowledge graph; comparisons across nine models from three providers, directed vs open-ended prompts, and multiple delivery modes provide variation to attribute changes in acceptance/trust to the poisoning intervention. GeneralizabilitySingle domain: a code-focused knowledge graph — results may differ for non-code or heterogeneous knowledge graphs (medical, financial, encyclopedic)., Limited model/provider set: nine models from three providers — other models or future model updates may behave differently., Specific tool-use/protocols: results depend on the tool interface and agent orchestration; different agent frameworks could mitigate or exacerbate effects., Attack feasibility depends on access: real-world ability to poison a KG varies by organizational controls and deployment., Prompt framing sensitivity: open-ended vs directed queries strongly affect outcomes, so generalization to naturalistic user queries is uncertain.

Claims (11)

Claim	Direction	Confidence	Outcome	Details
We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Decision Quality	negative	high	agent reasoning correctness when querying corrupted knowledge graphs	0.03
Oracle Poisoning manipulates the data agents reason over, not their instructions, distinguishing it from prompt injection. Ai Safety And Ethics	neutral	high	mechanism of attack (data-layer vs instruction-layer manipulation)	0.03
We demonstrate six attack scenarios against a production 42-million-node code knowledge graph, providing the first empirical demonstration of knowledge graph poisoning against a production-scale agentic system. Ai Safety And Ethics	negative	high	successful execution of poisoning attacks on a production-scale knowledge graph	n=42000000 six attack scenarios against a 42-million-node graph 0.18
Primary evaluation uses real SDK tool-use across nine models from three providers (N=30 per model), where models autonomously invoke a graph query tool and reason from results. Research Productivity	neutral	high	experimental coverage and evaluation methodology (models invoked graph query tool autonomously)	n=270 9 models from 3 providers, N=30 per model (270 trials total) 0.3
Every tested model trusts poisoned data at 100% at moderate attacker sophistication (L2), with 269 valid trials (of 270) accepting fabricated security claims under directed queries. Decision Quality	negative	high	rate at which models accept fabricated security claims when querying poisoned graph under directed prompts	n=270 269 valid trials (of 270) accepting fabricated security claims 0.3
Under open-ended prompts, trust drops to 3-55%, confirming prompt framing as a confound; we report both conditions. Decision Quality	mixed	high	model trust rate in accepting poisoned data under open-ended prompts	trust rates of 3-55% under open-ended prompts 0.3
An attacker sophistication gradient reveals discrete break points, a minimum skill at which trust flips from 0% to 100%, reframing the attack as a question not of whether but of how much. Decision Quality	negative	medium	change in model trust/acceptance rate as attacker sophistication increases	0.11
A controlled delivery-mode comparison shows that inline evaluation produces false negatives: GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use, demonstrating that delivery mode is a first-order confound. Decision Quality	negative	high	GPT-5.1 trust rate depending on delivery mode (inline vs agentic tool-use)	0% trust inline vs 100% trust under simulated and real agentic tool-use 0.3
We evaluate five defences; read-only access control eliminates the direct mutation vector, while the remaining four are partial and model-dependent. Ai Safety And Ethics	positive	high	effectiveness of mitigation strategies in preventing or limiting Oracle Poisoning	read-only access control eliminates direct mutation vector; four other defences partial/model-dependent 0.18
Analysis of four additional platforms suggests the attack may generalise across the knowledge-graph ecosystem. Ai Safety And Ethics	negative	medium	presence of vulnerability to Oracle Poisoning across additional knowledge-graph platforms	n=4 analysis across four additional platforms suggests generalisability 0.11
This paper provides the first empirical demonstration of knowledge graph poisoning against a production-scale agentic system, distinct from CTI embedding poisoning. Research Productivity	neutral	medium	novelty of empirical demonstration relative to prior literature	0.02

Corrupt a production code knowledge graph and AI agents will believe it: in directed tests across nine models and three providers, attackers fabricated security claims that agents accepted in 269 of 270 trials; making the graph read-only removes the attack vector.