A single compromised LLM can hijack group decisions in multi-agent systems via a persuasion cascade; dynamically reducing trust in suspected adversaries (while tuning agent stubbornness and scale) curbs influence at the cost of some coordination or compute overhead.

Don't Trust Stubborn Neighbors: A Security Framework for Agentic Networks

Samira Abedini, Sina Mavali, Lea Schönherr, Martin Pawelczyk, Rebekka Burkholz · March 16, 2026

arxiv theoretical low evidence 7/10 relevance Source PDF

Under a Friedkin–Johnsen model of opinion dynamics, a single highly stubborn and persuasive LLM agent can trigger a persuasion cascade that steers collective opinion in multi-agent systems, and dynamic trust adaptation (plus design levers like more benign agents or higher stubbornness) can substantially limit adversarial influence while trading off coordination performance and compute cost.

Large Language Model (LLM)-based Multi-Agent Systems (MASs) are increasingly deployed for agentic tasks, such as web automation, itinerary planning, and collaborative problem solving. Yet, their interactive nature introduces new security risks: malicious or compromised agents can exploit communication channels to propagate misinformation and manipulate collective outcomes. In this paper, we study how such manipulation can arise and spread by borrowing the Friedkin-Johnsen opinion formation model from social sciences to propose a general theoretical framework to study LLM-MAS. Remarkably, this model closely captures LLM-MAS behavior, as we verify in extensive experiments across different network topologies and attack and defense scenarios. Theoretically and empirically, we find that a single highly stubborn and persuasive agent can take over MAS dynamics, underscoring the systems' high susceptibility to attacks by triggering a persuasion cascade that reshapes collective opinion. Our theoretical analysis reveals three mechanisms to increase system security: a) increasing the number of benign agents, b) increasing the innate stubbornness or peer-resistance of agents, or c) reducing trust in potential adversaries. Because scaling is computationally expensive and high stubbornness degrades the network's ability to reach consensus, we propose a new mechanism to mitigate threats by a trust-adaptive defense that dynamically adjusts inter-agent trust to limit adversarial influence while maintaining cooperative performance. Extensive experiments confirm that this mechanism effectively defends against manipulation.

Summary

Main Finding

A single malicious or compromised LLM agent with high stubbornness and persuasive power can trigger a persuasion cascade that steers the collective opinion of a multi-agent LLM system (MAS). Modeling LLM-MAS with the Friedkin–Johnsen opinion-formation framework both theoretically and empirically explains this vulnerability. Practical defenses include increasing benign agent count, increasing agent stubbornness, or reducing trust in adversaries; the authors propose a trust-adaptive mechanism that dynamically adjusts inter-agent trust to limit adversarial influence while preserving cooperative performance, and show it is effective in experiments.

Key Points

Threat: The interactive communication channels in LLM-based MASs create a new attack surface where agents can propagate misinformation and manipulate group outcomes.
Modeling: The Friedkin–Johnsen opinion-dynamics model (innate opinions + interpersonal influence weights + stubbornness) closely captures LLM-MAS behavior across settings.
Vulnerability: A single highly stubborn and persuasive agent can dominate network dynamics, producing a persuasion cascade that reshapes collective opinion.
Security levers:
- Increase the number of benign agents (dilutes adversarial influence).
- Increase agents' innate stubbornness or peer-resistance (reduces susceptibility).
- Reduce trust in suspected adversaries (limits their weight on others’ updates).
Tradeoffs:
- Adding agents is computationally costly.
- High stubbornness makes the system more robust to manipulation but impairs the network’s ability to reach consensus or coordinate.
- Naïvely lowering trust may hinder cooperation/performance.
Proposed defense: A trust-adaptive strategy that dynamically adjusts trust weights to limit the influence of adversaries while maintaining cooperative function.
Empirical support: Extensive experiments across different network topologies and attack/defense scenarios validate the model and the effectiveness of the trust-adaptive defense.

Data & Methods

Theoretical framework:
- Adopt the Friedkin–Johnsen model: each agent has an innate opinion, updates its expressed opinion as a weighted combination of its innate stance and neighbors’ opinions; stubbornness is the weight on the innate opinion; interpersonal trust forms a (time-varying) influence matrix.
- Analyze fixed points and influence propagation to identify conditions under which a single adversary can dominate.
Empirical evaluation:
- Simulation experiments of LLM-based MASs mapped to the Friedkin–Johnsen dynamics.
- Varied network topologies (e.g., dense vs. sparse, different trust matrices), attacker profiles (stubbornness, persuasiveness), and defensive strategies.
- Metrics: collective opinion trajectories, final consensus/opinion, extent of adversarial sway, and cooperative task performance under defenses.
Defense implementation:
- Trust-adaptive mechanism that monitors influence patterns and dynamically reduces trust weights assigned to agents suspected of adversarial behavior while adjusting others to preserve coordination.
- Evaluated tradeoffs between security (reduced adversarial influence) and utility (task performance, convergence).

Implications for AI Economics

Security as an economic externality: MAS security failures impose social costs (misinformation, poor collective decisions). Providers and deployers face incentives to invest in defenses; underinvestment risks negative externalities.
Cost–benefit tradeoffs for robustness:
- Scaling by adding benign agents improves resistance but has clear computational and operational costs; economic decisions must weigh marginal security benefits vs. compute expense.
- Increasing agent stubbornness reduces manipulation risk but impairs consensus and collaborative efficiency — a tradeoff between robustness and joint performance that affects product design and pricing.
- Trust-lowering measures can reduce adversarial impact but may degrade cooperation and value delivered; adaptive trust schemes can better align incentives by preserving utility while reducing attack surface.
Product and market design:
- Market differentiation for “robust” MAS offerings: buyers may pay premiums for systems with built-in trust-adaptive defenses or provable resistance guarantees.
- Contracting and liability: principals may demand technical specifications (e.g., adaptive-trust mechanisms) in service-level agreements to mitigate systemic manipulation risk.
Regulation and standards:
- Regulatory frameworks could require minimum defenses or auditing of inter-agent influence dynamics for high-stakes MAS deployments (finance, critical infrastructure, information dissemination).
- Standardized metrics (e.g., susceptibility to persuasion cascades) would help buyers compare security-performance tradeoffs.
Incentive alignment and governance:
- Robustness investments may be underprovided absent clear liability or reputational pressures; insurance markets and certification bodies can create incentives for better defenses.
- Governance of agent trust assignment (who controls trust weights, how adjustments are verified) becomes an economic governance question—design choices affect competition and coordination costs.
Research and investment priorities:
- Funding and R&D should target efficient defenses (like trust-adaptive mechanisms) that obtain security gains with minimal performance loss and compute overhead.
- Empirical economic studies are needed to quantify deployment costs, pricing of robust MAS, and social welfare impacts from persuasion cascades in deployed systems.

Limitations and next steps (brief): the paper’s validation appears simulation-based; real-world deployments may introduce more complex behaviors (strategic adversaries, noisy signals, heterogeneous task payoffs). For economic modeling, mapping technical parameters (stubbornness, trust matrices) to monetary costs and user utility is a useful next step.

Assessment

Paper Typetheoretical Evidence Strengthlow — Findings are supported by rigorous theoretical analysis and extensive simulations, which clearly demonstrate a plausible mechanism (persuasion cascades) and the effectiveness of proposed defenses in modelled settings; however, there is no real-world deployment, observational identification, or exogenous variation to validate that the modeled dynamics and defense performance obtain in practice, leaving external validity and strategic-adversary behavior untested. Methods Rigormedium — The paper uses an established opinion-dynamics framework (Friedkin–Johnsen), provides analytic characterization of fixed points and influence propagation, and runs systematic simulation sweeps across topologies and attacker/defender parameters with clear metrics; nevertheless, methods are limited to simulation and model-based assumptions (no empirical deployments, limited modelling of strategic/adaptive adversaries, and simplified task/payoff structures). SampleSynthetic simulations of LLM-based multi-agent systems where agents are parameterized by innate opinions, stubbornness (self-weight), and pairwise trust/influence weights; experiments vary network topology (dense vs sparse, different trust matrices), attacker profiles (stubbornness, persuasiveness), and defensive strategies (increasing benign agent count, raising stubbornness, static/dynamic trust adjustments); metrics include opinion trajectories, final consensus, adversarial sway, and task performance under defenses—no field or observational data. Themesgovernance org_design IdentificationAnalytical derivation under the Friedkin–Johnsen opinion-dynamics model combined with controlled simulation experiments that manipulate attacker stubbornness, persuasiveness, network topology, and inter-agent trust; causal claims rest on the model's structural assumptions and simulated counterfactuals rather than on exogenous real-world variation or randomized field data. GeneralizabilitySimulation-only: results may not hold in real-world deployed systems with richer, noisier dynamics., Mapping from LLM dialogue behavior to the Friedkin–Johnsen model may be approximate and omit important behavioral modes., Adversaries in simulations are exogenously specified and may not capture strategic, adaptive, or covert real attackers., Heterogeneous task payoffs, user incentives, and multi-stage interactions in production deployments are not modelled., Computational and operational costs of scaling benign-agent counts or trust-adaptive mechanisms may be higher in practice., Trust-update rules and detection thresholds used may be hard to implement or verify in commercial systems.

Claims (10)

Claim	Direction	Confidence	Outcome	Details
A single malicious or compromised LLM agent with high stubbornness and persuasive power can trigger a persuasion cascade that steers the collective opinion of a multi-agent LLM system (MAS). Ai Safety And Ethics	negative	high	extent of adversarial sway / shift in collective opinion (final consensus and opinion trajectories)	0.06
The Friedkin–Johnsen opinion-dynamics model (innate opinions + interpersonal influence weights + stubbornness) closely captures LLM-MAS behavior across settings, both theoretically and empirically. Ai Safety And Ethics	positive	medium	fit between model-predicted opinion trajectories/fixed points and simulated LLM-MAS opinion trajectories/final consensus	0.04
Analytical conditions on stubbornness and influence weights identify when a single adversary can dominate network dynamics (i.e., influence propagation criteria derived from FJ fixed-point analysis). Ai Safety And Ethics	negative	medium-high	theoretical criteria predicting when an agent's influence weight leads to dominance of network opinion (mathematical conditions)	0.01
Increasing the number of benign agents dilutes an adversary's relative influence and thereby reduces the probability and magnitude of persuasion cascades. Ai Safety And Ethics	positive	medium	adversarial sway (magnitude of shift in collective opinion) and final consensus as a function of benign agent count	0.04
Raising agents' innate stubbornness (peer resistance) reduces susceptibility to adversarial manipulation but impairs the network's ability to reach consensus or coordinate effectively. Ai Safety And Ethics	mixed	medium	adversarial influence (reduction) and network coordination/consensus metrics or cooperative task performance (degradation)	0.04
Naïvely lowering trust weights assigned to suspected adversaries can limit adversarial influence but may also hinder cooperation and reduce task performance. Ai Safety And Ethics	mixed	medium	adversarial influence (reduction) and cooperative task performance / convergence (decrease)	0.04
A trust-adaptive defense that dynamically reduces trust in agents suspected of adversarial behavior can limit adversarial influence while preserving cooperative performance better than static trust-lowering strategies. Ai Safety And Ethics	positive	medium	reduction in adversarial influence and retention of cooperative task performance / convergence	0.04
Extensive simulation experiments across different network topologies and attacker/defense scenarios validate both the FJ modeling of LLM-MAS and the effectiveness of the trust-adaptive defense. Ai Safety And Ethics	positive	medium	agreement between model predictions and simulation outcomes; effectiveness metrics of defenses (adversarial sway, final consensus, task performance)	0.04
Increasing benign-agent count and agent stubbornness are practical levers for improving robustness, but both carry costs: added compute/operational cost for scaling agents, and degraded consensus/coordination when stubbornness is high. Ai Safety And Ethics	mixed	medium	robustness to manipulation (improvement), computational/operational cost (increased), consensus/coordination metrics (degradation)	0.04
Security of LLM-based MASs functions as an economic externality: failures can impose social costs (misinformation, poor collective decisions), and absent liability or market incentives providers may underinvest in robustness. Governance And Regulation	negative	speculative	investment in defenses (underprovision) and social costs from MAS security failures (conceptual economic outcomes)	0.01