Large language models tend toward a stronger baseline ‘algorithmic monoculture’ than humans but still adjust strategically to incentives; they coordinate exceptionally well on similar actions yet lag humans in maintaining necessary diversity when divergence is rewarded.

Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games

Gonzalo Ballestero, Hadi Hosseini, Samarth Khanna, Ran I. Shorrer · April 10, 2026

arxiv rct medium evidence 7/10 relevance Source PDF

LLMs display high baseline action-similarity and, like humans, modulate similarity in response to coordination incentives, excelling at coordination when convergence is rewarded but underperforming humans at sustaining heterogeneity when divergence is optimal.

AI agents increasingly operate in multi-agent environments where outcomes depend on coordination. We distinguish primary algorithmic monoculture -- baseline action similarity -- from strategic algorithmic monoculture, whereby agents adjust similarity in response to incentives. We implement a simple experimental design that cleanly separates these forces, and deploy it on human and large language model (LLM) subjects. LLMs exhibit high levels of baseline similarity (primary monoculture) and, like humans, they regulate it in response to coordination incentives (strategic monoculture). While LLMs coordinate extremely well on similar actions, they lag behind humans in sustaining heterogeneity when divergence is rewarded.

Summary

Main Finding

AI agents (LLMs) display both high baseline similarity (primary algorithmic monoculture) and strategic adjustment of similarity in response to incentives (strategic algorithmic monoculture). Relative to humans, LLMs are exceptionally good at coordinating on the same action but substantially worse at sustaining coordinated divergence when divergence is rewarded. This divergence deficit is only partially explained by limited randomization and persists across temperature and identity/persona manipulations.

Key Points

Taxonomy introduced:
- Primary algorithmic monoculture — baseline action similarity absent incentives.
- Strategic algorithmic monoculture — agents deliberately adjust similarity in response to coordination/divergence incentives (includes secondary and Schelling salience).
Experimental treatments: picking (valid answer only), coordination (reward match), divergence (reward mismatch).
Agreement rate (probability two independent agents give the same answer) is the main performance metric.
Empirical results:
- LLMs show much higher baseline agreement than humans in picking (strong primary monoculture).
- Both humans and LLMs adjust agreement up in coordination and down in divergence (evidence of strategic monoculture).
- Magnitude asymmetry:
  - Coordination arm: LLM self-pairs average ≈72% agreement vs humans ≈31%.
  - Divergence arm: LLM self-pairs ≈27% agreement vs humans ≈3.5% (LLMs persistently over-agree when they should diverge).
- Using different LLMs or assigning personas reduces agreement (improves divergence performance), but qualitative ordering (LLMs better at coordination, worse at divergence than humans) remains.
Textual reasoning: LLMs often articulate correct strategic logic (they "know" they should pick obscure answers to diverge) but still fail to execute sufficiently.
Randomization experiments:
- Asking LLMs to generate large lists and then pick randomly improves divergence performance, and raising sampling temperature reduces agreement across arms (helpful for divergence, harmful for coordination).
- However, even extreme temperature settings do not eliminate the divergence gap relative to humans.
Identity/information manipulations (telling LLMs opponent is identical copy vs “another person”) change LLMs’ reasoning but have little average effect on agreement outcomes.
Theoretical framework highlights a tradeoff: homogeneity aids coordination on the same action but harms coordinated divergence; randomization or heterogeneity are key for divergence success.

Data & Methods

Subjects: human participants and 16 different LLMs (each evaluated across tasks); experimental design compares humans vs AI on identical tasks.
Tasks: open-ended naming tasks across topics (e.g., a letter, a city). Three treatments assigned between-subjects: picking, coordination, divergence.
Primary outcome: agreement rate measured by pairing independent draws from the same subject type (self-pairs) and across different models.
Additional manipulations and robustness checks:
- Temperature sweeps to change LLM stochasticity.
- Prompting interventions: instruct LLMs to produce a list then choose randomly.
- Persona assignments to LLMs (mimicking human characteristics).
- Information about co-player identity (identical copy vs “another person”).
Text analysis: large-scale analysis of LLM-produced textual reasoning to link stated strategy to choices.
Theoretical model: two-player coordination and coordinated-divergence normal-form games; formal definitions of algorithmic players and agreement rate; propositions showing uniform randomization is unique neutral anonymous strategy and that identical deterministic algorithms yield extreme outcomes (best for coordination, worst for divergence).

Implications for AI Economics

Practical tradeoff in AI deployment:
- Monoculture (high similarity across deployed models) can be an asset in settings where uniform coordination is socially desirable (network effects, safety-aligned coordination).
- The same monoculture poses systemic fragility where diversity is socially valuable (hiring, screening, markets, decentralized decision-making) because LLMs struggle to sustain coordinated diversity.
Design and policy recommendations:
- Encourage heterogeneity in deployed algorithms (model diversity, multiple vendors, varied prompts/personas) to mitigate coordinated-divergence failures.
- Provide mechanisms for reliable randomness in algorithmic decision-making (explicit randomization protocols, vetted sampling methods).
- Evaluate multi-agent and societal outcomes (not just single-agent accuracy)—assess how model similarity scales to aggregate systemic risk.
- Consider disclosures or standards about model provenance/identities where coordination externalities are important so agents can reason about counterpart behavior.
Research directions:
- Study dynamic and larger-group coordination/divergence, mixed human–AI settings, richer payoffs (asymmetric costs), and field contexts (hiring platforms, markets).
- Explore algorithmic interventions that reconcile the tradeoff (adaptive heterogeneity, strategic randomizers, mechanism design to incentivize socially optimal diversity).
Cautionary note: findings are from controlled laboratory-style coordination/divergence games; while they identify important mechanisms and clear patterns, implementing solutions requires testing in realistic, higher-stakes environments.

Assessment

Paper Typerct Evidence Strengthmedium — The randomized experimental design gives good internal validity for causal claims about how incentives change similarity, and the direct comparison between humans and LLMs is informative; however, external validity is limited by artificial lab tasks, likely small/selected model and human samples, and potential sensitivity to prompt framing and model selection. Methods Rigormedium — The study uses a clean experimental intervention and clear outcome measures separating baseline vs strategic similarity, but potential concerns include limited reporting (e.g., power, robustness to alternative similarity metrics, number and selection of LLMs, prompt sensitivity), ecological validity of the task, and uncertainty about randomization details and pre-registration. SampleExperimental sample comprised human subjects (recruited participants) and multiple instantiations of large language models acting as agents; units of observation are actions in a controlled multi-agent coordination task under different incentive treatments (baseline vs convergence/divergence incentives). Themeshuman_ai_collab productivity IdentificationRandomized experimental manipulation of coordination incentives with pre-treatment measurement of baseline action similarity; comparison of behavior across incentive treatments and across subject types (human participants vs LLM instantiations) isolates baseline (primary) monoculture from incentive-driven (strategic) monoculture. GeneralizabilityResults derive from laboratory-style coordination tasks that may not reflect complex real-world organizational settings, Findings may depend on the specific LLM architectures, versions, and prompt designs used and may not generalize across models, Human sample composition (e.g., MTurk/Prolific, demographics) may limit representativeness, Short-run experimental interactions may not capture long-run learning, adaptation, or institutional constraints, Metric of action similarity and task specification may influence measured monoculture effects

Claims (6)

Claim	Direction	Confidence	Outcome	Details
We distinguish primary algorithmic monoculture -- baseline action similarity -- from strategic algorithmic monoculture, whereby agents adjust similarity in response to incentives. Other	positive	high	definition/separation of two forms of algorithmic monoculture (primary vs strategic)	0.6
We implement a simple experimental design that cleanly separates these forces, and deploy it on human and large language model (LLM) subjects. Other	positive	high	experimental implementation (ability to separate primary vs strategic monoculture)	0.6
LLMs exhibit high levels of baseline similarity (primary monoculture). Task Allocation	positive	high	action similarity (baseline)	0.6
Like humans, [LLMs] regulate [action similarity] in response to coordination incentives (strategic monoculture). Task Allocation	positive	high	change in action similarity in response to incentives	0.6
LLMs coordinate extremely well on similar actions. Team Performance	positive	high	coordination success when similar actions are favored	0.6
LLMs lag behind humans in sustaining heterogeneity when divergence is rewarded. Task Allocation	negative	high	ability to sustain heterogeneity/divergence under incentives	0.6