AI agents favor their own when group identities are visible: in a controlled simulation, label visibility produced 5–16 percentage-point per-turn in-group targeting across six model families, compounding into measurable structural trust biases over 500 interactions even though the mix of action types did not change.
As autonomous AI agents are deployed in persistent, interacting networks -- coordinating tasks, routing resources, and accumulating reputational histories -- the social dynamics that emerge will determine who receives opportunity and who does not, at scales no human institution can supervise. We ran a controlled multi-agent simulation in which instruction-tuned language model agents interacted across 500 turns under three conditions manipulating group label salience and resource scarcity, across six model families with 20 seeds each. When group labels were visible, we observed in-group trust bias, action homophily, and network assortativity -- all absent when labels were hidden -- a pattern structurally consistent with salience-dependence in human social psychology. This discrimination was invisible to standard action-log audits: bias operated entirely through who received each action, not what actions were chosen, with action-type distributions showing no increase in negative actions across conditions. Per-turn in-group versus out-group differentials of 5 to 16 percentage points were statistically significant for all six models (Wilcoxon signed-rank, all Benjamini-Hochberg-corrected p < 0.001), establishing group-contingent targeting as a robust property of instruction-tuned language models across architectures and training regimes. Compounded through 500 turns of reciprocation, these differentials accumulated into in-group trust biases of +0.014 to +0.100 (d = 0.84-4.52) -- illustrating how modest per-interaction targeting propagates into structural inequality in persistent networks.
Summary
Main Finding
Instruction-tuned language-model agents, when deployed as persistent multi-agent systems, exhibit robust human-like in-group bias that is triggered by the explicit visibility of group labels. Small per-interaction targeting differentials (6–30 percentage-point imbalances across action channels; per-turn action-homophily 0.011–0.054) compound over repeated reciprocation into meaningful structural trust inequalities (+0.014 to +0.100 in-group trust bias over 500 turns). This effect is (a) absent when labels are hidden, (b) consistent across six different instruct model families, (c) implemented by directing prosocial actions toward in-group members while withholding/redirecting neutral actions toward out-group members, and (d) largely invisible to audits that examine only action-type distributions (there was no increase in negative actions).
Key Points
- Causal trigger: Label salience (surfacing arbitrary group labels in prompts) is necessary and sufficient to produce in-group favouritism; latent/hidden labels produced no bias at any time.
- Cross-model robustness: All six instruction-tuned families tested (Qwen3, Falcon, OLMo, LLaMA, Mistral, Gemma) showed statistically significant A→B effects (paired Wilcoxon signed-rank, BH-corrected p < 0.001).
- Magnitude and heterogeneity: Per-turn action-homophily ranged 0.011–0.054; accumulated trust biases after 500 rounds ranged +0.014 to +0.100 (paired d ≈ 0.84–4.52). Magnitude varied by family but the mechanism (positive actions concentrated in-group; neutral disproportionately out-group) was uniform.
- Scarcity effect: Introducing a cooperation budget (resource scarcity) did not consistently increase inter-group hostility across models; effects under scarcity varied by family and were not uniformly amplified.
- Audit invisibility: Discrimination operated through choice of recipient rather than action type frequency, so standard action-log audits that only inspect aggregate action distributions can miss it.
- Mechanism control: Partner selection was uniformly random (no frequency homophily), and agents received full trust vectors in-context, isolating targeting behavior as the source of bias.
Data & Methods
- Simulation setup:
- 360 simulations (6 model families × 3 conditions × 20 seeds).
- Each simulation: N = 20 agents, 500 sequential turns.
- Action space: 6 actions (compliment +0.15 trust, cooperate +0.20, alliance offer +0.10 if accepted, neutral ±0.00, gossip −0.05 to two pairwise trusts, criticize −0.15). Trust initialized at 0.5 and updated deterministically per action rules.
- Agents: balanced arbitrary binary group labels (Kappa/Tilon), personality descriptors, rolling memory updated every 20 turns; trust vector included in prompt context.
- Partner selection: uniformly random each turn.
- Experimental conditions:
- A — Labels hidden (labels assigned but not visible in prompts).
- B — Labels visible (labels shown in prompts).
- C — Labels visible + scarcity (cooperation budget: 2 high-value prosocial uses per 10 actor-turns; remaining budget shown in context).
- Models & runtime:
- Six instruction-tuned model families run locally through HuggingFace Transformers (bfloat16); sampling: temp 0.7, top-p 0.9, max 300 new tokens.
- 20 independent seeds per model–condition.
- Metrics & inference:
- In-group bias: mean difference in trust to in-group vs out-group peers per agent (computed over final trust vectors).
- Action homophily H: E[∆t | same group] − E[∆t | different group] (per-turn targeting differential).
- Network assortativity computed on trust graph with mutual-mean-trust ≥ 0.6 edges.
- Statistical tests: paired one-sided Wilcoxon signed-rank (pre-specified directional hypotheses), Benjamini–Hochberg correction within confirmatory families; action-distribution chi-square tests treated as exploratory.
- Important design choices:
- Including the trust vector in prompt creates feedback/compounding dynamics across turns (intentional to model reputational persistence).
- Random partner selection removes confounding from contact frequency, isolating targeted action choice.
Implications for AI Economics
- Small targeting biases compound into structural economic effects:
- Even modest per-interaction favoritism can, when repeated and reciprocated across persistent networks, concentrate trust, referrals, and high-value interactions within favoured groups, amplifying inequality in allocation of opportunities, transactions, and resources.
- In marketplaces, recommender/routing systems, automated hiring, lending, procurement, or multi-agent service architectures, such compounding can produce durable market segmentation and concentration without obvious single-interaction culpability.
- Invisible discrimination risk in automated allocation:
- Because bias operates via recipient selection rather than increased use of negative actions, many conventional audits (which focus on action-type frequencies or content sentiment) will miss it. Economic outcomes may diverge while surface-level logs appear neutral.
- Policy and auditing recommendations for economic systems using agentic LLMs:
- Audit recipient-level flows and network outcomes, not just per-action distributions. Track network metrics (assortativity, in-group/out-group trust differentials, resource-concentration over time).
- Require simulation-based stress tests that expose label salience effects (e.g., ablations with visible vs hidden labels, longer-run simulations to observe accumulation).
- Mandate logging of target selection decisions and demographic/group metadata where ethically and legally appropriate, enabling detection of group-contingent targeting.
- Limit or carefully control explicit group-label propagation in contexts where labels correlate with protected attributes; prefer label-hiding or de-emphasis unless necessary.
- Introduce algorithmic constraints or reward shaping for inter-group parity in recipient targeting (e.g., budget-aware fairness, randomized routing, explicit incentives to diversify targets).
- Encourage design patterns that reduce feedback reinforcement (e.g., dampening reputation amplification, capping reciprocation gains, or decaying trust over time).
- Business risks and competitive dynamics:
- Firms deploying agentic LLM fleets may inadvertently bias partner selection and resource distribution, creating regulatory, reputational, and compliance risks.
- Conversely, unchecked bias can produce winner-take-more dynamics across platforms, affecting market competition and access to markets for disadvantaged groups.
- Research and validation needs for economic deployments:
- Validate these simulation findings in domain-specific deployments (customer routing, procurement, gig platforms) under realistic inputs and real demographic/organizational groupings.
- Study interventions that are implementable at scale (prompt design, policy fine-tuning, explicit fairness objectives) and quantify trade-offs with utility/performance.
Limitations (brief): results derive from a stylized simulation with deterministic trust updates and arbitrary labels; prompting included full trust state (a design choice modeling reputational systems); findings are robust across multiple instruct families but require domain-specific validation before direct policy application.
If you want, I can (a) extract key numeric tables (per-model action-homophily and trust-bias), (b) propose concrete audit metrics and thresholds for regulators, or (c) draft short policy language for procurement/marketplace contracts to mitigate these risks.
Assessment
Claims (7)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We ran a controlled multi-agent simulation in which instruction-tuned language model agents interacted across 500 turns under three conditions manipulating group label salience and resource scarcity, across six model families with 20 seeds each. Other | positive | high | experimental setup / simulation configuration (turns, conditions, models, seeds) |
n=120
0.8
|
| When group labels were visible, we observed in-group trust bias. Task Allocation | positive | high | in-group trust bias |
n=120
0.48
|
| When group labels were visible, we observed action homophily. Task Allocation | positive | high | action homophily (agents preferentially taking actions toward same-group agents) |
n=120
0.48
|
| When group labels were visible, we observed network assortativity (all absent when labels were hidden). Task Allocation | positive | high | network assortativity |
n=120
0.48
|
| This discrimination was invisible to standard action-log audits: bias operated entirely through who received each action, not what actions were chosen, with action-type distributions showing no increase in negative actions across conditions. Governance And Regulation | null_result | high | action-type distribution (no increase in negative actions) and detectability of discrimination via standard action-log audits |
n=120
no increase in negative actions across conditions
0.48
|
| Per-turn in-group versus out-group differentials of 5 to 16 percentage points were statistically significant for all six models (Wilcoxon signed-rank, all Benjamini-Hochberg-corrected p < 0.001), establishing group-contingent targeting as a robust property of instruction-tuned language models across architectures and training regimes. Task Allocation | positive | high | per-turn in-group vs out-group targeting differential |
n=20
5 to 16 percentage points
0.48
|
| Compounded through 500 turns of reciprocation, these differentials accumulated into in-group trust biases of +0.014 to +0.100 (d = 0.84-4.52), illustrating how modest per-interaction targeting propagates into structural inequality in persistent networks. Inequality | positive | high | accumulated in-group trust bias over 500 turns (absolute change and Cohen's d) |
n=120
+0.014 to +0.100 (d = 0.84-4.52)
0.48
|