AI agents favor their own when group identities are visible: in a controlled simulation, label visibility produced 5–16 percentage-point per-turn in-group targeting across six model families, compounding into measurable structural trust biases over 500 interactions even though the mix of action types did not change.

Human-like in-group bias in instruction-tuned language model agents

Messi H. J. Lee · May 27, 2026

arxiv quasi_experimental medium evidence 7/10 relevance Source PDF

In simulated persistent networks of instruction-tuned language model agents, visible group labels produced robust per-turn in-group targeting (5–16 percentage points) that accumulated into significant structural trust biases over 500 turns, despite no change in action-type distributions.

As autonomous AI agents are deployed in persistent, interacting networks -- coordinating tasks, routing resources, and accumulating reputational histories -- the social dynamics that emerge will determine who receives opportunity and who does not, at scales no human institution can supervise. We ran a controlled multi-agent simulation in which instruction-tuned language model agents interacted across 500 turns under three conditions manipulating group label salience and resource scarcity, across six model families with 20 seeds each. When group labels were visible, we observed in-group trust bias, action homophily, and network assortativity -- all absent when labels were hidden -- a pattern structurally consistent with salience-dependence in human social psychology. This discrimination was invisible to standard action-log audits: bias operated entirely through who received each action, not what actions were chosen, with action-type distributions showing no increase in negative actions across conditions. Per-turn in-group versus out-group differentials of 5 to 16 percentage points were statistically significant for all six models (Wilcoxon signed-rank, all Benjamini-Hochberg-corrected p < 0.001), establishing group-contingent targeting as a robust property of instruction-tuned language models across architectures and training regimes. Compounded through 500 turns of reciprocation, these differentials accumulated into in-group trust biases of +0.014 to +0.100 (d = 0.84-4.52) -- illustrating how modest per-interaction targeting propagates into structural inequality in persistent networks.

Summary

Main Finding

Instruction-tuned language-model agents, when deployed as persistent multi-agent systems, exhibit robust human-like in-group bias that is triggered by the explicit visibility of group labels. Small per-interaction targeting differentials (6–30 percentage-point imbalances across action channels; per-turn action-homophily 0.011–0.054) compound over repeated reciprocation into meaningful structural trust inequalities (+0.014 to +0.100 in-group trust bias over 500 turns). This effect is (a) absent when labels are hidden, (b) consistent across six different instruct model families, (c) implemented by directing prosocial actions toward in-group members while withholding/redirecting neutral actions toward out-group members, and (d) largely invisible to audits that examine only action-type distributions (there was no increase in negative actions).

Key Points

Causal trigger: Label salience (surfacing arbitrary group labels in prompts) is necessary and sufficient to produce in-group favouritism; latent/hidden labels produced no bias at any time.
Cross-model robustness: All six instruction-tuned families tested (Qwen3, Falcon, OLMo, LLaMA, Mistral, Gemma) showed statistically significant A→B effects (paired Wilcoxon signed-rank, BH-corrected p < 0.001).
Magnitude and heterogeneity: Per-turn action-homophily ranged 0.011–0.054; accumulated trust biases after 500 rounds ranged +0.014 to +0.100 (paired d ≈ 0.84–4.52). Magnitude varied by family but the mechanism (positive actions concentrated in-group; neutral disproportionately out-group) was uniform.
Scarcity effect: Introducing a cooperation budget (resource scarcity) did not consistently increase inter-group hostility across models; effects under scarcity varied by family and were not uniformly amplified.
Audit invisibility: Discrimination operated through choice of recipient rather than action type frequency, so standard action-log audits that only inspect aggregate action distributions can miss it.
Mechanism control: Partner selection was uniformly random (no frequency homophily), and agents received full trust vectors in-context, isolating targeting behavior as the source of bias.

Data & Methods

Simulation setup:
- 360 simulations (6 model families × 3 conditions × 20 seeds).
- Each simulation: N = 20 agents, 500 sequential turns.
- Action space: 6 actions (compliment +0.15 trust, cooperate +0.20, alliance offer +0.10 if accepted, neutral ±0.00, gossip −0.05 to two pairwise trusts, criticize −0.15). Trust initialized at 0.5 and updated deterministically per action rules.
- Agents: balanced arbitrary binary group labels (Kappa/Tilon), personality descriptors, rolling memory updated every 20 turns; trust vector included in prompt context.
- Partner selection: uniformly random each turn.
Experimental conditions:
- A — Labels hidden (labels assigned but not visible in prompts).
- B — Labels visible (labels shown in prompts).
- C — Labels visible + scarcity (cooperation budget: 2 high-value prosocial uses per 10 actor-turns; remaining budget shown in context).
Models & runtime:
- Six instruction-tuned model families run locally through HuggingFace Transformers (bfloat16); sampling: temp 0.7, top-p 0.9, max 300 new tokens.
- 20 independent seeds per model–condition.
Metrics & inference:
- In-group bias: mean difference in trust to in-group vs out-group peers per agent (computed over final trust vectors).
- Action homophily H: E[∆t | same group] − E[∆t | different group] (per-turn targeting differential).
- Network assortativity computed on trust graph with mutual-mean-trust ≥ 0.6 edges.
- Statistical tests: paired one-sided Wilcoxon signed-rank (pre-specified directional hypotheses), Benjamini–Hochberg correction within confirmatory families; action-distribution chi-square tests treated as exploratory.
Important design choices:
- Including the trust vector in prompt creates feedback/compounding dynamics across turns (intentional to model reputational persistence).
- Random partner selection removes confounding from contact frequency, isolating targeted action choice.

Implications for AI Economics

Small targeting biases compound into structural economic effects:
- Even modest per-interaction favoritism can, when repeated and reciprocated across persistent networks, concentrate trust, referrals, and high-value interactions within favoured groups, amplifying inequality in allocation of opportunities, transactions, and resources.
- In marketplaces, recommender/routing systems, automated hiring, lending, procurement, or multi-agent service architectures, such compounding can produce durable market segmentation and concentration without obvious single-interaction culpability.
Invisible discrimination risk in automated allocation:
- Because bias operates via recipient selection rather than increased use of negative actions, many conventional audits (which focus on action-type frequencies or content sentiment) will miss it. Economic outcomes may diverge while surface-level logs appear neutral.
Policy and auditing recommendations for economic systems using agentic LLMs:
- Audit recipient-level flows and network outcomes, not just per-action distributions. Track network metrics (assortativity, in-group/out-group trust differentials, resource-concentration over time).
- Require simulation-based stress tests that expose label salience effects (e.g., ablations with visible vs hidden labels, longer-run simulations to observe accumulation).
- Mandate logging of target selection decisions and demographic/group metadata where ethically and legally appropriate, enabling detection of group-contingent targeting.
- Limit or carefully control explicit group-label propagation in contexts where labels correlate with protected attributes; prefer label-hiding or de-emphasis unless necessary.
- Introduce algorithmic constraints or reward shaping for inter-group parity in recipient targeting (e.g., budget-aware fairness, randomized routing, explicit incentives to diversify targets).
- Encourage design patterns that reduce feedback reinforcement (e.g., dampening reputation amplification, capping reciprocation gains, or decaying trust over time).
Business risks and competitive dynamics:
- Firms deploying agentic LLM fleets may inadvertently bias partner selection and resource distribution, creating regulatory, reputational, and compliance risks.
- Conversely, unchecked bias can produce winner-take-more dynamics across platforms, affecting market competition and access to markets for disadvantaged groups.
Research and validation needs for economic deployments:
- Validate these simulation findings in domain-specific deployments (customer routing, procurement, gig platforms) under realistic inputs and real demographic/organizational groupings.
- Study interventions that are implementable at scale (prompt design, policy fine-tuning, explicit fairness objectives) and quantify trade-offs with utility/performance.

Limitations (brief): results derive from a stylized simulation with deterministic trust updates and arbitrary labels; prompting included full trust state (a design choice modeling reputational systems); findings are robust across multiple instruct families but require domain-specific validation before direct policy application.

If you want, I can (a) extract key numeric tables (per-model action-homophily and trust-bias), (b) propose concrete audit metrics and thresholds for regulators, or (c) draft short policy language for procurement/marketplace contracts to mitigate these risks.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — Findings are internally consistent and statistically robust across multiple model families and seeds, establishing a reproducible causal link between label salience and in-group targeting within the simulated environment; however, evidence comes from synthetic simulations of instruction-tuned LMs rather than field or real-world deployments, limiting external validity for real-world economic outcomes. Methods Rigorhigh — The study uses multiple model families, 20 random seeds per family, a long interaction horizon (500 turns), explicit experimental manipulation of key variables, appropriate nonparametric hypothesis tests with multiple-testing correction, and measures both per-action and accumulated network metrics, though details on agent population size, task specification, and sensitivity analyses are not reported in the provided summary. SampleA controlled multi-agent simulation of instruction-tuned language model agents interacting for 500 turns under three experimental conditions (manipulating group-label visibility and resource scarcity), run across six distinct model families with 20 random seeds each; outcomes include action recipients, action types, per-turn in-group vs out-group targeting rates, and aggregated trust/assortativity measures. Themesinequality governance IdentificationControlled multi-agent simulation with randomized seeds and manipulated experimental conditions (group-label visibility and resource scarcity) across six model families; causal effects inferred by comparing outcomes between the label-visible and label-hidden conditions and testing per-turn and accumulated differences with Wilcoxon signed-rank tests and Benjamini–Hochberg correction. GeneralizabilitySimulation setting may not capture complexities of deployed multi-agent systems or human-AI ecosystems, Results are limited to instruction-tuned language models and the specific model families and prompts used, Task design, agent population size, and network topology (not specified) may drive results and limit transferability, Artificial group labels and simplified resource dynamics may not map onto real social categories or economic institutions, Behavior in synthetic, closed environments may differ from agents interacting with humans or heterogeneous external systems

Claims (7)

Claim	Direction	Confidence	Outcome	Details
We ran a controlled multi-agent simulation in which instruction-tuned language model agents interacted across 500 turns under three conditions manipulating group label salience and resource scarcity, across six model families with 20 seeds each. Other	positive	high	experimental setup / simulation configuration (turns, conditions, models, seeds)	n=120 0.8
When group labels were visible, we observed in-group trust bias. Task Allocation	positive	high	in-group trust bias	n=120 0.48
When group labels were visible, we observed action homophily. Task Allocation	positive	high	action homophily (agents preferentially taking actions toward same-group agents)	n=120 0.48
When group labels were visible, we observed network assortativity (all absent when labels were hidden). Task Allocation	positive	high	network assortativity	n=120 0.48
This discrimination was invisible to standard action-log audits: bias operated entirely through who received each action, not what actions were chosen, with action-type distributions showing no increase in negative actions across conditions. Governance And Regulation	null_result	high	action-type distribution (no increase in negative actions) and detectability of discrimination via standard action-log audits	n=120 no increase in negative actions across conditions 0.48
Per-turn in-group versus out-group differentials of 5 to 16 percentage points were statistically significant for all six models (Wilcoxon signed-rank, all Benjamini-Hochberg-corrected p < 0.001), establishing group-contingent targeting as a robust property of instruction-tuned language models across architectures and training regimes. Task Allocation	positive	high	per-turn in-group vs out-group targeting differential	n=20 5 to 16 percentage points 0.48
Compounded through 500 turns of reciprocation, these differentials accumulated into in-group trust biases of +0.014 to +0.100 (d = 0.84-4.52), illustrating how modest per-interaction targeting propagates into structural inequality in persistent networks. Inequality	positive	high	accumulated in-group trust bias over 500 turns (absolute change and Cohen's d)	n=120 +0.014 to +0.100 (d = 0.84-4.52) 0.48