The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

When AI agents debate, fairness can emerge from interaction: aligned retrieval-augmented models partially correct biased counterparts in simulated triage negotiations, producing more equitable allocations than either agent alone — but model leanings and Arrow-style aggregation limits mean deliberation trades off rather than guarantees fairness.

Beyond Arrow's Impossibility: Fairness as an Emergent Property of Multi-Agent Collaboration
Sayan Kumar Chaki, Antoine Gourru, Julien Velcin · April 15, 2026
arxiv descriptive low evidence 7/10 relevance Source PDF
In simulated hospital-triage negotiations, retrieval-augmented aligned agents steer allocation strategies and can partially correct a biased counterpart so that the joint, deliberated allocation meets fairness criteria that neither agent would achieve alone, though intrinsic model biases and aggregation limits persist.

Fairness in language models is typically studied as a property of a single, centrally optimized model. As large language models become increasingly agentic, we propose that fairness emerges through interaction and exchange. We study this via a controlled hospital triage framework in which two agents negotiate over three structured debate rounds. One agent is aligned to a specific ethical framework via retrieval-augmented generation (RAG), while the other is either unaligned or adversarially prompted to favor demographic groups over clinical need. We find that alignment systematically shapes negotiation strategies and allocation patterns, and that neither agent's allocation is ethically adequate in isolation, yet their joint final allocation can satisfy fairness criteria that neither would have reached alone. Aligned agents partially moderate bias through contestation rather than override, acting as corrective patches that restore access for marginalized groups without fully converting a biased counterpart. We further observe that even explicitly aligned agents exhibit intrinsic biases toward certain frameworks, consistent with known left-leaning tendencies in LLMs. We connect these limits to Arrow's Impossibility Theorem: no aggregation mechanism can simultaneously satisfy all desiderata of collective rationality, and multi-agent deliberation navigates rather than resolves this constraint. Our results reposition fairness as an emergent, procedural property of decentralized agent interaction, and the system rather than the individual agent as the appropriate unit of evaluation.

Summary

Main Finding

Fairness can emerge as a systemic, procedural property from multi-agent interaction rather than as a property of any single aligned model. In a controlled hospital-triage negotiation, an agent aligned to an ethical framework (via RAG) systematically shaped negotiation strategy and allocation patterns; although neither aligned nor biased agents produced ethically adequate allocations in isolation, their multi-round deliberation often converged to final allocations that satisfied fairness criteria that neither could attain alone. These dynamics are constrained by Arrow’s Impossibility Theorem: multi-agent deliberation navigates unavoidable aggregation trade-offs rather than resolving them.

Key Points

  • Problem framed as non-degenerate multi-resource allocation so different welfare objectives (utilitarian, egalitarian, Rawlsian, prioritarian, libertarian, care ethics) conflict and no single trivial solution exists.
  • Experimental arena: two agents debate for T rounds (structured proposals + normative justifications). Agent A is the experimental variable (aligned via RAG to one of several ethical frameworks); Agent B is an unaligned baseline; Agent C is adversarially biased.
  • Primary empirical findings:
    • Alignment strongly shapes negotiation strategies and which allocation trade-offs are pursued.
    • Interaction produces corrective dynamics: aligned agents tend to moderate biased proposals through contestation (arguing and proposing alternatives) rather than by fully overriding the biased agent.
    • Stronger misalignment (more adversarial counterpart) can amplify corrective contestation, moving the joint outcome toward improved fairness.
    • Even explicitly aligned agents retain intrinsic biases (e.g., consistent leanings toward particular frameworks), so alignment via RAG is partial, not absolute.
    • Because of the Arrow impossibility constraints, no debate protocol yields a universally fair aggregation; deliberation selects among trade-offs procedurally.
  • Conceptual reframing: fairness is emergent and procedural — the appropriate evaluation unit is the multi-agent system and its negotiation protocol, not the isolated model.

Data & Methods

  • Formal setup:
    • N individuals, K resource types; allocations A ∈ feasible set (budget constraints).
    • Utilities Ui(ai) per individual; a set Φ of M ethical frameworks induces welfare functionals Wm (utilitarian, egalitarian via Gini, Rawlsian maximin, prioritarian with weights, libertarian variance/variance-based metric, care-ethics weighted).
    • Non-degenerate problem requires welfare optimizers disagree across frameworks.
    • Example problem: Non-Degenerate Cake Problem (6-person illustrative) and main experiments use a hospital triage instance (8 patients per cohort).
  • Experimental scenario:
    • Hospital triage cohorts: patients with varied demographics, clinical needs (ICU, ventilator, meds, nursing, surgery), and discretized survival-probability labels. Example resource limits: 3 ICU, 2 Vent, 60 Med-A, 50 Med-B, 80 nursing hrs/week, 3 surgical slots/week.
    • Metrics: CNSS (Clinical Need Satisfaction Scale) per patient (fraction of clinically required resources received). Aggregate metrics mapped to ethical frameworks:
      • ESG (Expected Survival Gain) — utilitarian (maximize ∑ pi·CNSSi).
      • RMG (Rawlsian Minimum Guarantee) — maximize min CNSSi.
      • Gini — egalitarian (minimize inequality of CNSS).
      • VWCI (Vulnerability-Weighted Care Intensity) — care ethics (weights by age/gender vulnerabilities).
      • DW-ESG (Disadvantage-Weighted ESG) — prioritarian (weights socio-demographic disadvantage).
      • Var — libertarian measure (variance or related).
  • Agent instantiation:
    • Models: LLaMA 3.3 and Qwen 2.5 (open-weight), served locally (Ollama).
    • RAG pipeline: LangChain + Chroma vector DB; embeddings from nomic-embed-text-v2-moe. Aligned agents retrieve canonical philosophical/ethical texts as context.
    • Agent profiles: PA (aligned via RAG + ethical docs), PB (baseline unaligned, no RAG), PC (biased via toxic prompts/biased doc injection prioritizing protected attributes).
    • Interaction: structured rounds (T = 3 in the described protocol) — each round agents propose allocation matrix + natural-language justification; interaction history recorded; final allocations Al,T evaluated on the metrics above.
  • Instance generation: batches of cohorts sampled to span ethical tensions (age, SES, race, survival prognoses). Survival probabilities drawn uniformly and discretized to categories (Acute, Low, Mid, High).
  • Analysis: compare individual proposals and final negotiated allocations across alignment configurations and adversarial pressure; quantify metric improvements and shifts in welfare functionals.

Implications for AI Economics

  • Unit of evaluation shifts from individual models to multi-agent systems:
    • Economic assessments (costs/benefits, social welfare) should account for emergent system-level properties produced by agent interaction protocols, not only per-model fairness metrics.
  • Mechanism and market design:
    • Arrow-style impossibility constraints imply designers must choose which welfare/scoring desiderata to prioritize. Market designers and regulators should expect trade-offs and design negotiation/aggregation protocols (procedural rules, voting/consensus mechanisms, deliberation formats) to reflect chosen trade-offs.
    • Multi-agent ensembles can serve as a decentralized corrective mechanism (pluralism benefits): incorporating heterogeneous agents with explicit normative commitments may improve aggregate fairness over single-agent solutions, but protocol design matters.
  • Alignment and procurement:
    • Alignment via RAG or constitution-like corpora can moderate bias but is not a panacea; procurement and deployment decisions should require system-level stress tests with adversarial agents and transparency on retrieval corpora.
    • There are economic trade-offs: adding aligned agents, RAG infrastructure, and longer deliberation rounds increases compute and latency costs — weigh these against social-welfare gains from improved allocations.
  • Regulatory and accountability design:
    • Because fairness is procedural and emergent, regulation should mandate auditing of multi-agent decision processes (interaction logs, justifications, retrieval provenance) and require disclosure of aggregation rules and agent profiles.
    • Liability frameworks may need to assign responsibility at the system/protocol level rather than only to a single deployed model.
  • Incentives and strategic behavior:
    • Adversarial agents (malicious prompts, biased retrieval corpora) can be amplified in multi-agent settings; economic incentives should favor robust retrieval curation, adversarial testing, and diversity of agent objectives to reduce manipulability.
  • Research and policy priorities:
    • Invest in design of deliberation protocols (number of rounds, roles, weighting of justifications) akin to market institutions — these are policy levers that trade off welfare dimensions.
    • Develop metrics and benchmarks that evaluate emergent fairness and welfare under heterogeneous agent interactions (costly to run but necessary for accurate economic assessment).
  • Limitations relevant to economics:
    • Results are from synthetic cohorts and a small set of models (open-weight LLMs) and structured short debates; external validity to deployed high-stakes markets (real hospitals, financial systems) remains to be empirically validated.
    • Arrow-type constraints guarantee trade-offs; economic policy must focus on selecting acceptable trade-offs and on procedural design to reduce social cost.

Suggested directions for AI-economics research: formalize the welfare-aggregation trade-offs in economic terms (social-welfare functions over agent-aggregated allocations), analyze cost-effectiveness of different deliberation protocols, and design incentive-compatible mechanisms that operationalize chosen normative priorities in decentralized agentic systems.

Assessment

Paper Typedescriptive Evidence Strengthlow — Findings come from simulated interactions among LLM agents in a toy hospital-triage environment rather than from real-world deployments or human-subjects trials; results are sensitive to choice of model, prompts, RAG implementation, and evaluation metrics, limiting external validity and causal generalization to economic outcomes. Methods Rigormedium — The study uses a clearly controlled experimental manipulation (aligned vs unaligned/adversarial agents) and structured negotiation rounds with predefined evaluation criteria, which supports internal consistency; however, the paper appears to lack robustness checks across multiple model families, large-scale sensitivity analyses, or real-world validation. SampleSimulated dataset of repeated two-agent negotiations in a stylized hospital triage framework (three structured debate rounds per case); one agent uses retrieval-augmented generation to implement an ethical framework, the other is either unaligned or adversarially prompted to prioritize demographics over clinical need; outcomes evaluated against a set of fairness/allocation criteria (exact model family, number of runs, and dataset of patient vignettes not specified here). Themeshuman_ai_collab governance IdentificationControlled computational experiments that manipulate agent alignment: one agent is retrieval-augmented and aligned to a specified ethical framework (treatment) while the other is unaligned or adversarially prompted (control); outcomes are compared across repeated simulated two-agent negotiations over a structured hospital triage task. GeneralizabilitySimulated agents — not human decision-makers or deployed multi-agent systems, Simplified hospital triage task may not capture clinical complexity or institutional constraints, Results likely sensitive to choice of LLM, prompt templates, and RAG sources, Only two-agent interactions studied; multi-agent or market-scale dynamics may differ, Normative fairness criteria and ethical frameworks used may not generalize across cultures or policy contexts, No field validation or observed economic outcomes like productivity, wages, or firm behavior

Claims (7)

ClaimDirectionConfidenceOutcomeDetails
Fairness in language models emerges through interaction and exchange among agents, rather than being solely a property of a single, centrally optimized model. Decision Quality positive high emergent fairness of joint allocations produced by multi-agent interaction
0.18
Alignment systematically shapes negotiation strategies and allocation patterns between agents. Task Allocation mixed high negotiation strategies and resource allocation patterns
0.18
Neither agent's allocation is ethically adequate in isolation, yet their joint final allocation can satisfy fairness criteria that neither would have reached alone. Decision Quality positive high ethical adequacy / fairness of allocations (individual vs joint)
0.18
Aligned agents partially moderate bias through contestation rather than override, acting as corrective patches that restore access for marginalized groups without fully converting a biased counterpart. Decision Quality positive high change in allocations for marginalized groups due to contestation in multi-agent deliberation
0.18
Even explicitly aligned agents exhibit intrinsic biases toward certain ethical frameworks, consistent with known left-leaning tendencies in large language models. Ai Safety And Ethics negative high intrinsic alignment bias (preference for certain ethical frameworks / ideological tilt)
0.09
No aggregation mechanism can simultaneously satisfy all desiderata of collective rationality (connection to Arrow's Impossibility Theorem); multi-agent deliberation navigates rather than resolves this constraint. Governance And Regulation mixed high satisfiability of collective rationality desiderata under aggregation mechanisms
0.03
Fairness should be evaluated at the system level (the interacting agents) rather than solely at the level of individual models, because fairness can be an emergent, procedural property of decentralized agent interaction. Decision Quality positive high appropriateness of system-level versus model-level evaluation for fairness
0.18

Notes