Autonomous scheduling agents that reward cooperative behavior using only aggregate demand signals can substantially improve simulated electricity-load coordination and permit beneficial unilateral adoption; but in some parameter regimes non-adopters capture outsized gains, creating free‑rider risks.

Hybrid Human-Agent Social Dilemmas in Energy Markets

Isuri Perera, Frits de Nijs, Julian Garcia · March 12, 2026

arxiv theoretical low evidence 7/10 relevance Source PDF

Using game-theoretic analysis and simulations, the paper shows that autonomous consumer agents endowed with intrinsic rewards based on aggregate signals can steer learning dynamics toward cooperative turn-taking in load scheduling, and partial adoption can improve system outcomes while sometimes allowing non-adopters to free-ride.

In hybrid populations where humans delegate strategic decision-making to autonomous agents, understanding when and how cooperative behaviors can emerge remains a key challenge. We study this problem in the context of energy load management: consumer agents schedule their appliance use under demand-dependent pricing. This structure can create a social dilemma where everybody would benefit from coordination, but in equilibrium agents often choose to incur the congestion costs that cooperative turn-taking would avoid. To address the problem of coordination, we introduce artificial agents that use globally observable signals to increase coordination. Using evolutionary dynamics, and reinforcement learning experiments, we show that artificial agents can shift the learning dynamics to favour coordination outcomes. An often neglected problem is partial adoption: what happens when the technology of artificial agents is in the early adoption stages? We analyze mixed populations of adopters and non-adopters, demonstrating that unilateral entry is feasible: adopters are not structurally penalized, and partial adoption can still improve aggregate outcomes. However, in some parameter regimes, non-adopters may benefit disproportionately from the cooperation induced by adopters. This asymmetry, while not precluding beneficial entry, warrants consideration in deployment, and highlights strategic issues around the adoption of AI technology in multiagent settings.

Summary

Main Finding

Artificial (delegated) agents that condition on globally observable signals can change learning dynamics in hybrid human–AI populations so that efficient, cooperative scheduling (turn-taking) emerges in an energy load-management setting. Partial adoption is feasible: early adopters are not necessarily penalized and can improve aggregate outcomes, but adoption can create distributional asymmetries where non-adopters sometimes capture a disproportionate share of the gains.

Key Points

Problem setup
- Consumers schedule appliance use under demand-dependent pricing, creating a coordination problem: everyone would be better off if agents coordinated turn-taking to avoid congestion, but myopic equilibrium behavior often leads to costly simultaneous use.
- The focus is on hybrid populations where strategic decision-making is delegated to autonomous agents alongside non-adopters (humans or legacy algorithms).
Intervention
- Introduce artificial agents that use globally observable signals (public signals) to coordinate across the population and implement turn-taking-style cooperation.
Outcomes
- In fully adopted populations, artificial agents steer learning dynamics toward cooperative equilibria, reducing congestion costs and improving aggregate welfare.
- In mixed (partial-adoption) populations, unilateral entry of artificial agents is generally feasible: adopters are not structurally penalized and can improve system-level outcomes even if adoption is incomplete.
- Distributional asymmetry can occur: non-adopters may sometimes benefit more than adopters from the coordination induced by adopters (free-riding effect). This does not necessarily block beneficial entry but creates strategic and fairness considerations.

Data & Methods

Modeling environment
- Energy load-management/peak-demand context with demand-dependent pricing and discrete scheduling choices for consumer agents (scheduling windows for appliance use).
- Payoffs incorporate individual utility from appliance use minus congestion-dependent price costs.
Analytical and computational approach
- Evolutionary dynamics analysis: investigate how population share of strategies evolves under replicator-like or learning dynamics to identify stable outcomes and basins of attraction.
- Reinforcement Learning (RL) experiments: multi-agent RL simulations in which agents learn scheduling policies (with and without access to global signals) to observe emergent equilibria and welfare outcomes.
- Mixed-population experiments: vary fraction of adopters vs. non-adopters to study partial-adoption regimes and measure aggregate welfare, individual payoffs (adopters vs non-adopters), and equilibrium selection probabilities.
Measured outcomes
- Aggregate welfare (total surplus or system-level cost reductions).
- Congestion costs and frequency of coordinated (turn-taking) outcomes.
- Distributional payoffs to adopters versus non-adopters.
- Robustness across parameter regimes (e.g., demand sensitivity, signal informativeness, fraction of adopters).

Implications for AI Economics

Design and diffusion of AI coordination technologies
- Public signals or standardized coordination protocols can be a low-cost lever to induce efficient equilibria when agents are learning.
- Early adoption can be welfare-improving at the system level, making policy support for initial deployment (subsidies, pilots) potentially desirable.
Strategic adoption and distributional effects
- Partial adoption can produce positive externalities that non-adopters capture; this free-riding raises questions about incentives for adoption and fairness.
- Policymakers and platform designers should consider mechanisms to ensure adopters are not undercompensated (e.g., credits, differential pricing, contracts that internalize benefits).
Market design and regulation
- Pricing and information-design choices matter: demand-dependent prices plus publicly observable coordination signals can be deliberately engineered to guide market dynamics toward socially desirable equilibria.
- Regulation may be warranted to prevent exploitative outcomes where adopters bear coordination costs while others reap outsized benefits.
Research directions
- Empirical testing in real-world pilot deployments to quantify adoption thresholds and distributional impacts.
- Mechanism design to align private incentives with socially optimal adoption (tax/subsidy schemes, side-payments, platform-level revenue sharing).
- Extension to other multiagent domains (traffic routing, shared compute resources, decentralized markets) where learned coordination and partial adoption interact.

Limitations to note: results depend on the informativeness of public signals, the particular learning dynamics and environment parameters, and model assumptions about agent objectives and observability. Practical deployment should evaluate parameter sensitivity and distributional safeguards.

Assessment

Paper Typetheoretical Evidence Strengthlow — Results come from analytic toy games and simulation experiments (replicator dynamics and RL) rather than from field data or natural experiments; while internally consistent and useful for mechanism exploration, they do not provide strong empirical evidence that the same effects will obtain in real electricity markets or among human principals. Methods Rigormedium — The paper uses standard, appropriate theoretical tools (repeated games, Folk-theorem intuition, memory‑1 strategy enumeration), two‑population replicator dynamics, Monte Carlo initialization sweeps, and RL baselines with centralized optima for benchmarking; however, key assumptions (two-action abstraction, symmetric agents, step-price function, memory-1 restriction, choice of intrinsic reward form and hyperparameters, high patience δ regimes) are strong and not extensively stress‑tested or validated against behavioral/field data. SampleNo empirical sample; work uses simulated data and models: (a) a centralized MiniZinc optimal benchmark and decentralized reinforcement‑learning agents in a simulated DSLM environment (experiments reported over ~130 testing days and across small populations, RL hyperparameters in appendix), (b) a minimal 2-player, 2-action stage game abstracting PST vs Away with memory‑1 strategies (8 deterministic strategies), and (c) two‑population replicator dynamics with Monte Carlo draws (1000 simulations reported) varying inconvenience parameter p and discount factor δ. Themeshuman_ai_collab adoption IdentificationNo empirical causal identification; causal claims are supported by a mechanistic/theoretical strategy: (i) formalization of a stage game and repeated-game equilibria, (ii) analysis with replicator dynamics (two-population) and Monte Carlo simulation to show basin shifts, and (iii) reinforcement-learning simulations and centralized optimization benchmarks to illustrate the mechanisms under specified assumptions. GeneralizabilitySimulation-only results; no field or experimental human data to validate assumptions or behavior, Heavy abstraction: two-action, memory‑1 strategies, symmetric players — real consumers and appliances are heterogeneous, Assumes globally observable aggregate signals and truthful reporting; ignores information frictions, privacy, and measurement noise, Relies on specific step‑price cost structure and particular intrinsic‑reward design; results may be sensitive to pricing rules and reward hyperparameters, Cooperation often requires high patience (δ close to 1) in some regimes, which may not reflect real decision horizons, No accounting for implementation costs, regulatory constraints, or strategic responses by utilities

Claims (7)

Claim	Direction	Confidence	Outcome	Details
Demand-dependent pricing in the modeled energy load management setting creates a social dilemma: everyone would benefit from coordination, but in equilibrium agents often choose to incur congestion costs that cooperative turn-taking would avoid. Consumer Welfare	negative	medium-high	presence of congestion costs vs coordinated turn-taking (system efficiency/total cost)	Demand-dependent pricing creates social dilemma: equilibrium congestion costs vs coordinated turn-taking 0.01
Introducing artificial agents that use globally observable signals increases coordination among agents. Organizational Efficiency	positive	medium	coordination level (e.g., frequency of cooperative turn-taking, reduction in congestion)	Introducing artificial agents using global signals increases coordination among agents 0.04
Artificial agents can shift the learning dynamics to favour coordination outcomes. Team Performance	positive	medium	learning dynamics (probability / prevalence of converging to coordinated equilibria)	Artificial agents can shift learning dynamics to favour coordination outcomes 0.04
Unilateral entry of artificial-agent technology is feasible: adopters are not structurally penalized. Adoption Rate	positive	medium	relative payoff to adopters (adopter payoff compared to non-adopter payoff)	Unilateral entry feasible: adopters are not structurally penalized (relative payoff to adopters) 0.04
Partial adoption of artificial agents can still improve aggregate outcomes. Organizational Efficiency	positive	medium	aggregate outcomes (total system welfare / total cost / overall congestion)	Partial adoption of artificial agents can improve aggregate outcomes 0.04
In some parameter regimes, non-adopters may benefit disproportionately from the cooperation induced by adopters (i.e., non-adopters can free-ride on adopter-induced coordination). Inequality	mixed	medium	distribution of benefits (payoff to non-adopters vs payoff to adopters)	In some regimes, non-adopters may benefit disproportionately (free-riding on adopter-induced coordination) 0.04
Although the asymmetry in who benefits does not preclude beneficial entry, it raises strategic issues for deployment of AI technology in multiagent settings. Governance And Regulation	mixed	medium	policy/strategic implications for adoption (qualitative)	0.04