AI built to optimize in isolation can undermine its own deployment by changing the environment; researchers should shift from capability-first design to institution-aware, cooperation‑centric systems and adaptive testbeds to preserve human agency and stable equilibria.

Solipsistic Superintelligence is Unlikely to be Cooperative

Rakshit S Trivedi, Natasha Jaques, Logan Cross, Alexander Sasha Vezhnevets, Joel Z Leibo · June 02, 2026

arxiv theoretical n/a evidence 7/10 relevance Source PDF

Solipsistic AI design that treats environments as exogenous leads to self-undermining non‑stationarity and non‑cooperative outcomes, so AI should be redesigned around cooperation, institutional primitives, and dynamic evaluation with adaptive counterparties.

AI's central challenge is shifting from capability to coexistence. The dominant paradigm in AI research focuses on developing powerful agents that treat the world as an exogenous and stationary source of feedback. We contend that superintelligence, an extremely capable task solver, born out of such a solipsistic approach to AI design, is unlikely to be cooperative. Deploying AI systems induces endogenous non-stationarity, resulting in a train-test-deploy gap where historical distributions diverge from the deployment context. We refer to this as the self-undermining property of unilateral optimization. Closing this gap requires AI that participates in cooperation: the equilibrium-selection process through which multiple actors navigate their interdependence. We call for a non-solipsistic research paradigm that treats this interdependence as a core design principle rather than approaching cooperation as a task to solve. This entails building dynamic evaluation testbeds involving adaptive counterparties, treating institutions as design primitives, and preserving human agency as a structural feature of the systems we build.

Summary

Main Finding

A “solipsistic” design paradigm—training AIs as unilateral optimizers against a fixed, exogenous environment—systematically misses a core deployment reality: other humans, institutions, and algorithms will adapt in response. That endogenous non‑stationarity produces a train–test–deploy gap and a self‑undermining property of aggressive unilateral optimization. Consequently, even extremely capable (or “superintelligent”) systems built under solipsistic assumptions are unlikely to sustain cooperative, socially beneficial outcomes. Cooperation is not an add‑on capability but an equilibrium‑selection property of multi‑agent systems and must be treated as a design primitive.

Key Points

Central thesis: Cooperation among many humans and many AIs is an equilibrium‑selection process that emerges from interdependent adaptation; it cannot be guaranteed by merely scaling individual agent capabilities or “alignment” on isolated objectives.
Solipsistic assumptions in mainstream ML:
- Exogeneity: environment/data generation is independent of the agent’s policy.
- Stationarity: training/evaluation distributions remain valid at deployment.
- Singleton framing: other agents are treated as passive parts of state rather than strategic, adaptive actors.
Train–test–deploy gap: historical (exogenous) performance Jtrain can diverge systematically from Jdeploy when other actors respond to the deployed policy.
Self‑undermining property: policies that aggressively exploit historical regularities create incentives for other actors to adapt; higher capability can deepen these adaptations and precipitate sharp regime shifts.
Three structured channels of adaptation:
- Behavioral: humans change behavior (skill atrophy, strategic manipulation of inputs).
- Institutional: firms, regulators, and norms adapt rules and workflows.
- Algorithmic: other AIs retrain and co‑evolve (autocurricula, emergent collusion).
Equilibrium selection risk: multi‑agent systems typically admit multiple equilibria; deployment details (timing, scale, interfaces) can push systems into low‑welfare equilibria that are hard to reverse (lock‑in, network effects).
Prediction ≠ participation: modeling others as a harder prediction problem fails because of epistemic horizons, strategic uncertainty, and the fact that acting changes the very distribution to be predicted.
Empirical/illustrative failures noted: reservation algorithms causing phantom bookings and market frictions, diagnostic AIs causing human skill atrophy and narrowing of diagnostic diversity, recommendation/pricing algorithms producing polarization or supra‑competitive behavior even when optimizing their specified objectives.
Proposal: shift to a non‑solipsistic research paradigm that treats interdependence and adaptive counterparties as core design constraints.

Data & Methods

Methods are primarily theoretical, conceptual, and formal:
- Formalization: framing the problem as a transition from single‑agent MDPs (fixed P, R) to multi‑agent Markov games where transition dynamics and effective rewards become policy‑dependent (Pπ, Rπ).
- Definitions: introduced “endogenous non‑stationarity,” “train‑test‑deploy gap,” and the “self‑undermining property.”
- Structured argumentation: identification of adaptation channels (behavioral, institutional, algorithmic) and equilibrium selection mechanisms using game‑theoretic and economic concepts (performative prediction, externalities, tipping points).
- Appendices: provide more formal discussion and illustrative models (authors note Appendix A & B for formal/technical elaboration).
Empirical grounding: the paper draws on prior empirical and theoretical literature (e.g., performative prediction, algorithmic collusion, Flash Crash, literature on social norms and institutions) and uses stylized examples and case studies rather than presenting new datasets or experiments.
No large‑scale new empirical dataset or controlled multi‑agent experiments are reported; the contribution is primarily conceptual and methodological.

Implications for AI Economics

Rethink welfare analysis: static analyses that assume exogenous technology shocks are incomplete. Economic outcomes when AIs deploy are endogenous — preferences, constraints, and institutions co‑evolve with automation — so policy and welfare assessments must model these feedbacks.
Market design & competition policy:
- Algorithmic interaction can produce tacit collusion, supracompetitive pricing, or instability even without explicit communication. Antitrust and market‑design tools must account for adaptive algorithmic strategies and autocurricula.
- Timing and scale of deployment matter for equilibrium selection; regulators should consider phased rollouts and coordination mechanisms to avoid tipping into bad equilibria.
Labor, skills, and human capital:
- Widespread deployment of assistive AIs can cause human skill atrophy and alter labor market signaling. Economic models of productivity and human capital should incorporate skill‑decay and performativity effects.
Institutions as instruments:
- Institutions (rules, legitimacy, participatory procedures) are not externalities to be fixed after deployment; they are design variables that shape equilibrium selection. Economists should study institutional design as part of AI policy (e.g., participatory governance, deliberative institutions, audit regimes).
Dynamic regulation and adaptive policy:
- Static regulation will underperform if it assumes fixed agent behavior. Policy must be dynamic and anticipatory, monitoring endogenous feedbacks and enabling rapid, legitimate adjustments.
Research agenda for AI economics:
- Develop dynamic, multi‑agent models that explicitly include humans, firms, regulators, and adaptive algorithms; study tipping points, lock‑in, and distributional impacts.
- Build and use adaptive testbeds and simulations (economic environments with learning agents) to evaluate deployment pathways and policy interventions.
- Design mechanisms and coordination protocols that help select high‑welfare equilibria (e.g., standards, interface rules, transparency requirements, deployment schedules).
- Quantify externalities and second‑order effects (institutional responses, reputation dynamics, legitimacy erosion) to better forecast macroeconomic consequences of large‑scale AI adoption.
Practical policy recommendations implied by the paper:
- Mandate or incentivize dynamic evaluation against adaptive counterparties before large rollouts.
- Preserve human agency structurally (human oversight, meaningful recourse) to maintain legitimacy and avoid coordination collapse.
- Use phased and coordinated deployment conditional on demonstrated robustness in multi‑agent settings.
- Update competition and liability frameworks to account for algorithmic co‑evolution and equilibrium selection failure modes.

Overall, the paper urges economists and policymakers to move beyond static, exogenous views of AI shocks and to build analytic tools, testbeds, and institutions that explicitly model and shape the multi‑agent, adaptive processes that determine whether powerful AI systems produce cooperative, high‑welfare outcomes.

Assessment

Paper Typetheoretical Evidence Strengthn/a — The paper is a conceptual/position argument without empirical tests or causal identification; it advances theory and prescriptions rather than providing data-based evidence. Methods Rigorn/a — No empirical or formal methodological apparatus is presented—arguments are conceptual and prescriptive rather than derived from a rigorous empirical design or formal model. SampleNo empirical sample or dataset; the paper presents a conceptual critique and design agenda, illustrated with thought experiments and heuristic arguments rather than observed data. Themeshuman_ai_collab governance org_design GeneralizabilityNo empirical validation — claims are not tested across domains or contexts, Abstract framing may not map directly onto specific ML architectures, industries, or regulatory settings, Does not quantify economic magnitudes or distributional effects, limiting policy applicability, Assumes feasibility of designing institution-aware agents and adaptive counterparties, which may be constrained in practice, Limited operational guidance for near-term narrow-AI systems and deployments

Claims (8)

Claim	Direction	Confidence	Outcome	Details
AI's central challenge is shifting from capability to coexistence. Ai Safety And Ethics	positive	high	the primary challenge for AI development (capability vs. coexistence)	0.02
The dominant paradigm in AI research focuses on developing powerful agents that treat the world as an exogenous and stationary source of feedback. Ai Safety And Ethics	negative	high	research paradigm focus (solipsistic/stationary world assumption)	0.02
Superintelligence, an extremely capable task solver, born out of such a solipsistic approach to AI design, is unlikely to be cooperative. Ai Safety And Ethics	negative	high	cooperativeness of superintelligent AI	0.02
Deploying AI systems induces endogenous non-stationarity, resulting in a train-test-deploy gap where historical distributions diverge from the deployment context. Ai Safety And Ethics	negative	high	distributional shift (train-test-deploy gap) induced by AI deployment	0.02
This phenomenon is the self-undermining property of unilateral optimization. Ai Safety And Ethics	negative	high	conceptual identification of unilateral optimization leading to self-undermining effects	0.02
Closing this gap requires AI that participates in cooperation: the equilibrium-selection process through which multiple actors navigate their interdependence. Ai Safety And Ethics	positive	high	ability of AI to close the train-test-deploy gap via cooperative participation	0.02
The paper calls for a non-solipsistic research paradigm that treats interdependence as a core design principle rather than approaching cooperation as a task to solve. Ai Safety And Ethics	positive	high	research paradigm orientation (non-solipsistic vs. solipsistic)	0.02
Addressing these issues entails building dynamic evaluation testbeds involving adaptive counterparties, treating institutions as design primitives, and preserving human agency as a structural feature of the systems we build. Ai Safety And Ethics	positive	high	recommended design and evaluation practices for AI (dynamic testbeds, institutions as primitives, preserved human agency)	0.02