People cooperate readily with a powerful chatbot, but not as completely as with humans: natural-language chat with GPT-5.2 yields high initial cooperation yet a persistent plateau below the near-universal cooperation seen in human pairs, because conversations with the AI prompt explicit conditional rules rather than trust-building social signals.
This paper investigates how natural language communication with an AI agent affects human cooperative behaviour in indefinitely repeated Prisoner's Dilemma games. We conduct a laboratory experiment (n = 126) with two between-subjects treatments varying whether human participants chat with an AI chatbot (GPT-5.2) before every round or only before the first round of each supergame, and benchmark against human-human data from Dvorak and Fehrler (2024) (n = 108). We find four main results. First, cooperation against the AI is high and initially comparable to human-human levels, but unlike in the human-human setting, where cooperation converges to near-complete levels, cooperation against the AI plateaus and never reaches full cooperation. Second, repeated communication, which substantially increases cooperation in human-human interactions, has no detectable effect in the human-AI setting. Third, strategy estimation reveals that human-AI subjects favour Grim Trigger under pre-play communication and remain dispersed under repeated communication, whereas human-human subjects converge to Tit-for-Tat and unconditional cooperation respectively. Fourth, human-AI conversations contain more explicit strategy commitments but fewer emotional and social messages. These results suggest that humans cooperate with AI at high rates but do not develop the trust observed in human-human interactions. Cooperation in the human-AI setting is sustained through conditional rules rather than through the social bonds and mutual understanding that characterise human-human cooperation.
Summary
Main Finding
Humans cooperate with an AI chatbot (GPT-5.2) at high rates in an indefinitely repeated Prisoner’s Dilemma, but the pattern and maintenance of cooperation differ from human–human play: cooperation against the AI plateaus below full cooperation (~82%) and repeated natural-language communication has no detectable additional effect. Humans facing the AI adopt more punitive, rule‑based strategies (grim-trigger) and use more explicit commitments but fewer emotional/social messages than in human–human conversations. Overall, cooperation with AI is sustained by conditional rules rather than by the social trust and mutual understanding that characterise human–human cooperation.
Key Points
-
Experimental setup
- Between-subjects lab experiment with human participants playing indefinitely repeated PD against an AI chatbot (GPT-5.2 snapshot).
- n = 126 human–AI subjects; benchmark human–human dataset from Dvorak & Fehrler (2024), n = 108.
- Two treatments: (i) Pre-play chat: natural-language messages only before the first round of each supergame; (ii) Repeated chat: messages before every round.
- Payoffs: T=37, R=30, P=17, S=0 (normalized g = 7/13, l = 30/13). Continuation probability δ = 0.80. Seven supergames.
- Perfect monitoring (actions observed), length-generation as in Dvorak & Fehrler (2024). Subjects informed the partner was an AI.
-
Four main empirical results
- Cooperation is high vs AI and initially comparable to human–human play, but unlike human–human interactions (which converge to near-complete cooperation), cooperation against the AI plateaus around ~82% and never reaches full cooperation.
- Repeated communication substantially increases cooperation in human–human settings in prior work, but in the human–AI experiment repeated communication produced no detectable additional effect beyond pre-play chat.
- Strategy estimation (Strategy Frequency Estimation Method) shows humans facing the AI favor grim-trigger under pre-play communication and remain strategy‑dispersed under repeated communication; by contrast, human–human subjects converge to Tit‑for‑Tat (pre-play) and unconditional cooperation (repeated).
-
NLP analysis of chat content finds more explicit strategy commitments (promises/conditional proposals) but fewer emotional and social messages in human–AI conversations versus human–human conversations.
-
Interpretation
- Humans appear to respond to the AI as a non‑human counterpart by relying on algorithmic, rule-based reasoning and punitive strategies (hedging), rather than building social trust through affective engagement.
- The AI–human cooperative equilibrium is sustained differently: conditional enforcement of rules rather than social-bonding/forgiveness norms.
Data & Methods
-
Subjects and platform
- Human participants recruited via ORSEE at Lancaster Experimental Economics Laboratory (LExEL).
- Implementation in oTree. Ethics clearance and preregistration reported; replication materials available on OSF (https://osf.io/e4rz8).
-
Treatments & game parameters
- Two between-subjects treatments: pre-play chat vs repeated chat (chat with GPT-5.2 pinned snapshot; reasoning effort = medium).
- 7 supergames, continuation probability δ = 0.8 (length generated as in Dvorak & Fehrler, 2024), perfect monitoring.
-
Strategy inference
- Strategy Frequency Estimation Method (Dal Bó & Fréchette, 2011) used to infer population distribution over canonical strategies (grim-trigger, Tit‑for‑Tat, unconditional cooperation/defection, lenient M1BF variants, etc.).
- Computation of Basin of Attraction of Defection (BAD) for baseline strategic uncertainty (π* ≈ 0.4 under parameters).
-
Conversation analysis
- Natural language processing (NLP) methods (classification/keyword/topic analysis) to quantify content: explicit commitments, promises, threats, emotional/social language.
- Comparative content metrics vs human–human benchmark (Dvorak & Fehrler data).
-
Benchmarks & robustness
- Direct comparison to Dvorak & Fehrler (2024) human–human results due to matched design and parameters.
- Robustness checks across treatments and sessions reported; key null result (no effect of repeated chat vs pre-play chat in human–AI) emphasized.
-
Limitations (noted by authors)
- Laboratory setting and single LLM snapshot (GPT-5.2) limit external generalisability.
- Participants were explicitly told they faced an AI; effects may differ under deception/uncertainty about partner type.
- Perfect monitoring only; different monitoring/noise regimes might change the role of repeated communication.
Implications for AI Economics
-
For models of human–AI strategic interaction
- Behavioral framing matters: humans facing AIs may default to rule-based, punitive strategies rather than socially forgiving ones. Economic models of repeated human–AI interaction should incorporate shifts in strategy distribution and trust formation.
- The mere capacity for fluent natural-language communication by an AI does not guarantee the same social mechanisms (affect, trust-building) that sustain human–human cooperation.
-
For AI design and deployment in cooperative settings
- To maximise sustained cooperation in long-run interactions (negotiation, advisory roles, customer service), designers should consider not only accurate commitments but also social-signalling behaviours (empathic language, forgiveness norms) that build trust over time.
- If AIs are unforgiving or behave mechanically after defections, humans may adopt grim-trigger responses that reduce long-run efficiency despite high short-run cooperation.
-
For policy and governance
- Evaluation of AI systems in social/economic roles should test repeated-interaction dynamics and communication formats—benchmarks should measure not only initial cooperation but convergence and robustness of cooperation over time.
- Disclosure that a partner is an AI changes human behaviour; regulators and institutions should account for behavioral shifts (algorithm aversion, defensive strategies) when approving AI agents for roles requiring sustained cooperation.
-
For empirical research agenda
- Key next steps: replicate with different LLMs (varying degrees of expressed affect/forgiveness), test alternative monitoring regimes, field studies in real-world repeated interactions, and experiments that vary whether subjects believe the partner is human or AI.
- Investigate whether altering the AI’s conversational style (more emotional/social messaging, explicit forgiveness) reduces human reliance on punitive strategies and allows cooperation to converge to human–human levels.
Overall, the paper provides controlled evidence that while AI chatbots can sustain high cooperation via conditional rules, natural-language communication with an AI does not automatically substitute for the social-trust channels central to human–human cooperation—an important consideration for economic theory, AI system design, and policy.
Assessment
Claims (12)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Initial cooperation rates against the AI (GPT-5.2) are high and comparable to initial cooperation in human–human pairs. Team Performance | null_result | high | initial cooperation rate (cooperation in early rounds / first round of supergames) |
n=126
0.6
|
| Cooperation with the AI plateaus and never reaches the near-complete cooperation levels observed in human–human interactions. Team Performance | negative | high | cooperation rate over time and asymptotic/end-state cooperation level |
n=126
0.6
|
| Allowing repeated pre-play communication (chat before every round) has no detectable effect on cooperation rates when the partner is an AI. Team Performance | null_result | high | effect of chat frequency on cooperation rate (difference in cooperation between chat-frequency treatments) |
n=126
0.6
|
| In the human–human benchmark, repeated pre-play communication substantially increases cooperation. Team Performance | positive | high | change in cooperation rate associated with repeated communication in human–human pairs |
n=108
0.6
|
| Strategy estimation indicates human–AI subjects tend to favor Grim Trigger when allowed pre-play communication. Decision Quality | positive | medium | prevalence/frequency of Grim Trigger strategy classification among subjects |
n=126
0.36
|
| When allowed repeated communication with the AI, human subjects remain behaviorally dispersed and do not converge to a single dominant strategy. Team Performance | mixed | medium | strategy convergence / dispersion (distribution of inferred strategies over time) |
n=126
0.36
|
| Human–human subjects converge to Tit‑for‑Tat under one condition and to unconditional cooperation under the repeated-communication condition. Team Performance | positive | medium | prevalent strategy type over time in human–human pairs (Tit‑for‑Tat vs unconditional cooperation) |
n=108
0.36
|
| Human–AI chat logs contain more explicit strategy commitments (stated rules) than human–human chats. Decision Quality | positive | medium | frequency/count of explicit strategy-commitment messages in chat logs |
n=126
0.36
|
| Human–AI chats contain fewer emotional and social messages compared with human–human chats. Decision Quality | negative | medium | frequency/count of emotional/social message types in chat logs |
n=126
0.36
|
| Cooperation with the AI is sustained mainly through conditional rule-based strategies rather than through trust-building, emotional, and social channels. Decision Quality | mixed | medium | mechanism of cooperation (relative contribution of conditional rule-following vs social/emotional trust indicators) |
n=126
0.36
|
| Experimental design: subjects played an indefinitely repeated Prisoner’s Dilemma in supergames with two between-subjects treatments varying chat timing (chat only before first round of each supergame vs chat before every round); the AI partner was GPT-5.2. Other | null_result | high | experimental treatment specification (chat-frequency manipulation; AI identity) |
n=126
0.6
|
| Sample sizes reported: human–AI experiment n = 126; human–human benchmark n = 108. Other | null_result | high | reported sample sizes |
n=126
0.6
|