Language-model agents that perceive, deliberate and act reproduce human-like adaptive behavior in epidemic–economic simulations, producing divergent health and economic trajectories and enabling quantitative trade-off analysis; the framework is robust across several LLMs but requires empirical calibration before real-world policy use.
Traditional Agent-based Models (ABMs) often struggle to capture the nuance of adaptive human decision-making during complex crises due to their reliance on static, predefined rules. Large Language Models (LLMs) offer a transformative solution by acting as cognitive engines that empower agents with human-like common-sense reasoning. In this paper, we introduce an LLM-driven Multi-Agent Simulation framework to investigate coupled epidemic–economic dynamics, incorporating a Perception-Deliberation-Action (PDA) loop. Agents, acting as heterogeneous cognitive entities, utilize Chain-of-Thought processes to autonomously balance health risks against economic necessities. This approach endogenously generates adaptive behaviors without explicit scripting. Extensive experiment results across diverse LLM backends confirm the framework’s robustness, revealing divergent socio-economic trajectories under distinct macroscopic conditions and effectively quantifying the trade-offs between public health and economic stability. This approach establishes a high-fidelity computational laboratory for investigating complex scenarios under distinct macroscopic conditions, effectively bridging the gap between micro-level cognition and macro-level societal outcomes.
Summary
Main Finding
LLM-driven agents embedded in a Perception–Deliberation–Action (PDA) loop produce endogenous, human-like adaptive behaviors via Chain-of-Thought reasoning. When used in a coupled epidemic–economic simulation, this framework robustly generates divergent macro trajectories and quantitatively captures the trade-offs between public health outcomes and economic stability across a range of macroscopic scenarios and LLM backends — effectively creating a high-fidelity computational laboratory that links micro-level cognition to macro-level societal outcomes.
Key Points
- Framework innovation: Replaces static, rule-based agent decision-making with LLM-powered cognitive agents that perceive environment signals, deliberate using Chain-of-Thought, and act—without hand-coded behavior rules.
- Perception–Deliberation–Action (PDA) loop:
- Perception: agents observe local epidemic and economic signals (e.g., infection risk, income/work opportunities).
- Deliberation: agents use LLM Chain-of-Thought to reason about trade-offs (health risk vs. economic necessity).
- Action: agents select behaviors (e.g., alter social contacts, participate in work, change consumption) that feed back into epidemic and economic dynamics.
- Endogenous adaptation: Behavioral changes emerge from cognitive reasoning rather than parameterized switches, producing context-sensitive, heterogeneous responses.
- Robustness: Experiments run with multiple LLM backends (proprietary and open-source) show qualitatively consistent dynamics, indicating framework stability to model choice.
- Outcomes: The model produces divergent socio-economic trajectories under different macroscopic conditions and quantitatively characterizes the public-health vs. economic-stability trade-off (e.g., differing infection curves and economic activity levels).
- Usefulness: Enables scenario testing for policies and shocks where human judgment and adaptation matter (lockdowns, targeted interventions, information campaigns).
Data & Methods
- Model architecture:
- Multi-agent simulation coupling an epidemic module (transmission depends on agent interactions and behaviors) with an economic module (aggregate activity driven by agent-level labor/consumption decisions).
- Each agent implemented as an LLM-driven cognitive unit running the PDA loop each timestep.
- Chain-of-Thought prompts/internal reasoning are used to simulate richer, multi-step decision processes.
- Heterogeneity:
- Agents differ in their perceptions, priorities (health vs. income), and local conditions, producing diverse responses.
- LLM backends:
- Framework evaluated across several LLMs (multiple proprietary and open-source models) to test sensitivity to language model choice.
- Experiments:
- Diverse macroscopic scenarios (e.g., varying baseline transmissibility, economic shocks, information environments, policy regimes).
- Metrics tracked at micro and macro scales: individual behavior patterns, infection prevalence over time, and aggregate economic indicators (changes in activity/employment/consumption).
- Quantitative analysis of trade-offs between health metrics and economic outcomes across scenarios and model variants.
- Validation & robustness checks:
- Cross-backend comparisons and scenario sweeps to assess qualitative consistency and sensitivity.
- (Paper notes necessity of further empirical calibration; framework primarily demonstrates method and emergent phenomena.)
Implications for AI Economics
- Better behavioral realism: LLM-driven agents can produce richer, context-sensitive adaptive behavior, improving the realism of economic-epidemic simulations where human judgment matters.
- Policy evaluation: The framework enables testing of nuanced interventions (timing, targeting, communication strategies) and forecasts their macro impacts considering endogenous behavioral change.
- Micro–macro linkage: Provides a tractable way to endogenize micro-level cognition and see how cognitive heterogeneity aggregates into systemic outcomes and trade-offs.
- Scenario analysis and risk assessment: Useful for exploring tail events, cascading failures, and distributional consequences when behavior is a key driver.
- Research directions:
- Empirical calibration and validation against observed behavioral/economic data to increase predictive credibility.
- Incorporating LLM alignment, bounded rationality constraints, and memory or learning across episodes to better match human cognition.
- Cost/compute and reproducibility improvements (e.g., distilled/smaller LLMs, surrogate models for large runs).
- Cautions:
- Dependence on LLM behavior: hallucinations, bias, or misaligned reasoning in language models can propagate into simulated outcomes.
- Interpretability and auditability: Chain-of-Thought reasoning may be hard to fully verify; careful logging and post-hoc analysis required.
- Ethical and policy risks: Simulations could be misused without proper transparency; real-world policy recommendations require robust validation.
Overall, LLM-driven multi-agent simulations offer a promising step toward endogenizing human-like adaptive decision-making in economic models, enabling more realistic analysis of policy trade-offs during crises — while necessitating careful validation, cost-management, and safeguards.
Assessment
Claims (10)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| LLM-driven agents embedded in a Perception–Deliberation–Action (PDA) loop produce endogenous, human-like adaptive behaviors via Chain-of-Thought reasoning. Other | positive | medium | agent-level behavioral adaptation patterns / ‘‘human-likeness’’ of decisions (e.g., changes in social contacts, work participation, consumption) |
0.05
|
| When coupled with an epidemic–economic model, the LLM-PDA framework robustly generates divergent macro trajectories across scenarios. Fiscal And Macroeconomic | mixed | medium | macro trajectories: infection prevalence over time and aggregate economic indicators (activity/employment/consumption) |
0.05
|
| The framework quantitatively captures trade-offs between public-health outcomes and economic stability across macroscopic scenarios and different LLM backends. Fiscal And Macroeconomic | mixed | medium | trade-off metrics linking public-health measures (infection curves/prevalence) to economic stability indicators (aggregate activity, employment, consumption) |
0.05
|
| The framework replaces static, rule-based agent decision-making with LLM-powered cognitive agents that perceive environment signals, deliberate using Chain-of-Thought, and act—without hand-coded behavior rules. Other | positive | high | agent decision-making mechanism (presence of LLM/CoT-driven decisions vs. hand-coded rules) |
0.09
|
| Behavioral changes in the simulation emerge endogenously from cognitive reasoning rather than from parameterized switches, producing context-sensitive, heterogeneous responses. Other | positive | medium | heterogeneity in individual behaviors (context-sensitive changes in contacts, work, consumption) |
0.05
|
| Experiments run with multiple LLM backends (proprietary and open-source) show qualitatively consistent dynamics, indicating framework stability to model choice. Research Productivity | positive | medium | qualitative consistency of macro dynamics (e.g., similarity in infection/economic trajectories) across different LLM backends |
0.05
|
| Chain-of-Thought prompts/internal reasoning simulate richer, multi-step decision processes in agents compared with conventional single-step decision rules. Decision Quality | positive | high | complexity/structure of agent decision process (presence of multi-step CoT reasoning vs. single-step rules) |
0.09
|
| The framework enables scenario testing for policies and shocks (e.g., lockdowns, targeted interventions, information campaigns) where human judgment and adaptation matter. Governance And Regulation | positive | medium | suitability for scenario/policy analysis (ability to simulate policy-induced changes in macro outcomes accounting for endogenous behavior) |
0.05
|
| Further empirical calibration and validation against observed behavioral and economic data are necessary; the framework primarily demonstrates method and emergent phenomena rather than ready predictive deployment. Research Productivity | null_result | high | level of empirical calibration/validation (current framework not yet empirically calibrated for predictive deployment) |
0.09
|
| Risks: dependence on LLM behavior means hallucinations, bias, or misaligned reasoning can propagate into simulated outcomes; Chain-of-Thought reasoning may be hard to fully verify, posing interpretability/auditability challenges. Ai Safety And Ethics | negative | high | propagation of LLM-induced errors/bias into simulation outcomes and interpretability/auditability of agent reasoning |
0.09
|