Language-model agents that perceive, deliberate and act reproduce human-like adaptive behavior in epidemic–economic simulations, producing divergent health and economic trajectories and enabling quantitative trade-off analysis; the framework is robust across several LLMs but requires empirical calibration before real-world policy use.

An LLM-Driven Multi-Agent Simulation Framework for Coupled Epidemic–Economic Dynamics

S. Wang, Huiyong Liu, Na Wang, Qunsheng Yang · March 05, 2026 · Information

openalex descriptive low evidence 7/10 relevance DOI Source PDF

LLM-powered Perception–Deliberation–Action agents generate endogenous, heterogeneous behavioral adaptation in coupled epidemic–economic simulations, producing divergent macro trajectories and quantifying public-health versus economic-stability trade-offs across scenarios and LLM backends.

Traditional Agent-based Models (ABMs) often struggle to capture the nuance of adaptive human decision-making during complex crises due to their reliance on static, predefined rules. Large Language Models (LLMs) offer a transformative solution by acting as cognitive engines that empower agents with human-like common-sense reasoning. In this paper, we introduce an LLM-driven Multi-Agent Simulation framework to investigate coupled epidemic–economic dynamics, incorporating a Perception-Deliberation-Action (PDA) loop. Agents, acting as heterogeneous cognitive entities, utilize Chain-of-Thought processes to autonomously balance health risks against economic necessities. This approach endogenously generates adaptive behaviors without explicit scripting. Extensive experiment results across diverse LLM backends confirm the framework’s robustness, revealing divergent socio-economic trajectories under distinct macroscopic conditions and effectively quantifying the trade-offs between public health and economic stability. This approach establishes a high-fidelity computational laboratory for investigating complex scenarios under distinct macroscopic conditions, effectively bridging the gap between micro-level cognition and macro-level societal outcomes.

Summary

Main Finding

LLM-driven agents embedded in a Perception–Deliberation–Action (PDA) loop produce endogenous, human-like adaptive behaviors via Chain-of-Thought reasoning. When used in a coupled epidemic–economic simulation, this framework robustly generates divergent macro trajectories and quantitatively captures the trade-offs between public health outcomes and economic stability across a range of macroscopic scenarios and LLM backends — effectively creating a high-fidelity computational laboratory that links micro-level cognition to macro-level societal outcomes.

Key Points

Framework innovation: Replaces static, rule-based agent decision-making with LLM-powered cognitive agents that perceive environment signals, deliberate using Chain-of-Thought, and act—without hand-coded behavior rules.
Perception–Deliberation–Action (PDA) loop:
- Perception: agents observe local epidemic and economic signals (e.g., infection risk, income/work opportunities).
- Deliberation: agents use LLM Chain-of-Thought to reason about trade-offs (health risk vs. economic necessity).
- Action: agents select behaviors (e.g., alter social contacts, participate in work, change consumption) that feed back into epidemic and economic dynamics.
Endogenous adaptation: Behavioral changes emerge from cognitive reasoning rather than parameterized switches, producing context-sensitive, heterogeneous responses.
Robustness: Experiments run with multiple LLM backends (proprietary and open-source) show qualitatively consistent dynamics, indicating framework stability to model choice.
Outcomes: The model produces divergent socio-economic trajectories under different macroscopic conditions and quantitatively characterizes the public-health vs. economic-stability trade-off (e.g., differing infection curves and economic activity levels).
Usefulness: Enables scenario testing for policies and shocks where human judgment and adaptation matter (lockdowns, targeted interventions, information campaigns).

Data & Methods

Model architecture:
- Multi-agent simulation coupling an epidemic module (transmission depends on agent interactions and behaviors) with an economic module (aggregate activity driven by agent-level labor/consumption decisions).
- Each agent implemented as an LLM-driven cognitive unit running the PDA loop each timestep.
- Chain-of-Thought prompts/internal reasoning are used to simulate richer, multi-step decision processes.
Heterogeneity:
- Agents differ in their perceptions, priorities (health vs. income), and local conditions, producing diverse responses.
LLM backends:
- Framework evaluated across several LLMs (multiple proprietary and open-source models) to test sensitivity to language model choice.
Experiments:
- Diverse macroscopic scenarios (e.g., varying baseline transmissibility, economic shocks, information environments, policy regimes).
- Metrics tracked at micro and macro scales: individual behavior patterns, infection prevalence over time, and aggregate economic indicators (changes in activity/employment/consumption).
- Quantitative analysis of trade-offs between health metrics and economic outcomes across scenarios and model variants.
Validation & robustness checks:
- Cross-backend comparisons and scenario sweeps to assess qualitative consistency and sensitivity.
- (Paper notes necessity of further empirical calibration; framework primarily demonstrates method and emergent phenomena.)

Implications for AI Economics

Better behavioral realism: LLM-driven agents can produce richer, context-sensitive adaptive behavior, improving the realism of economic-epidemic simulations where human judgment matters.
Policy evaluation: The framework enables testing of nuanced interventions (timing, targeting, communication strategies) and forecasts their macro impacts considering endogenous behavioral change.
Micro–macro linkage: Provides a tractable way to endogenize micro-level cognition and see how cognitive heterogeneity aggregates into systemic outcomes and trade-offs.
Scenario analysis and risk assessment: Useful for exploring tail events, cascading failures, and distributional consequences when behavior is a key driver.
Research directions:
- Empirical calibration and validation against observed behavioral/economic data to increase predictive credibility.
- Incorporating LLM alignment, bounded rationality constraints, and memory or learning across episodes to better match human cognition.
- Cost/compute and reproducibility improvements (e.g., distilled/smaller LLMs, surrogate models for large runs).
Cautions:
- Dependence on LLM behavior: hallucinations, bias, or misaligned reasoning in language models can propagate into simulated outcomes.
- Interpretability and auditability: Chain-of-Thought reasoning may be hard to fully verify; careful logging and post-hoc analysis required.
- Ethical and policy risks: Simulations could be misused without proper transparency; real-world policy recommendations require robust validation.

Overall, LLM-driven multi-agent simulations offer a promising step toward endogenizing human-like adaptive decision-making in economic models, enabling more realistic analysis of policy trade-offs during crises — while necessitating careful validation, cost-management, and safeguards.

Assessment

Paper Typedescriptive Evidence Strengthlow — The paper demonstrates a methodological capability within simulated environments and shows qualitative robustness across LLM backends, but it presents no empirical calibration or validation against observed human behavioral, epidemiological, or economic data, so its claims about real-world effects remain unvalidated and largely hypothetical. Methods Rigormedium — The authors implement a coherent multi-agent PDA architecture, use Chain-of-Thought prompting, include agent heterogeneity, run scenario sweeps, and compare multiple LLM backends—showing attention to robustness; however, key methodological gaps remain (no empirical calibration, limited detail on prompt/seed sensitivity, uncertain reproducibility and computational feasibility for large-scale runs), which reduce overall rigor. SampleSynthetic multi-agent population in a coupled epidemic–economic simulation; agents are heterogeneous in perceptions, priorities (health vs income), and local conditions; each agent runs an LLM-based Perception–Deliberation–Action loop using Chain-of-Thought; experiments sweep macroscopic scenarios (transmissibility, shocks, policy regimes) and compare several proprietary and open-source LLM backends; micro (agent behaviors, contacts) and macro (infection prevalence, aggregate activity/employment/consumption) metrics are recorded. (Exact population sizes and parameterization not specified in provided summary.) Themeshuman_ai_collab governance GeneralizabilityNo empirical calibration or validation against observed human behavioral or economic data limits external validity, LLM-generated reasoning may not match actual human cognition across cultures, demographics, or institutional settings, Dependence on specific LLM architectures, prompts, and stochastics may change outcomes despite reported qualitative robustness, Simulation assumptions about disease transmission and economic structure may not hold in different epidemiological or macroeconomic contexts, High compute requirements and proprietary model use may limit reproducibility and scalability, Findings pertain to the modeled scenario space and may not generalize to untested shocks or policy regimes

Claims (10)

Claim	Direction	Confidence	Outcome	Details
LLM-driven agents embedded in a Perception–Deliberation–Action (PDA) loop produce endogenous, human-like adaptive behaviors via Chain-of-Thought reasoning. Other	positive	medium	agent-level behavioral adaptation patterns / ‘‘human-likeness’’ of decisions (e.g., changes in social contacts, work participation, consumption)	0.05
When coupled with an epidemic–economic model, the LLM-PDA framework robustly generates divergent macro trajectories across scenarios. Fiscal And Macroeconomic	mixed	medium	macro trajectories: infection prevalence over time and aggregate economic indicators (activity/employment/consumption)	0.05
The framework quantitatively captures trade-offs between public-health outcomes and economic stability across macroscopic scenarios and different LLM backends. Fiscal And Macroeconomic	mixed	medium	trade-off metrics linking public-health measures (infection curves/prevalence) to economic stability indicators (aggregate activity, employment, consumption)	0.05
The framework replaces static, rule-based agent decision-making with LLM-powered cognitive agents that perceive environment signals, deliberate using Chain-of-Thought, and act—without hand-coded behavior rules. Other	positive	high	agent decision-making mechanism (presence of LLM/CoT-driven decisions vs. hand-coded rules)	0.09
Behavioral changes in the simulation emerge endogenously from cognitive reasoning rather than from parameterized switches, producing context-sensitive, heterogeneous responses. Other	positive	medium	heterogeneity in individual behaviors (context-sensitive changes in contacts, work, consumption)	0.05
Experiments run with multiple LLM backends (proprietary and open-source) show qualitatively consistent dynamics, indicating framework stability to model choice. Research Productivity	positive	medium	qualitative consistency of macro dynamics (e.g., similarity in infection/economic trajectories) across different LLM backends	0.05
Chain-of-Thought prompts/internal reasoning simulate richer, multi-step decision processes in agents compared with conventional single-step decision rules. Decision Quality	positive	high	complexity/structure of agent decision process (presence of multi-step CoT reasoning vs. single-step rules)	0.09
The framework enables scenario testing for policies and shocks (e.g., lockdowns, targeted interventions, information campaigns) where human judgment and adaptation matter. Governance And Regulation	positive	medium	suitability for scenario/policy analysis (ability to simulate policy-induced changes in macro outcomes accounting for endogenous behavior)	0.05
Further empirical calibration and validation against observed behavioral and economic data are necessary; the framework primarily demonstrates method and emergent phenomena rather than ready predictive deployment. Research Productivity	null_result	high	level of empirical calibration/validation (current framework not yet empirically calibrated for predictive deployment)	0.09
Risks: dependence on LLM behavior means hallucinations, bias, or misaligned reasoning can propagate into simulated outcomes; Chain-of-Thought reasoning may be hard to fully verify, posing interpretability/auditability challenges. Ai Safety And Ethics	negative	high	propagation of LLM-induced errors/bias into simulation outcomes and interpretability/auditability of agent reasoning	0.09