A reinforcement‑learning agent trained on the FRB/US macro model discovers fiscal strategies that raise simulated US GDP and lower unemployment across 2000–2024 versus standard FRB/US scenarios, but the improvements are earned inside the assumptions of the model and a growth‑focused objective.

Fiscal Policy Towards Optimizing Macroeconomic Indicators by Integrating FRB/US with Reinforcement Learning

Quang Truong Dang, Huong Giang Hoang, Anh Son Ta · March 11, 2026 · Computational Economics

openalex theoretical low evidence 7/10 relevance DOI Source PDF

An RL agent using PPO integrated with the FRB/US macro model finds fiscal policies that, in simulations for 2000–2024, raise simulated GDP, reduce unemployment peaks and better manage inflation compared with baseline FRB/US scenarios, though these gains are conditional on model structure and reward design.

Fiscal policy optimization in the context of competing macroeconomic objectives poses significant challenges for economic policymakers. Although the Federal Reserve’s FRB/US model provides sophisticated forecasts, its reliance on predefined scenarios constrains exploration of the full policy space. This research introduces the RL-FRB/US model which integrates the FRB/US model and Proximal Policy Optimization (PPO) reinforcement learning (RL) model with an active enhancement of relocation mechanism for fiscal policy optimization. The RL-FRB/US model demonstrates significant performance improvements over baseline FRB/US simulations in the period 2000–2024. By 2024Q2, the RLFRB/US model achieved higher real GDP (RL-FRB/US model: 23,407 trillion $ vs. FRB/US model: 23,218 trillion $), lower unemployment (3.23% vs. 3.96%), and more effective inflation management (PCPI RL-FRB/US model: 317.9 vs. FRB/US model: 312.3). During recessions, the model consistently delivered superior counter-cyclical responses, with unemployment peaks significantly reduced during major downturns—during the 1982 recession, peak unemployment reached only 9.9% compared to 10.9% in traditional simulations. While the RL-FRB/US model showed similar federal budget deficits by 2024 (RL-FRB/US model: -1,767 trillion $ vs. FRB/US model: -1,758 trillion $), it achieved substantially lower debt-to-GDP ratios (RL-FRB/US model: 26,535 trillion $ vs. FRB/US model: 30,186 trillion $) through more strategic debt management during expansionary periods. The output indicates that a combination of reinforcement learning and macroeconomic modeling introduces more reliable outputs than the traditional model, which provides policymakers with powerful decision-support instruments to balance inflation control, targeted unemployment rate and fiscal sustainability.

Summary

Main Finding

Integrating a Proximal Policy Optimization (PPO) reinforcement-learning agent with the FRB/US macroeconomic model (the RL-FRB/US) materially improves policy outcomes versus standard scenario-based FRB/US simulations. Over the reported sample, the RL-enhanced model achieved higher real GDP, lower unemployment, better inflation control, and a substantially lower debt-to-GDP ratio while producing similar aggregate budget deficits — suggesting that RL can find timing- and composition-improvements in fiscal policy that traditional scenario searches miss.

Key Points

Model integration: The RL-FRB/US couples the FRB/US structural macro model with a PPO reinforcement-learning agent that actively adjusts fiscal policy instruments; the agent uses an "active enhancement of relocation mechanism" (reported as a method to reallocate policy actions across time/space to improve outcomes).
Performance gains (selected end-point comparisons, 2024Q2 unless noted):
- Real GDP: RL-FRB/US = 23,407 (trillion $) vs. FRB/US = 23,218.
- Unemployment: RL-FRB/US = 3.23% vs. FRB/US = 3.96%.
- Price level (PCPI): RL-FRB/US = 317.9 vs. FRB/US = 312.3.
- Federal budget deficit (cumulative to 2024): similar (RL: -1,767 trillion $ vs. FRB: -1,758 trillion $).
- Debt-to-GDP: substantially lower under RL (RL: 26,535 trillion $ vs. FRB: 30,186 trillion $), attributed to strategic debt management during expansions.
Recession performance: RL-FRB/US produced stronger counter-cyclical responses and lower unemployment peaks in downturns (example given: 1982 recession peak unemployment RL 9.9% vs. FRB 10.9%).
Trade-offs: Improvements in real activity and unemployment were achieved without worsening aggregate deficits, indicating better timing/composition rather than larger fiscal outlays.
Limitations signaled by authors: reliance on model specification (FRB/US structure), potential issues with interpretability of RL policies, and the need to validate robustness out-of-sample and under alternative shocks.

Data & Methods

Core macro model: FRB/US — a large-scale, structural U.S. macroeconometric model used by the Federal Reserve Board for forecasting and policy analysis.
RL algorithm: Proximal Policy Optimization (PPO), a policy-gradient method that is stable and sample-efficient for continuous action spaces.
Integration approach:
- The RL agent selects fiscal policy actions (e.g., spending, transfers, tax instruments or their timing/composition) which are fed into FRB/US; FRB/US produces macro outcomes; the agent receives a reward and updates its policy.
- The paper reports using an "active enhancement of relocation mechanism" to allow the agent to reallocate fiscal effort across periods or instruments more flexibly (specific implementation details should be checked in the full text).
Objectives / reward: Multi-objective balancing of inflation control, minimizing unemployment, maximizing output, and fiscal sustainability (debt metrics). Exact functional form and weighting of objectives should be verified in the paper.
Sample / evaluation:
- Reported primary evaluation window is 2000–2024 with backtests and scenario comparisons.
- Recession examples include earlier historical episodes (1982 cited), implying the model was also tested on or calibrated to pre-2000 downturns.
- Key outcome metrics reported: real GDP, unemployment rate, price index (PCPI), federal budget deficit, and debt-to-GDP.
Validation/robustness: The summary reports improved outcomes but does not detail statistical uncertainty, sensitivity checks, or alternative shock scenarios — these are important for assessing reliability.

Implications for AI Economics

Policy-space exploration: RL can systematically search high-dimensional fiscal-policy spaces and uncover timing/composition strategies that conventional scenario-based approaches may miss.
Trade-off management: RL agents can learn dynamic trade-offs across conflicting objectives (inflation vs. unemployment vs. fiscal sustainability), offering quantitatively optimized policy paths rather than ad hoc rules.
Decision-support tool: Combining structural macro models with RL produces candidate policies that policymakers can use for stress-testing or as starting points for deliberation.
Debt management insight: The results suggest RL can exploit expansionary periods to optimize debt trajectories (improving debt-to-GDP without increasing cumulative deficits), highlighting the value of timing and composition in fiscal plans.
Cautions and research priorities:
- Model risk and interpretability: RL policies may be opaque; policymakers require interpretable rules and clear causal channels before adoption.
- Robustness and generalization: Performance should be evaluated across alternative model specifications, shock types, parameter uncertainty, and structural breaks to rule out overfitting to FRB/US idiosyncrasies.
- Constraints and political economy: Real-world constraints (legal, administrative, political feasibility) are not guaranteed to be respected by an unconstrained RL agent — embedding these constraints is necessary for operational use.
- Transparency and governance: Use of AI in fiscal policy necessitates governance frameworks for validation, accountability, and model auditing.
Next steps for research/application:
- Publish reward-function details, constraints, and full algorithmic specification so results can be replicated and stress-tested.
- Run sensitivity analyses (different objective weights, shock distributions, and alternative macro models).
- Develop interpretability methods (e.g., policy saliency, counterfactual decomposition) to translate RL recommendations into actionable fiscal rules.
- Pilot decision-support deployments that keep human-in-the-loop oversight and incorporate political/administrative constraints.

Notes and caveats - Reported magnitudes (e.g., GDP and debt in "trillion $") and date-range mentions (2000–2024 vs. examples from 1982) appear inconsistent and should be checked against the original manuscript for units, scaling, and sample definitions. - The summary reports improvements but does not include statistical significance, confidence intervals, or robustness statistics — consult the full paper for those details before drawing policy conclusions.

Assessment

Paper Typetheoretical Evidence Strengthlow — Findings are produced by simulation: an RL agent interacting with the structural FRB/US macro model rather than by analysis of real-world policy interventions or identification from observational data; results therefore depend on model structure, reward design, and simulation choices, and cannot be interpreted as causal evidence about real-world fiscal policy effects. Methods Rigormedium — The paper integrates a well-known structural macro model (FRB/US) with a modern RL algorithm (PPO), specifies state/action spaces, an active relocation exploration mechanism, and reports historical-period simulations; however, it optimizes a single arbitrary reward (4% GDP target), overrides FRB/US policy equations directly (raising realism concerns), limits action space to three continuous instruments, and provides little evidence of robustness checks, sensitivity analyses, uncertainty quantification, or external validation. SampleSimulated quarterly macroeconomic environment generated by the FRB/US structural model covering 2000–2024 (approx. 100 quarters); state vector includes GDP, inflation, unemployment, interest rates, fiscal variables (government expenditure, transfers, personal and corporate tax), external and financial variables; RL agent (PPO) controls continuous adjustments to government expenditure and personal/corporate tax rates (action bounds given), trained with an active relocation mechanism and a reward function targeting 4% GDP growth. Themesgovernance productivity GeneralizabilityResults are conditional on FRB/US structural assumptions and parametrization; real-world dynamics may differ., Reward function optimizes GDP growth only (g* = 4%), which embeds normative priorities and drives outcomes., Action space is narrow (three instruments) and overrides model equations directly — may be unrealistic politically and institutionally., No empirical validation against actual policy experiments or out-of-sample real-world outcomes., Limited reported robustness/sensitivity analyses and no uncertainty quantification., US-specific model (FRB/US) limits applicability to other countries or institutional settings.

Claims (9)

Claim	Direction	Confidence	Outcome	Details
This research introduces the RL-FRB/US model which integrates the FRB/US macroeconomic model and a Proximal Policy Optimization (PPO) reinforcement learning agent with an active enhancement of a relocation mechanism for fiscal policy optimization. Other	positive	high	Model architecture / method (integration of FRB/US and PPO RL; presence of relocation enhancement)	0.06
The RL-FRB/US model demonstrates significant performance improvements over baseline FRB/US simulations in the period 2000–2024. Fiscal And Macroeconomic	positive	medium	Aggregate performance across multiple macroeconomic outcomes (comparative simulation performance 2000–2024)	reported simulation performance improvement (2000-2024) 0.04
By 2024Q2 the RL-FRB/US model achieved higher real GDP: 23,407 trillion $ versus FRB/US model: 23,218 trillion $. Fiscal And Macroeconomic	positive	medium	Real GDP (trillion $) at 2024Q2	RL-FRB/US 23,407 vs FRB/US 23,218 (difference 189) 0.04
By 2024Q2 the RL-FRB/US model produced lower unemployment: 3.23% versus FRB/US model: 3.96%. Fiscal And Macroeconomic	positive	medium	Unemployment rate (%) at 2024Q2	-0.73 percentage points (3.23% vs 3.96%) 0.04
By 2024Q2 the RL-FRB/US model produced a PCPI of 317.9 versus FRB/US model: 312.3 (reported as evidence of more effective inflation management). Fiscal And Macroeconomic	mixed	medium	PCPI (price index) at 2024Q2	RL-FRB/US: 317.9 vs FRB/US: 312.3 0.04
During recessions the RL-FRB/US model delivered superior counter-cyclical responses, with unemployment peaks significantly reduced—for example, during the 1982 recession peak unemployment reached 9.9% in the RL-FRB/US simulation versus 10.9% in traditional simulations. Employment	positive	medium	Peak unemployment rate (%) during specified recession (1982 example)	Peak unemployment: RL-FRB/US 9.9% vs FRB/US 10.9% 0.04
By 2024 the RL-FRB/US model produced a federal budget deficit similar to the baseline: RL-FRB/US model: -1,767 trillion $ vs. FRB/US model: -1,758 trillion $. Fiscal And Macroeconomic	null_result	medium	Federal budget deficit (trillion $) for 2024	RL-FRB/US: -1,767 trillion $ vs FRB/US: -1,758 trillion $ 0.04
The RL-FRB/US model achieved substantially lower debt (reported as debt-to-GDP ratios) by 2024: RL-FRB/US model: 26,535 trillion $ vs. FRB/US model: 30,186 trillion $, attributed to more strategic debt management during expansionary periods. Fiscal And Macroeconomic	positive	medium	Federal debt level / reported debt-to-GDP metric (trillion $) by 2024	RL-FRB/US: 26,535 trillion $ vs FRB/US: 30,186 trillion $ 0.04
Combining reinforcement learning and macroeconomic modeling (RL-FRB/US) produces more reliable outputs than the traditional FRB/US model, providing policymakers with a powerful decision-support tool to balance inflation control, targeted unemployment, and fiscal sustainability. Decision Quality	positive	low	Overall reliability/usefulness of model outputs for policymaking (qualitative)	0.02