Teaching a time-series foundation model basic economic rationality improves demand forecasts: Chronos-2 fine-tuned on synthetic, GARP-consistent consumer histories predicts real consumer choices markedly better than the zero-shot model across all tested horizons. The result indicates economic theory can supply structured synthetic data that acts as a powerful forecasting prior.

GARP-EFM: Improving Foundation Models with Revealed Preference Structure

Victor H. Aguiar, Nail Kashaev · March 25, 2026

arxiv quasi_experimental medium evidence 8/10 relevance Source PDF

Fine-tuning a transformer time-series foundation model on synthetic demand histories constrained by revealed-preference (GARP) substantially improves out-of-sample demand forecasts relative to the zero-shot model.

Modern pretrained time-series foundation models can forecast without task-specific training, but they do not fully incorporate economic behavior. We show that teaching them basic economic logic improves how they predict demand using an experimental panel. We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents. We exploit Afriat's theorem, which guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint. GARP is a simple condition to check that allows us to generate time series from a large class of utilities efficiently. The fine-tuned model serves as a rationality-constrained forecasting prior: it learns price-quantity relations from GARP-consistent synthetic histories and then uses those relations to predict the choices of real consumers. We find that fine-tuning on GARP-consistent synthetic data substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study. Our results show that economic theory can be used to generate structured synthetic data that improves foundation-model predictions when the theory implies observable patterns in the data.

Summary

Main Finding

Fine-tuning a pretrained time-series foundation model (Amazon Chronos-2) on synthetic panels of price–quantity histories that satisfy the Generalized Axiom of Revealed Preference (GARP) meaningfully improves out-of-sample demand forecasts. Relative to zero-shot Chronos-2, the GARP-fine-tuned model reduces bundle prediction error by ~31% at horizon H=1 and by ~17–18% at horizons H∈{5,10,15}.

Key Points

Idea: Use revealed-preference theory (Afriat’s theorem / GARP) to generate large amounts of economically coherent synthetic time series (price–quantity histories) and fine-tune a foundation forecasting model on those histories so it learns price–quantity relations as a structured prior.
Synthetic data are GARP-consistent histories of utility-maximizing agents (no parametric utility specified); Afriat’s theorem guarantees this captures a very large class of utilities.
Fine-tuning strategy: LoRA (low-rank adaptation) on Chronos-2 (keeping most pretrained weights frozen), so adaptation is computationally cheap and avoids overfitting.
Empirical test: an experiment panel (Ahn et al., 2014) with N=154 subjects, T=50 periods, 3 goods/prices, budget m=100; only 13% of subjects pass GARP in the real data (so subjects are typically only approximately rational).
Performance: GARP 50K (LoRA fine-tuned on 50k synthetic agents) beats zero-shot Chronos-2 across all evaluated horizons and metrics (MASE by good, bundle ℓ2, bundle fitness). Example numbers (average across consumers):
- H=1: bundle ℓ2 20.53 → 14.09 (≈31% reduction)
- H=5: bundle ℓ2 16.97 → 14.04 (≈17% reduction)
- H=10: bundle ℓ2 16.83 → 13.92 (≈17% reduction)
- H=15: bundle ℓ2 17.29 → 14.23 (≈18% reduction)
The synthetic panels were never mixed with real panels; they serve as a prior via fine-tuning before evaluation on real data.
The approach complements (rather than replaces) structural estimation or constraint-based regularization: it injects economic structure through the training distribution rather than penalties.

Data & Methods

Foundation model: Amazon Chronos-2 (T5-style encoder-only transformer, probabilistic forecasting, multivariate mode). Inputs: historical targets (K×L), past_covariates (past prices), future_covariates (known future prices); outputs 21 quantiles, median used for point forecasts.
Fine-tuning: LoRA (Hu et al., 2021) inserted into attention projections; only low-rank matrices trained (~1–2% of parameters).
- LoRA config used: rank r=16, α=32, steps=5000, learning rate=1e-5 with cosine decay, batch size 64, context length 35, prediction length 15, optimizer AdamW, precision bfloat16.
Synthetic data generation:
- Use a GARP sampler (simGarpPriceWealth from a revealedPrefs fork) that draws candidate budget shares from Dirichlet(1,1,1) and accepts proposals only if adding the period preserves GARP. This samples from the GARP polytope and thus from histories rationalizable by some locally non-satiated utility.
- Synthetic price DGP matches empirical experiment: each price ~ LogNormal(log 3, 0.5); budget m=100; T=50 periods per synthetic agent.
- Training set: 50,000 synthetic agents (each iid price path in main spec); training/validation split 80/20.
Empirical evaluation:
- Real panel: Ahn et al. (2014) experimental portfolio choices, N=154, T=50, 3 goods, budget m=100.
- Train/test protocol: use synthetic-only fine-tuning; for evaluation use each consumer’s first 35 periods as context and forecast H∈{1,5,10,15} holdouts.
- Metrics: MASE per good, bundle ℓ2 (mean per-period Euclidean distance), bundle fitness (normalized ℓ2 → higher is better).
Baselines: zero-shot Chronos-2 (no fine-tuning) and a random feasible-budget benchmark (uniform Dirichlet draws per holdout).

Implications for AI Economics

Practical channel for theory → foundation models: Economic theory (here, revealed-preference consistency) can be used to build structured synthetic corpora that shape foundation-model representations and improve forecasting, without estimating a parametric structural model.
Middle ground between structural and ML approaches: This method retains the forecasting power and flexibility of foundation models while embedding qualitative economic constraints (substitution via budget constraints) via training data rather than hard constraints or penalties.
Scalability and generality: GARP-based simulation is computationally efficient and, by Afriat’s theorem, spans a very broad class of utility rationalizations—making it applicable to higher-dimensional goods and longer time series without specifying functional forms.
Interpretability and diagnostics: The fine-tuned model’s predictions can be evaluated for revealed-preference properties (e.g., approximate GARP / Afriat efficiency), offering a route to implicit economic interpretability even when no explicit utility is recovered.
Potential applications: demand forecasting in retail/markets with rich price variation, economic counterfactuals (if combined with appropriate price/intervention DGPs), and improved priors when real data are noisy or only approximately rational.
Caveats & open questions:
- Dependence on synthetic-data design: gains may depend on matching price DGP and budget structure to the real environment; sensitivity to sampler choices, Dirichlet proposals, and synthetic heterogeneity should be studied.
- Approximate rationality: most real agents failed GARP here, yet improvements occurred—understanding when and why a rationality-based prior helps (vs. harms) in more noisy or strategic settings is critical.
- Endogeneity & policy settings: experiment used exogenous prices; extending to market settings with endogenous prices requires combining this approach with causal/economic identification strategies.
- Complementarity with regularization: combining synthetic-data priors with constraint-based regularizers (e.g., Slutsky/CCEI penalties) or partial structural estimation could yield further gains; tuning and stability remain open.
Research directions: explore other economic constraints (e.g., Slutsky symmetry/negative semidefiniteness), mixtures of synthetic and real data during fine-tuning, robustness across product spaces and market settings, and whether synthetic priors can improve counterfactual inference or policy evaluation when integrated with causal frameworks.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — The authors report consistent out-of-sample forecasting gains across horizons using a clear intervention (fine-tuning on GARP-consistent synthetic data) and evaluation on a real experimental panel, which provides direct empirical evidence of model improvement; however, the result rests on synthetic training data whose assumptions (utility maximization/GARP) may not hold broadly, details on sample size/heterogeneity and robustness checks are not provided in the summary, and there is limited field validation. Methods Rigormedium — The approach combines a principled economic-theory constraint (Afriat's theorem) with state-of-the-art time-series foundation models and an experimental evaluation, which is methodologically sound; but potential weaknesses include reliance on synthetic data generation choices, possible sensitivity to hyperparameters or fine-tuning regimen, unclear handling of price endogeneity or noisy violations of GARP in real data, and limited information about robustness and external validation. SampleSynthetic training data: large sets of price-quantity time series generated from utility-maximizing agents using Afriat's theorem to enforce GARP consistency across a wide class of utilities; Model: Amazon Chronos-2 (transformer-based probabilistic time-series foundation model) fine-tuned on the synthetic histories; Evaluation data: an experimental panel of real consumers' price-quantity histories (details on panel size, goods, and sampling not provided in the summary). Themesproductivity innovation IdentificationNo causal identification in the econometric sense; the paper measures predictive improvement by fine-tuning a transformer-based probabilistic time-series model (Amazon Chronos-2) on synthetic demand histories generated to satisfy Afriat's theorem (GARP-consistent utility-maximizing behavior) and comparing out-of-sample forecasting performance against the zero-shot Chronos-2 baseline on an experimental panel of real consumers. GeneralizabilitySynthetic-data assumptions may not hold: real consumers may systematically violate GARP (behavioral biases, framing, stochastic choice)., Evaluation limited to an experimental panel (potentially small or non-representative) rather than large-scale field/market data., Method tested on one foundation model (Chronos-2); transferability to other architectures or pretraining regimes is untested., Focus on individual demand price-quantity series — results may not extend to aggregate demand, firm-level forecasting, or markets with strategic interactions., No assessment of robustness to price endogeneity, temporal nonstationarity, or structural breaks common in real economic time series.

Claims (8)

Claim	Direction	Confidence	Outcome	Details
Modern pretrained time-series foundation models can forecast without task-specific training, but they do not fully incorporate economic behavior. Other	mixed	high	ability of pretrained time-series models to forecast and degree to which they incorporate economic behavior	0.48
Teaching them basic economic logic improves how they predict demand using an experimental panel. Output Quality	positive	high	prediction accuracy of consumer demand	0.48
We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents. Other	null_result	high	model fine-tuning procedure / training data source	0.8
Afriat's theorem guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint. Other	null_result	high	logical equivalence between GARP and utility-maximizing demand	0.8
GARP is a simple condition to check that allows us to generate time series from a large class of utilities efficiently. Other	positive	high	feasibility/efficiency of generating synthetic time series from utility classes	0.48
The fine-tuned model serves as a rationality-constrained forecasting prior: it learns price-quantity relations from GARP-consistent synthetic histories and then uses those relations to predict the choices of real consumers. Output Quality	positive	high	model's ability to predict real consumer choices (use of learned price-quantity relations)	0.48
Fine-tuning on GARP-consistent synthetic data substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study. Output Quality	positive	high	forecast prediction accuracy across forecast horizons	substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study 0.48
Economic theory can be used to generate structured synthetic data that improves foundation-model predictions when the theory implies observable patterns in the data. Output Quality	positive	high	improvement in foundation-model prediction accuracy when using theory-generated synthetic data	0.48