Teaching a time-series foundation model basic economic rationality improves demand forecasts: Chronos-2 fine-tuned on synthetic, GARP-consistent consumer histories predicts real consumer choices markedly better than the zero-shot model across all tested horizons. The result indicates economic theory can supply structured synthetic data that acts as a powerful forecasting prior.
Modern pretrained time-series foundation models can forecast without task-specific training, but they do not fully incorporate economic behavior. We show that teaching them basic economic logic improves how they predict demand using an experimental panel. We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents. We exploit Afriat's theorem, which guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint. GARP is a simple condition to check that allows us to generate time series from a large class of utilities efficiently. The fine-tuned model serves as a rationality-constrained forecasting prior: it learns price-quantity relations from GARP-consistent synthetic histories and then uses those relations to predict the choices of real consumers. We find that fine-tuning on GARP-consistent synthetic data substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study. Our results show that economic theory can be used to generate structured synthetic data that improves foundation-model predictions when the theory implies observable patterns in the data.
Summary
Main Finding
Fine-tuning a pretrained time-series foundation model (Amazon Chronos-2) on synthetic panels of price–quantity histories that satisfy the Generalized Axiom of Revealed Preference (GARP) meaningfully improves out-of-sample demand forecasts. Relative to zero-shot Chronos-2, the GARP-fine-tuned model reduces bundle prediction error by ~31% at horizon H=1 and by ~17–18% at horizons H∈{5,10,15}.
Key Points
- Idea: Use revealed-preference theory (Afriat’s theorem / GARP) to generate large amounts of economically coherent synthetic time series (price–quantity histories) and fine-tune a foundation forecasting model on those histories so it learns price–quantity relations as a structured prior.
- Synthetic data are GARP-consistent histories of utility-maximizing agents (no parametric utility specified); Afriat’s theorem guarantees this captures a very large class of utilities.
- Fine-tuning strategy: LoRA (low-rank adaptation) on Chronos-2 (keeping most pretrained weights frozen), so adaptation is computationally cheap and avoids overfitting.
- Empirical test: an experiment panel (Ahn et al., 2014) with N=154 subjects, T=50 periods, 3 goods/prices, budget m=100; only 13% of subjects pass GARP in the real data (so subjects are typically only approximately rational).
- Performance: GARP 50K (LoRA fine-tuned on 50k synthetic agents) beats zero-shot Chronos-2 across all evaluated horizons and metrics (MASE by good, bundle ℓ2, bundle fitness). Example numbers (average across consumers):
- H=1: bundle ℓ2 20.53 → 14.09 (≈31% reduction)
- H=5: bundle ℓ2 16.97 → 14.04 (≈17% reduction)
- H=10: bundle ℓ2 16.83 → 13.92 (≈17% reduction)
- H=15: bundle ℓ2 17.29 → 14.23 (≈18% reduction)
- The synthetic panels were never mixed with real panels; they serve as a prior via fine-tuning before evaluation on real data.
- The approach complements (rather than replaces) structural estimation or constraint-based regularization: it injects economic structure through the training distribution rather than penalties.
Data & Methods
- Foundation model: Amazon Chronos-2 (T5-style encoder-only transformer, probabilistic forecasting, multivariate mode). Inputs: historical targets (K×L), past_covariates (past prices), future_covariates (known future prices); outputs 21 quantiles, median used for point forecasts.
- Fine-tuning: LoRA (Hu et al., 2021) inserted into attention projections; only low-rank matrices trained (~1–2% of parameters).
- LoRA config used: rank r=16, α=32, steps=5000, learning rate=1e-5 with cosine decay, batch size 64, context length 35, prediction length 15, optimizer AdamW, precision bfloat16.
- Synthetic data generation:
- Use a GARP sampler (simGarpPriceWealth from a revealedPrefs fork) that draws candidate budget shares from Dirichlet(1,1,1) and accepts proposals only if adding the period preserves GARP. This samples from the GARP polytope and thus from histories rationalizable by some locally non-satiated utility.
- Synthetic price DGP matches empirical experiment: each price ~ LogNormal(log 3, 0.5); budget m=100; T=50 periods per synthetic agent.
- Training set: 50,000 synthetic agents (each iid price path in main spec); training/validation split 80/20.
- Empirical evaluation:
- Real panel: Ahn et al. (2014) experimental portfolio choices, N=154, T=50, 3 goods, budget m=100.
- Train/test protocol: use synthetic-only fine-tuning; for evaluation use each consumer’s first 35 periods as context and forecast H∈{1,5,10,15} holdouts.
- Metrics: MASE per good, bundle ℓ2 (mean per-period Euclidean distance), bundle fitness (normalized ℓ2 → higher is better).
- Baselines: zero-shot Chronos-2 (no fine-tuning) and a random feasible-budget benchmark (uniform Dirichlet draws per holdout).
Implications for AI Economics
- Practical channel for theory → foundation models: Economic theory (here, revealed-preference consistency) can be used to build structured synthetic corpora that shape foundation-model representations and improve forecasting, without estimating a parametric structural model.
- Middle ground between structural and ML approaches: This method retains the forecasting power and flexibility of foundation models while embedding qualitative economic constraints (substitution via budget constraints) via training data rather than hard constraints or penalties.
- Scalability and generality: GARP-based simulation is computationally efficient and, by Afriat’s theorem, spans a very broad class of utility rationalizations—making it applicable to higher-dimensional goods and longer time series without specifying functional forms.
- Interpretability and diagnostics: The fine-tuned model’s predictions can be evaluated for revealed-preference properties (e.g., approximate GARP / Afriat efficiency), offering a route to implicit economic interpretability even when no explicit utility is recovered.
- Potential applications: demand forecasting in retail/markets with rich price variation, economic counterfactuals (if combined with appropriate price/intervention DGPs), and improved priors when real data are noisy or only approximately rational.
- Caveats & open questions:
- Dependence on synthetic-data design: gains may depend on matching price DGP and budget structure to the real environment; sensitivity to sampler choices, Dirichlet proposals, and synthetic heterogeneity should be studied.
- Approximate rationality: most real agents failed GARP here, yet improvements occurred—understanding when and why a rationality-based prior helps (vs. harms) in more noisy or strategic settings is critical.
- Endogeneity & policy settings: experiment used exogenous prices; extending to market settings with endogenous prices requires combining this approach with causal/economic identification strategies.
- Complementarity with regularization: combining synthetic-data priors with constraint-based regularizers (e.g., Slutsky/CCEI penalties) or partial structural estimation could yield further gains; tuning and stability remain open.
- Research directions: explore other economic constraints (e.g., Slutsky symmetry/negative semidefiniteness), mixtures of synthetic and real data during fine-tuning, robustness across product spaces and market settings, and whether synthetic priors can improve counterfactual inference or policy evaluation when integrated with causal frameworks.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Modern pretrained time-series foundation models can forecast without task-specific training, but they do not fully incorporate economic behavior. Other | mixed | high | ability of pretrained time-series models to forecast and degree to which they incorporate economic behavior |
0.48
|
| Teaching them basic economic logic improves how they predict demand using an experimental panel. Output Quality | positive | high | prediction accuracy of consumer demand |
0.48
|
| We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents. Other | null_result | high | model fine-tuning procedure / training data source |
0.8
|
| Afriat's theorem guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint. Other | null_result | high | logical equivalence between GARP and utility-maximizing demand |
0.8
|
| GARP is a simple condition to check that allows us to generate time series from a large class of utilities efficiently. Other | positive | high | feasibility/efficiency of generating synthetic time series from utility classes |
0.48
|
| The fine-tuned model serves as a rationality-constrained forecasting prior: it learns price-quantity relations from GARP-consistent synthetic histories and then uses those relations to predict the choices of real consumers. Output Quality | positive | high | model's ability to predict real consumer choices (use of learned price-quantity relations) |
0.48
|
| Fine-tuning on GARP-consistent synthetic data substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study. Output Quality | positive | high | forecast prediction accuracy across forecast horizons |
substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study
0.48
|
| Economic theory can be used to generate structured synthetic data that improves foundation-model predictions when the theory implies observable patterns in the data. Output Quality | positive | high | improvement in foundation-model prediction accuracy when using theory-generated synthetic data |
0.48
|