AI that learns from people’s choices predicts risk preferences better than AI fed written instructions, but humans frequently choose or defer to prompts nonetheless — and when prompts and choices disagree, agents often follow the less accurate prompt.
As AI agents become more autonomous, properly aligning their objectives with human preferences becomes increasingly important. We study how effectively an AI agent learns a human principal's preference in choice under risk via stated versus revealed preferences. We conduct an online experiment in which subjects state their preferences through written instructions ("prompts") and reveal them through choices in a series of binary lottery questions ("data"). We find that on average, an AI agent given revealed-preference data predicts subjects' choices more accurately than an AI agent given stated-preference prompts. Further analysis suggests that the gap is driven by subjects' difficulty in translating their own preferences into written instructions. When given a choice between which information source to give to an AI agent, a large portion of subjects fail to select the more informative one. Moreover, when predictions from the two sources conflict, we find that the AI agent aligns more frequently with the prompt, despite its lower accuracy. Overall, these results highlight the revealed preference approach as a powerful mechanism for communicating human preferences to AI agents, but its success depends on careful implementation.
Summary
Main Finding
AI agents given revealed-preference data (past choices) predict human choices under risk more accurately, on average, than agents given stated-preference prompts (written instructions). The performance gap is largely driven by humans’ difficulty writing informative prompts; synthetic prompts generated by an LLM (AutoPrompt-AI) match the performance of the revealed-data agent. Combining both information sources can worsen performance because agents often defer to the (noisier) prompt when the two sources conflict.
Key Points
- Experimental comparison: two LLM-based agents per subject
- Data-AI: given the subject’s earlier choices (revealed preferences).
- Prompt-AI: given the subject’s free-text prompt (stated preferences).
- Both-AI: given both the choices and the prompt.
- AutoPrompt-AI: synthetic prompts generated by an LLM using the same instructions given to subjects.
- Primary metric: out-of-sample match rate — share of Part II lottery choices the AI correctly predicts.
- Main empirical findings
- Data-AI outperforms Prompt-AI on average across question types (easy, hard, behavioral).
- Heterogeneity: subjects with more behavioral deviations from expected-utility theory are harder to predict; Prompt-AI underperforms Data-AI more for these subjects (up to ~10 percentage points worse for the most biased).
- AutoPrompt-AI (LLM-generated prompts) achieves performance on par with Data-AI, implying better prompts exist and the main bottleneck is human prompt quality.
- When subjects choose which agent to delegate to, 59% choose Data-AI. Subjects’ delegation largely reflects perceived performance (85% choose the agent they believe is (weakly) better), but many misestimate performance and 35% end up delegating to the objectively worse agent.
- Both-AI performs worse than Data-AI and only slightly better than Prompt-AI. In the ~25% of subject-question cases where Data-AI and Prompt-AI conflict, Both-AI tends to follow the Prompt-AI prediction despite its lower accuracy.
- Robustness: main analyses use Claude Opus 4.5; a replication using GPT-5.4 yields qualitatively similar results.
Data & Methods
- Design
- Online incentivized experiment (oTree on Prolific), U.S. adults, N = 147 completing subjects (passed comprehension).
- Three parts:
- Part I: subjects answer 13 binary lottery choices (mix of easy, hard, behavioral), then write a free-text prompt describing their preferences, then choose whether to delegate to Data-AI or Prompt-AI.
- Part II: subjects answer a fresh set of 13 structurally similar binary lotteries (not told these would be used to benchmark AI) — these choices are the ground truth for match-rate evaluation.
- Part III: belief elicitation about AI accuracy, control tasks (IQ, risk attitude), demographics and personality.
- Lottery design: 13 pairs per part, classified into “easy” (dominance or large EV gaps), “hard” (small EV gaps, subtle dominance), and “behavioral” (common-ratio / common-consequence style tasks to elicit Allais-type violations). Stakes and parameters varied between parts to create out-of-sample evaluation.
- Incentives
- $5 base payment; bonus determined by a randomly selected set among (subject’s Part II choices, Data-AI predictions, Prompt-AI predictions, or the agent selected by subject), with one of the 13 problems randomly implemented for payment.
- Additional incentivized belief elicitations and payments for Part III tasks.
- AI implementation
- Agents instantiated on Claude Opus 4.5 with extended reasoning; system prompts standardized (appendix).
- Predictions produced for each lottery pair in a standardized format; also replicated on GPT-5.4.
- AutoPrompt-AI: LLM used to generate prompts from the same instructions and data available to subjects.
- Outcomes & key quantitative results
- Data-AI mean match rate > Prompt-AI mean match rate (statistically significant; heterogeneity across subjects and question types).
- Prompt-AI performs comparably to Data-AI for subjects with choices consistent with canonical models, but up to ~10 percentage points worse for the most behaviorally biased subjects.
- Delegation: 59% choose Data-AI; 85% choose the agent they believe is weakly better; 35% misdelegate relative to true performance.
- Both-AI underperforms Data-AI; in conflicting cases (~25% of subject-question pairs), Both-AI largely adopts Prompt-AI’s (inferior) prediction.
- Pre-registration and IRB: study pre-registered; ethics approval obtained.
- Limitations noted by authors: lab-style choice-under-risk environment (binary lotteries) may not generalize to richer, high-dimensional real-world tasks; reliance on two frontier LLMs at a point in time; sample of online U.S. adults.
Implications for AI Economics
- Revealed preferences are a strong practical channel for aligning agentic AI with human principals, especially when principals have systematic behavioral deviations from canonical models. Logged past choices are informative and, when used directly, can outperform free-text preference statements.
- Specification hazard: human inability to fully and precisely state preferences (prompt-writing frictions) is a first-order driver of misalignment; interventions that reduce this friction (tooling, templates, assisted prompt generation) can materially improve alignment.
- LLMs as preference mediators: LLMs can both read past choices and generate better prompts (AutoPrompt). This suggests a hybrid operational model: use revealed-choice data to infer structure and have LLMs translate that structure into robust, generalizable instructions for downstream decision-making.
- Delegation and belief distortions matter for adoption and welfare: even when revealed data is more informative, many principals misperceive comparative agent performance and may delegate suboptimally. Policies and interfaces should surface transparent, calibrated performance metrics to aid delegation decisions.
- Combining information sources requires careful conflict-resolution design: naively feeding both stated and revealed preferences to an LLM can reduce performance if the model overweight s noisier stated input. Mechanisms for credibility-weighting, provenance-aware prompts, or model architectures that probabilistically reconcile sources are needed.
- Practical recommendations for designers and policymakers
- Default to leveraging revealed-preference logs where available; provide clear options for users to opt in and understand trade-offs.
- Build assistive prompt-writing tools (LLM-assisted or template-based) that translate users’ past choices into high-quality, generalizable instructions.
- Report out-of-sample predictive accuracy and conflict cases to users when offering delegation choices; consider decision aids that simulate agent behavior under different information regimes.
- When combining inputs, explicitly model and communicate uncertainty and provenance so agents can downweight noisier stated inputs.
- Research directions
- Test these findings in richer, high-dimensional tasks (planning, scheduling, procurement) and field settings where preference stakes and dynamics differ.
- Investigate algorithmic methods for formal reconciliation of stated vs revealed signals (Bayesian models, ensemble approaches, provenance-aware prompting).
- Study longitudinal effects: whether individuals learn to write better prompts with feedback, and whether alignment improves as agents observe more revealed choices over time.
- Caution: while encouraging for using revealed data, ethical and privacy concerns about continuous preference logging, consent, and potential manipulation must be addressed in deployment.
Overall, the paper provides experimental evidence that revealed-preference information is a powerful channel for aligning AI to human principals, but achieving robust real-world gains requires better prompt-assisted interfaces, calibrated delegation support, and principled ways to reconcile conflicting information.
Assessment
Claims (5)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| An AI agent given revealed-preference data predicts subjects' choices more accurately than an AI agent given stated-preference prompts. Decision Quality | positive | high | prediction accuracy of AI agent for subjects' choices |
0.48
|
| The gap in predictive accuracy is driven by subjects' difficulty in translating their own preferences into written instructions. Decision Quality | negative | medium | degree to which prompt quality explains predictive accuracy gap (i.e., translation fidelity from preference to written instruction) |
0.29
|
| When given a choice between which information source to give to an AI agent, a large portion of subjects fail to select the more informative one. Decision Quality | negative | medium | choice by subjects of which information source to provide to the AI (rate of selecting the more informative source) |
0.29
|
| When predictions from the two sources conflict, the AI agent aligns more frequently with the prompt, despite its lower accuracy. Decision Quality | negative | high | frequency of AI alignment with prompt versus revealed-preference prediction in conflict cases |
0.48
|
| The revealed preference approach is a powerful mechanism for communicating human preferences to AI agents, but its success depends on careful implementation. Decision Quality | positive | medium | effectiveness of revealed-preference communication for aligning AI with human preferences |
0.05
|