AI that learns from people’s choices predicts risk preferences better than AI fed written instructions, but humans frequently choose or defer to prompts nonetheless — and when prompts and choices disagree, agents often follow the less accurate prompt.

Should I State or Should I Show? Aligning AI with Human Preferences

Keaton Ellis, Wanying Huang · March 31, 2026

arxiv quasi_experimental medium evidence 7/10 relevance Source PDF

In an online experiment, AI agents given subjects' revealed-preference choice data predict individuals' risky choices more accurately than agents given written preference prompts, yet subjects often fail to provide the more informative data and AI systems tend to follow explicit prompts when the two sources conflict.

As AI agents become more autonomous, properly aligning their objectives with human preferences becomes increasingly important. We study how effectively an AI agent learns a human principal's preference in choice under risk via stated versus revealed preferences. We conduct an online experiment in which subjects state their preferences through written instructions ("prompts") and reveal them through choices in a series of binary lottery questions ("data"). We find that on average, an AI agent given revealed-preference data predicts subjects' choices more accurately than an AI agent given stated-preference prompts. Further analysis suggests that the gap is driven by subjects' difficulty in translating their own preferences into written instructions. When given a choice between which information source to give to an AI agent, a large portion of subjects fail to select the more informative one. Moreover, when predictions from the two sources conflict, we find that the AI agent aligns more frequently with the prompt, despite its lower accuracy. Overall, these results highlight the revealed preference approach as a powerful mechanism for communicating human preferences to AI agents, but its success depends on careful implementation.

Summary

Main Finding

AI agents given revealed-preference data (past choices) predict human choices under risk more accurately, on average, than agents given stated-preference prompts (written instructions). The performance gap is largely driven by humans’ difficulty writing informative prompts; synthetic prompts generated by an LLM (AutoPrompt-AI) match the performance of the revealed-data agent. Combining both information sources can worsen performance because agents often defer to the (noisier) prompt when the two sources conflict.

Key Points

Experimental comparison: two LLM-based agents per subject
- Data-AI: given the subject’s earlier choices (revealed preferences).
- Prompt-AI: given the subject’s free-text prompt (stated preferences).
- Both-AI: given both the choices and the prompt.
- AutoPrompt-AI: synthetic prompts generated by an LLM using the same instructions given to subjects.
Primary metric: out-of-sample match rate — share of Part II lottery choices the AI correctly predicts.
Main empirical findings
- Data-AI outperforms Prompt-AI on average across question types (easy, hard, behavioral).
- Heterogeneity: subjects with more behavioral deviations from expected-utility theory are harder to predict; Prompt-AI underperforms Data-AI more for these subjects (up to ~10 percentage points worse for the most biased).
- AutoPrompt-AI (LLM-generated prompts) achieves performance on par with Data-AI, implying better prompts exist and the main bottleneck is human prompt quality.
- When subjects choose which agent to delegate to, 59% choose Data-AI. Subjects’ delegation largely reflects perceived performance (85% choose the agent they believe is (weakly) better), but many misestimate performance and 35% end up delegating to the objectively worse agent.
- Both-AI performs worse than Data-AI and only slightly better than Prompt-AI. In the ~25% of subject-question cases where Data-AI and Prompt-AI conflict, Both-AI tends to follow the Prompt-AI prediction despite its lower accuracy.
Robustness: main analyses use Claude Opus 4.5; a replication using GPT-5.4 yields qualitatively similar results.

Data & Methods

Design
- Online incentivized experiment (oTree on Prolific), U.S. adults, N = 147 completing subjects (passed comprehension).
- Three parts:
  - Part I: subjects answer 13 binary lottery choices (mix of easy, hard, behavioral), then write a free-text prompt describing their preferences, then choose whether to delegate to Data-AI or Prompt-AI.
  - Part II: subjects answer a fresh set of 13 structurally similar binary lotteries (not told these would be used to benchmark AI) — these choices are the ground truth for match-rate evaluation.
  - Part III: belief elicitation about AI accuracy, control tasks (IQ, risk attitude), demographics and personality.
- Lottery design: 13 pairs per part, classified into “easy” (dominance or large EV gaps), “hard” (small EV gaps, subtle dominance), and “behavioral” (common-ratio / common-consequence style tasks to elicit Allais-type violations). Stakes and parameters varied between parts to create out-of-sample evaluation.
Incentives
- $5 base payment; bonus determined by a randomly selected set among (subject’s Part II choices, Data-AI predictions, Prompt-AI predictions, or the agent selected by subject), with one of the 13 problems randomly implemented for payment.
- Additional incentivized belief elicitations and payments for Part III tasks.
AI implementation
- Agents instantiated on Claude Opus 4.5 with extended reasoning; system prompts standardized (appendix).
- Predictions produced for each lottery pair in a standardized format; also replicated on GPT-5.4.
- AutoPrompt-AI: LLM used to generate prompts from the same instructions and data available to subjects.
Outcomes & key quantitative results
- Data-AI mean match rate > Prompt-AI mean match rate (statistically significant; heterogeneity across subjects and question types).
- Prompt-AI performs comparably to Data-AI for subjects with choices consistent with canonical models, but up to ~10 percentage points worse for the most behaviorally biased subjects.
- Delegation: 59% choose Data-AI; 85% choose the agent they believe is weakly better; 35% misdelegate relative to true performance.
- Both-AI underperforms Data-AI; in conflicting cases (~25% of subject-question pairs), Both-AI largely adopts Prompt-AI’s (inferior) prediction.
Pre-registration and IRB: study pre-registered; ethics approval obtained.
Limitations noted by authors: lab-style choice-under-risk environment (binary lotteries) may not generalize to richer, high-dimensional real-world tasks; reliance on two frontier LLMs at a point in time; sample of online U.S. adults.

Implications for AI Economics

Revealed preferences are a strong practical channel for aligning agentic AI with human principals, especially when principals have systematic behavioral deviations from canonical models. Logged past choices are informative and, when used directly, can outperform free-text preference statements.
Specification hazard: human inability to fully and precisely state preferences (prompt-writing frictions) is a first-order driver of misalignment; interventions that reduce this friction (tooling, templates, assisted prompt generation) can materially improve alignment.
LLMs as preference mediators: LLMs can both read past choices and generate better prompts (AutoPrompt). This suggests a hybrid operational model: use revealed-choice data to infer structure and have LLMs translate that structure into robust, generalizable instructions for downstream decision-making.
Delegation and belief distortions matter for adoption and welfare: even when revealed data is more informative, many principals misperceive comparative agent performance and may delegate suboptimally. Policies and interfaces should surface transparent, calibrated performance metrics to aid delegation decisions.
Combining information sources requires careful conflict-resolution design: naively feeding both stated and revealed preferences to an LLM can reduce performance if the model overweight s noisier stated input. Mechanisms for credibility-weighting, provenance-aware prompts, or model architectures that probabilistically reconcile sources are needed.
Practical recommendations for designers and policymakers
- Default to leveraging revealed-preference logs where available; provide clear options for users to opt in and understand trade-offs.
- Build assistive prompt-writing tools (LLM-assisted or template-based) that translate users’ past choices into high-quality, generalizable instructions.
- Report out-of-sample predictive accuracy and conflict cases to users when offering delegation choices; consider decision aids that simulate agent behavior under different information regimes.
- When combining inputs, explicitly model and communicate uncertainty and provenance so agents can downweight noisier stated inputs.
Research directions
- Test these findings in richer, high-dimensional tasks (planning, scheduling, procurement) and field settings where preference stakes and dynamics differ.
- Investigate algorithmic methods for formal reconciliation of stated vs revealed signals (Bayesian models, ensemble approaches, provenance-aware prompting).
- Study longitudinal effects: whether individuals learn to write better prompts with feedback, and whether alignment improves as agents observe more revealed choices over time.
Caution: while encouraging for using revealed data, ethical and privacy concerns about continuous preference logging, consent, and potential manipulation must be addressed in deployment.

Overall, the paper provides experimental evidence that revealed-preference information is a powerful channel for aligning AI to human principals, but achieving robust real-world gains requires better prompt-assisted interfaces, calibrated delegation support, and principled ways to reconcile conflicting information.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — The within-experiment comparison of prediction accuracy gives fairly strong internal evidence that revealed-choice data are more informative than written prompts for this task, with direct measurement and out-of-sample testing; however, external validity is limited by a single, simplified decision domain (binary lotteries), an online convenience sample, unspecified sample size and model variants, possible measurement error in prompts, and lack of evidence on real-world consequential decisions or multiple AI architectures. Methods Rigormedium — The study uses an experimental design with direct behavioral measures and evaluates model predictions on held-out choice data, which is appropriate for the research question; nonetheless the paper (as summarized) leaves open details about randomization procedures, sample size and power, pre-registration, model training/tuning procedures, robustness across AI architectures, incentive structure (stakes), and potential demand effects—issues that could affect inference and replicability. SampleOnline adult subjects who (a) wrote free‑text instructions describing their preferences over risky lotteries (prompts) and (b) revealed preferences by answering a sequence of binary lottery choice questions (choice data); the dataset includes the written prompts, choice sequences (used as both training and held-out test data), and the AI agents' predictions; precise sample size, recruitment platform, and demographic composition not specified in the summary. Themeshuman_ai_collab adoption IdentificationCompare predictive accuracy of AI agents given two different information sources (stated-preference prompts vs. revealed-preference choice data) using an online experimental dataset of subjects who both write instructions and make a sequence of binary lottery choices; predictions are evaluated on held-out choices and accuracy differences tested statistically; additional analyses examine subjects' selection of information source and AI alignment when the two sources conflict. GeneralizabilityFindings are limited to a laboratory-style task (binary lottery choices) that may not reflect complex real-world preference signals., Online convenience sample (platform and demographics unspecified) may not represent broader populations or decision contexts., Single decision domain (risk choices) — results may not hold for moral preferences, multi-attribute decisions, or dynamic settings., Stakes and incentives in the experiment may be low relative to consequential real-world decisions, affecting behavior and prompt quality., Results depend on the specific AI model(s) and prompt-processing methods used; other architectures or fine-tuning approaches may yield different gaps., Language proficiency and ability to articulate preferences in writing may drive results, limiting applicability across languages and literacy levels.

Claims (5)

Claim	Direction	Confidence	Outcome	Details
An AI agent given revealed-preference data predicts subjects' choices more accurately than an AI agent given stated-preference prompts. Decision Quality	positive	high	prediction accuracy of AI agent for subjects' choices	0.48
The gap in predictive accuracy is driven by subjects' difficulty in translating their own preferences into written instructions. Decision Quality	negative	medium	degree to which prompt quality explains predictive accuracy gap (i.e., translation fidelity from preference to written instruction)	0.29
When given a choice between which information source to give to an AI agent, a large portion of subjects fail to select the more informative one. Decision Quality	negative	medium	choice by subjects of which information source to provide to the AI (rate of selecting the more informative source)	0.29
When predictions from the two sources conflict, the AI agent aligns more frequently with the prompt, despite its lower accuracy. Decision Quality	negative	high	frequency of AI alignment with prompt versus revealed-preference prediction in conflict cases	0.48
The revealed preference approach is a powerful mechanism for communicating human preferences to AI agents, but its success depends on careful implementation. Decision Quality	positive	medium	effectiveness of revealed-preference communication for aligning AI with human preferences	0.05