A simple revealed-preference model shows AI choices can be decomposed into human and machine preference components, and the degree of alignment can be recovered from choice-data; observing human choices makes recovery trivial, but even AI-only data suffices under generic menu variation.

A Revealed Preference Framework for AI Alignment

Elchin Suleymanov · March 29, 2026

arxiv theoretical n/a evidence 8/10 relevance Source PDF

The paper proposes the Luce Alignment Model — modeling AI choices as a mixture of human and AI Luce choice rules — and shows the mixture weight (alignment) is generically identifiable from observed choice probabilities in both lab and field settings.

Human decision makers increasingly delegate choices to AI agents, raising a natural question: does the AI implement the human principal's preferences or pursue its own? To study this question using revealed preference techniques, I introduce the Luce Alignment Model, where the AI's choices are a mixture of two Luce rules, one reflecting the human's preferences and the other the AI's. I show that the AI's alignment (similarity of human and AI preferences) can be generically identified in two settings: the laboratory setting, where both human and AI choices are observed, and the field setting, where only AI choices are observed.

Summary

Main Finding

The paper introduces the Luce Alignment Model (LAM): when an AI chooses on behalf of a human principal, its stochastic choices are a mixture of two Luce (multinomial logit) rules — one generated by the human utility u and one by the AI’s intrinsic utility v — mixed with weight α (compliance). Using revealed-preference techniques the author shows that:

In the laboratory setting (both the human’s stochastic choice rule ρH and the AI’s ρAI observed), the alignment (relative shape of u and v) and the compliance parameter α are generically identified whenever ρAI violates the Luce Independence of Irrelevant Alternatives (IIA) property. Closed-form expressions for α (and recovery of u and v up to scale) are derived using algebraic “instability” measures.
In the field setting (only ρAI observed), the underlying pair of utilities is generically identified up to a label swap (u ↔ v). Compliance α is only identified up to reflection about 1/2 (α versus 1−α) unless an exogenous assumption on the side of compliance is imposed. Identification requires sufficiently many alternatives (N ≥ 4) and generic conditions.

Key Points

Model (LAM): ρAI(x,S) = α · u(x)/Σy∈S u(y) + (1−α) · v(x)/Σy∈S v(y). Here u is the human utility, v is the AI’s intrinsic utility, α∈[0,1] is compliance.
Special cases: α=1 (perfect compliance), v ∝ u (perfect alignment), α=0 (autonomous AI). Perfect compliance and perfect alignment produce observationally identical choice behavior, but they have different welfare/interpretation implications.
IIA role: The Luce (single-logit) rule implies IIA. Proposition: ρAI satisfies IIA iff α∈{0,1} or v ∝ u. Thus observed IIA violations in ρAI signal partial compliance (α∈(0,1)) together with misalignment (v not proportional to u).
Instability measures: The paper defines own-instability ∆, cross-instability Γ, and composite-instability Φ for tuples (x,y,S,T). These algebraic quantities capture deviations from IIA and are used to recover α and the two utilities in the lab setting. Composite instability detects differences in u and v.
Identification strategy (laboratory):
Recover u from ρH via standard Luce/IIA normalization (u recovered up to scale).
Use IIA violations and instability measures computed from (ρAI,ρH) to obtain α in closed form.
Given α and ρAI, recover v (up to scale).
Identification strategy (field): Observing only ρAI yields a generic identification of the pair {u,v} (their distribution/values) but not which is the human’s or AI’s utility; similarly α cannot be distinguished from 1−α without extra assumptions. Constructive identification is provided under genericity and sufficient menu richness (≥4 alternatives).
Relation to econometric literature: LAM = 2-point mixed multinomial logit (2-MNL / MMNL with binary support). The paper connects revealed-preference identification methods in abstract menu settings to MMNL identifiability results in product-characteristic settings, and offers a constructive alternative to prior identification results in the 2-MNL literature.

Data & Methods

Data primitive: stochastic choice functions over menus S drawn from a finite set X (|X| = N ≥ 3). Two observational regimes:
- Laboratory: both ρH and ρAI observed (possibly ρH elicited or synthetically provided).
- Field: only ρAI observed.
Structural assumptions:
- Human (ρH) and autonomous-AI (ρA) follow Luce rules with positive utilities u,v : X → R++.
- AI mixes the two Luce rules with fixed, menu-independent weight α.
- Positivity (all observed choice probabilities > 0).
Analytical approach:
- Revealed-preference algebraic derivations exploiting the Luce representation and IIA implications.
- Definition and algebraic manipulation of instability measures (∆, Γ, Φ) to detect and quantify departures from IIA and to isolate α.
- Constructive identification proofs: closed-form expressions relating observables (ρAI, ρH) to the parameters; generic identification arguments (up to label symmetry) when only ρAI is observed.
- Axiomatic characterization: behavioral conditions on (ρAI,ρH) that are necessary and sufficient for consistency with LAM (paper develops these in the laboratory section).
The paper is primarily theoretical; no empirical application or estimation on field data is provided in the excerpt.

Implications for AI Economics

Practical auditability of alignment: The framework gives a testable, revealed-preference method to detect and quantify misalignment and partial compliance from menu-choice data. In practice, designers, auditors, or regulators can (i) elicit human stochastic choices in lab-style experiments or (ii) analyze AI choice logs for IIA violations and use the model’s constructive formulas to infer misalignment structure.
Value of experimental design / menu variation: Rich menu variation (especially menus with ≥3 alternatives; identification results strengthen with ≥4 alternatives for the field case) is essential. Observers should design experiments or logging that produce the necessary cross-menu comparisons to reveal IIA violations.
Distinction between compliance and alignment matters for welfare and governance: Observationally identical choice behavior can arise from perfect compliance (AI follows human preferences) or perfect alignment (AI’s intrinsic preferences match human). These have different policy responses: compliance failures call for governance and incentive/control fixes; misalignment calls for model retraining, objective redesign, or different delegation rules.
Limits in observational inference: When only AI choices are observed, analysts can typically recover the shape of the two utilities but cannot label which is the human’s versus the AI’s, and cannot distinguish α from 1−α without further assumptions. This creates an unavoidable ambiguity in field-only audits unless external information (e.g., known low/high compliance priors, audits that can elicit the human’s revealed preferences, or other behavioral markers) is available.
Complement to existing alignment approaches: LAM is complementary to training-time alignment, interpretability, and benchmark testing. It provides a behavioral, choice-based diagnostic that is model-agnostic (does not require access to AI internals) and focuses on the economic welfare-relevant object — which preferences are being implemented.
Empirical guidance: To operationalize the approach, practitioners should:
- Collect AI choice frequencies across many menus and, if possible, elicit or simulate the human’s stochastic choices across the same menus.
- Compute IIA tests and the paper’s instability measures; significant composite instability signals misalignment + partial compliance.
- Use the constructive identification formulas to estimate α and recover u,v (up to scale and, in the field case, up to label swap).
Caveats and next steps: The method relies on Luce/Logit structure for both agents and on positivity and menu richness. Real-world choice behavior may deviate from Luce or involve menu-dependent mixing weights, contextual effects, or dynamic/covert influences; extensions to richer noise structures, menu-dependent weights, or continuous coefficient distributions would be valuable next steps.

Assessment

Paper Typetheoretical Evidence Strengthn/a — The paper is theoretical and provides identification results and proofs rather than empirical estimates or experimental data. Methods Rigorhigh — The contribution is mathematical: it formulates a clear structural model (mixture of Luce rules) and provides formal identifiability proofs (including generic identifiability statements) under stated assumptions; caveats about the assumptions and required variation are discussed. SampleNo empirical sample — the analysis is analytical. Two observational regimes are considered conceptually: a laboratory regime where both human and AI choice probability distributions over menus are observed, and a field regime where only AI choice probabilities over varying menus are observed; identification arguments assume access to choice probabilities across sufficiently rich menus. Themeshuman_ai_collab governance IdentificationAssumes AI choice probabilities are a convex mixture of two Luce (logit-like) choice rules — one generated by the human principal and one by the AI — so observed choice probabilities p(a|S) = w * p_human(a|S) + (1-w) * p_ai(a|S). Identification of the alignment (mixture weight w and underlying Luce parameters) is achieved by exploiting variation in choice sets (menus) and the algebraic structure of the Luce rule to invert choice probabilities into underlying preference weights; in the laboratory setting identification is straightforward because both human and AI choice distributions are observed, while in the field setting the paper proves generic identifiability of w and the component Luce parameters from AI-only choice probabilities under nondegeneracy and richness conditions on menu variation. GeneralizabilityRelies on the Luce choice model (IIA property), which may not hold for many real-world human or AI decision processes., Assumes the AI's behavior is well-modeled as a static mixture of two Luce rules; real AI agents may be deterministic, adaptive, strategic, or trained on different objective families., Identification requires rich variation in available menus/choice sets and nondegeneracy conditions that may be hard to satisfy in field data., Ignores dynamics, learning, repeated interactions, and potential heterogeneity across humans or tasks., Does not address finite-sample estimation, measurement noise, or model misspecification in empirical applications.

Claims (5)

Claim	Direction	Confidence	Outcome	Details
Human decision makers increasingly delegate choices to AI agents. Adoption Rate	positive	high	frequency of delegation of choices to AI agents	0.06
The paper introduces the Luce Alignment Model, where the AI's choices are a mixture of two Luce rules, one reflecting the human's preferences and the other the AI's. Ai Safety And Ethics	positive	high	model specification of AI choice behavior (mixture of Luce rules)	0.2
The AI's alignment (similarity of human and AI preferences) can be generically identified in the laboratory setting, where both human and AI choices are observed. Ai Safety And Ethics	positive	high	identifiability of AI alignment parameter from observed human and AI choices (laboratory setting)	0.2
The AI's alignment (similarity of human and AI preferences) can be generically identified in the field setting, where only AI choices are observed. Ai Safety And Ethics	positive	high	identifiability of AI alignment parameter from observed AI-only choices (field setting)	0.2
The paper studies principal-agent alignment using revealed preference techniques. Ai Safety And Ethics	positive	high	methodological approach (use of revealed preference techniques to study alignment)	0.2