A simple revealed-preference model shows AI choices can be decomposed into human and machine preference components, and the degree of alignment can be recovered from choice-data; observing human choices makes recovery trivial, but even AI-only data suffices under generic menu variation.
Human decision makers increasingly delegate choices to AI agents, raising a natural question: does the AI implement the human principal's preferences or pursue its own? To study this question using revealed preference techniques, I introduce the Luce Alignment Model, where the AI's choices are a mixture of two Luce rules, one reflecting the human's preferences and the other the AI's. I show that the AI's alignment (similarity of human and AI preferences) can be generically identified in two settings: the laboratory setting, where both human and AI choices are observed, and the field setting, where only AI choices are observed.
Summary
Main Finding
The paper introduces the Luce Alignment Model (LAM): when an AI chooses on behalf of a human principal, its stochastic choices are a mixture of two Luce (multinomial logit) rules — one generated by the human utility u and one by the AI’s intrinsic utility v — mixed with weight α (compliance). Using revealed-preference techniques the author shows that:
- In the laboratory setting (both the human’s stochastic choice rule ρH and the AI’s ρAI observed), the alignment (relative shape of u and v) and the compliance parameter α are generically identified whenever ρAI violates the Luce Independence of Irrelevant Alternatives (IIA) property. Closed-form expressions for α (and recovery of u and v up to scale) are derived using algebraic “instability” measures.
- In the field setting (only ρAI observed), the underlying pair of utilities is generically identified up to a label swap (u ↔ v). Compliance α is only identified up to reflection about 1/2 (α versus 1−α) unless an exogenous assumption on the side of compliance is imposed. Identification requires sufficiently many alternatives (N ≥ 4) and generic conditions.
Key Points
- Model (LAM): ρAI(x,S) = α · u(x)/Σy∈S u(y) + (1−α) · v(x)/Σy∈S v(y). Here u is the human utility, v is the AI’s intrinsic utility, α∈[0,1] is compliance.
- Special cases: α=1 (perfect compliance), v ∝ u (perfect alignment), α=0 (autonomous AI). Perfect compliance and perfect alignment produce observationally identical choice behavior, but they have different welfare/interpretation implications.
- IIA role: The Luce (single-logit) rule implies IIA. Proposition: ρAI satisfies IIA iff α∈{0,1} or v ∝ u. Thus observed IIA violations in ρAI signal partial compliance (α∈(0,1)) together with misalignment (v not proportional to u).
- Instability measures: The paper defines own-instability ∆, cross-instability Γ, and composite-instability Φ for tuples (x,y,S,T). These algebraic quantities capture deviations from IIA and are used to recover α and the two utilities in the lab setting. Composite instability detects differences in u and v.
- Identification strategy (laboratory):
- Recover u from ρH via standard Luce/IIA normalization (u recovered up to scale).
- Use IIA violations and instability measures computed from (ρAI,ρH) to obtain α in closed form.
- Given α and ρAI, recover v (up to scale).
- Identification strategy (field): Observing only ρAI yields a generic identification of the pair {u,v} (their distribution/values) but not which is the human’s or AI’s utility; similarly α cannot be distinguished from 1−α without extra assumptions. Constructive identification is provided under genericity and sufficient menu richness (≥4 alternatives).
- Relation to econometric literature: LAM = 2-point mixed multinomial logit (2-MNL / MMNL with binary support). The paper connects revealed-preference identification methods in abstract menu settings to MMNL identifiability results in product-characteristic settings, and offers a constructive alternative to prior identification results in the 2-MNL literature.
Data & Methods
- Data primitive: stochastic choice functions over menus S drawn from a finite set X (|X| = N ≥ 3). Two observational regimes:
- Laboratory: both ρH and ρAI observed (possibly ρH elicited or synthetically provided).
- Field: only ρAI observed.
- Structural assumptions:
- Human (ρH) and autonomous-AI (ρA) follow Luce rules with positive utilities u,v : X → R++.
- AI mixes the two Luce rules with fixed, menu-independent weight α.
- Positivity (all observed choice probabilities > 0).
- Analytical approach:
- Revealed-preference algebraic derivations exploiting the Luce representation and IIA implications.
- Definition and algebraic manipulation of instability measures (∆, Γ, Φ) to detect and quantify departures from IIA and to isolate α.
- Constructive identification proofs: closed-form expressions relating observables (ρAI, ρH) to the parameters; generic identification arguments (up to label symmetry) when only ρAI is observed.
- Axiomatic characterization: behavioral conditions on (ρAI,ρH) that are necessary and sufficient for consistency with LAM (paper develops these in the laboratory section).
- The paper is primarily theoretical; no empirical application or estimation on field data is provided in the excerpt.
Implications for AI Economics
- Practical auditability of alignment: The framework gives a testable, revealed-preference method to detect and quantify misalignment and partial compliance from menu-choice data. In practice, designers, auditors, or regulators can (i) elicit human stochastic choices in lab-style experiments or (ii) analyze AI choice logs for IIA violations and use the model’s constructive formulas to infer misalignment structure.
- Value of experimental design / menu variation: Rich menu variation (especially menus with ≥3 alternatives; identification results strengthen with ≥4 alternatives for the field case) is essential. Observers should design experiments or logging that produce the necessary cross-menu comparisons to reveal IIA violations.
- Distinction between compliance and alignment matters for welfare and governance: Observationally identical choice behavior can arise from perfect compliance (AI follows human preferences) or perfect alignment (AI’s intrinsic preferences match human). These have different policy responses: compliance failures call for governance and incentive/control fixes; misalignment calls for model retraining, objective redesign, or different delegation rules.
- Limits in observational inference: When only AI choices are observed, analysts can typically recover the shape of the two utilities but cannot label which is the human’s versus the AI’s, and cannot distinguish α from 1−α without further assumptions. This creates an unavoidable ambiguity in field-only audits unless external information (e.g., known low/high compliance priors, audits that can elicit the human’s revealed preferences, or other behavioral markers) is available.
- Complement to existing alignment approaches: LAM is complementary to training-time alignment, interpretability, and benchmark testing. It provides a behavioral, choice-based diagnostic that is model-agnostic (does not require access to AI internals) and focuses on the economic welfare-relevant object — which preferences are being implemented.
- Empirical guidance: To operationalize the approach, practitioners should:
- Collect AI choice frequencies across many menus and, if possible, elicit or simulate the human’s stochastic choices across the same menus.
- Compute IIA tests and the paper’s instability measures; significant composite instability signals misalignment + partial compliance.
- Use the constructive identification formulas to estimate α and recover u,v (up to scale and, in the field case, up to label swap).
- Caveats and next steps: The method relies on Luce/Logit structure for both agents and on positivity and menu richness. Real-world choice behavior may deviate from Luce or involve menu-dependent mixing weights, contextual effects, or dynamic/covert influences; extensions to richer noise structures, menu-dependent weights, or continuous coefficient distributions would be valuable next steps.
Assessment
Claims (5)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Human decision makers increasingly delegate choices to AI agents. Adoption Rate | positive | high | frequency of delegation of choices to AI agents |
0.06
|
| The paper introduces the Luce Alignment Model, where the AI's choices are a mixture of two Luce rules, one reflecting the human's preferences and the other the AI's. Ai Safety And Ethics | positive | high | model specification of AI choice behavior (mixture of Luce rules) |
0.2
|
| The AI's alignment (similarity of human and AI preferences) can be generically identified in the laboratory setting, where both human and AI choices are observed. Ai Safety And Ethics | positive | high | identifiability of AI alignment parameter from observed human and AI choices (laboratory setting) |
0.2
|
| The AI's alignment (similarity of human and AI preferences) can be generically identified in the field setting, where only AI choices are observed. Ai Safety And Ethics | positive | high | identifiability of AI alignment parameter from observed AI-only choices (field setting) |
0.2
|
| The paper studies principal-agent alignment using revealed preference techniques. Ai Safety And Ethics | positive | high | methodological approach (use of revealed preference techniques to study alignment) |
0.2
|