The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

Value alignment is a governance problem, not just an engineering puzzle: misalignment stems from objective-setting, information flows and whose interests count, so durable solutions require institutions and contested decision‑making rather than only model tweaks.

Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem
Travis LaCroix · April 22, 2026
arxiv theoretical n/a evidence 7/10 relevance Source PDF
The value alignment problem is best understood as a governance challenge—arising from how objectives are set, information is distributed, and which principals are considered—so alignment requires institutional processes and trade-offs, not only technical fixes.

The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but whether it is aligned enough, for whom, and at what cost. Drawing on the principal-agent framework from economics, this paper reconceptualises misalignment as arising along three interacting axes: objectives, information, and principals. The three-axis framework provides a systematic way of diagnosing why misalignment arises in real-world systems and clarifies that alignment cannot be treated as a single technical property of models but an outcome shaped by how objectives are specified, how information is distributed, and whose interests count in practice. The core contribution of this paper is to show that the three-axis decomposition implies that alignment is fundamentally a problem of governance rather than engineering alone. From this perspective, alignment is inherently pluralistic and context-dependent, and resolving misalignment involves trade-offs among competing values. Because misalignment can occur along each axis -- and affect stakeholders differently -- the structural description shows that alignment cannot be "solved" through technical design alone, but must be managed through ongoing institutional processes that determine how objectives are set, how systems are evaluated, and how affected communities can contest or reshape those decisions.

Summary

Main Finding

Value alignment for AI is primarily a structural governance problem, not a single technical or purely normative one. Using the principal–agent framework, the paper shows misalignment arises along three interacting axes—objectives, information, and principals—and therefore cannot be “solved” by model design alone. Alignment is pluralistic and context-dependent: resolving it requires ongoing institutional processes that negotiate which interests count, how objectives are set and verified, and what trade-offs are acceptable.

Key Points

  • Structural definition: Value alignment problems are instances of a principal–agent problem in human→AI delegation that occur when (a) the AI agent’s objective function is mis‑specified, or (b) there are informational asymmetries between human principals and the AI agent.
  • Three orthogonal but interacting axes:
    • Objectives axis: proxy mis‑specification, reward hacking, perverse incentives and the usual “outer alignment” issues.
    • Information axis: opacity, hidden actions, non‑verifiability, distributional shift, and other informational asymmetries that prevent principals from observing or verifying agent behaviour.
    • Principals axis: plurality and conflict among stakeholders (developers, users, affected communities, regulators, shareholders) so that “aligned” for whom becomes contested.
  • Interaction effects: misalignment along one axis can exacerbate the others (e.g., biased proxies both misstate objectives and conceal harms from affected stakeholders).
  • Human→AI special case: unlike human–human agency, AI agents have no intrinsic values; misalignment often stems from imperfect specification of objectives and institutional gaps rather than from independently motivated agents.
  • Scaling hypothesis for value‑aligned AI: as model generality, deployment scope, and stakeholder diversity increase, informational asymmetries, value conflicts, and power imbalances systematically amplify, making alignment management harder.
  • Practical conclusion: technical methods (RLHF, interpretability, robustness, benchmarks) matter but must be embedded in governance arrangements—contractual design, accountability, participatory processes, dispute and remediation mechanisms—to manage ongoing misalignment and trade‑offs.

Data & Methods

  • Paper type: conceptual/theoretical analysis and literature synthesis (FAccT 2026 conference paper; arXiv preprint).
  • Core method: apply and extend the principal–agent (agency) framework from economics to human–AI delegation.
    • Uses canonical agency concepts (moral hazard, adverse selection, non‑verifiability) to characterize the information axis.
    • Maps outer alignment / objective specification issues to the objectives axis.
    • Incorporates pluralistic and social‑choice insights to formalize the principals axis.
  • Draws on and situates existing technical approaches (e.g., RLHF, CIRL, incomplete contracting, game‑theoretic methods) but does not present new empirical data or algorithmic experiments.
  • Provides diagnostic examples (e.g., predictive policing) to illustrate interactions among axes and to motivate the scaling hypothesis.

Implications for AI Economics

  • Research agenda:
    • Model multi‑principal delegation problems formally (multiple stakeholders with heterogeneous utilities) rather than single‑principal setups.
    • Extend contract theory and mechanism design to AI agents: design incentives, verification protocols, and enforceable contracts that account for informational asymmetries and pluralistic principals.
    • Incorporate externalities, public goods, and distributional impacts of misalignment into welfare analyses.
    • Study dynamics under the scaling hypothesis: how increases in model capability and scope change market equilibria, investment in safety, and systemic risk.
  • Market and firm incentives:
    • Aligning commercial incentives (shareholder returns, product market competition) with broader stakeholder welfare requires institutional fixes (regulation, standards, liability rules), not only technical fixes.
    • Competitive pressures can create a “race” that under‑invests in governance; economists should model these strategic interactions and potential market failures.
  • Information and verification markets:
    • There is economic value in verification, auditing, and monitoring services (third‑party audits, red teaming, certification); market design and regulation can promote supply of credible information.
    • Information provision (transparency, provenance, model cards) affects contracting costs and the feasible set of incentive schemes.
  • Policy and regulation:
    • Design policy instruments oriented to governance: mandatory disclosure, auditability standards, stakeholder representation in procurement and public contracts, liability rules to internalize harms.
    • Use incomplete contracting insights: require minimum verification rights, contingency clauses, and remediation procedures when proxies fail.
  • Distributional and welfare trade‑offs:
    • “Aligned enough” is inherently political and involves trade‑offs (efficiency vs equity, short‑run performance vs long‑term safety). Economists should quantify these trade‑offs and model how institutional rules allocate alignment costs across agents and stakeholders.
  • Institutional innovations:
    • Develop mechanisms for participatory decision‑making and grievance redress that change who counts as a principal in practice (e.g., community oversight, stakeholder co‑design).
    • Consider insurance, indemnity, and funding instruments to manage residual alignment risk and externalities.
  • Practical modelling implications:
    • When evaluating interventions (technical or policy), include informational frictions and multiple principals in counterfactuals.
    • Cost–benefit and welfare assessments should include governance costs of monitoring, contestation, and ongoing negotiation.

Taken together, the paper suggests AI economists should shift more attention from treating alignment as a property of models to analysing the institutional and market mechanisms that determine whose values get encoded, how objective proxies are chosen and monitored, and how information and power imbalances are corrected.

Assessment

Paper Typetheoretical Evidence Strengthn/a — This is a conceptual/theoretical paper that offers a diagnostic and normative reframing rather than empirical tests; it provides no causal estimates or statistical evidence to evaluate real-world effects. Methods Rigormedium — The paper applies a well-established principal–agent framework and gives a clear three-axis decomposition (objectives, information, principals), showing careful theoretical reasoning; however it lacks formal models, quantitative analysis, or empirical validation that would raise rigor to high. SampleNo empirical sample or dataset; the paper is a conceptual analysis drawing on economic principal–agent theory, qualitative examples from AI systems and governance debates, and literature from ethics and policy. Themesgovernance org_design GeneralizabilityNo empirical testing — implications unvalidated in specific industries, firms, or countries, Prescriptive claims depend on institutional context and so may not transfer across regulatory regimes or organizational forms, Does not model technical ML constraints or trade-offs quantitatively, limiting operationalizability, Relies on normative assumptions about whose values should count, so recommendations may vary by stakeholder perspectives, High-level framework may miss sector-specific mechanisms (e.g., manufacturing vs. online platforms)

Claims (7)

ClaimDirectionConfidenceOutcomeDetails
The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. Governance And Regulation negative high framing_of_problem_in_literature
0.12
The alignment problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but whether it is aligned enough, for whom, and at what cost. Governance And Regulation positive high interpretation_of_alignment_problem
0.02
Misalignment can be reconceptualised as arising along three interacting axes: objectives, information, and principals (drawing on the principal–agent framework). Governance And Regulation positive high sources_of_misalignment
0.02
The three-axis framework provides a systematic way of diagnosing why misalignment arises in real-world systems and clarifies that alignment cannot be treated as a single technical property of models but an outcome shaped by how objectives are specified, how information is distributed, and whose interests count in practice. Governance And Regulation positive high diagnostic_power_of_framework
0.02
The three-axis decomposition implies that alignment is fundamentally a problem of governance rather than engineering alone. Governance And Regulation positive high primary_domain_responsible_for_alignment
0.02
Alignment is inherently pluralistic and context-dependent, and resolving misalignment involves trade-offs among competing values. Governance And Regulation positive high nature_of_alignment_solutions
0.02
Because misalignment can occur along each axis -- and affect stakeholders differently -- alignment cannot be 'solved' through technical design alone, but must be managed through ongoing institutional processes that determine how objectives are set, how systems are evaluated, and how affected communities can contest or reshape those decisions. Governance And Regulation positive high feasibility_of_technical_only_solutions
0.02

Notes