An AI ensemble that alternates between a trust-building specialist and a performance-focused specialist improves human decision accuracy more than single-model assistants; a simple, provably near-optimal routing rule decides which specialist to use based on context.

Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration

H. Amin, Ming Yin, Rajiv Khanna · Fetched March 17, 2026

semantic_scholar other medium evidence 7/10 relevance Source

An adaptive ensemble that switches between a trust-aligned specialist and a performance-complementary specialist using a near-optimal routing rule yields higher human-AI team performance than single-model assistants in both simulated and real-world decision tasks.

In human-AI decision making, designing AI that complements human expertise has been a natural strategy to enhance human-AI collaboration, yet it often comes at the cost of decreased AI performance in areas of human strengths. This can inadvertently erode human trust and cause them to ignore AI advice precisely when it is most needed. Conversely, an aligned AI fosters trust yet risks reinforcing suboptimal human behavior and lowering human-AI team performance. In this paper, we start by identifying this fundamental tension between performance-boosting (i.e., complementarity) and trust-building (i.e., alignment) as an inherent limitation of the traditional approach for training a single AI model to assist human decision making. To overcome this, we introduce a novel human-centered adaptive AI ensemble that strategically toggles between two specialist AI models - the aligned model and the complementary model - based on contextual cues, using an elegantly simple yet provably near-optimal Rational Routing Shortcut mechanism. Comprehensive theoretical analyses elucidate why the adaptive AI ensemble is effective and when it yields maximum benefits. Moreover, experiments on both simulated and real-world data show that when humans are assisted by the adaptive AI ensemble in decision making, they can achieve significantly higher performance than when they are assisted by single AI models that are trained to either optimize for their independent performance or even the human-AI team performance.

Summary

Main Finding

Training a single AI to both boost human performance (complementarity) and build human trust (alignment) creates an unavoidable trade-off that can reduce overall team performance. An adaptive AI ensemble that switches between two specialists — an aligned model (trust-building) and a complementary model (performance-boosting) — using a simple, provably near-optimal Rational Routing Shortcut, substantially outperforms any single-model approach. Theory explains when the ensemble is most effective, and experiments on simulated and real-world tasks show significant human-AI team performance gains.

Key Points

Fundamental tension:
- Complementary models sacrifice performance where humans are strong to provide corrective suggestions, which can reduce human trust and lead to ignored advice.
- Aligned models foster trust but may reinforce human mistakes, producing suboptimal team outcomes.
Solution: human-centered adaptive ensemble
- Two specialist models: aligned (trust-preserving) and complementary (performance-correcting).
- A routing mechanism (Rational Routing Shortcut) selects which specialist to present based on contextual cues about the task and human behavior.
- The routing rule is simple to implement and provably near-optimal under realistic assumptions.
Theoretical contributions:
- Formalization of the complementarity vs. alignment trade-off.
- Guarantees showing when and why routing between specialists improves team performance.
- Characterization of conditions (e.g., heterogeneity in human competence, informativeness of routing cues) under which the ensemble yields the largest gains.
Empirical results:
- Simulated experiments validate theoretical predictions across varied human-skill distributions and noise levels.
- Real-world human-in-the-loop experiments confirm that the adaptive ensemble raises human decision quality relative to:
  - standalone AI optimized for accuracy,
  - AI optimized for human-AI team outcomes,
  - single-model alignment or complementarity strategies.

Data & Methods

Model setup:
- Formal model of human decision-making with probabilistic behavior that depends on trust and perceived AI reliability.
- Two AI specialists modeled: one optimized for maximizing independent AI accuracy when corrective intervention is needed (complementary), the other optimized to preserve or increase human trust and align with typical human choices (aligned).
Routing mechanism:
- Rational Routing Shortcut: uses observable contextual features (task characteristics and proxies for human competence/trust) to pick the specialist.
- Analytical proof of near-optimality relative to an omniscient selector under mild informational assumptions.
Theoretical analysis:
- Comparative statics showing how gains scale with human heterogeneity, cue informativeness, and the severity of the complementarity vs. alignment trade-off.
- Bounds on expected team performance improvements and sensitivity to routing errors.
Experiments:
- Simulation studies varying human-skill distributions, signal-to-noise ratios, and miscalibration.
- Human-subject or real-world task datasets (details not reproduced here) where participants receive AI advice under different assistance regimes; measured outcomes include accuracy, frequency of following advice, and net team performance.
- Benchmarks: solo AI, aligned-only, complementary-only, single AI optimized for team outcomes, and the adaptive ensemble with Rational Routing.

Implications for AI Economics

Measurement and procurement:
- Evaluations of AI should prioritize human-AI team outcomes over standalone AI accuracy. Adaptive ensembles can deliver higher social value than single best-performing models.
- Procurement contracts and performance metrics should reward systems that optimize team-level productivity and account for trust dynamics.
Product design and market differentiation:
- Firms can capture value by offering adaptive decision-support products that tailor behavior to users’ competence and context, increasing adoption and sustained use.
- The ability to route effectively (i.e., obtain informative contextual cues) is an economically valuable capability — a potential competitive moat.
Labor and task allocation:
- Adaptive ensembles can reshape task division by enabling humans to rely safely on AI in contexts where trust can be fostered, while letting AI correct humans where needed — altering where human labor adds comparative advantage.
- Potential to raise effective worker productivity, but also to shift skill requirements toward tasks where human judgment is most valuable.
Investment and R&D trade-offs:
- Cost-benefit analysis should compare investing in a single “universal” model vs. building specialist models plus routing infrastructure. Gains are largest when human skill is heterogeneous and routing cues are informative.
- There is option value in modular systems: easier updates of specialists or routing rules as human behavior or environments change.
Policy and welfare:
- Regulators should consider team-level outcomes and trust effects when assessing safety and efficacy; mis-specified alignment that simply increases trust can have adverse welfare effects by locking in suboptimal human behavior.
- Transparency about when and why the system switches modes may be important for accountability and avoiding perverse incentives.
Empirical priorities for economists:
- Estimate distributions of human competence across tasks and contexts.
- Quantify the value of routing signals (how informative they must be to justify ensemble costs).
- Measure long-run effects on skill acquisition, labor demand, and welfare when adaptive ensembles are deployed at scale.

Takeaway: Designing AI assistance as an adaptive ensemble that strategically trades off trust and corrective power can yield substantial team-level welfare gains. For economists, this reframes evaluation, procurement, and policy toward systems and metrics that capture human-AI interaction dynamics rather than standalone model performance.

Assessment

Paper Typeother Evidence Strengthmedium — The paper combines provable theoretical results about the routing mechanism with controlled simulation and real-world experiments showing improved human-AI team performance, which supports the central claim; however, external validity is limited because experiment domains, participant populations, and deployment constraints are likely narrow or lab/bench-top, and the summary does not report large-scale field validation or diverse real-world deployments. Methods Rigormedium — High-quality elements include formal proofs of near-optimality and evaluation on both simulated and real-world datasets; however, the available description lacks detail on experimental design (sample sizes, randomization, participant selection, pre-registration), task variety, and robustness checks across domains, which prevents rating the methods as high rigor. SampleMixture of synthetic (simulated) datasets with known ground truth and at least one real-world dataset; human-subject decision-making experiments where participants were assisted by different AI conditions (aligned model, complementary model, adaptive ensemble); exact domains, sample sizes, and participant recruitment procedures are not specified in the summary. Themeshuman_ai_collab productivity IdentificationControlled comparative experiments combined with simulation studies and theoretical analysis: human participants performed decision tasks under different assistance conditions (aligned-only, complementary-only, adaptive-ensemble), allowing between-condition comparisons (presumably randomized or otherwise controlled assignment); simulation experiments use known ground truth to measure gains precisely; theoretical proofs establish near-optimality of the routing mechanism. GeneralizabilityHuman experiments may use lab or online participants (e.g., crowdworkers) and so may not generalize to domain experts or organizational settings, Evaluations likely limited to specific decision tasks/datasets; performance gains may not transfer across tasks with different error structures or stakes, Implementation assumes reliable contextual cues and ability to switch between specialist models, which may be harder in real operational systems, Behavioral model of human trust and compliance may be task- and population-specific, limiting external validity, Scalability, costs, and integration constraints for deploying dual-model ensembles in real firms are untested

Claims (6)

Claim	Direction	Confidence	Outcome	Details
There is a fundamental tension between designing AI for complementarity (performance-boosting) and designing AI for alignment (trust-building) when training a single AI model to assist human decision making. Decision Quality	mixed	medium	trade-off between human-AI team performance (complementarity) and human trust/alignment	0.07
Training AI to complement human strengths can decrease AI performance in areas where humans are strong, which can erode human trust and cause humans to ignore AI advice when it is most needed. Decision Quality	negative	medium	AI performance on tasks where humans are strong; human trust and reliance on AI	0.07
Aligned AI (trained to foster trust) can increase human trust but risks reinforcing suboptimal human behavior and lowering human-AI team performance. Decision Quality	negative	medium	human trust and human-AI team performance	0.07
An adaptive AI ensemble that toggles between two specialist models (an aligned model and a complementary model) using a Rational Routing Shortcut mechanism overcomes the complementarity–alignment limitation of single-model approaches. Decision Quality	positive	medium	contextual model selection/routing and resulting human-AI team performance	0.07
The Rational Routing Shortcut mechanism is provably near-optimal for routing between the aligned and complementary specialist models. Decision Quality	positive	medium-high	routing optimality (theoretical performance bound) and implied ensemble performance	0.01
Experiments on simulated and real-world data show that humans assisted by the adaptive AI ensemble achieve significantly higher performance than humans assisted by single AI models trained either for independent AI performance or for human-AI team performance. Decision Quality	positive	medium	human decision-making performance / human-AI team performance (improvement when using the adaptive ensemble)	0.07