Personalized AI explanations can boost human–AI team performance, but only when tasks let humans and models play complementary roles; in controlled experiments personalization delivered complementarity in an engineered geography task but not in a standard sentiment-analysis task.
Can personalized AI explanations improve human-AI team performance? Motivated by research on individual differences in cognitive science, we examine whether user characteristics influence the effectiveness of AI explanations in AI-assisted decision making. We study this question through preregistered experiments in two tasks. In a sentiment-analysis task, we find that individual differences in user characteristics shape how users respond to explanations, but these differences do not lead to human-AI complementarity, where the joint performance of humans and AI exceeds that of either alone. Motivated by this limitation, we design a new geography-guessing task in which humans and AI possess complementary strengths. In this setting, we again observe that user characteristics interact with explanation types, and now these effects also contribute to complementarity. These results suggest that tailoring explanations to individual users can improve performance and provide valuable insights into how personalization may enhance human-AI collaboration.
Summary
Main Finding
Personalizing AI explanations to individual users can improve human-AI team performance, but whether personalization yields true human-AI complementarity (the team outperforming either human or AI alone) depends on the task. In two preregistered experiments, user characteristics shaped how people responded to explanations in both a sentiment-analysis task (no observed complementarity) and a deliberately complementary geography-guessing task (personalization contributed to complementarity).
Key Points
- Individual differences in users (drawn from cognitive-science motivation) systematically influence how people use and respond to AI explanations.
- Explanations change human behavior in both tasks, but changed behavior leads to superior joint performance only when the task structure affords complementary strengths between humans and AI.
- In the sentiment-analysis task, explanation-induced behavioral differences did not produce complementarity: humans + explanations did not beat the best of the two actors alone.
- In the geography-guessing task, which was designed so humans and AI had complementary error profiles, interactions between user characteristics and explanation types did produce complementarity.
- The results imply that explanation personalization can be valuable, but its payoff depends on (a) identifying the right user features to target and (b) selecting tasks or system configurations where human and AI capabilities truly complement each other.
Data & Methods
- Design: Two preregistered randomized experiments comparing explanation types, measuring individual user characteristics, and evaluating human-AI team outcomes.
- Tasks:
- Sentiment-analysis task: standard text judgment task to assess how explanations affect reliance and accuracy.
- Geography-guessing task: engineered so humans and AI have different, complementary strengths, allowing potential gains from coordination.
- Treatment: Participants were randomly assigned to different explanation conditions (vs. baseline/no explanation) while researchers measured pre-task individual-difference variables.
- Outcomes: Performance metrics for humans with explanations, AI alone, and human-AI teams; statistical tests of interaction effects between explanation type and user characteristics; preregistered tests for complementarity (whether team accuracy > max(human-only, AI-only)).
- Analysis: Focus on heterogeneity via interaction terms and subgroup analyses to identify when explanations help or harm team performance. (All analyses were prespecified in preregistration.)
Implications for AI Economics
- Value of personalization: Tailoring explanations to user heterogeneity can raise the productivity of human-AI teams, but returns are task-dependent. Investment in personalization is most valuable when tasks create complementarities between human judgment and model strengths.
- Design and deployment: Firms should assess whether tasks exhibit complementary error patterns before investing in per-user explanation systems; if so, prioritize identifying predictive user features and tailoring explanation style or content.
- Labor and task allocation: Personalized explanations could raise effective human capital by enabling workers to better leverage AI on tasks where complementarity exists, potentially changing optimal task assignment and training strategies.
- Measurement and evaluation: Economic evaluations of AI systems should measure team-level outcomes (not just model accuracy) and include heterogeneity analyses to capture gains from personalization.
- Policy and welfare: Policymakers and organizations should support research into which user traits predict benefit from explanation personalization and consider privacy trade-offs when collecting such traits; maximizing social surplus requires targeting personalization where it yields genuine complementarity rather than only increased reliance.
- Research priorities: Quantify cost–benefit of personalization (development, data collection, privacy compliance) versus gains from complementarity; identify which user characteristics are most actionable; extend to higher-stakes and field settings for external validity.
Assessment
Claims (6)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We conducted preregistered experiments in two tasks (a sentiment-analysis task and a geography-guessing task) to study whether user characteristics influence the effectiveness of AI explanations. Research Productivity | null_result | high | existence and measurement of experimental manipulation (implementation of preregistered studies across two tasks) |
0.6
|
| In the sentiment-analysis task, individual differences in user characteristics shape how users respond to AI explanations. Decision Quality | mixed | medium | users' responses to AI explanations (behavioral measures in the sentiment-analysis task, e.g., decision changes, agreement with AI, or task performance as affected by explanations) |
0.36
|
| In the sentiment-analysis task, those individual differences do not produce human–AI complementarity: the joint performance of humans and AI did not exceed that of either alone. Team Performance | null_result | medium | human–AI joint performance compared to human-alone and AI-alone performance (e.g., accuracy or task success in sentiment classification) |
0.36
|
| We designed a geography-guessing task in which humans and AI possess complementary strengths. Task Allocation | positive | medium | complementarity potential as implied by task design (differences in human vs. AI strengths on geography-guessing items) |
0.36
|
| In the geography-guessing task, user characteristics interact with explanation types, and these interactions contribute to human–AI complementarity (the joint performance exceeds either alone). Team Performance | positive | medium | human–AI joint performance (e.g., accuracy or combined decision quality) and interaction effects between user characteristics and explanation type on that performance |
0.36
|
| Tailoring AI explanations to individual users can improve human–AI team performance and provides insights into how personalization may enhance human-AI collaboration. Team Performance | positive | medium | human–AI team performance (improvements in task outcomes when explanations are personalized to user characteristics) |
0.36
|