Personalized AI explanations can boost human–AI team performance, but only when tasks let humans and models play complementary roles; in controlled experiments personalization delivered complementarity in an engineered geography task but not in a standard sentiment-analysis task.

Who Needs What Explanation? How User Traits Affect Explanation Effectiveness in AI-Assisted Decision-Making

Torrence S Farmer, Chien-Ju Ho · Fetched March 15, 2026 · International Conference on Intelligent User Interfaces

semantic_scholar rct medium evidence 8/10 relevance DOI Source

Preregistered randomized experiments show that tailoring AI explanations to individual user characteristics can change user behavior and improve human-AI team performance, but true complementarity (team beating both human-only and AI-only) arises only when task structure creates complementary strengths between humans and the model.

Can personalized AI explanations improve human-AI team performance? Motivated by research on individual differences in cognitive science, we examine whether user characteristics influence the effectiveness of AI explanations in AI-assisted decision making. We study this question through preregistered experiments in two tasks. In a sentiment-analysis task, we find that individual differences in user characteristics shape how users respond to explanations, but these differences do not lead to human-AI complementarity, where the joint performance of humans and AI exceeds that of either alone. Motivated by this limitation, we design a new geography-guessing task in which humans and AI possess complementary strengths. In this setting, we again observe that user characteristics interact with explanation types, and now these effects also contribute to complementarity. These results suggest that tailoring explanations to individual users can improve performance and provide valuable insights into how personalization may enhance human-AI collaboration.

Summary

Main Finding

Personalizing AI explanations to individual users can improve human-AI team performance, but whether personalization yields true human-AI complementarity (the team outperforming either human or AI alone) depends on the task. In two preregistered experiments, user characteristics shaped how people responded to explanations in both a sentiment-analysis task (no observed complementarity) and a deliberately complementary geography-guessing task (personalization contributed to complementarity).

Key Points

Individual differences in users (drawn from cognitive-science motivation) systematically influence how people use and respond to AI explanations.
Explanations change human behavior in both tasks, but changed behavior leads to superior joint performance only when the task structure affords complementary strengths between humans and AI.
In the sentiment-analysis task, explanation-induced behavioral differences did not produce complementarity: humans + explanations did not beat the best of the two actors alone.
In the geography-guessing task, which was designed so humans and AI had complementary error profiles, interactions between user characteristics and explanation types did produce complementarity.
The results imply that explanation personalization can be valuable, but its payoff depends on (a) identifying the right user features to target and (b) selecting tasks or system configurations where human and AI capabilities truly complement each other.

Data & Methods

Design: Two preregistered randomized experiments comparing explanation types, measuring individual user characteristics, and evaluating human-AI team outcomes.
Tasks:
- Sentiment-analysis task: standard text judgment task to assess how explanations affect reliance and accuracy.
- Geography-guessing task: engineered so humans and AI have different, complementary strengths, allowing potential gains from coordination.
Treatment: Participants were randomly assigned to different explanation conditions (vs. baseline/no explanation) while researchers measured pre-task individual-difference variables.
Outcomes: Performance metrics for humans with explanations, AI alone, and human-AI teams; statistical tests of interaction effects between explanation type and user characteristics; preregistered tests for complementarity (whether team accuracy > max(human-only, AI-only)).
Analysis: Focus on heterogeneity via interaction terms and subgroup analyses to identify when explanations help or harm team performance. (All analyses were prespecified in preregistration.)

Implications for AI Economics

Value of personalization: Tailoring explanations to user heterogeneity can raise the productivity of human-AI teams, but returns are task-dependent. Investment in personalization is most valuable when tasks create complementarities between human judgment and model strengths.
Design and deployment: Firms should assess whether tasks exhibit complementary error patterns before investing in per-user explanation systems; if so, prioritize identifying predictive user features and tailoring explanation style or content.
Labor and task allocation: Personalized explanations could raise effective human capital by enabling workers to better leverage AI on tasks where complementarity exists, potentially changing optimal task assignment and training strategies.
Measurement and evaluation: Economic evaluations of AI systems should measure team-level outcomes (not just model accuracy) and include heterogeneity analyses to capture gains from personalization.
Policy and welfare: Policymakers and organizations should support research into which user traits predict benefit from explanation personalization and consider privacy trade-offs when collecting such traits; maximizing social surplus requires targeting personalization where it yields genuine complementarity rather than only increased reliance.
Research priorities: Quantify cost–benefit of personalization (development, data collection, privacy compliance) versus gains from complementarity; identify which user characteristics are most actionable; extend to higher-stakes and field settings for external validity.

Assessment

Paper Typerct Evidence Strengthmedium — Internal causal claims about the effect of explanation type on user behavior are well-supported by preregistered randomized treatments and clear outcome measures, but claims about the value of 'personalization' for improving complementarity are limited because user traits were observed (not randomized) and the experiments were conducted on two controlled tasks with online participants, limiting external validity. Methods Rigorhigh — Preregistered randomized experiments with clearly specified outcomes and preregistered complementarity tests, systematic measurement of individual-difference variables, and interaction analyses to probe heterogeneity indicate strong methodological practices; limitations include lack of randomized personalization assignment and constrained task environments. SampleTwo preregistered randomized online experiments with adult participants recruited from online panels (convenience samples); participants completed either a sentiment-analysis text-judgment task or a deliberately engineered geography-guessing task while receiving AI predictions with randomized explanation types; researchers measured pre-task cognitive and motivational traits and compared human-only, AI-only, and human+AI team performance. Themeshuman_ai_collab productivity skills_training org_design IdentificationRandomized assignment of participants to different explanation conditions (vs. baseline/no explanation) in two preregistered experiments; measurement of pre-task individual-difference variables and analysis of interactions between treatment and observed user characteristics; complementarity tested by comparing human+AI team accuracy to max(human-only, AI-only). GeneralizabilityOnline convenience samples (e.g., MTurk/Prolific) may not represent workers in real-world firms or high-stakes settings, Tasks are limited to a low-stakes sentiment-analysis task and a contrived geography-guessing task, so results may not transfer to complex, high-stakes, or domain-specific work, AI model type, explanation design, and interface specifics may affect results and limit applicability to other models or explanation formats, Personalization effects are inferred from observational associations between traits and treatment response; a field deployment that randomizes tailored assignments is needed to generalize to real-world personalization strategies, Short-term experimental interactions may not capture learning, adaptation, or long-run labor-market effects

Claims (6)

Claim	Direction	Confidence	Outcome	Details
We conducted preregistered experiments in two tasks (a sentiment-analysis task and a geography-guessing task) to study whether user characteristics influence the effectiveness of AI explanations. Research Productivity	null_result	high	existence and measurement of experimental manipulation (implementation of preregistered studies across two tasks)	0.6
In the sentiment-analysis task, individual differences in user characteristics shape how users respond to AI explanations. Decision Quality	mixed	medium	users' responses to AI explanations (behavioral measures in the sentiment-analysis task, e.g., decision changes, agreement with AI, or task performance as affected by explanations)	0.36
In the sentiment-analysis task, those individual differences do not produce human–AI complementarity: the joint performance of humans and AI did not exceed that of either alone. Team Performance	null_result	medium	human–AI joint performance compared to human-alone and AI-alone performance (e.g., accuracy or task success in sentiment classification)	0.36
We designed a geography-guessing task in which humans and AI possess complementary strengths. Task Allocation	positive	medium	complementarity potential as implied by task design (differences in human vs. AI strengths on geography-guessing items)	0.36
In the geography-guessing task, user characteristics interact with explanation types, and these interactions contribute to human–AI complementarity (the joint performance exceeds either alone). Team Performance	positive	medium	human–AI joint performance (e.g., accuracy or combined decision quality) and interaction effects between user characteristics and explanation type on that performance	0.36
Tailoring AI explanations to individual users can improve human–AI team performance and provides insights into how personalization may enhance human-AI collaboration. Team Performance	positive	medium	human–AI team performance (improvements in task outcomes when explanations are personalized to user characteristics)	0.36