Tailored AI coaching measurably improves how people express empathy: a randomized trial finds personalized LLM feedback produces more normatively empathic replies than generic training or no intervention. But identical messages lose perceived authenticity when recipients are told they came from AI, highlighting a trade-off between content quality and attribution.
Empathy is central to human connection, yet people often struggle to express it effectively. In blinded evaluations, large language models (LLMs) generate responses that are often judged more empathic than human-written ones. Yet when a response is attributed to AI, recipients feel less heard and validated than when comparable responses are attributed to a human. To probe and address this gap in empathic communication skill, we built Lend an Ear, an experimental conversation platform in which participants are asked to offer empathic support to an LLM role-playing personal and workplace troubles. From 33,938 messages spanning 2,904 text-based conversations between 968 participants and their LLM conversational partners, we derive a data-driven taxonomy of idiomatic empathic expressions in naturalistic dialogue. Based on a pre-registered randomized experiment, we present evidence that a brief LLM coaching intervention offering personalized feedback on how to effectively communicate empathy significantly boosts alignment of participants' communication patterns with normative empathic communication patterns relative to both a control group and a group that received video-based but non-personalized feedback. Moreover, we find evidence for a silent empathy effect that people feel empathy but systematically fail to express it. Nonetheless, participants reliably identify responses aligned with normative empathic communication criteria as more expressive of empathy. Together, these results advance the scientific understanding of how empathy is expressed and valued and demonstrate a scalable, AI-based intervention for scaffolding and cultivating it.
Summary
Main Finding
A brief, personalized coaching intervention delivered by a large language model significantly improves people’s ability to produce responses that align with normative, idiomatic empathic communication patterns. Although LLM-generated replies are often judged more empathic than human-written replies in blinded tests, simply labeling a reply as AI reduces recipients’ feelings of being heard and validated. The study also documents a “silent empathy” effect: people often feel empathy but fail to express it, and targeted feedback helps close that expression gap.
Key Points
- Blinded evaluations: LLM-generated responses frequently score as more empathic than human-written responses.
- Attribution effect: When identical replies are labeled as coming from AI rather than from a human, recipients report feeling less heard and less validated.
- Lend an Ear platform: An experimental conversation system where participants respond empathically to an LLM role-playing personal and workplace problems.
- Large dataset: 33,938 messages across 2,904 conversations with 968 participants were collected and analyzed to derive a data-driven taxonomy of idiomatic empathic expressions in natural dialogue.
- Data-driven taxonomy: The authors map common idiomatic empathic moves used in naturalistic support conversations (e.g., validation, perspective-taking, emotional labeling, offers of support) to create normative criteria for empathic communication.
- Pre-registered randomized experiment: Participants were randomly assigned to (a) personalized LLM coaching (feedback tailored to their messages), (b) video-based but non-personalized feedback, or (c) control. The personalized LLM coaching produced a statistically significant increase in alignment with normative empathic patterns relative to both other conditions.
- Silent empathy effect: Evidence that people often experience empathic concern but do not express it in ways that align with normative empathic communication; however, participants consistently rate responses that match the normative criteria as more expressive of empathy.
Data & Methods
- Platform and scenarios: Lend an Ear presented participants with text-based role-play prompts (personal and workplace troubles) and collected free-form empathic responses to an LLM playing the distressed counterpart.
- Sample and corpus: 968 participants; 2,904 conversations; 33,938 messages used to build and analyze communication patterns.
- Taxonomy derivation: Data-driven analysis of message text produced a taxonomy of idiomatic empathic expressions used in naturalistic dialogue (operationalizing moves such as validation, acknowledging feelings, perspective taking, normalizing, and offering help).
- Experimental design: Pre-registered randomized controlled trial with three arms:
- Personalized LLM coaching — brief, tailored feedback on participants’ recent messages, instructing how to communicate empathy more effectively.
- Video-based non-personalized feedback — generic, instructional video material about empathic communication.
- Control — no feedback.
- Outcome measures: Primary outcomes measured alignment of participants’ subsequent communication with the normative taxonomy (coding/automated measures), plus recipient-rated perceptions of being heard/validated and blinded empathy judgments.
- Statistical inference: The personalized coaching produced significant increases in alignment to normative empathic patterns compared with both the non-personalized video and control arms (pre-registered analysis).
Implications for AI Economics
- Scalable human capital development: LLM-driven, personalized coaching can cheaply scale soft-skill training (empathy expression) that would otherwise require costly human trainers, suggesting a high-return application of AI in workforce development.
- Augmentation vs. substitution in emotional labor: AI can augment workers’ expressive skills (e.g., customer service, therapy-adjacent roles) rather than simply replacing roles; better expression of empathy can improve service quality while preserving human responsibilities that depend on attribution.
- Attribution and market signaling: The attribution effect—reduced perceived empathy when responses are labeled AI—matters for product design, branding, and disclosure policies. Firms must balance transparency with user experience; attribution can alter demand and satisfaction even when content quality is high.
- Product design and platform policy: Embedding LLM coaching tools in platforms (employee onboarding, customer support dashboards, peer-support communities) could raise overall conversational quality. Platforms should measure expressive outcomes, not just informational accuracy.
- Measurement and valuation of soft skills: The taxonomy and measurement approach provide operational metrics to quantify empathic communication in economic analyses (productivity, customer satisfaction, retention), enabling better cost-benefit calculations for deploying coaching interventions.
- Externalities and ethics: Scaling empathic coaching raises issues around authenticity, manipulation, and dependence. Markets and regulators will need to consider disclosure, informed consent, and potential effects on social norms (e.g., if people rely on coached phrasing rather than genuine engagement).
- Research and deployment caveats: Results are from role-play contexts and short-term interventions; economic estimates of benefit require validation in field settings, across diverse populations, and with different LLM models.
Assessment
Claims (13)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| A brief, personalized coaching intervention delivered by a large language model significantly improves participants' alignment with normative, idiomatic empathic communication patterns. Skill Acquisition | positive | high | alignment with normative empathic patterns (coding/automated alignment metrics) |
n=968
1.0
|
| LLM-generated responses frequently score as more empathic than human-written responses in blinded evaluations. Output Quality | positive | medium | blinded empathy judgments (perceived empathy ratings) |
0.6
|
| When identical replies are labeled as coming from AI rather than from a human, recipients report feeling less heard and less validated (an attribution effect). Consumer Welfare | negative | high | recipient-rated feelings of being heard and validated |
1.0
|
| The study documents a 'silent empathy' effect: people often feel empathic concern but fail to express it in ways that align with normative empathic communication; targeted feedback helps close that expression gap. Skill Acquisition | mixed | medium | gap between experienced empathy and expressed empathic moves (alignment with normative criteria) |
n=968
0.6
|
| The Lend an Ear platform collected a large conversational corpus: 33,938 messages across 2,904 conversations with 968 participants. Research Productivity | null_result | high | corpus size (number of messages, conversations, participants) |
n=33938
1.0
|
| A data-driven taxonomy was derived mapping common idiomatic empathic moves (e.g., validation, perspective-taking, emotional labeling, offers of support) used in naturalistic support conversations. Research Productivity | null_result | high | taxonomy of empathic communication moves (categorical coding scheme) |
n=33938
1.0
|
| Personalized LLM coaching produced a statistically significant increase in alignment with the normative empathic taxonomy relative to both the video-based non-personalized feedback and control arms. Training Effectiveness | positive | high | statistical difference in alignment to normative empathic patterns (primary outcome) |
n=968
1.0
|
| Outcome measures included alignment to the normative taxonomy (coding/automated), recipient-rated perceptions of being heard/validated, and blinded empathy judgments. Research Productivity | null_result | high | alignment metrics, recipient-rated perceptions, blinded empathy judgments |
1.0
|
| Results are from role-play contexts and short-term interventions; economic estimates of benefit require validation in field settings, across diverse populations, and with different LLM models. Research Productivity | null_result | high | generalizability/external validity (not directly measured) |
1.0
|
| LLM-driven personalized coaching can cheaply scale soft-skill training (empathy expression) that would otherwise require costly human trainers, suggesting a high-return application of AI in workforce development. Training Effectiveness | positive | speculative | scalability and cost-effectiveness (extrapolated, not directly measured) |
0.1
|
| Attribution (labeling responses as AI) can alter perceived empathy and therefore matters for product design, branding, and disclosure policy decisions. Governance And Regulation | negative | medium | recipient-rated perceptions (being heard/validated) and inferred implications for product outcomes |
0.6
|
| Embedding LLM coaching tools in platforms (employee onboarding, customer support, peer-support communities) could raise overall conversational quality by improving expressive outcomes rather than only informational accuracy. Output Quality | positive | speculative | conversational quality (expressive empathy) — extrapolated |
0.1
|
| The taxonomy and measurement approach provide operational metrics to quantify empathic communication for economic analyses (productivity, customer satisfaction, retention). Research Productivity | positive | medium | operational empathic communication metrics (taxonomy-derived measures) |
0.6
|