Tailored AI coaching measurably improves how people express empathy: a randomized trial finds personalized LLM feedback produces more normatively empathic replies than generic training or no intervention. But identical messages lose perceived authenticity when recipients are told they came from AI, highlighting a trade-off between content quality and attribution.

Practicing with Language Models Cultivates Human Empathic Communication

Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Bruce Lambert, Matthew Groh · March 16, 2026

arxiv rct high evidence 7/10 relevance Source PDF

A pre-registered RCT shows that brief, personalized LLM coaching causally increases alignment of participants' responses with a normative taxonomy of empathic expressions compared with generic video training or no feedback, though identical replies labeled as AI are rated as less validating by recipients.

Empathy is central to human connection, yet people often struggle to express it effectively. In blinded evaluations, large language models (LLMs) generate responses that are often judged more empathic than human-written ones. Yet when a response is attributed to AI, recipients feel less heard and validated than when comparable responses are attributed to a human. To probe and address this gap in empathic communication skill, we built Lend an Ear, an experimental conversation platform in which participants are asked to offer empathic support to an LLM role-playing personal and workplace troubles. From 33,938 messages spanning 2,904 text-based conversations between 968 participants and their LLM conversational partners, we derive a data-driven taxonomy of idiomatic empathic expressions in naturalistic dialogue. Based on a pre-registered randomized experiment, we present evidence that a brief LLM coaching intervention offering personalized feedback on how to effectively communicate empathy significantly boosts alignment of participants' communication patterns with normative empathic communication patterns relative to both a control group and a group that received video-based but non-personalized feedback. Moreover, we find evidence for a silent empathy effect that people feel empathy but systematically fail to express it. Nonetheless, participants reliably identify responses aligned with normative empathic communication criteria as more expressive of empathy. Together, these results advance the scientific understanding of how empathy is expressed and valued and demonstrate a scalable, AI-based intervention for scaffolding and cultivating it.

Summary

Main Finding

A brief, personalized coaching intervention delivered by a large language model significantly improves people’s ability to produce responses that align with normative, idiomatic empathic communication patterns. Although LLM-generated replies are often judged more empathic than human-written replies in blinded tests, simply labeling a reply as AI reduces recipients’ feelings of being heard and validated. The study also documents a “silent empathy” effect: people often feel empathy but fail to express it, and targeted feedback helps close that expression gap.

Key Points

Blinded evaluations: LLM-generated responses frequently score as more empathic than human-written responses.
Attribution effect: When identical replies are labeled as coming from AI rather than from a human, recipients report feeling less heard and less validated.
Lend an Ear platform: An experimental conversation system where participants respond empathically to an LLM role-playing personal and workplace problems.
Large dataset: 33,938 messages across 2,904 conversations with 968 participants were collected and analyzed to derive a data-driven taxonomy of idiomatic empathic expressions in natural dialogue.
Data-driven taxonomy: The authors map common idiomatic empathic moves used in naturalistic support conversations (e.g., validation, perspective-taking, emotional labeling, offers of support) to create normative criteria for empathic communication.
Pre-registered randomized experiment: Participants were randomly assigned to (a) personalized LLM coaching (feedback tailored to their messages), (b) video-based but non-personalized feedback, or (c) control. The personalized LLM coaching produced a statistically significant increase in alignment with normative empathic patterns relative to both other conditions.
Silent empathy effect: Evidence that people often experience empathic concern but do not express it in ways that align with normative empathic communication; however, participants consistently rate responses that match the normative criteria as more expressive of empathy.

Data & Methods

Platform and scenarios: Lend an Ear presented participants with text-based role-play prompts (personal and workplace troubles) and collected free-form empathic responses to an LLM playing the distressed counterpart.
Sample and corpus: 968 participants; 2,904 conversations; 33,938 messages used to build and analyze communication patterns.
Taxonomy derivation: Data-driven analysis of message text produced a taxonomy of idiomatic empathic expressions used in naturalistic dialogue (operationalizing moves such as validation, acknowledging feelings, perspective taking, normalizing, and offering help).
Experimental design: Pre-registered randomized controlled trial with three arms:
Personalized LLM coaching — brief, tailored feedback on participants’ recent messages, instructing how to communicate empathy more effectively.
Video-based non-personalized feedback — generic, instructional video material about empathic communication.
Control — no feedback.
Outcome measures: Primary outcomes measured alignment of participants’ subsequent communication with the normative taxonomy (coding/automated measures), plus recipient-rated perceptions of being heard/validated and blinded empathy judgments.
Statistical inference: The personalized coaching produced significant increases in alignment to normative empathic patterns compared with both the non-personalized video and control arms (pre-registered analysis).

Implications for AI Economics

Scalable human capital development: LLM-driven, personalized coaching can cheaply scale soft-skill training (empathy expression) that would otherwise require costly human trainers, suggesting a high-return application of AI in workforce development.
Augmentation vs. substitution in emotional labor: AI can augment workers’ expressive skills (e.g., customer service, therapy-adjacent roles) rather than simply replacing roles; better expression of empathy can improve service quality while preserving human responsibilities that depend on attribution.
Attribution and market signaling: The attribution effect—reduced perceived empathy when responses are labeled AI—matters for product design, branding, and disclosure policies. Firms must balance transparency with user experience; attribution can alter demand and satisfaction even when content quality is high.
Product design and platform policy: Embedding LLM coaching tools in platforms (employee onboarding, customer support dashboards, peer-support communities) could raise overall conversational quality. Platforms should measure expressive outcomes, not just informational accuracy.
Measurement and valuation of soft skills: The taxonomy and measurement approach provide operational metrics to quantify empathic communication in economic analyses (productivity, customer satisfaction, retention), enabling better cost-benefit calculations for deploying coaching interventions.
Externalities and ethics: Scaling empathic coaching raises issues around authenticity, manipulation, and dependence. Markets and regulators will need to consider disclosure, informed consent, and potential effects on social norms (e.g., if people rely on coached phrasing rather than genuine engagement).
Research and deployment caveats: Results are from role-play contexts and short-term interventions; economic estimates of benefit require validation in field settings, across diverse populations, and with different LLM models.

Assessment

Paper Typerct Evidence Strengthhigh — A pre-registered RCT with near-1,000 participants, blinded evaluations, and large conversational corpus provides strong causal evidence that personalized LLM coaching changes measured empathic expression; caveats remain about external validity (role-play setting, short-term follow-up) and the extent to which measured alignment maps to real-world economic outcomes. Methods Rigorhigh — Rigorous design elements include pre-registration, random assignment, a large annotated corpus used to derive a normative taxonomy, blinded human judgments, and automated coding; potential limitations are reliance on taxonomy/automated measures whose construct validity could be further externally validated and limited information on sample representativeness and long-term effects. Sample968 participants produced 2,904 role-play conversations (33,938 messages) in an online platform where an LLM played distressed interlocutors (personal and workplace scenarios); participants were randomized to personalized LLM coaching, generic video feedback, or control; demographic details and sampling frame not specified in the summary. Themesskills_training human_ai_collab productivity adoption IdentificationPre-registered randomized controlled trial: participants were randomly assigned to personalized LLM coaching, non-personalized video feedback, or control, with causal effects identified by between-arm comparisons; outcome assessment included blinded third-party empathy judgments and automated/coded alignment to a pre-specified taxonomy. GeneralizabilityRole-play contexts may not capture stakes, dynamics, or emotional complexity of real-world interactions (e.g., customer service, therapy, peer support)., Short-term outcomes measured immediately after intervention; durability of behavior change is unknown., Sample recruitment and demographics not described—may be an online convenience sample limiting population representativeness., Results depend on the specific LLM(s) and coaching design used; different models or prompts could yield different effects., Primary outcomes are alignment to a taxonomy and perceived validation, not direct labor-market or productivity metrics., Cultural and linguistic variation: taxonomy and norms of empathic expression may not generalize across languages or cultures.

Claims (13)

Claim	Direction	Confidence	Outcome	Details
A brief, personalized coaching intervention delivered by a large language model significantly improves participants' alignment with normative, idiomatic empathic communication patterns. Skill Acquisition	positive	high	alignment with normative empathic patterns (coding/automated alignment metrics)	n=968 1.0
LLM-generated responses frequently score as more empathic than human-written responses in blinded evaluations. Output Quality	positive	medium	blinded empathy judgments (perceived empathy ratings)	0.6
When identical replies are labeled as coming from AI rather than from a human, recipients report feeling less heard and less validated (an attribution effect). Consumer Welfare	negative	high	recipient-rated feelings of being heard and validated	1.0
The study documents a 'silent empathy' effect: people often feel empathic concern but fail to express it in ways that align with normative empathic communication; targeted feedback helps close that expression gap. Skill Acquisition	mixed	medium	gap between experienced empathy and expressed empathic moves (alignment with normative criteria)	n=968 0.6
The Lend an Ear platform collected a large conversational corpus: 33,938 messages across 2,904 conversations with 968 participants. Research Productivity	null_result	high	corpus size (number of messages, conversations, participants)	n=33938 1.0
A data-driven taxonomy was derived mapping common idiomatic empathic moves (e.g., validation, perspective-taking, emotional labeling, offers of support) used in naturalistic support conversations. Research Productivity	null_result	high	taxonomy of empathic communication moves (categorical coding scheme)	n=33938 1.0
Personalized LLM coaching produced a statistically significant increase in alignment with the normative empathic taxonomy relative to both the video-based non-personalized feedback and control arms. Training Effectiveness	positive	high	statistical difference in alignment to normative empathic patterns (primary outcome)	n=968 1.0
Outcome measures included alignment to the normative taxonomy (coding/automated), recipient-rated perceptions of being heard/validated, and blinded empathy judgments. Research Productivity	null_result	high	alignment metrics, recipient-rated perceptions, blinded empathy judgments	1.0
Results are from role-play contexts and short-term interventions; economic estimates of benefit require validation in field settings, across diverse populations, and with different LLM models. Research Productivity	null_result	high	generalizability/external validity (not directly measured)	1.0
LLM-driven personalized coaching can cheaply scale soft-skill training (empathy expression) that would otherwise require costly human trainers, suggesting a high-return application of AI in workforce development. Training Effectiveness	positive	speculative	scalability and cost-effectiveness (extrapolated, not directly measured)	0.1
Attribution (labeling responses as AI) can alter perceived empathy and therefore matters for product design, branding, and disclosure policy decisions. Governance And Regulation	negative	medium	recipient-rated perceptions (being heard/validated) and inferred implications for product outcomes	0.6
Embedding LLM coaching tools in platforms (employee onboarding, customer support, peer-support communities) could raise overall conversational quality by improving expressive outcomes rather than only informational accuracy. Output Quality	positive	speculative	conversational quality (expressive empathy) — extrapolated	0.1
The taxonomy and measurement approach provide operational metrics to quantify empathic communication for economic analyses (productivity, customer satisfaction, retention). Research Productivity	positive	medium	operational empathic communication metrics (taxonomy-derived measures)	0.6