Generative AI boosts customer-service throughput and agent productivity, especially by automating routine queries and drafting replies, but quality failures and trust risks mean it complements rather than replaces humans; hybrid human-AI systems with strong governance produce the best results.

The Effectiveness of ChatGPT in Customer Service and Communication Management (Nano Review)

Horn Sarun · March 27, 2026 · Zenodo (CERN European Organization for Nuclear Research)

openalex review_meta medium evidence 7/10 relevance DOI Source PDF

Generative AI materially raises customer-service productivity through 24/7 automation, scalable personalization, and agent augmentation, but is not a substitute for humans—hybrid systems with transparent AI use, clear escalation, and continuous oversight yield the best outcomes.

Abstract The integration of generative artificial intelligence, epitomized by large language models like ChatGPT, is instigating a foundational shift in customer service and strategic communication management. This nano review evaluates its operational effectiveness, synthesizing evidence of its capacity to drive transformative efficiencies through 24/7 automation, hyper-personalization at scale, and robust agent augmentation. Concurrently, the analysis identifies critical, persistent challenges that threaten service quality and erode consumer trust, including the model’s propensity for contextual misunderstanding and factual “hallucinations,” its inherent lack of genuine empathy and emotional intelligence, and significant integration complexities. The review posits that optimal effectiveness is not achieved through full automation but through a strategically designed hybrid model. This paradigm necessitates transparent AI deployment, seamless human escalation pathways, and continuous oversight to leverage AI’s scalability while safeguarding the relational fidelity, nuanced problem-solving, and trust that define superior customer experience. Keywords: ChatGPT, customer service, conversational AI, service automation, customer experience (CX), AI ethics, human-AI collaboration, service quality, communication management

Summary

Main Finding

Generative AI (e.g., ChatGPT) can materially improve customer service productivity through 24/7 automation, scalable personalization, and agent augmentation—but is not a substitute for humans. Best outcomes arise from hybrid systems that combine AI scalability with human judgment, transparent AI use, clear escalation pathways, and continuous oversight to avoid quality failures and trust erosion.

Key Points

Benefits
- 24/7 automation reduces routine handling time and operational costs for simple, repetitive queries.
- Hyper-personalization at scale can increase relevance of responses and customer engagement when fed high-quality signals.
- Agent augmentation (drafting replies, summarizing histories, suggesting actions) raises frontline productivity and can improve response consistency.
Persistent risks
- Factual errors and "hallucinations" create misinformation risks and can produce costly service failures.
- Lack of genuine empathy and emotional intelligence undermines performance on complex or emotionally charged interactions.
- Integration complexity (data access, context continuity, privacy/security, workflow alignment) raises implementation costs and time-to-value.
Governance & design
- Full automation produces trade-offs unfavorable to complex service quality and trust; hybrid models with human-in-the-loop control are preferable.
- Transparency about AI use, seamless escalation to humans, and continuous monitoring/feedback loops are essential mitigations.
Evidence strength & limits
- Current evidence is promising but early: case studies, pilot deployments, and short-run experiments dominate; long-run causal evidence on labor and welfare effects is limited.

Data & Methods

Study type: Nano review / synthesis of operational evidence and early empirical work on conversational generative models in customer service contexts.
Sources synthesized: deployments, pilot studies, vendor reports, experimental A/B tests where available, and published academic/industry analyses of conversational AI performance.
Evaluation criteria used in synthesis:
- Accuracy and factuality (hallucination rates)
- Response relevance and personalization quality
- Handling time, uptime (24/7 availability), and throughput
- Agent productivity gains (time saved, draft quality)
- Customer satisfaction, escalations, and trust indicators
- Integration costs (engineering, data plumbing, compliance)
Methodological limitations noted:
- Heterogeneity across firms, channels (chat, voice, email), and customer segments limits external validity.
- Many results are short-run, vendor-supplied, or lack randomized controls.
- Measurement of qualitative outcomes (empathy, trust) is noisy; long-term dynamic effects (behavioral adaptation, labor reallocation) are under-researched.

Implications for AI Economics

Productivity vs. Quality trade-off
- Generative AI raises measurable productivity (lower marginal cost per interaction) but introduces quality and trust externalities; optimal deployment balances these.
Labor reallocation and skill complementarities
- Expect reallocation from routine frontline tasks toward higher-skill supervision, escalation handling, and CX design. Demand for skills in prompt engineering, AI oversight, and emotional/complex problem-solving rises.
- Wages may diverge: routine-role wages downward pressure; premium for supervisory and relational skills increases.
Investment and fixed costs
- Upfront costs (integration, data engineering, compliance) and recurring governance costs (monitoring, retraining, content moderation) mean smaller firms may face higher relative costs—potentially increasing scale advantages for larger incumbents.
Measurement & incentive design
- Firms must redesign KPIs to capture trust-related externalities (accuracy, escalation rates, repeat contacts) rather than only speed and throughput to avoid perverse incentives.
Consumer surplus and demand effects
- Improved availability and personalization can increase consumer welfare for routine interactions, but trust failures can reduce long-term demand or increase churn; net welfare depends on governance quality.
Regulation, liability, and disclosure
- Policy responses (disclosure requirements, liability for misinformation, auditability) will affect deployment costs and firm strategy. Transparent AI use and human escalation pathways lower regulatory and reputational risk.
Market structure & competition
- Superior AI integration and oversight capabilities can create competitive differentiation. However, if quality failures are widespread, market friction from trust erosion could benefit providers with stronger human-AI blends.
Research priorities for economics
- Randomized controlled trials on hybrid vs. automated routing, long-run studies on labor markets in service sectors, and models quantifying trust externalities and governance costs are high priority to inform policy and firm strategy.

Assessment

Paper Typereview_meta Evidence Strengthmedium — Synthesis draws on deployments, vendor reports, pilot A/B tests and early academic/industry analyses that consistently report productivity gains, uptime and draft-assist benefits, but most evidence is short-run, heterogeneous, and often non-randomized or vendor-supplied, limiting causal claims and external validity. Methods Rigormedium — The nano-review applies sensible evaluation criteria across multiple operational metrics and studies, but lacks a preregistered systematic review protocol, formal meta-analytic aggregation, and is constrained by variable study designs, measurement noise on qualitative outcomes, and sparse long-run or randomized evidence. SampleSynthesis of operational evidence from customer-service deployments and pilot programs, vendor reports, available A/B tests and short-run experiments, and published academic and industry analyses of conversational generative models across channels (chat, voice, email). Themesproductivity human_ai_collab labor_markets governance adoption GeneralizabilityHeterogeneity across firms (size, sector) limits transferability of measured effects, Different channels (chat vs. voice vs. email) and customer segments show varying efficacy, Many findings are short-run; long-run behavioral and labor-market adjustments are under-studied, Vendor-supplied and non-randomized studies risk bias and overestimate effects, Geographic and regulatory contexts (privacy, liability) affect implementation costs and outcomes

Claims (19)

Claim	Direction	Confidence	Outcome	Details
Generative AI can materially improve customer service productivity through 24/7 automation, scalable personalization, and agent augmentation — but is not a substitute for humans. Firm Productivity	mixed	medium	productivity metrics (handling time, agent productivity), uptime/availability, throughput	0.14
24/7 automation reduces routine handling time and operational costs for simple, repetitive queries. Task Completion Time	positive	medium	routine handling time; operational cost per interaction	0.14
Hyper-personalization at scale can increase relevance of responses and customer engagement when fed high-quality signals. Output Quality	positive	medium	response relevance; customer engagement (clicks, session length, follow-up contacts)	0.14
Agent augmentation (drafting replies, summarizing histories, suggesting actions) raises frontline productivity and can improve response consistency. Task Completion Time	positive	medium	agent productivity (time per case saved), consistency of responses	0.14
Factual errors and 'hallucinations' create misinformation risks and can produce costly service failures. Error Rate	negative	high	factual accuracy / hallucination rate; incidents of service failure (operational or reputational cost)	0.24
Lack of genuine empathy and emotional intelligence undermines performance on complex or emotionally charged interactions. Consumer Welfare	negative	medium	customer satisfaction/trust in emotionally charged interactions; resolution quality for complex cases	0.14
Integration complexity (data access, context continuity, privacy/security, workflow alignment) raises implementation costs and time-to-value. Organizational Efficiency	negative	medium	implementation cost; time-to-value (time until measurable benefits)	0.14
Full automation produces trade-offs unfavorable to complex service quality and trust; hybrid models with human-in-the-loop control are preferable. Output Quality	mixed	medium	service quality metrics; customer trust; escalation rates	0.14
Transparency about AI use, seamless escalation to humans, and continuous monitoring/feedback loops are essential mitigations to avoid quality failures and trust erosion. Governance And Regulation	positive	low	trust indicators; error detection/mitigation rates; successful escalations	0.07
Current evidence is promising but early: case studies, pilot deployments, and short-run experiments dominate; long-run causal evidence on labor and welfare effects is limited. Research Productivity	null_result	high	quality and duration of evidence (study types, presence of randomized controls)	0.24
Generative AI raises measurable productivity (lower marginal cost per interaction) but introduces quality and trust externalities; optimal deployment balances these trade-offs. Firm Productivity	mixed	medium	marginal cost per interaction; quality/trust metrics (accuracy, escalation, churn)	0.14
Expect labor reallocation from routine frontline tasks toward higher-skill supervision, escalation handling, and customer experience design; demand for prompt engineering and AI oversight rises. Employment	mixed	low	employment composition by task/skill; demand for new job categories	0.07
Wages may diverge: downward pressure on routine-role wages and a premium for supervisory and relational skills. Wages	mixed	low	wage levels by role (routine vs. supervisory/relational)	0.07
Upfront integration and recurring governance costs mean smaller firms may face higher relative costs — potentially increasing scale advantages for larger incumbents. Market Structure	negative	low	relative upfront and ongoing costs; indicators of scale advantages or market concentration	0.07
Firms must redesign KPIs to capture trust-related externalities (accuracy, escalation rates, repeat contacts) rather than only speed and throughput to avoid perverse incentives. Organizational Efficiency	positive	low	KPI design adoption; changes in perverse incentive outcomes (accuracy, repeat contacts)	0.07
Improved availability and personalization can increase consumer welfare for routine interactions, but trust failures can reduce long-term demand or increase churn; net welfare depends on governance quality. Consumer Welfare	mixed	low	consumer surplus measures; demand/churn rates	0.07
Policy responses (disclosure requirements, liability for misinformation, auditability) will affect deployment costs and firm strategy; transparent AI use and human escalation pathways lower regulatory and reputational risk. Governance And Regulation	mixed	low	deployment costs; regulatory risk exposure; incidence of reputational events	0.07
Superior AI integration and oversight capabilities can create competitive differentiation; if quality failures are widespread, providers with stronger human-AI blends may gain market advantage. Market Structure	mixed	low	market share; competitive advantage indicators; incidence of quality failures	0.07
High-priority research includes randomized controlled trials on hybrid vs. automated routing, long-run studies on labor markets in service sectors, and models quantifying trust externalities and governance costs. Research Productivity	null_result	high	research output (RCTs, long-run studies, models) addressing the specified gaps	0.24