The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

Chatty LLMs make AI easier to use but also easier to misread: conversational style boosts adoption and short-term engagement while fostering overtrust and anthropomorphism; clear disclosure, uncertainty cues, provenance, and regulatory standards can curb these harms and correct market failures.

Why We Need to Destroy the Illusion of Speaking to A Human: Critical Reflections On Ethics at the Front-End for LLMs
Sarah Diefenbach, Daniel Ullrich · March 17, 2026
arxiv commentary n/a evidence 7/10 relevance Source PDF
Conversational LLM interfaces increase usability and adoption but induce misleading mental models and overtrust, so ethical front-end design plus institutional measures are needed to mitigate harms and market failures.

Conversation with chatbots based on Large Language Models (LLMs) such as ChatGPT has become one of the major forms of interaction with Artificial Intelligence (AI) in everyday life. What makes this interaction so convenient is that interacting with LLMs feels so natural, and resembles what we know from real, human conversations. At the same time, this seeming similarity is part of one of the ethical challenges of AI design, since it activates many misleading ideas about AI. We discuss similarities and differences between human-AI-conversations and interpersonal conversation and highlight starting points for more ethical design of AI at the front-end.

Summary

Main Finding

LLM-based chatbots’ conversational naturalness increases usability and adoption but also triggers misleading mental models (e.g., anthropomorphism, overtrust). Ethical front-end design—explicit disclosure of capabilities/limits, uncertainty cues, user controls, and interface affordances—can reduce harms and important market failures in AI-enabled interactions.

Key Points

  • Natural conversational style is a double-edged sword:
    • Pro: lowers friction, raises engagement and productivity.
    • Con: creates the impression the system is human-like, intentional, or reliably knowledgeable.
  • Common misleading beliefs activated by chat-like interfaces:
    • Overtrust in correctness and robustness.
    • Attribution of goals, understanding, or moral agency.
    • Underestimation of hallucination, biases, or privacy risks.
  • Important distinctions from interpersonal conversation:
    • No true beliefs, intentions, or accountability; outputs are probabilistic and can be inconsistent.
    • Different failure modes (hallucination, calibration errors) and opaque training/data provenance.
  • Ethical front-end design principles highlighted:
    • Clear, salient disclosure that the user is interacting with an AI (not a human).
    • Explicit statements of capability limits and typical failure modes.
    • Real-time uncertainty/credibility signals for model outputs.
    • Easy access to provenance, sources, and the option to request citations.
    • Controls for personalization, data retention, and opt-out.
    • Escalation/fallback paths to human assistance where appropriate.
    • Logging, auditability, and user-consent flows for sensitive use.
  • Design alone is necessary but not sufficient: institutional measures (standards, certification, liability rules) are also important.

Data & Methods

  • Type of work: conceptual, normative, and design-oriented analysis (no large-scale empirical treatment described).
  • Methods used or recommended:
    • Literature synthesis from human-computer interaction, ethics, and conversational analysis.
    • Comparative analysis of conversational norms vs. human dialogue.
    • Design heuristics and prototypical UI interventions.
    • Suggested empirical follow-ups: lab/field experiments, user surveys, A/B tests of disclosure cues, dialog-corpus analyses to quantify user misunderstanding and overtrust.
  • Measurement suggestions for future empirical work:
    • Metrics for calibration (user trust vs. model accuracy), hallucination rate, user comprehension of capability limits, behavioral dependence on system recommendations.

Implications for AI Economics

  • Demand and adoption
    • Natural interfaces lower search and transaction costs, increasing demand for AI services and expanding markets.
    • Misleading cues can create short-term surplus (user satisfaction) but long-term welfare losses if overtrust causes harms or misinformation.
  • Market failures and information asymmetries
    • Users often cannot assess model reliability; providers may have weak incentives to disclose limitations (information asymmetry).
    • Concealed failure modes create negative externalities (misinformation, reputational spillovers) that markets may underprice.
  • Product differentiation and competition
    • Firms can compete on front-end design (transparency, trustworthiness) as a quality signal—this can be a socially beneficial axis if consumers value accuracy/safety.
    • Absent regulation, competition might instead favor more persuasive (but less honest) interfaces that increase engagement.
  • Regulation, liability, and certification
    • Economic rationale for disclosure mandates, certification of model properties (e.g., hallucination rates), or liability rules to internalize externalities.
    • Front-end design standards (e.g., mandatory identity disclosure) can be low-cost interventions with high social benefit.
  • Labor and task allocation
    • Easier conversational access to models can substitute for routine cognitive labor but may complement high-skill work; miscalibrated trust affects labor outcomes and supervision costs.
  • Measurement of value and welfare
    • Welfare assessments should account for both productivity gains from natural interfaces and harms from misperception (medical, legal, financial contexts).
    • Cost–benefit analysis of design interventions (UI changes, disclosures) is needed: often cheap to implement with outsized economic benefits.
  • Policy recommendations from an economics perspective
    • Mandate salient AI disclosure and provenance signals for decision-critical contexts.
    • Encourage (or require) uncertainty indicators and provenance for high-stakes outputs.
    • Support standardized benchmarks and third-party certification to reduce information asymmetries.
    • Subsidize research and field trials to identify which front-end interventions maximize net social welfare.

Brief research agenda: quantify how different disclosure and uncertainty cues change user trust, decision quality, and downstream economic outcomes; estimate welfare gains from design- and policy-based corrections to overtrust and information asymmetry.

Assessment

Paper Typecommentary Evidence Strengthn/a — This is a conceptual, normative, and design-oriented analysis that synthesizes prior literature and reasoning rather than presenting new empirical or causal evidence; claims are plausible but not validated with systematic data or identification strategies. Methods Rigormedium — The paper provides a coherent synthesis of human–computer interaction, ethics, and conversational analysis and offers concrete design heuristics and measurement suggestions, but it does not implement empirical protocols, pre-registered tests, or quantitative analyses that would be required for high empirical rigor. SampleNo empirical sample or original dataset; the work is a literature synthesis and conceptual analysis drawing on HCI, ethics, conversational analysis, and illustrative examples of LLM-based chat interfaces, and it proposes recommended lab/field experiments, surveys, and A/B tests for future work. Themesadoption governance GeneralizabilityNo empirical validation — recommendations are untested across real-world deployments, Findings may vary across domains (low- vs high-stakes: e.g., entertainment vs. medical/legal/financial), Cultural and language differences may alter conversational interpretations and trust dynamics, Effects may differ by user population (age, expertise, digital literacy) and task complexity, Different LLM architectures, model sizes, and fine-tuning/data provenance could change failure modes and user responses, Firms' incentives, business models, and regulatory environments vary across jurisdictions, affecting adoption and disclosure practices, UI/UX contexts beyond chat (multimodal, embedded assistants, voice agents) may not map directly to the recommendations

Claims (15)

ClaimDirectionConfidenceOutcomeDetails
LLM-based chatbots’ conversational naturalness increases usability and adoption but also triggers misleading mental models (e.g., anthropomorphism, overtrust). Ai Safety And Ethics mixed medium usability, adoption (engagement/use rates), and prevalence of misleading mental models such as anthropomorphism and overtrust
conversational naturalness increases usability/adoption but triggers misleading mental models (anthropomorphism, overtrust)
0.01
Natural conversational style lowers friction and raises engagement and productivity. Task Completion Time positive medium user engagement, task completion speed/productivity, friction (barriers to use)
natural conversational style lowers friction and raises engagement and productivity
0.01
Natural conversational style creates the impression the system is human-like, intentional, or reliably knowledgeable. Ai Safety And Ethics negative medium user beliefs about system humanness, intentionality, and perceived reliability
natural conversational style increases impression of system humanness/intentionality and perceived reliability
0.01
Chat-like interfaces commonly activate misleading beliefs including overtrust in correctness/robustness, attribution of goals or moral agency, and underestimation of hallucination/bias/privacy risks. Ai Safety And Ethics negative medium incidence of overtrust, attribution of agency, and underestimation of model failure modes and privacy risks
chat-like interfaces commonly activate overtrust, attribution of agency, and underestimation of hallucination/privacy risks
0.01
Conversational AI differs from interpersonal conversation: it has no true beliefs/intentions or accountability and produces probabilistic, sometimes inconsistent outputs with opaque training/data provenance. Ai Safety And Ethics null_result high ontological status of AI outputs (beliefs/intentions/accountability) and properties of output generation (probabilistic consistency, provenance transparency)
conversational AI lacks true beliefs/intentions/accountability and produces probabilistic, sometimes inconsistent outputs with opaque provenance
0.01
Ethical front-end design—explicit disclosure of AI identity, capability limits, uncertainty cues, provenance, user controls, and escalation paths—can reduce harms and important market failures in AI-enabled interactions. Ai Safety And Ethics positive medium reduction in harms (e.g., misinformation, overtrust), improvement in user understanding/calibration, mitigation of market failures
ethical front-end design can reduce harms and market failures (design recommendation)
0.01
Real-time uncertainty/credibility signals and easy access to provenance (citations) should be provided to users to improve trust calibration. Ai Safety And Ethics positive medium user trust calibration (alignment of trust with model accuracy), decision quality, and perceived credibility
real-time uncertainty/credibility signals recommended to improve trust calibration
0.01
Controls for personalization, data retention, opt-out, and escalation to human assistance are important interface affordances to mitigate risks in conversational AI. Ai Safety And Ethics positive medium user privacy outcomes, incidence of inappropriate dependence, availability/use of human assistance when needed
controls for personalization, data retention, opt-out, escalation to humans recommended to mitigate risks
0.01
Design interventions alone are necessary but not sufficient; institutional measures (standards, certification, liability rules) are also important to address harms and market failures. Governance And Regulation positive medium reduction in negative externalities, corrected information asymmetries, and improved social welfare
design interventions necessary but not sufficient; institutional measures (standards, certification, liability) also important
0.01
Natural conversational interfaces lower search and transaction costs, increasing demand for AI services and expanding markets. Adoption Rate positive medium demand for AI services, market size/transaction volume, search/transaction costs
natural conversational interfaces lower search/transaction costs, increasing demand and expanding markets (theoretical implication)
0.01
Misleading cues can create short-term surplus (user satisfaction) but long-term welfare losses if overtrust causes harms or misinformation. Consumer Welfare mixed medium short-term user satisfaction vs. long-term welfare (harms from misinformation/overtrust)
misleading cues can create short-term user satisfaction but long-term welfare losses via overtrust/misinformation
0.01
Firms can compete on front-end design (transparency, trustworthiness) as a socially beneficial quality signal, but absent regulation competition may favor more persuasive (less honest) interfaces. Market Structure mixed medium firm competition strategies, prevalence of transparent vs. persuasive interfaces, consumer welfare
firms can compete on front-end design; absent regulation competition may favor more persuasive (less honest) interfaces
0.01
There is an economic rationale for disclosure mandates, certification of model properties (e.g., hallucination rates), and liability rules to internalize externalities from conversational AI. Governance And Regulation positive medium degree to which disclosure/certification/liability reduce externalities and improve market outcomes
economic rationale for disclosure mandates, certification, liability rules to internalize externalities
0.01
Easier conversational access to models can substitute for routine cognitive labor while complementing high-skill work; miscalibrated trust affects labor outcomes and supervision costs. Task Allocation mixed medium labor substitution for routine tasks, complementarity with high-skill tasks, supervision costs, labor outcomes
conversational access can substitute for routine cognitive labor while complementing high-skill work; miscalibrated trust affects supervision costs
0.01
Future empirical work should measure calibration (user trust vs. model accuracy), hallucination rate, user comprehension of capability limits, and behavioral dependence on system recommendations. Other null_result high calibration metrics, hallucination rates, user comprehension, behavioral dependence
recommendation to measure calibration, hallucination rate, comprehension, behavioral dependence (future work)
0.01

Notes