The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

Generative AI can make clinicians faster and improve diagnostic suggestions, but real-world benefits are uncertain: hallucinations, bias, liability gaps and perverse incentives could offset gains unless deployment, payment and regulation are carefully designed.

GenAI and clinical decision making in general practice
Luke Allen · March 09, 2026 · British Journal of General Practice
openalex review_meta low evidence 7/10 relevance DOI Source PDF
Generative AI clinical decision support can raise diagnostic quality, speed workflows, and expand access, but substantial risks (hallucinations, bias, liability, deskilling) and uncertain incentive effects mean net health and economic impacts depend on deployment design, payment models, and governance.

potentially transformative impact -for good and ill -lies in GenAI clinical decision support?

Summary

Main Finding

Generative AI (GenAI) in UK general practice is already widely used for transcription and administrative tasks, and is increasingly being experimented with for clinical decision support. While GenAI could materially improve diagnostic accuracy, guideline tailoring, and efficiency, there is scant real‑world evidence on how clinicians actually use these tools for decision making, and substantial risks (hallucination, bias, liability, skill degradation, digital exclusion) that must be managed. The net economic and clinical impact is therefore uncertain and requires targeted evaluation and governance.

Key Points

  • Current uptake and uses
    • GP use of AI rose >25% in the past two years, driven largely by ambient voice transcription/scribes.
    • Most clinicians use general-purpose chatbots (ChatGPT, Gemini, Claude, Copilot) rather than medically specialised agents.
    • Common applications: scribing, administrative automation, learning support; a minority use AI for clinical decision support, though anecdotal use is growing (sense‑checking plans, differential diagnosis, interpreting results, second opinions).
  • Benefits (promise)
    • Potential to access and apply the totality of evidence, tailor guidelines to multimorbidity, and integrate records to spot patterns earlier—thus reducing unwarranted variation and missed diagnoses.
    • Integration with genomic, wearable, and exposomic data could be transformative.
    • May materially reduce administrative burden and clinical errors if properly implemented.
  • Harms and limitations (peril)
    • Hallucinations and lack of true clinical understanding; transcription errors (e.g., lithium → sodium) are consequential.
    • Unclear guidance/regulation, medicolegal liability, privacy/consent concerns, bias in training data, model drift, and digital exclusion.
    • Automation bias and emerging evidence that clinician+AI teams can underperform compared with the best of either alone.
    • Risk of skill degradation and changing nature of the doctor–patient relationship; possible long‑term redundancy for higher‑order clinical tasks.
  • Knowledge gaps
    • Virtually no qualitative data on real-world prompts, timing of use, how clinicians appraise and apply GenAI outputs.
    • NHS lacks a clear picture of GP needs for decision‑support AI despite policy ambitions to make AI “every doctor’s trusted assistant.”
  • Recommended professional response
    • Improve AI literacy among clinicians (prompting, appraisal).
    • Proactively evaluate triadic (doctor–patient–AI) consultation models, mitigate limitations, and maximise patient benefit.
    • Treat GenAI as a tool with known flaws requiring expertise to apply.

Data & Methods

  • Document type: editorial / commentary synthesising literature, recent surveys, policy documents, and author experience—no new primary empirical data.
  • Sources cited include:
    • Surveys and reports (e.g., RCGP Voice survey, Nuffield Trust, BMJ Health Care Inform survey of GPs).
    • Recent literature on AI in primary care, automation bias, and clinician trustworthiness.
    • Anecdotal clinical examples supplied by the author (illustrative transcription error).
  • Limitations of evidence base noted by the author: lack of qualitative and observational data on how GenAI is used in real consultations; limited clinical trial data on GenAI decision support.

Implications for AI Economics

  • Economic channels affected
    • Productivity and time savings: ambient scribing and administrative automation can reduce non‑clinical workload, freeing GP time (value depends on accuracy and integration costs).
    • Diagnostic efficiency and error costs: improved detection could lower avoidable harms and downstream costs; conversely, hallucinations/automation bias could impose new error costs.
    • Labor demand and skill composition: short‑term augmentation may raise GP productivity; long‑term deskilling could alter demand for tasks and workforce training needs.
    • Market formation and pricing: demand for specialised medical AI agents, integration services, and data platforms will create new markets; vendors with EHR access may capture large rents.
    • Liability and insurance costs: ambiguity over responsibility for AI-supported decisions may raise malpractice premiums or require new indemnity schemes.
    • Equity and distributional effects: adoption skewed by clinician demographics and practice affluence risks widening inequalities in care access and outcomes.
    • Public goods and data value: linking AI to national EHRs, genomics, and wearables raises questions about data governance, monetisation, and social returns.
  • Policy and financing implications
    • Need for investment in AI literacy and clinician training (public funding or conditional reimbursement).
    • Regulatory costs: certification, post‑market surveillance (model drift), and auditing will be necessary and costly.
    • Reimbursement models should clarify whether and how AI use is compensated (e.g., per‑use, bundled, outcome‑based).
    • Procurement and competition policy to avoid vendor lock‑in and ensure interoperability.
    • Malpractice / standard‑of‑care norms may shift if GenAI becomes expected practice—affecting liability economics.
  • Priority research and evaluation agenda for health economists
    • Real‑world usage studies (surveys + qualitative) to map who uses GenAI, for what, when, and how outputs are applied.
    • Randomised trials and implementation studies comparing clinician care with/without GenAI decision support on clinical outcomes, errors, and workflow.
    • Cost‑effectiveness and budget‑impact analyses (including downstream costs/savings from avoided harms or introduced errors).
    • Workforce modelling to assess long‑run impacts on GP supply, task allocation, and training needs.
    • Equity analyses quantifying differential uptake and outcomes by socioeconomic status and practice characteristics.
    • Regulatory impact assessments and studies of liability/insurance market responses.
    • Market and competition analyses around data access, interoperability, and vendor concentration.
  • Practical recommendations for economists and policymakers
    • Fund mixed‑methods evaluation (qualitative + quantitative) now to inform procurement and regulation.
    • Incorporate heterogeneity and learning effects in economic models (automation bias, skill decay, model drift).
    • Design procurement and reimbursement policies that incentivise validated clinical benefit, transparency, and equitable access.
    • Require post‑deployment monitoring and independent audits to detect harm and model degradation.
    • Consider public investment in medically‑trained or NHS‑controlled AI agents to reduce private‑market externalities and ensure alignment with national health priorities.

Short takeaway: GenAI offers large potential productivity and quality gains in primary care but also creates novel risk and distributional problems. Robust, timely economic evaluation, governance, and clinician upskilling are essential to ensure net social benefit.

Assessment

Paper Typereview_meta Evidence Strengthlow — The paper is a narrative synthesis of early evaluations (retrospective validations, vignettes, small pilots) and conceptual argumentation rather than new causal empirical evidence; few large-scale RCTs or long-term observational studies exist to demonstrate net clinical or economic effects. Methods Rigormedium — The piece systematically outlines relevant data sources, evaluation methods, and metrics and cites standard approaches (RCTs, stepped-wedge, difference-in-differences, economic evaluation), but it does not present a reproducible systematic review or original empirical analysis and relies on selective synthesis of emerging studies. SampleSynthesis of the emerging literature and practice evidence: retrospective EHR-based validations, diagnostic imaging benchmarks, simulated patient vignettes/standardized case sets, small prospective pilots and clinician A/B tests, limited RCTs, claims and billing data for utilization studies, and vendor-proprietary model evaluations; much of the underlying data are institution- or vendor-specific and not publicly available. Themesproductivity human_ai_collab labor_markets adoption governance GeneralizabilityEvidence concentrated in high-income health systems and particular specialties (e.g., radiology, dermatology) limits transferability to low-resource settings and other clinical domains, Results depend heavily on deployment design and local workflows, so findings from one health system or UI cannot be readily generalized, Vendor-proprietary models and data lead to selection bias in available evaluations and constrain independent replication, Short-term pilots and vignette studies do not capture long-term effects such as deskilling, induced demand, or market-structure changes, Heterogeneity in payment systems (fee-for-service vs. capitation) and regulatory regimes across jurisdictions limits applicability of economic conclusions

Claims (18)

ClaimDirectionConfidenceOutcomeDetails
Generative AI clinical decision support (GenAI CDS) can improve diagnostic and treatment suggestions through synthesis of patient data and medical knowledge, reducing missed diagnoses and standardizing care where evidence is clear. Decision Quality positive medium diagnostic accuracy; guideline concordance; missed-diagnoses rate; treatment quality
0.07
GenAI CDS can save clinician time (faster charting, literature summarization, guideline retrieval), potentially increasing capacity and access. Task Completion Time positive medium clinician time per patient; documentation time; time-to-task completion
0.07
GenAI CDS can extend access to expertise in low-resource settings by supporting non-specialists or overburdened clinicians. Consumer Welfare positive medium access to specialist-level recommendations; capacity (patients served); referral rates
0.07
GenAI models enable personalization (tailored care pathways and risk predictions) by integrating multimodal data (notes, imaging, labs). Output Quality positive medium individualized risk predictions; guideline-concordant personalized care; predictive accuracy
0.07
GenAI CDS systems hallucinate and can produce incorrect but plausible recommendations, which can cause patient harm if trusted unchecked. Ai Safety And Ethics negative high adverse events; erroneous recommendations; clinician reliance/misuse leading to harm
0.12
GenAI CDS can amplify bias and inequities if training data underrepresent groups or reflect historical disparities. Ai Safety And Ethics negative high performance disparities across demographic subgroups; differential error rates; inequitable outcomes
0.12
Liability and accountability gaps exist for AI-suggested errors: it is unclear whether vendors, hospitals, or clinicians are responsible for harms resulting from GenAI CDS recommendations. Governance And Regulation negative medium existence of legal/ liability/ accountability clarity; number of resolved liability cases (not provided)
0.07
Overreliance on GenAI CDS may lead to deskilling of clinicians, eroding judgment over time and increasing systemic vulnerability. Skill Obsolescence negative low clinician diagnostic skill over time; reliance/override rates; error rates when AI unavailable
0.04
GenAI CDS creates data privacy and security risks because of high-value medical data and use of external cloud services. Ai Safety And Ethics negative high data breaches; unauthorized access incidents; compliance violations
0.12
GenAI can reduce clinician time per case (productivity gains) but may increase utilization (more tests/treatments) if it lowers thresholds for intervention or aligns with revenue incentives. Task Completion Time mixed medium clinician time per case; test ordering rates; treatment utilization rates; per-patient spending
0.07
Task-based labor effects: GenAI will substitute routine tasks (documentation, triage) and complement complex decision-making; net employment effects are ambiguous and vary by role. Employment mixed medium employment levels by role; hours worked; task composition; wages
0.07
Proprietary models trained on large clinical datasets can create high entry barriers, concentrating market power among a few platform firms and increasing prices for hospitals. Market Structure negative medium market concentration metrics (HHI); vendor pricing; hospital switching costs
0.07
Reimbursement models (fee-for-service vs. capitation) will influence whether cost savings from GenAI are realized or offset by increased service volume. Firm Revenue mixed high total spending; per-patient cost; service volume under different payment models
0.12
Safety and net benefit of GenAI CDS hinge on deployment details: user interface, real-time feedback, uncertainty quantification, calibration, and how recommendations are presented (strong vs. suggestive). Decision Quality mixed high acceptance/override rates; error rates; calibration metrics; clinician trust
0.12
Human factors (training, trust calibration, workflows) determine whether clinicians accept, override, or ignore GenAI suggestions. Decision Quality mixed high override/acceptance rates; clinician-reported trust and cognitive load; adherence to recommendations
0.12
There are few large-scale randomized controlled trials (RCTs) showing direct patient outcome improvements from GenAI CDS; high-quality real-world and longitudinal studies are limited but essential. Research Productivity null_result high number of large-scale RCTs reporting patient outcome improvements; availability of longitudinal outcome data
0.12
Limited public datasets and vendor lock-in constrain independent reproducible evaluations and audits of current generative models in healthcare. Research Productivity negative high availability of public datasets; reproducibility of model evaluations; number of independent audits
0.12
If deployed without mitigation, GenAI CDS risks widening disparities by performing worse on underrepresented groups or being unequally distributed across resource-rich versus resource-poor settings. Inequality negative high differences in performance/outcomes across demographic and socioeconomic groups; distribution of deployments by facility resource level
0.12

Notes