AI can raise radiology accuracy and throughput, but benefits are conditional on real-world integration—poorly designed deployments risk automation bias, deskilling and workflow disruption that can erode clinical and economic gains.

Human-AI interaction and collaboration in radiology: from conceptual frameworks to responsible implementation.

B. Koçak, Renato Cuocolo · Fetched March 18, 2026 · Diagnostic and Interventional Radiology

semantic_scholar review_meta low evidence 7/10 relevance DOI Source PDF

AI holds clear potential to improve radiologic accuracy and efficiency, but realized clinical and economic value depends critically on how tools are integrated with radiologists, monitored over time, and governed—evidence on patient outcomes, costs, and long-run workforce effects remains sparse.

Artificial intelligence (AI) is entering routine radiology practice, but most studies evaluate algorithms in isolation rather than their interaction with radiologists in clinical workflows. This narrative review summarizes current knowledge on human-AI interaction in radiology and highlights practical risks and opportunities for clinical teams. First, simple conceptual models of human-AI collaboration are described, such as diagnostic complementarity, which explain when radiologists and AI can achieve synergistic performance exceeding that of either alone. Then, AI tool integration strategies along the imaging pathway are reviewed, from acquisition and triage to interpretation, reporting, and teaching, outlining common interaction models and physician-in-the-loop workflows. Cognitive and professional effects of AI integration are also discussed, including automation bias, algorithmic aversion, deskilling, workload management, and burnout, with specific vulnerabilities for trainees. Furthermore, key elements of responsible implementation are summarized, such as liability and oversight implications, continuous monitoring for performance drift, usable explanations, basic AI literacy, and co-design with radiology teams. Finally, emerging systems are introduced, including vision-language models and adaptive learning loops. This review aims to provide a clear and accessible overview to help the radiology community recognize where human-AI collaboration can add value, where it can cause harm, and which questions future studies must address.

Summary

Main Finding

Human–AI collaboration in radiology is most effective when designed around diagnostic complementarity and physician-in-the-loop workflows: AI augments radiologists by handling high-volume, repetitive, and pattern-recognition tasks while humans retain contextual judgment, oversight, and final responsibility. Properly structured collaborations can improve diagnostic accuracy, timeliness, and efficiency, but they require governance, continuous monitoring, explainability, and workforce adaptation to avoid harms (automation bias, algorithmic aversion, deskilling).

Key Points

Diagnostic complementarity and human–AI symbiosis: Combining imperfect, partially non-overlapping error sets of humans and AI can yield team performance superior to either alone; clear role separation and interaction design are critical.
Spectrum of autonomy: Interaction models range from concurrent decision support and second-reader workflows to triage/prioritization and high-confidence auto-finalization; fully autonomous clinical reporting remains conceptual and not currently viable.
Empirical examples:
- Mammography simulations and prospective deployments show AI can reduce human workload (e.g., supporting-reader workflows cut second reads) and modestly increase cancer detection (additional ~0.7–1.6 cancers per 1,000 screens in one prospective study).
- A multi-reader prostate MRI study (61 readers, 360 exams) found AUC improved from 0.882 to 0.916 with AI assistance (sensitivity and specificity gains).
- AI-based triage/prioritization studies report large reductions in turnaround times for critical findings (examples: mean reporting delays reduced from ~11.2 to 2.7 days in a simulation; CT pulmonary angiography turnaround from 59.9 to 47.6 minutes in deployment).
Risks and cognitive impacts: automation bias (overreliance), algorithmic aversion (underuse), cognitive offloading leading to deskilling, context-dependent workload changes, and disproportionate vulnerabilities for trainees.
Design and implementation requirements: configurable, workflow-embedded tools; explainability and usable outputs; explicit manual oversight/arbitration mechanisms; closed-loop learning/active learning for continuous model refinement; AI literacy and co-design with radiology teams.
Governance and safety: need for formal liability frameworks, continuous post-market surveillance for performance drift, explainability standards, and prospective multi-institutional evaluation focused on team performance, equity, and long-term training outcomes.
Emerging tech: vision–language models and adaptive learning loops (active learning) offer draft-reporting and summarization potential but need robust evaluation and guardrails.

Data & Methods

Paper type: narrative (non-systematic) review based on targeted topic-driven literature searches, expert knowledge, and reference chaining.
Evidence synthesized: conceptual models, qualitative field studies, simulation and retrospective reader studies, prospective clinical implementations, and multi-reader diagnostic studies. Representative empirical items described include:
- Large-scale mammography simulation (Ng et al.) using >280,000 exams across sites and vendors assessing “AI as supporting reader”.
- Prospective mammography implementation where AI as an additional reader increased cancer detection modestly with minimal recall increase.
- Multi-center prostate MRI diagnostic study (Prostate Imaging–Cancer AI Consortium): 61 readers across 53 centers, 360 examinations—AUC improvement with AI assistance.
- Reader-study simulations showing human–AI team gains (e.g., Lee et al., 30 radiologists evaluating chest radiographs).
- Triage/prioritization simulations and deployments showing substantial turnaround time reductions for urgent findings.
- Qualitative cross-regional field study (Zając et al.) across Denmark and Kenya on radiologists’ visions for triage and workload distribution.
Limitations acknowledged: narrative review methodology (not systematic), dependence on selected studies that vary in design/quality, and the relative scarcity of prospective, multi-site randomized assessments of human–AI team performance and long-term workforce effects.

Implications for AI Economics

Labor augmentation vs substitution: The paper supports an augmentative model where AI increases radiologist productivity and throughput rather than wholesale substitution. Economic outcomes depend on the degree of task automation, local practice patterns, and regulatory constraints.
Productivity and capacity effects:
- Short-term: AI triage and automation of manual tasks can raise throughput, shorten turnaround times, and reduce backlog—potentially increasing revenue per radiologist or enabling reallocation of radiologist time to higher-value activities.
- Medium/long-term: Persistent efficiency gains may compress per-case prices (if supply expands) or enable greater volume of imaging (demand elasticity), affecting imaging service revenues.
Human capital and training costs:
- Risk of deskilling and the need for retraining entail investment in AI literacy, new curricula, and supervised practice. These are recurrent costs for hospitals, training programs, and payers.
- Trainee vulnerability implies potential long-term impacts on the supply and skill composition of the radiology workforce, with implications for wage dynamics and credentialing requirements.
Implementation and maintenance costs:
- Beyond license fees, institutions face integration (PACS/EHR), customization, governance, continuous monitoring, data labeling/active learning, and liability/insurance costs—raising the total cost of ownership and favoring larger institutions with scale to amortize these expenses.
- Ongoing model maintenance (drift monitoring, retraining) creates recurring operational expenditures and staffing demands (data engineers, AI safety officers).
Market structure and competition:
- Economies of scale and network effects (better models with more data/labels) may concentrate market power among a few large vendors, increasing risks of vendor lock-in and raising bargaining leverage for incumbents.
- Differentiation will favor vendors who provide robust workflow integration, physician-in-the-loop tooling, explainability, and regulatory compliance.
Reimbursement and regulatory uncertainty:
- Economic viability depends on reimbursement models—whether payers reimburse AI-assisted reads, triage services, or only radiologist interpretation. Clear coding/reimbursement pathways (e.g., add-on billing for AI-assisted workflows) will shape adoption speed.
- Liability allocation (who is responsible for errors in physician-in-the-loop vs autonomous systems) influences malpractice insurance and risk premiums—uncertainty can slow adoption and increase costs.
Value and downstream healthcare economics:
- Improved detection (e.g., additional cancers detected) can raise downstream treatment costs but also potentially improve outcomes; cost-effectiveness depends on true-positive vs false-positive tradeoffs, overdiagnosis risks, and downstream care pathways.
- Efficiency gains could enable reallocation of radiologist time to value-enhancing activities (multidisciplinary care, interventional procedures), changing revenue mixes across departments.
Distributional and equity concerns:
- Resource-rich centers are more able to absorb implementation and governance costs, potentially widening performance and outcome gaps between institutions and regions—raising equity issues in access to advanced imaging diagnostics.
Investment implications:
- Investors should value firms that prioritize human-in-the-loop designs, active-learning features that lower labeling costs, strong integration/API capabilities, explainability, and compliance-ready offerings.
- Prospective evaluation data (multi-site trials demonstrating team-level clinical and economic benefits) will materially affect adoption risk premiums and valuations.
Metrics and policy needs:
- Policymakers and payers should incentivize prospective, team-focused evaluations (not just model accuracy), fund training and surveillance infrastructure, and clarify reimbursement and liability frameworks to align economic incentives with safe adoption.

Summary recommendation for economists and decision-makers: evaluate AI investments not solely on model performance but on total cost of ownership (integration, governance, retraining), measurable improvements in team-level productivity/outcomes, and distributional effects across institutions and workforce cohorts. Prioritize procurement of systems designed for physician-in-the-loop use, with transparent monitoring and active-learning capabilities that reduce long-run labeling costs and performance drift.

Assessment

Paper Typereview_meta Evidence Strengthlow — The review synthesizes largely laboratory evaluations, reader studies, simulations, qualitative/usability reports and a small number of observational deployments; there are few randomized or longitudinal real-world studies measuring patient outcomes or economic impacts, and many studies evaluate standalone algorithm accuracy rather than clinician–AI joint performance, limiting causal inference and external validity. Methods Rigormedium — The article provides a coherent conceptual framework and a broad synthesis of interdisciplinary literature (clinical reader studies, usability, organizational analyses), but it is a narrative review rather than a systematic review or meta-analysis, lacks pre-registered inclusion criteria, quantitative pooling, and is subject to selection and publication biases. SampleEvidence base comprises heterogeneous study types: laboratory algorithm accuracy evaluations across imaging modalities, controlled reader studies and simulations (small-to-moderate sample sizes), observational deployment reports from a few health systems, and qualitative/usability studies with clinicians; data are typically short-term, focus on diagnostic performance or workflow metrics, and rarely include patient-level outcomes, long-run workforce measures, or comprehensive cost data. Themeshuman_ai_collab productivity labor_markets skills_training governance adoption GeneralizabilityPredominance of lab/reader studies limits generalizability to routine clinical workflows, Few randomized or longitudinal real-world evaluations — limited evidence on patient outcomes or long-term economic effects, Evidence concentrated in high-resource health systems and specialty radiology settings — may not generalize to low-resource or non-hospital contexts, Heterogeneity in AI models, tasks, integration designs and regulatory environments reduces transferability across deployments, Small sample sizes and publication/selection bias toward positive demonstrations

Claims (23)

Claim	Direction	Confidence	Outcome	Details
AI in radiology has clear potential to improve diagnostic performance and workflow efficiency. Decision Quality	positive	medium	diagnostic accuracy (sensitivity/specificity), workflow efficiency (throughput, time-to-diagnosis, time-on-task)	0.07
Real clinical value depends critically on how AI tools interact with radiologists in practice (integration design and human-AI interaction). Decision Quality	mixed	medium	clinician-AI joint diagnostic performance, patient-relevant outcomes, workflow metrics	0.07
Human-AI collaboration can produce synergistic gains (diagnostic complementarity) when errors are uncorrelated and tasks are allocated to leverage comparative strengths. Decision Quality	positive	medium	combined diagnostic accuracy (aggregate sensitivity/specificity), reduction in missed diagnoses	0.07
Human-AI collaboration can also generate harms, including automation bias, deskilling, and workflow disruption. Error Rate	negative	medium	rates of over-reliance on AI, diagnostic error rates attributable to automation bias, measures of clinician skill over time, workflow error/throughput metrics	0.07
Many published studies focus on standalone algorithm accuracy rather than clinician–AI joint performance in routine workflows. Research Productivity	negative	high	proportion of studies reporting standalone algorithm metrics versus those reporting clinician+AI workflow outcomes	0.12
There are limited randomized controlled trials or longitudinal evaluations; few studies measure patient-relevant outcomes or economic impacts. Research Productivity	negative	high	number of RCTs/longitudinal studies, frequency of patient outcome and economic outcome reporting	0.12
Automation bias can increase undue reliance on AI, while algorithmic aversion can drive underuse of helpful tools. Adoption Rate	mixed	medium	rates of clinician acceptance/use of AI recommendations, error rates when following vs. overriding AI	0.07
There is a risk of deskilling, especially for trainees receiving reduced diagnostic practice when AI automates routine tasks. Skill Obsolescence	negative	low	trainee diagnostic performance over time, case exposure counts, measures of retained clinical skill	0.04
Integration points for AI across the imaging pathway include acquisition (image quality/protocol selection), triage (prioritization), interpretation/reporting (detection, quantification, report pre-population), and post-interpretation (teaching, QA, model improvement loops). Organizational Efficiency	positive	medium	site-level implementation metrics by workflow stage (e.g., reduced repeat scans, prioritized read times, report completion time)	0.07
Triage and automation can shorten time-to-diagnosis, increase throughput, and reduce time spent on repetitive tasks. Task Completion Time	positive	medium	time-to-diagnosis, studies-per-hour per radiologist, time spent on repetitive tasks	0.07
Tools that improve detection or quantification may reduce downstream costs from missed diagnoses or unnecessary follow-ups, improving cost-effectiveness in some scenarios. Consumer Welfare	positive	low	downstream healthcare utilization (additional tests, treatments), cost per diagnosis, cost-effectiveness ratios	0.04
Economic outcomes depend on complementarity versus substitution: AI that augments radiologists can raise output per worker; AI that substitutes tasks may reduce demand for certain diagnostic activities. Firm Productivity	mixed	medium	radiologist productivity metrics, employment levels/demand for diagnostic activities	0.07
Up-front implementation costs commonly include procurement, integration with PACS/EMR, UI/UX development, regulatory compliance, and staff training; recurring costs include monitoring, data labeling, software updates, and cybersecurity. Firm Revenue	negative	medium_high	implementation capital expenditures, annual operating expenditures	0.01
Hidden costs can arise from increased liability exposure, workflow redesign burden, and potential productivity loss during transition periods. Organizational Efficiency	negative	medium	measures of productivity during rollout, documented workflow redesign time/costs, liability incidents/concerns	0.07
Changes in workload composition can reduce routine burdens but may shift cognitive load to follow-up decisions and managing AI outputs. Worker Satisfaction	mixed	medium	time allocation across task types, subjective cognitive workload scores, frequency of follow-up decision tasks	0.07
The net effect of AI on clinician burnout is ambiguous: tools can remove tedious tasks but may introduce new cognitive, administrative, and liability stresses. Worker Satisfaction	mixed	medium	burnout survey scores, task satisfaction, administrative burden metrics	0.07
Responsible implementation requires legal/liability clarity, continuous monitoring for performance drift and distributional shifts, usable explanations, baseline AI literacy for clinicians, and co-design with frontline radiology teams. Governance And Regulation	positive	medium	successful deployment metrics, monitoring alerts for drift, clinician comprehension/usability scores	0.07
Emerging technologies such as vision-language models and adaptive learning loops may expand functionality but raise governance and safety challenges. Ai Safety And Ethics	mixed	low	model capability metrics (multimodal performance), incidence of safety/governance incidents	0.04
Explainability, trust, and demonstrated real-world effectiveness are key demand-side frictions; small-scale laboratory gains rarely translate into broad clinical uptake without workflow fit. Adoption Rate	negative	medium	adoption rates, clinician trust/acceptance measures, implementation success rates	0.07
Unclear liability frameworks increase perceived and real costs and can slow adoption by hospitals and insurers. Adoption Rate	negative	medium_high	time-to-adoption, procurement decisions citing liability concerns, insurance/coverage decisions	0.01
Fee-for-service payment structures may not reward efficiency gains from AI; value-based payment or shared-savings models are better aligned to incentivize adoption that reduces total cost and improves outcomes. Governance And Regulation	positive	medium_high	reimbursement levels, adoption under different payment models, cost savings realized	0.01
Research priorities include rigorous real-world trials assessing patient outcomes, cost-effectiveness, and labor impacts; comparative studies of integration strategies; measurement of long-run workforce effects; and development of standard metrics and monitoring frameworks. Research Productivity	positive	high	number and quality of real-world trials, existence of standardized monitoring frameworks, availability of long-term workforce impact studies	0.12
Overall, economic benefits from AI in radiology are plausible but conditional on human-AI interaction design, governance, workforce effects, and payment structures; net value is not determined by algorithmic accuracy alone. Firm Productivity	mixed	medium	net economic value/ROI, clinical outcomes, adoption and sustainability metrics	0.07