← Papers

Short, introductory AI courses for clinicians improve confidence, attitudes and basic skills but leave clinical workflows and patient outcomes unchanged; most programs are brief, academic, and aimed at students or early‑career clinicians, limiting their likely productivity payoff.

Assessing the effectiveness of artificial intelligence education and training for healthcare workers: a systematic review

Leanna Woods, Kayley Lyons, Anton van der Vegt, Quita Olsen, Wenyao Huang, Johnson S. Khor, Nancy Xu, Clair Sullivan · March 10, 2026 · BMC Medical Education

openalex review_meta medium evidence 7/10 relevance DOI Source PDF

A systematic review of 27 evaluated AI training programs for healthcare workers finds consistent short‑term gains in learner satisfaction, attitudes, knowledge/skills, and self‑reported behavior (Kirkpatrick‑Barr levels 1–3) but no evidence of organizational or patient‑level impacts (level 4).

Artificial intelligence (AI) is increasingly integrated into healthcare, yet upskilling the health workforce remains a challenge. We addressed the research question: What evidence exists on the effectiveness of AI education and training programs in improving AI literacy among healthcare workers? Following PRISMA guidelines and PROSPERO registration, five databases (PubMed, Scopus, CINAHL, Embase, ERIC) were searched on 20 August 2024, focusing on studies with an intervention of AI training or education for the healthcare workforce, in any study design that reported an evaluation. 27 studies were included. Programs improved AI literacy outcomes mapped to levels 1–3 of the Kirkpatrick-Barr training evaluation hierarchy including improved learner reactions, shifts in attitudes and perceptions, enhanced knowledge and skills, and behavior changes. Programs did not map to level 4, where healthcare workers learn to metacognition levels, including organizational change and patient benefit. Programs were short in length (44%), delivered in academic settings (56%), to doctors (44%) or medical students (44%), at entry-to-practice level (56%). Most taught an introduction to AI (67%), with technical AI skills less frequent. These programs are a promising start but often lack sufficient depth to build advanced competencies. Improving AI literacy in healthcare will require appropriate course design, an evolving understanding of this rapidly changing area, and evaluating learning effectiveness. As the adoption of AI accelerates across healthcare, health systems may seek to standardise and assess the efficacy of these courses.

Summary

Main Finding

A systematic review of 27 evaluated AI education/training programs for the healthcare workforce found consistent short‑term improvements in AI literacy mapped to Kirkpatrick‑Barr levels 1–3 (learner reaction, attitudes/perceptions, knowledge/skills, behavior change). No programs demonstrated level‑4 outcomes (organizational change, patient outcomes, or metacognitive mastery). Most programs were introductory, short, and delivered in academic settings, suggesting early-stage progress but insufficient depth to build advanced clinical AI competencies.

Key Points

Scope and search: PRISMA‑guided review, PROSPERO‑registered; five databases searched (PubMed, Scopus, CINAHL, Embase, ERIC) on 20 Aug 2024. Inclusion: any evaluated AI training/education intervention for the healthcare workforce.
Studies included: 27 evaluated programs.
Outcomes: Improvements reported at Kirkpatrick‑Barr levels 1–3 (satisfaction/reaction; shifts in attitudes/perceptions; gains in knowledge/skills; behavior change). No evaluated program reported level‑4 outcomes (organizational impact, patient benefit, or sustained metacognitive learning).
Program characteristics:
- Short duration: 44% categorized as short courses.
- Setting: 56% delivered in academic settings.
- Participants: 44% targeted doctors, 44% targeted medical students (overlap possible).
- Career stage: 56% at entry‑to‑practice level.
- Content: 67% taught introductory AI; technical/advanced AI skills were less frequent.
Overall interpretation: Programs produce measurable early learning gains but frequently lack depth, duration, and design features to produce advanced competencies or measurable system/patient impacts.

Data & Methods

Review design: Systematic review following PRISMA guidelines; protocol registered in PROSPERO.
Databases searched: PubMed, Scopus, CINAHL, Embase, ERIC; search date 20 Aug 2024.
Eligibility: Any study design reporting an evaluation of an AI education or training intervention aimed at the healthcare workforce.
Included studies: 27; extracted data on program length, setting, target audience, content focus, and reported evaluation outcomes mapped to the Kirkpatrick‑Barr training evaluation hierarchy (levels 1–4).
Outcome mapping: Evaluations typically used learner surveys, knowledge/skill tests, or self‑reported behavior change measures to classify outcomes into Kirkpatrick‑Barr levels 1–3; no studies provided evidence of organizational change or patient‑level benefits (level 4).

Implications for AI Economics

Human capital and productivity
- Short, introductory programs likely yield modest increases in individual AI literacy but may not generate the deeper competencies required to materially change clinical task allocation or workflow productivity.
- For economic models of AI adoption, current training investments may produce limited short‑run returns; deeper, longer, and practice‑embedded training could be necessary to unlock larger productivity gains.
Labor market effects
- Predominant focus on entry‑level trainees (students/early practitioners) may increase future workforce supply of basic AI literacy but leaves current mid‑career clinicians undertrained—potentially slowing adoption or creating heterogeneous skill premiums.
- Limited advanced technical training reduces the near‑term supply of clinician‑AI integrators (roles that bridge clinical and technical domains), which could keep wages/premia for such specialists high.
Policy, accreditation, and standardization
- The absence of level‑4 evidence (organizational/patient outcomes) hampers cost‑benefit and return‑on‑investment analyses needed by health systems and payers to justify larger upskilling expenditures.
- Standardized curricula, competency frameworks, and validated assessment tools would improve comparability and enable economic evaluation across programs.
Market and scaling considerations
- Growing demand for AI education creates market opportunities (commercial courses, continuing medical education, institutional programs), but heterogeneous quality and outcome measurement limit purchasers’ ability to identify high‑value offerings.
- Economies of scale (online, modular, credentialed programs) could lower per‑learner costs, but evidence is required that scaled programs achieve higher‑order outcomes.
Research and evaluation priorities for economic analysis
- Conduct cost‑effectiveness and cost‑benefit studies linking training to measurable organizational outcomes (workflow changes, error rates, throughput, patient outcomes).
- Evaluate training targeted at mid‑career clinicians and interdisciplinary teams to estimate marginal returns relative to student/entry‑level programs.
- Model long‑term impacts on labor demand, task reallocation, wage effects, and healthcare spending under alternative upskilling investments.
Practical recommendations for stakeholders
- Fund and prioritize longer, practice‑embedded programs that teach beyond introductory concepts (technical literacy, model interpretation, human‑AI teaming), and include robust outcome measurement tied to organizational and patient metrics.
- Develop standardized competency frameworks and assessments to enable credentialing and rigorous economic evaluation.
- Health systems should pilot and evaluate scaled upskilling programs with embedded economic evaluation to guide investment decisions.

If you want, I can draft a short outline for an economic evaluation protocol that links AI training inputs to organizational and patient outcomes (costs, productivity, and QALYs) to illustrate how to measure level‑4 impacts.

Assessment

Paper Typereview_meta Evidence Strengthmedium — The review finds consistent short‑term improvements across 27 evaluated programs, but the underlying evidence is largely limited to short courses, self‑report or knowledge tests, heterogeneous study designs, and short follow‑up; no studies demonstrate organizational or patient‑level (Kirkpatrick‑Barr level 4) effects, reducing confidence in real‑world impact. Methods Rigorhigh — The review follows PRISMA guidelines, was PROSPERO‑registered, and searched five major databases with prespecified inclusion criteria and outcome mapping to the Kirkpatrick‑Barr hierarchy, though it appears the included primary studies are heterogeneous and mostly low intensity and the review does not report pooled causal estimates or strong risk‑of‑bias synthesis for level‑4 outcomes. SampleSystematic review of 27 evaluated AI education/training programs for the healthcare workforce (search through 20 Aug 2024 across PubMed, Scopus, CINAHL, Embase, ERIC); programs were mostly short courses (44%), majority delivered in academic settings (56%), participants primarily doctors and medical students (each 44%, with overlap), 56% targeted entry‑to‑practice career stage, 67% taught introductory AI; outcome data consisted of learner surveys, knowledge/skill tests, and self‑reported behavior change mapped to Kirkpatrick‑Barr levels 1–3. Themesskills_training human_ai_collab GeneralizabilityLimited to healthcare workforce — findings may not generalize to other sectors, Majority of programs short and academic—limits applicability to workplace/continuing professional development contexts, Participants skewed to students and early‑career clinicians — less applicable to mid‑career or senior clinicians, Predominantly introductory content — results do not reflect effects of advanced/technical training, Short follow‑up and reliance on self‑reported measures — uncertain persistence of gains, Geographic and institutional diversity not specified — potential regional/contextual limits

Claims (13)

Claim	Direction	Confidence	Outcome	Details
A systematic review of 27 evaluated AI education/training programs for the healthcare workforce was conducted following PRISMA guidance and a PROSPERO-registered protocol. Training Effectiveness	positive	high	number of evaluated AI education/training programs included in the review	n=27 0.24
Included studies (n=27) reported improvements in learner outcomes mapped to Kirkpatrick‑Barr levels 1–3 (learner reaction/satisfaction; attitudes/perceptions; knowledge/skills; behavior change). Skill Acquisition	positive	medium	Kirkpatrick‑Barr levels 1–3 (satisfaction/reaction, attitudes/perceptions, knowledge/skills, behavior change)	n=27 0.14
No evaluated program reported Kirkpatrick‑Barr level‑4 outcomes (organizational change, patient outcomes, or sustained metacognitive mastery). Training Effectiveness	negative	high	Kirkpatrick‑Barr level‑4 outcomes (organizational impact, patient outcomes, metacognitive mastery)	n=27 0.24
Most programs were introductory in content: 67% of included programs taught introductory AI concepts rather than advanced/technical AI skills. Training Effectiveness	mixed	high	program content focus (introductory vs advanced/technical AI skills)	n=27 67% 0.24
A plurality of programs were short in duration: 44% of programs were categorized as short courses. Training Effectiveness	mixed	high	program duration (short vs longer formats)	n=27 44% 0.24
Most programs were delivered in academic settings: 56% of evaluated programs reported an academic setting. Training Effectiveness	mixed	high	program delivery setting (academic vs non-academic)	n=27 56% 0.24
Participant targeting: 44% of programs targeted doctors and 44% targeted medical students (with possible overlap), and 56% targeted entry‑to‑practice career stages. Training Effectiveness	mixed	high	target audience (doctors, medical students) and career stage distribution (entry-to-practice)	n=27 44% doctors; 44% medical students; 56% entry-to-practice 0.24
Evaluations reporting outcomes predominantly relied on learner surveys, knowledge/skill tests, or self‑reported behavior change measures. Training Effectiveness	null_result	high	evaluation methods (surveys, tests, self-report behavior change)	n=27 0.24
Because most programs were short, introductory, and assessed only short‑term learner outcomes, they likely produce modest increases in individual AI literacy but are insufficient to build advanced clinical AI competencies that would change clinical task allocation or productivity. Skill Acquisition	negative	medium	individual AI literacy gains and capacity to generate advanced clinical AI competencies/productivity changes	n=27 0.14
The predominant focus on entry‑level trainees suggests future workforce increases in basic AI literacy but leaves current mid‑career clinicians undertrained, potentially slowing adoption and creating heterogeneous skill premiums. Skill Acquisition	mixed	medium	future workforce AI literacy distribution and potential labor market effects (adoption speed, skill premiums)	n=27 0.14
The absence of level‑4 evidence (organizational/patient outcomes) limits the ability of health systems and payers to conduct cost‑benefit or return‑on‑investment analyses for upskilling investments in AI. Training Effectiveness	negative	medium	availability of evidence linking training to organizational/patient outcomes for economic evaluation	n=27 0.14
Heterogeneous program design and outcome measurement limit purchasers' ability to identify high‑value AI education offerings, creating a market opportunity but also risk. Market Structure	mixed	medium	heterogeneity of program design and outcome measurement; market implications for purchasers	n=27 0.14
Recommended priorities include funding longer, practice‑embedded programs, developing standardized competency frameworks and validated assessments, and conducting studies that link training to organizational and patient outcomes (to enable level‑4 evidence and economic evaluation). Training Effectiveness	positive	speculative	program design improvements and the generation of level‑4 (organizational/patient) outcome evidence	0.02