← Papers

AI systems detect gastrointestinal bleeding lesions with high accuracy in retrospective studies and can speed clinician review, but there is scant prospective evidence that these diagnostic gains translate into better patient outcomes or cost-effectiveness, leaving economic value uncertain.

How Do AI-Assisted Diagnostic Tools Impact Clinical Decision-Making And Patient Outcomes In Acute Gastrointestinal Bleeding Cases? : A Comprehensive Systematic Review

Gafrinda Kautsari, Luluk Aflakah, R. Nurlaili · Fetched March 15, 2026 · International journal of medical science and health research

semantic_scholar review_meta low evidence 7/10 relevance DOI Source PDF

AI diagnostics for acute gastrointestinal bleeding show high retrospective accuracy and possible reading-time reductions, but evidence is insufficient that these gains improve patient outcomes or are cost-effective in practice.

Introduction: Acute gastrointestinal bleeding (GIB) is a life-threatening condition requiring prompt diagnosis and intervention. Despite advancements in endoscopy, challenges like diagnostic variability and resource allocation persist. Artificial intelligence (AI)-assisted tools have shown promise in improving detection rates and workflow efficiency in gastroenterology, but their impact on clinical decision-making and patient outcomes in acute GIB remains underexplored. Methods: This systematic review adhered to PRISMA 2020 guidelines. Eligible studies included those evaluating AI-based diagnostic tools in acute GIB cases, with outcomes such as diagnostic accuracy, clinical decision-making changes, or patient outcomes. Databases like PubMed, Sagepub, and Google Scholar were searched using Boolean MeSH keywords. Data extraction focused on study design, AI tool characteristics, diagnostic performance, and clinical impact. Results: Among 40 included studies, AI demonstrated high diagnostic accuracy, with sensitivities and specificities exceeding 90% in lesion detection. For instance, convolutional neural networks achieved 95.4% accuracy in identifying ulcers and hemorrhages. However, only one study reported a modest improvement in predicting endoscopic intervention needs (AUC: 0.68). While AI reduced reading time by 30% in some studies, its impact on patient outcomes (e.g., mortality, rebleeding) was rarely addressed. Most evidence came from retrospective studies or meta-analyses, with limited prospective or randomized controlled trials. Discussion: AI enhances diagnostic accuracy and workflow efficiency but lacks robust evidence linking it to improved patient outcomes in acute GIB. Key limitations include methodological heterogeneity, scarce safety data, and a focus on non-acute settings. Prospective studies are needed to evaluate AI's real-world clinical impact. Conclusion: AI shows potential as an adjunct tool in acute GIB management but requires further validation to confirm its clinical utility. Future research should prioritize patient-centered outcomes and standardized reporting.

Summary

Main Finding

AI-assisted diagnostic tools for acute gastrointestinal bleeding (GIB) substantially improve image/lesion detection accuracy and reading efficiency in endoscopic settings (reported sensitivities and specificities frequently >90%; e.g., CNNs with ~95.4% accuracy for ulcers/hemorrhages), but there is little robust evidence that these diagnostic gains translate into improved patient-centered outcomes (mortality, rebleeding, length of stay). Most evidence is retrospective or diagnostic accuracy studies; prospective randomized evidence in acute GIB is scarce and economic/clinical outcome data are lacking.

Key Points

Scope and evidence base
- Systematic review included 40 studies (search across PubMed, Sagepub, Semantic Scholar, Google Scholar).
- Predominantly retrospective diagnostic studies and meta-analyses; only five RCTs were identified overall, none focused on acute bleeding.
Diagnostic performance and workflow
- High reported diagnostic performance: sensitivities/specificities often >90%.
- Example: convolutional neural networks reported ~95.4% accuracy for detecting ulcers/hemorrhages in capsule endoscopy.
- Some studies reported reading-time reductions (~30%) with AI assistance.
Clinical decision-making and outcomes
- Very limited evidence that AI alters high-level clinical decisions in acute GIB. Only one study reported modest improvement in predicting need for endoscopic intervention (AUC = 0.68).
- Patient-centered outcomes (mortality, rebleeding, length of stay) were rarely addressed.
Limitations and risks
- Methodological heterogeneity (inputs: still images vs. video; tasks: detection vs. risk stratification) complicates synthesis.
- Sparse safety data and limited reporting on false-positive rates and downstream consequences (e.g., unnecessary interventions).
- Most tools studied as adjuncts, not as primary decision drivers—real-world clinical impact remains uncertain.
Recommendations from authors
- Need for prospective RCTs and implementation studies focusing on patient-centered outcomes and standardized reporting.

Data & Methods

Review protocol
- Followed PRISMA 2020 guidelines.
- Used explicit inclusion criteria: adult acute GIB, AI diagnostic tools with clinical implementation, comparison with standard assessment, report of diagnostic/decision/patient outcomes, study designs including RCTs and cohorts (retrospective studies with >10 patients included).
Search and screening (numbers reported)
- Initial hits: PubMed 170; Sagepub 5,054; Semantic Scholar 250; Google Scholar 42.
- Duplicates removed: 126; records excluded by automation: 972.
- Records screened: 4,418; records excluded at screening: 842.
- Reports sought/retrieved/assessed: large attrition (reports not retrieved n = 1,770); assessed for eligibility: 1,806; excluded for wrong design: 1,766; studies included: 40.
Data extraction focus
- Extracted study design, AI model details (type, inputs, task), patient population, performance metrics (sensitivity, specificity, accuracy, PPV, NPV), comparisons vs. human readers, and any clinical decision or outcome impacts.
Quality appraisal
- Used JBI critical appraisal elements to assess bias across temporal precedence, selection/allocation, confounding, measurement reliability, follow-up, and statistical validity.
Typical outcomes reported in included studies
- Diagnostic accuracy metrics (sens/spec/accuracy), occasional workflow measures (reading time), rare predictive AUCs for intervention need, almost no consistent reporting of patient outcomes or health-economic endpoints.

Implications for AI Economics

Potential economic benefits
- Efficiency gains: reported reading-time reductions (~30%) suggest potential clinician time savings and increased throughput per endoscopist—translates into capacity gains and potential cost savings per case.
- Triage/value capture: improved detection could enable better triage (prioritizing urgent endoscopy), potentially reducing delays and optimizing resource allocation (staffing, endoscopy slots).
- Market and product implications: demand for integrated AI software for endoscopy devices, image-management systems, and peripheral analytics services; opportunities for vendors to bundle AI with cloud/IT services and maintenance contracts.
Key uncertainties and economic risks
- Lack of demonstrated impact on hard patient outcomes weakens the case for payer reimbursement and hospital capital investment—uncertain cost-effectiveness.
- False positives and diagnostic heterogeneity risk downstream costs (unnecessary interventions, longer stays, liability), reducing net economic benefit.
- Implementation costs: integration with hospital IT/EHR/endoscopy systems, clinician training, validation, regulatory compliance, ongoing maintenance, and data governance—these can be substantial and vary by health system.
- Adoption barriers: clinician acceptance, workflow changes, interoperability, and liability/regulatory clarity will affect uptake speed and ROI.
Labor and workforce effects
- Potential productivity gains for specialists and redistribution of tasks (e.g., fewer routine reads by senior endoscopists), with possible deskilling risks or reallocation of labor toward oversight/implementation roles.
Policy, reimbursement, and investment signals
- Payers and hospitals are unlikely to fund widespread deployment without prospective evidence on patient outcomes and formal cost-effectiveness analyses (cost per QALY, cost per avoided rebleed/admission).
- Regulators and HTA bodies will likely ask for prospective clinical and safety data; value-based reimbursement models may require outcome-linked pricing or conditional coverage.
Recommendations for economic research and decision-makers
- Prioritize prospective clinical trials and implementation studies that include formal health-economic evaluations: capture costs (technology, integration, training), time savings, downstream resource use (procedures, LOS, ICU), and effects on mortality/rebleeding/QALYs.
- Standardize outcome sets for economic evaluations in acute GIB to enable cross-study cost-effectiveness comparisons.
- Conduct scenario and sensitivity analyses that model false-positive rates, variable implementation costs, and heterogeneous health-system capacity.
- Consider pilot rollouts with embedded health-economics endpoints and staged reimbursement tied to demonstrable outcome/improvement thresholds.
Short guidance for investors and hospital leaders
- Investors: prioritize AI companies that can demonstrate prospective, clinically meaningful improvements and have implementation pathways (regulatory clearance, interoperability).
- Hospital leaders: evaluate AI adoption via pilot studies with local ROI models; require outcome measurement, integration cost estimates, and clinician workflow impact before scaling.

Summary takeaway: AI diagnostic tools for acute GIB show promising technical performance and workflow efficiencies, but from an economics and policy perspective the evidence is presently insufficient to justify widescale investment or reimbursement without prospective trials that tie diagnostic gains to patient outcomes and formal cost-effectiveness.

Assessment

Paper Typereview_meta Evidence Strengthlow — Strong diagnostic accuracy is reported, but almost entirely from retrospective/curated datasets; there is little prospective or randomized evidence linking improved detection to patient-centered outcomes or costs, so causal/economic claims are unproven. Methods Rigormedium — The review follows PRISMA-2020 with systematic search and standardized data extraction, but primary studies are heterogeneous, mostly retrospective, often single-center or curated, and few report clinical outcomes or economic data, limiting inferential strength. SampleSystematic review of 40 studies of AI diagnostic tools for acute gastrointestinal bleeding, dominated by retrospective analyses of image and record datasets (including CNN-based image classifiers), reporting diagnostic metrics (sensitivity, specificity, AUC, accuracy) and some workflow measures (reading time); only a few prospective studies and virtually no randomized trials or full health-economic evaluations. Themesproductivity adoption GeneralizabilityPredominance of retrospective, curated image datasets limits applicability to real-world acute-care workflows, Heterogeneous patient populations, settings, and model types (single-center studies common), Limited or absent prospective validation and external validation on diverse populations, Rapid evolution of AI models means older studies may not reflect current performance, Sparse reporting of implementation context (hardware, integration, clinician mix) reduces transferability

Claims (12)

Claim	Direction	Confidence	Outcome	Details
This systematic review adhered to PRISMA 2020 guidelines. Research Productivity	null_result	high	methodological reporting standard (PRISMA adherence)	n=40 0.12
Among 40 included studies, AI demonstrated high diagnostic accuracy, with sensitivities and specificities exceeding 90% in lesion detection. Output Quality	positive	medium	diagnostic performance (sensitivity and specificity for lesion detection)	n=40 sensitivities and specificities > 90% 0.07
Convolutional neural networks achieved 95.4% accuracy in identifying ulcers and hemorrhages. Output Quality	positive	high	accuracy of CNN in identifying ulcers and hemorrhages	accuracy = 95.4% 0.12
Only one study reported a modest improvement in predicting endoscopic intervention needs (AUC: 0.68). Decision Quality	mixed	high	prediction of need for endoscopic intervention (AUC)	AUC = 0.68 0.12
AI reduced reading time by 30% in some studies. Task Completion Time	positive	medium	reading time / workflow efficiency	reading time reduced by 30% 0.07
The impact of AI on patient outcomes (e.g., mortality, rebleeding) was rarely addressed. Consumer Welfare	null_result	high	patient outcomes (mortality, rebleeding)	0.12
Most evidence came from retrospective studies or meta-analyses, with limited prospective or randomized controlled trials. Research Productivity	null_result	high	study design distribution (retrospective vs prospective/RCT)	n=40 majority retrospective or meta-analyses; few prospective/RCTs 0.12
AI enhances diagnostic accuracy and workflow efficiency but lacks robust evidence linking it to improved patient outcomes in acute GIB. Output Quality	mixed	medium	diagnostic accuracy, workflow efficiency, and patient outcomes	n=40 improves diagnostic accuracy and workflow efficiency; limited evidence on patient outcomes 0.07
Key limitations in the literature include methodological heterogeneity, scarce safety data, and a focus on non-acute settings. Research Productivity	null_result	medium	quality and applicability of evidence (heterogeneity, safety reporting, setting)	n=40 key limitations: methodological heterogeneity, scarce safety data, non-acute focus 0.07
AI-assisted tools have shown promise in improving detection rates and workflow efficiency in gastroenterology. Output Quality	positive	medium	detection rates and workflow efficiency	0.07
Prospective studies are needed to evaluate AI's real-world clinical impact in acute GIB. Research Productivity	null_result	speculative	need for prospective evaluation of clinical impact (recommendation)	recommendation: need prospective studies to evaluate real-world clinical impact 0.01
AI shows potential as an adjunct tool in acute GIB management but requires further validation to confirm its clinical utility. Consumer Welfare	mixed	medium	overall clinical utility in acute GIB management	AI has potential as adjunct tool but requires further validation for clinical utility 0.07