The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

AI systems detect gastrointestinal bleeding lesions with high accuracy in retrospective studies and can speed clinician review, but there is scant prospective evidence that these diagnostic gains translate into better patient outcomes or cost-effectiveness, leaving economic value uncertain.

How Do AI-Assisted Diagnostic Tools Impact Clinical Decision-Making And Patient Outcomes In Acute Gastrointestinal Bleeding Cases? : A Comprehensive Systematic Review
Gafrinda Kautsari, Luluk Aflakah, R. Nurlaili · Fetched March 15, 2026 · International journal of medical science and health research
semantic_scholar review_meta low evidence 7/10 relevance DOI Source
AI diagnostics for acute gastrointestinal bleeding show high retrospective accuracy and possible reading-time reductions, but evidence is insufficient that these gains improve patient outcomes or are cost-effective in practice.

Introduction: Acute gastrointestinal bleeding (GIB) is a life-threatening condition requiring prompt diagnosis and intervention. Despite advancements in endoscopy, challenges like diagnostic variability and resource allocation persist. Artificial intelligence (AI)-assisted tools have shown promise in improving detection rates and workflow efficiency in gastroenterology, but their impact on clinical decision-making and patient outcomes in acute GIB remains underexplored. Methods: This systematic review adhered to PRISMA 2020 guidelines. Eligible studies included those evaluating AI-based diagnostic tools in acute GIB cases, with outcomes such as diagnostic accuracy, clinical decision-making changes, or patient outcomes. Databases like PubMed, Sagepub, and Google Scholar were searched using Boolean MeSH keywords. Data extraction focused on study design, AI tool characteristics, diagnostic performance, and clinical impact. Results: Among 40 included studies, AI demonstrated high diagnostic accuracy, with sensitivities and specificities exceeding 90% in lesion detection. For instance, convolutional neural networks achieved 95.4% accuracy in identifying ulcers and hemorrhages. However, only one study reported a modest improvement in predicting endoscopic intervention needs (AUC: 0.68). While AI reduced reading time by 30% in some studies, its impact on patient outcomes (e.g., mortality, rebleeding) was rarely addressed. Most evidence came from retrospective studies or meta-analyses, with limited prospective or randomized controlled trials. Discussion: AI enhances diagnostic accuracy and workflow efficiency but lacks robust evidence linking it to improved patient outcomes in acute GIB. Key limitations include methodological heterogeneity, scarce safety data, and a focus on non-acute settings. Prospective studies are needed to evaluate AI's real-world clinical impact. Conclusion: AI shows potential as an adjunct tool in acute GIB management but requires further validation to confirm its clinical utility. Future research should prioritize patient-centered outcomes and standardized reporting.

Summary

Main Finding

AI-based diagnostic tools for acute gastrointestinal bleeding (GIB) substantially improve lesion detection and reading efficiency (sensitivities/specificities often >90%; example: CNNs achieving 95.4% accuracy) but there is little high-quality evidence that these diagnostic gains translate into improved patient-centered outcomes (mortality, rebleeding) or clear cost-effectiveness. Most evidence is retrospective; prospective trials and economic evaluations are lacking.

Key Points

  • Scope and evidence base

    • Systematic review of 40 studies following PRISMA 2020.
    • Eligible studies evaluated AI diagnostics in acute GIB with outcomes including diagnostic accuracy, changes in clinical decision-making, and patient outcomes.
    • Majority of studies were retrospective analyses or meta-analyses; few prospective studies or randomized controlled trials.
  • Diagnostic performance and workflow effects

    • High reported diagnostic accuracy: sensitivities and specificities frequently >90%.
    • Example: convolutional neural networks (CNNs) reported 95.4% accuracy for identifying ulcers and hemorrhages.
    • Some studies reported workflow gains (e.g., up to ~30% reduction in reading/interpretation time).
  • Limited impact on clinical decision-making and outcomes

    • Only one study showed a modest improvement in predicting need for endoscopic intervention (AUC = 0.68).
    • Few studies assessed hard patient outcomes (mortality, rebleeding, length of stay); therefore linkage from detection to outcomes is weak or unproven.
  • Methodological and safety gaps

    • Heterogeneity in study designs, populations, AI models, and outcome measures.
    • Sparse reporting of safety, false-positive harms (unnecessary procedures), and implementation challenges.
    • Predominant focus on non-acute or controlled datasets rather than real-world acute care workflows.

Data & Methods

  • Review methodology

    • PRISMA 2020–compliant systematic review.
    • Databases searched included PubMed, Sagepub, and Google Scholar using Boolean MeSH keywords relevant to AI and gastrointestinal bleeding.
    • Inclusion criteria: studies of AI diagnostic tools applied to acute GIB cases with outcomes on diagnostic accuracy, clinical decision change, or patient outcomes.
    • Data extraction fields: study design, AI tool characteristics (model type, training/validation details), diagnostic performance metrics (sensitivity, specificity, AUC, accuracy), workflow measures (reading time), and any reported clinical impacts or harms.
  • Aggregate findings

    • 40 studies included.
    • Diagnostic metrics commonly reported; performance often strong in retrospective image/record datasets.
    • Very limited prospective, randomized, or health-economic data.

Implications for AI Economics

  • Potential economic benefits

    • Efficiency gains: reported reductions in reading/interpretation time (~30%) could lower clinician labor costs or increase throughput in endoscopy/radiology workflows.
    • Diagnostic improvements: higher lesion detection could reduce missed diagnoses, potentially avoiding downstream costs from complications if linked to better outcomes.
    • Resource allocation: AI triage tools could prioritize endoscopy resources (urgent vs non-urgent), improving utilization.
  • Sources of economic uncertainty and risks

    • Uncertain effect on patient-centered outcomes (mortality, rebleeding, LOS) makes cost-effectiveness indeterminate.
    • False positives: increased detection sensitivity could raise unnecessary endoscopies/interventions, adding costs and harms.
    • Implementation costs: software acquisition, integration with electronic health records, hardware, training, validation in local populations, maintenance, and regulatory/compliance costs.
    • Generalizability: performance mostly from retrospective/curated datasets may not hold in varied real-world acute-care settings; poor generalizability can reduce expected economic benefits.
  • Recommended economic evaluations and data needs

    • Conduct prospective studies and randomized trials that collect clinical outcomes and resource-use data (procedures, length of stay, readmissions, adverse events).
    • Perform decision-analytic models (e.g., Markov or microsimulation) to estimate incremental cost-effectiveness (ICER, cost per QALY) under plausible effect sizes and implementation costs.
    • Model inputs to collect/estimate:
      • Diagnostic sensitivity/specificity in real-world acute settings
      • Effect of improved detection on rates of interventions, complications, rebleeding, and mortality
      • Unit costs: endoscopy, hospitalization per day, clinician time, AI deployment and maintenance
      • Downstream costs of false positives (unnecessary procedures, complications)
      • Uptake and workflow adoption rates
    • Include scenario and threshold analyses to identify when AI becomes cost-effective (e.g., minimum improvement in adverse-event reduction required given implementation cost).
  • Policy and adoption considerations

    • Prioritize trials that measure both clinical outcomes and resource use to enable full economic evaluation.
    • Use standardized reporting (CONSORT-AI, TRIPOD-AI) and collect safety/adverse event data to inform payer coverage and regulatory decisions.
    • Health systems should pilot deployments with embedded economic evaluation, monitoring throughput, clinician time, patient outcomes, and downstream costs before scaling.
    • Reimbursement and liability frameworks should account for changes in clinical decision-making and allocation of responsibilities.

Bottom line: AI for acute GIB shows promising diagnostic and efficiency gains that could be economically valuable, but lack of prospective outcome and resource-use data prevents robust cost-effectiveness conclusions. Future work should combine randomized/prospective clinical studies with formal health-economic modeling and standardized outcome reporting.

Assessment

Paper Typereview_meta Evidence Strengthlow — Strong diagnostic accuracy is reported, but almost entirely from retrospective/curated datasets; there is little prospective or randomized evidence linking improved detection to patient-centered outcomes or costs, so causal/economic claims are unproven. Methods Rigormedium — The review follows PRISMA-2020 with systematic search and standardized data extraction, but primary studies are heterogeneous, mostly retrospective, often single-center or curated, and few report clinical outcomes or economic data, limiting inferential strength. SampleSystematic review of 40 studies of AI diagnostic tools for acute gastrointestinal bleeding, dominated by retrospective analyses of image and record datasets (including CNN-based image classifiers), reporting diagnostic metrics (sensitivity, specificity, AUC, accuracy) and some workflow measures (reading time); only a few prospective studies and virtually no randomized trials or full health-economic evaluations. Themesproductivity adoption GeneralizabilityPredominance of retrospective, curated image datasets limits applicability to real-world acute-care workflows, Heterogeneous patient populations, settings, and model types (single-center studies common), Limited or absent prospective validation and external validation on diverse populations, Rapid evolution of AI models means older studies may not reflect current performance, Sparse reporting of implementation context (hardware, integration, clinician mix) reduces transferability

Claims (12)

ClaimDirectionConfidenceOutcomeDetails
This systematic review adhered to PRISMA 2020 guidelines. Research Productivity null_result high methodological reporting standard (PRISMA adherence)
n=40
0.12
Among 40 included studies, AI demonstrated high diagnostic accuracy, with sensitivities and specificities exceeding 90% in lesion detection. Output Quality positive medium diagnostic performance (sensitivity and specificity for lesion detection)
n=40
sensitivities and specificities > 90%
0.07
Convolutional neural networks achieved 95.4% accuracy in identifying ulcers and hemorrhages. Output Quality positive high accuracy of CNN in identifying ulcers and hemorrhages
accuracy = 95.4%
0.12
Only one study reported a modest improvement in predicting endoscopic intervention needs (AUC: 0.68). Decision Quality mixed high prediction of need for endoscopic intervention (AUC)
AUC = 0.68
0.12
AI reduced reading time by 30% in some studies. Task Completion Time positive medium reading time / workflow efficiency
reading time reduced by 30%
0.07
The impact of AI on patient outcomes (e.g., mortality, rebleeding) was rarely addressed. Consumer Welfare null_result high patient outcomes (mortality, rebleeding)
0.12
Most evidence came from retrospective studies or meta-analyses, with limited prospective or randomized controlled trials. Research Productivity null_result high study design distribution (retrospective vs prospective/RCT)
n=40
majority retrospective or meta-analyses; few prospective/RCTs
0.12
AI enhances diagnostic accuracy and workflow efficiency but lacks robust evidence linking it to improved patient outcomes in acute GIB. Output Quality mixed medium diagnostic accuracy, workflow efficiency, and patient outcomes
n=40
improves diagnostic accuracy and workflow efficiency; limited evidence on patient outcomes
0.07
Key limitations in the literature include methodological heterogeneity, scarce safety data, and a focus on non-acute settings. Research Productivity null_result medium quality and applicability of evidence (heterogeneity, safety reporting, setting)
n=40
key limitations: methodological heterogeneity, scarce safety data, non-acute focus
0.07
AI-assisted tools have shown promise in improving detection rates and workflow efficiency in gastroenterology. Output Quality positive medium detection rates and workflow efficiency
0.07
Prospective studies are needed to evaluate AI's real-world clinical impact in acute GIB. Research Productivity null_result speculative need for prospective evaluation of clinical impact (recommendation)
recommendation: need prospective studies to evaluate real-world clinical impact
0.01
AI shows potential as an adjunct tool in acute GIB management but requires further validation to confirm its clinical utility. Consumer Welfare mixed medium overall clinical utility in acute GIB management
AI has potential as adjunct tool but requires further validation for clinical utility
0.07

Notes