The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

AI projects accounted for 15.9% of NIH funding in 2025 and attracted a 13.4% funding premium, but most work remains in R&D with only 14.7% clinically deployed and scant focus on health disparities (5.7%).

An Analysis of Artificial Intelligence Adoption in NIH-Funded Research
Navapat Nananukul, Mayank Kejriwal · April 08, 2026
arxiv descriptive medium evidence 7/10 relevance Source PDF
In the 2025 NIH portfolio, 15.9% of projects were classified as AI-related and received a 13.4% funding premium, yet 79% remained in research/development, only 14.7% engaged in clinical deployment, and just 5.7% addressed health disparities.

Understanding the landscape of artificial intelligence (AI) and machine learning (ML) adoption across the National Institutes of Health (NIH) portfolio is critical for research funding strategy, institutional planning, and health policy. The advent of large language models (LLMs) has fundamentally transformed research landscape analysis, enabling researchers to perform large-scale semantic extraction from thousands of unstructured research documents. In this paper, we illustrate a human-in-the-loop research methodology for LLMs to automatically classify and summarize research descriptions at scale. Using our methodology, we present a comprehensive analysis of 58,746 NIH-funded biomedical research projects from 2025. We show that: (1) AI constitutes 15.9% of the NIH portfolio with a 13.4% funding premium, concentrated in discovery, prediction, and data integration across disease domains; (2) a critical research-to-deployment gap exists, with 79% of AI projects remaining in research/development stages while only 14.7% engage in clinical deployment or implementation; and (3) health disparities research is severely underrepresented at just 5.7% of AI-funded work despite its importance to NIH's equity mission. These findings establish a framework for evidence-based policy interventions to align the NIH AI portfolio with health equity goals and strategic research priorities.

Summary

Main Finding

Using a human-in-the-loop LLM pipeline to analyze 58,746 NIH-funded projects (FY2025), the authors find that AI/ML is a substantive element in 15.9% of projects (9,363). AI projects receive a measurable funding premium (13.4% higher funding than non-AI projects), but AI work is heavily concentrated in a few domains (cancer, aging, mental health account for 50.1% of AI funding). A large research-to-deployment gap exists: 79.0% of AI projects remain at research/development stages while only 14.7% engage in clinical deployment/implementation. Health disparities research is severely underrepresented, comprising only 5.7% of AI-funded work. University collaboration in NIH AI projects is concentrated and unequal (79 universities, 191 edges; largest connected component 48 nodes, 158 edges) with modular but bridge-mediated structure.

Key Points

  • Portfolio prevalence and funding
    • 58,746 NIH projects analyzed (FY2025 snapshot).
    • 9,363 projects labeled AI-relevant → 15.9% of the portfolio.
    • AI projects receive a 13.4% funding premium vs non-AI projects.
  • Concentration and equity
    • Cancer, aging, and mental health together receive 50.1% of AI funding.
    • Health disparities / minority & rural health: only 5.7% of AI funding.
  • Translation gap
    • 79.0% of AI projects focused on research/development.
    • 14.7% involve clinical implementation/deployment.
    • Translation gaps are especially acute in health disparities research.
  • Collaboration structure
    • University collaboration network: 79 universities, 191 weighted edges.
    • Largest connected component: 48 nodes, 158 edges; six communities detected via Louvain modularity.
    • Network shows heavy-tailed connectivity, concentrated betweenness (few hubs), modular communities bridged by key institutions.
  • Methodological innovation
    • Two-step LLM pipeline (GPT-4o-mini): conservative zero-shot screening, then fixed-schema rubric coding for AI-positive projects.
    • Human-in-the-loop for prompt/rubric refinement, recoding rules, and validation.
    • Outputs are analysis-ready JSON/CSV with reproducible Python scripts.

Data & Methods

  • Data source
    • NIH RePORTER project records; FY2025 snapshot; deduplicated to 58,746 projects.
    • Fields used: title, abstract, project terms, funding amount, administering institute/center, funding mechanism, organization name/type.
  • Two-step LLM pipeline
    • Step 1 — AI screening: zero-shot GPT-4o-mini classification on title + abstract + terms; conservative threshold to flag substantive AI/ML work (vs generic computational/statistical work). Result: 9,363 AI-positive projects.
    • Step 2 — Rubric coding: AI-positive projects processed with a controlled JSON rubric returning structured fields (e.g., what_ai_used_for, ai_contribution, ai_role, primary_focus_areas, application_domain, type_of_aiml, data_type, theme_note).
    • Prompt constraints: exact keys, closed vocabularies, JSON-only output; free-text only when domain = Other.
  • Human-in-the-loop process
    • Investigators defined research questions, validated outputs, refined recoding logic, and reran analyses iteratively.
    • Implementation packaged as reproducible Python scripts (Python 3.10+); inputs/outputs in JSONL/CSV; two modes for rerun (full LLM passes or analysis-only).
  • Network analysis
    • Constructed undirected, weighted university collaboration network where edge weight = number of co-occurrences in projects.
    • Metrics: weighted degree (collaboration volume), betweenness centrality, clustering coefficient, assortativity, modularity, robustness under hub removal.
    • Community detection via Louvain modularity optimization (reported on largest connected component).
  • Reproducibility & scope
    • Code/data pipeline organized into staged scripts with requirements.txt; 249 output files (tables and figures).
    • Snapshot is cross-sectional (FY2025) and dependent on project abstracts/metadata quality and LLM classification.

Implications for AI Economics

  • Public funding allocation and social returns
    • 13.4% funding premium on AI projects implies higher perceived value or cost; it could reflect higher expected social/private returns or greater competition for AI-relevant work. Policymakers should assess whether this premium aligns with marginal social benefits across disease areas.
    • Heavy concentration in a few domains suggests potential over-allocation to areas with existing infrastructure and private/commercial interest, risking underinvestment where social returns (e.g., equity gains) might be larger but less monetizable.
  • Market failure & equity rationale for targeted intervention
    • The low share of health-disparities-focused AI work (5.7%) indicates a likely market failure: private incentives and data/infrastructure constraints lead to under-provision of socially valuable AI research. This justifies targeted public interventions (earmarked grants, infrastructure subsidies) to correct distributional gaps.
  • Translation and diffusion economics
    • Large research-to-deployment gap (79% R&D vs 14.7% deployment) points to underfunding or misaligned incentives for implementation science and commercialization/operationalization. From an economic perspective, this reduces realized social returns from AI investments and slows diffusion of beneficial technologies.
    • Policies that reduce frictions for deployment (implementation grants, regulatory support, reimbursement pathways) can increase realized welfare from existing R&D.
  • Institutional capacity and network effects
    • Collaboration inequality (few hubs, modular communities) can create network bottlenecks limiting knowledge diffusion to smaller institutions and underserved regions. Strengthening collaboration capacity and bridging institutions can accelerate technology spillovers and broaden benefits.
  • Cost-effective monitoring and policy feedback
    • The LLM-enabled, reproducible portfolio-monitoring pipeline demonstrates a low-cost method for ongoing evaluation of public R&D portfolios. Regular, automated monitoring can enable responsive reallocation and targeted program design based on near-real-time evidence.
  • Policy levers suggested (economic framing)
    • Rebalance portfolio incentives: create funding streams that prioritize underrepresented domains with high social returns (health disparities, rural health, emerging infectious threats).
    • Increase translational funding: dedicated implementation/scale-up grants, milestone-contingent funding, or blended public–private mechanisms to share deployment risk.
    • Build infrastructure: invest in data generation/curation, interoperable platforms, and community partnerships to reduce fixed costs barriers for equity-oriented AI.
    • Incentivize diverse collaborations: seed collaboration grants or network-building funds to reduce centralization and encourage knowledge spillovers.
    • Workforce and capacity-building: fund training programs that combine AI methods with implementation science and equity-focused practice.
  • Measurement caveats relevant to economic interpretation
    • Cross-sectional snapshot (FY2025) — dynamics over time (entry/exit, maturation) are not captured here.
    • Classification depends on LLM decisions (GPT-4o-mini) and the conservative screening rubric; measurement error or systematic bias in labeling could affect prevalence and funding-premium estimates.
    • Funding premium is an average comparison and may be confounded by project scope, multi-year awards, or differing cost structures across fields.

Overall, the paper provides actionable, portfolio-level evidence for economists and policy-makers: AI is a significant and relatively well-funded component of NIH research, but concentration, translation gaps, and underinvestment in equity-relevant areas point to market and institutional frictions that public policy can address to increase social returns and equitable impact from AI in biomedical research.

Assessment

Paper Typedescriptive Evidence Strengthmedium — Large, near-comprehensive sample of NIH project descriptions (58,746) supports descriptive claims about portfolio composition and funding patterns, but results depend on automated LLM classification and project-description metadata which can mislabel AI content or deployment status and do not establish causal relationships. Methods Rigormedium — The human-in-the-loop LLM approach enables scalable semantic extraction and the sample size is strong, but the summary provides no detail here on validation metrics (precision/recall), sampling/frame exclusions, handling of ambiguous descriptions, or robustness checks across LLM prompts/models—raising risk of misclassification and reproducibility concerns. Sample58,746 NIH-funded biomedical research project descriptions from the 2025 portfolio (unstructured project descriptions and metadata), analyzed with an LLM-based human-in-the-loop classifier to identify AI-related projects, deployment stage, funding amounts and thematic categories. Themesadoption innovation GeneralizabilityLimited to the NIH biomedical research portfolio in a single year (2025); findings may not generalize to other funders, countries, or other years., Relies on project descriptions and metadata which may not fully reflect actual methods, deployment status, or outcomes (description vs. practice gap)., Definitions of 'AI' and 'deployment' are sensitive to classifier rules and prompts; evolving terminology may change classification over time., Potential misclassification or bias from LLMs, and unknown validation/accuracy metrics limit confidence in fine-grained categories (e.g., health disparities coding)., Does not measure economic outcomes (productivity, wages, firm performance), so extrapolation to broader AI economic impacts is indirect., Funding premium may reflect selection or reporting differences rather than causal valuation of AI by NIH reviewers.

Claims (8)

ClaimDirectionConfidenceOutcomeDetails
We present a comprehensive analysis of 58,746 NIH-funded biomedical research projects from 2025. Other positive high number of NIH-funded projects analyzed
n=58746
58,746
0.18
AI constitutes 15.9% of the NIH portfolio. Adoption Rate positive high share of NIH projects that are AI-related
n=58746
15.9%
0.18
AI projects receive a 13.4% funding premium. Adoption Rate positive high relative funding amount for AI projects (premium vs non-AI)
n=58746
13.4% funding premium
0.18
AI research is concentrated in discovery, prediction, and data integration across disease domains. Research Productivity positive high topic distribution of AI projects (domains: discovery, prediction, data integration)
n=58746
0.18
A critical research-to-deployment gap exists: 79% of AI projects remain in research/development stages while only 14.7% engage in clinical deployment or implementation. Adoption Rate negative high stage of project (research/development vs clinical deployment/implementation)
n=58746
79% research/development; 14.7% clinical deployment/implementation
0.18
Health disparities research is severely underrepresented at just 5.7% of AI-funded work. Inequality negative high share of AI-funded projects focused on health disparities
n=58746
5.7% of AI-funded work
0.18
We illustrate a human-in-the-loop research methodology for LLMs to automatically classify and summarize research descriptions at scale. Research Productivity positive high ability to classify and summarize research descriptions at scale
n=58746
0.18
These findings establish a framework for evidence-based policy interventions to align the NIH AI portfolio with health equity goals and strategic research priorities. Governance And Regulation positive medium use of analysis to support policy interventions
n=58746
0.02

Notes