AI in research proposals delivers only modest immediate publication gains concentrated among top projects, but substantially reorganizes how science is done: AI-enabled projects allocate more to people, form larger teams, and broaden ideation and experimental activities, signaling potential future productivity gains as models improve.

Artificial Intelligence in Science: Returns, Reallocation, and Reorganization

Moh Hosseinioun, Brian Uzzi, Henrik Barslund Fosse · March 30, 2026

arxiv correlational medium evidence 7/10 relevance Source PDF

AI usage in research proposals is associated with modest short-run gains concentrated in the top-performing projects, while more conspicuously reshaping research organization—shifting budgets toward human capital, enlarging teams, and expanding ideation and experimentation tasks.

Investment in artificial intelligence (AI) has grown rapidly, yet its returns to scientific research remain poorly understood. We study how AI reshapes the production of science using a comprehensive dataset of research proposals submitted to a large international funding agency, including both funded and unfunded projects. Combining keyword extraction with large language model classification, we identify the presence, type, and functional role of AI within each proposal and link these measures to detailed budget allocations, team structure, and subsequent publication outcomes. We find that, in the short run, AI adoption is associated with modest improvements in scientific outcomes concentrated in the upper tail. Instead, its primary effects arise in the organization of research: AI-enabled projects reallocate resources toward human capital, involve larger teams, and undertake a broader set of tasks. These patterns are consistent with a reorganization of the scientific production process rather than immediate efficiency gains, in line with theories of general-purpose technologies. Task-level analyses further show that activities expanded in AI-enabled projects, particularly ideation and experimentation, are increasingly compatible with large language model capabilities, suggesting potential for future productivity gains as these technologies mature.

Summary

Main Finding

AI adoption in scientific research is associated with modest short-run improvements concentrated in the upper tail of outcomes, but its primary effect is organizational: AI-enabled proposals reallocate resources toward human capital (salaries), involve larger teams, and expand the set of tasks undertaken. These patterns imply a reorganization and learning phase consistent with general-purpose-technology (GPT) theory, with greater potential for future productivity gains where tasks (ideation, experimentation) are highly compatible with large language model (LLM) capabilities.

Key Points

Adoption pattern
- Rapid diffusion of “modern AI” (neural nets, deep learning, generative models) after BERT/AlphaFold/GPT-3 and another surge after ChatGPT (2023).
- Modern AI increasingly used for ideation, experimentation, model/tool development, and automation — not only routine data analysis.
Inputs & organization
- AI-enabled projects tend to be longer in duration but request similar total budgets to non-AI projects.
- No systematic difference in funding probability for AI proposals after controls.
- Resource reallocation: higher share of budgets to human capital (driven by salaries), lower shares to equipment and operational/material costs.
- AI projects involve larger teams and more tasks (greater task scope/complexity).
Scientific outcomes
- Modest short-run gains overall; measurable advantages appear mainly in the upper tail (e.g., maximum journal impact factor, highest citation counts, publication counts for some projects).
- Publications from AI-enabled projects tend to have more authors (consistent with larger teams).
Task-level compatibility with LLMs
- Mapping tasks to Anthropic LLM-exposure scores shows ideation and experimentation tasks are more LLM-exposed, suggesting where future productivity gains are most likely.
- Current pattern is expansion of activities (AI adds tasks) rather than substitution of experimental work with computation (contrast with AlphaFold).
Interpretation
- Results are consistent with GPT theory: early-stage adoption induces reorganization and complements to human capital before broad measurable productivity gains emerge.

Data & Methods

Data
- Comprehensive corpus of research proposals (funded and unfunded) submitted over ~a decade to a large international health/biomed funding agency.
- Linked funded proposals to subsequent publication records for outcome measurement.
AI identification & classification
- Multi-stage pipeline: curated regex/dictionary extraction of candidate AI terms supplemented with LLM-inferred algorithm names; each keyword–sentence pair classified by LLMs on:
  - Architectural class: modern AI (deep learning, generative), statistical ML, analytics, or domain-specific algorithms.
  - Functional role/use case: ideation, data collection, processing/analysis, experimentation, validation, automation, application, education, benchmark/background, etc.
- Passing/incidental mentions excluded; measures are proposal-level, non-mutually-exclusive.
Task extraction & LLM exposure
- Proposal text mapped to task taxonomy derived from O*NET; per-proposal task counts and task composition constructed.
- Tasks mapped to external Anthropic task-level LLM-exposure scores; aggregated to measure project-level LLM exposure.
Empirical strategy
- Descriptive comparisons, semantic matching (match each AI proposal to top semantically similar non-AI proposals), and regression analyses.
- Regressions control for applicant demographics (age, gender, prior experience), project length, team size, textual similarity, year and domain fixed effects.
- Logistic regressions for funding likelihood; outcome regressions include funding status to separate selection from association.
- Robustness: alternative AI intensity measures (keyword counts), matched-sample analyses, cross-algorithm checks reported in SI.
Limitations noted by authors
- Observational design; potential residual confounding.
- Single funding agency in a health/biomed context — external validity may be limited.
- Short-run horizon; longer-term effects may differ.

Implications for AI Economics

Reallocation vs. immediate productivity gains
- AI adoption appears to shift the allocation of research inputs (toward human capital and larger teams) rather than generating immediate, uniform productivity improvements. Economically, returns to AI in science manifest partly through organizational capital and task-composition changes rather than pure factor productivity boosts in the short run.
Complementarities and investment in human capital
- Salaries and team expansion imply strong complementarities between AI tools and skilled labor; returns depend on investments in personnel (training, interdisciplinary skills) and organizational routines that realize AI’s potential.
Latency consistent with GPT diffusion
- The observed latency and reorganization echo historical GPT patterns: adoption requires complementary investments and time before productivity gains are realized. Evaluations of AI ROI must allow for this dynamic.
Heterogeneous returns and upper-tail concentration
- Gains concentrate in an upper tail of projects, suggesting winner-take-all or skewed returns. This has implications for inequality across labs/institutions and for allocation of public/private funding toward frontier capabilities.
Policy and funding design
- Funders should consider supporting reorganization costs (training, personnel, longer projects) and access to frontier models/infrastructure to broaden benefits and avoid reinforcing advantages for well-resourced groups.
- Evaluation metrics may need to incorporate organizational and task-based changes (e.g., team composition, task exposure) and longer horizons.
Research agendas for AI economics
- Need causal identification of AI’s long-run impact on scientific productivity; study effects across domains and institutional settings.
- Investigate distributional consequences (who captures gains), the role of access to frontier models and compute, and the economics of complementary investments (training, data, workflows).
- Track how task automation vs. augmentation evolves as LLMs and domain models mature, and how that changes returns and costs across sectors.

Limitations to keep in mind: single-domain (health/biomed) sample, observational design, and short-run outcome window. The paper’s contributions are strongest in showing how AI changes resource allocation and task composition even before large, broad-based productivity gains appear.

Assessment

Paper Typecorrelational Evidence Strengthmedium — Uses a large, detailed observational dataset (both funded and unfunded proposals) and careful text classification to establish robust associations, but lacks a clear causal identification strategy (no random assignment, instrument, or natural experiment), so observed relationships may reflect selection, omitted variables, or reverse causality. Methods Rigorhigh — Combines systematic keyword extraction with large-language-model classification, links proposal-level AI indicators to granular budget and team data and downstream publication outcomes, and performs task-level analyses; however, potential measurement error in AI classification and limited strategies to rule out confounding remain. SampleComprehensive dataset of research proposals submitted to a large international funding agency (includes both funded and unfunded projects), with linked proposal budgets, team composition, stated tasks/activities, and subsequent publication outcomes; AI presence/type/functional role coded via keyword methods and LLM classification; time window and exact sample size not specified in the abstract. Themesinnovation org_design productivity human_ai_collab GeneralizabilitySingle funding agency—may not represent other funders, countries, or institutional contexts, Proposals reflect intended plans, not necessarily realized project activities, Short-run publication outcomes may miss longer-run productivity or breakthrough effects, Results may vary across scientific fields; heterogeneity by discipline may limit generality, ML/LLM-based classification may misclassify AI involvement, introducing measurement error, Applicants self-select to mention AI; selection bias may affect comparisons

Claims (6)

Claim	Direction	Confidence	Outcome	Details
In the short run, AI adoption is associated with modest improvements in scientific outcomes concentrated in the upper tail. Research Productivity	positive	high	subsequent publication outcomes (scientific outcomes)	0.3
AI-enabled projects reallocate resources toward human capital (i.e., shift budget allocations toward labor / human capital). Labor Share	positive	high	budget allocation share toward human capital (labor share)	0.3
AI-enabled projects involve larger teams. Team Performance	positive	high	team size / team structure	0.3
AI-enabled projects undertake a broader set of tasks. Task Allocation	positive	high	breadth/variety of tasks undertaken in projects	0.3
These patterns are consistent with a reorganization of the scientific production process rather than immediate efficiency gains, in line with theories of general-purpose technologies. Organizational Efficiency	mixed	high	organizational reorganization vs efficiency gains (qualitative interpretation)	0.05
Task-level analyses show that activities expanded in AI-enabled projects—particularly ideation and experimentation—are increasingly compatible with large language model capabilities, suggesting potential for future productivity gains as these technologies mature. Task Allocation	positive	high	frequency/expansion of specific task categories (ideation, experimentation) and their compatibility with LLM capabilities	0.3