AI has been remaking science for six decades: successive generations of tools have outsourced bottlenecks to machines, and an emergent ecosystem of agents, preprint platforms, codebases and citation infrastructure is speeding discovery, changing who can do science and altering incentives for credit.

A Brief History of AI for Scientific Discovery: Open Research, Metrics, and Autonomous Agents

Surasak Phetmanee · March 23, 2026 · Preprints.org

openalex review_meta n/a evidence 7/10 relevance DOI Source PDF

A sixty-year historical synthesis argues that AI has repeatedly automated successive scientific bottlenecks, producing an interconnected ecosystem of agents, platforms, and metrics that accelerates discovery while reshaping access, incentives, and what counts as scientific contribution.

The history of artificial intelligence for scientific discovery is not a two year story about chatbots learning to write papers. It is a sixty year story about science repeatedly handing its bottlenecks to machines—first inference, then search, then measurement, then the full workflow—only to discover that each delegation solves one problem and exposes a harder one underneath. This paper traces that history from DENDRAL (1965) through the construction of open scholarly infrastructure (arXiv, Google Scholar, ORCID), the oracle breakthroughs of AlphaFold, and the current era of LLM driven autonomous research agents. Three interlocking threads are followed including AI as research instrument, AI for research infrastructure, and the reshaping of scholarly profiles and incentives by machine readable metrics. The central tension throughout is between automation and augmentation between building systems that replace human researchers and tools that amplify human creativity and judgement. The paper presents that the most consequential development is not any single tool but the emergence of an interconnected ecosystem where AI agents, preprint platforms, open source codebases, and citation infrastructure form a feedback loop that is fundamentally restructuring who can do science, how fast discoveries propagate, and what counts as a valid scientific contribution.

Summary

Main Finding

The paper argues that the history of AI for scientific discovery is a 60‑year, cumulative process in which science repeatedly delegates its core bottlenecks to machines—first inference, then search, then measurement, then much of the research workflow—each delegation solving a problem while revealing a harder one underneath. The most consequential outcome is not any single tool (e.g., AlphaFold or chatbots) but the emergence of an interconnected ecosystem—AI agents, preprint servers, open codebases, and machine‑readable citation infrastructure—that forms a feedback loop fundamentally reshaping who can do science, how fast discoveries propagate, and what counts as a valid scientific contribution.

Key Points

Long arc: Traces progress from early expert systems (DENDRAL, 1965) through open scholarly infrastructure (arXiv, Google Scholar, ORCID), oracle breakthroughs (AlphaFold), to contemporary LLM‑driven autonomous research agents.
Successive delegation: Each technical advance removes one bottleneck (e.g., pattern recognition, combinatorial search, measurement automation) but exposes deeper challenges (integration, validation, incentive alignment).
Three interlocking threads:
- AI as research instrument (tools that extend human capabilities in hypothesis formation, modeling, measurement).
- AI for research infrastructure (platforms, search and indexing, identifiers, programmatic APIs that change discovery and reuse).
- Machine‑readable metrics reshaping scholarly profiles and incentives (automated citation metrics, reproducibility tests, computational assessments).
Central tension: automation versus augmentation—systems that aim to replace human researchers versus systems that amplify human creativity and judgment—drives political, economic, and organizational choices.
Emergent ecosystem effects: Interactions among agents, preprint platforms, open source, and citation infrastructure create positive feedback loops that accelerate dissemination but also concentrate influence and change reward structures.
Practical consequences: changes in who has access to doing research (lower barriers in some domains, higher capital/platform dependence in others), faster propagation of results, shifts in what is recognized as legitimate contribution (code, datasets, agent‑generated artifacts), and new risks around validation, reproducibility, and perverse incentives.

Data & Methods

Historical and conceptual analysis: a chronological tracing of milestones in AI for science from 1965 to the present.
Case studies and exemplars: focused examination of systems and moments such as DENDRAL, the construction of open scholarly infrastructure (arXiv, Google Scholar, ORCID), AlphaFold, and recent LLM‑based research agents.
Synthesis of literature and platform histories: draws on published histories, technical papers, platform documentation, and examples of deployment to illustrate dynamics and feedback loops.
Institutional and incentive analysis: qualitative assessment of how machine‑readable metrics and infrastructure reshape scholarly incentives and organizational behavior.
(If empirical work is present) Likely use of illustrative metrics and examples from platform usage, citation flows, and demonstration projects—though the paper centers on interpretive synthesis rather than novel large‑scale econometric estimation.

Implications for AI Economics

Productivity and growth
- Potential for large productivity gains in scientific R&D through automation of routine tasks and acceleration of discovery cycles.
- Gains may be uneven across fields: domains with well‑structured data and clear evaluation metrics (e.g., structural biology) may capture early benefits.
Labor markets and skill demand
- Shifts in demand toward skills that complement AI (designing experiments, causal reasoning, stewardship, domain validation).
- Potential dislocation of tasks performed by junior researchers and technicians; rising returns to those who control platforms, data, and agent development.
Market structure and rents
- Platform effects and winner‑take‑all dynamics: control over preprint, indexing, agent platforms, and large curated datasets can concentrate market power and scientific influence.
- Proprietary vs open infrastructure choices will shape who captures rents from scientific automation.
Measurement and incentives
- Machine‑readable metrics can speed evaluation but risk incentivizing machine‑friendly outputs over substantive scientific progress (gaming, superficial writeups, reproducibility erosion).
- Funding, hiring, and promotion systems will need redesign to value human‑AI hybrid contributions and rigorous validation work.
Public goods, externalities, and regulation
- Scientific knowledge has public‑good features; privatization or gated access to powerful agents could reduce broad social returns.
- Externalities include faster spread of low‑quality or unvalidated findings and reduced diversity of research agendas.
- Policy responses: invest in open infrastructure, require transparency and reproducibility standards, monitor platform concentration, and consider antitrust or data‑access interventions where appropriate.
Research agenda for economists
- Quantify productivity effects across fields and over time.
- Study distributional impacts on researchers, institutions, and countries.
- Model platform competition, data ownership, and returns to complementary human skills.
- Evaluate policy interventions (open data mandates, metric design, platform regulation) for preserving public value and equitable access.

If you want, I can convert this into a short policy brief for funders or a research agenda listing specific empirical questions and suggested identification strategies.

Assessment

Paper Typereview_meta Evidence Strengthn/a — This is a historical and conceptual synthesis rather than an empirical, causal study; it does not present new causal identification or statistical estimates. Methods Rigormedium — The paper traces a long history and interlocking themes with illustrative case studies (DENDRAL, arXiv, AlphaFold, LLM agents) and argues coherently, but it appears to be a narrative review without a preregistered systematic-review protocol, formal inclusion criteria, or quantitative meta-analysis. SampleA narrative historical review drawing on landmark case studies and infrastructure examples in scientific AI (DENDRAL, preprint servers like arXiv, discovery systems such as AlphaFold, scholarly infrastructure like Google Scholar and ORCID, and recent LLM-driven research agents); synthesizes literature and examples across multiple decades rather than new primary data. Themesinnovation productivity human_ai_collab org_design adoption GeneralizabilityQualitative, narrative synthesis rather than representative empirical sample limits generalizability to measurable economic outcomes, Focus on high-profile successes may overstate typical effects (selection toward notable breakthroughs), Primarily centered on STEM and computational sciences; less applicable to humanities and social sciences, Likely English-language and open-source centric; regional and institutional heterogeneity underexamined, Does not quantify impacts on wages, employment, or firm-level productivity, limiting economic generalizability

Claims (8)

Claim	Direction	Confidence	Outcome	Details
The history of artificial intelligence for scientific discovery is not a two year story about chatbots learning to write papers; it is a sixty year story beginning with DENDRAL (1965). Research Productivity	null_result	high	historical scope and timeline of AI for scientific discovery	0.4
Science has repeatedly delegated its bottlenecks to machines—first inference, then search, then measurement, then the full workflow—and each delegation solves one problem while exposing a harder one underneath. Automation Exposure	mixed	high	pattern of delegation and emergent bottlenecks in research workflows	0.24
Three interlocking threads characterize AI for science: (1) AI as research instrument, (2) AI for research infrastructure, and (3) the reshaping of scholarly profiles and incentives by machine-readable metrics. Other	null_result	high	conceptual decomposition of AI-for-science developments	0.12
AlphaFold represents an 'oracle' breakthrough in AI for scientific discovery. Research Productivity	positive	high	impact of AlphaFold on a scientific subtask (protein structure prediction)	0.24
The central tension in AI for science is between automation (building systems that replace human researchers) and augmentation (tools that amplify human creativity and judgement). Automation Exposure	mixed	high	relationship between automation and augmentation in research practice	0.12
The most consequential development is not any single tool but the emergence of an interconnected ecosystem—AI agents, preprint platforms, open source codebases, and citation infrastructure—that forms a feedback loop. Adoption Rate	mixed	high	emergence of an interconnected scientific infrastructure ecosystem	0.04
That interconnected ecosystem is fundamentally restructuring who can do science (access), how fast discoveries propagate, and what counts as a valid scientific contribution. Research Productivity	mixed	high	access to scientific practice, speed of discovery dissemination, and norms of scientific contribution	0.04
Machine-readable metrics and open scholarly infrastructure are reshaping scholarly profiles and incentives. Governance And Regulation	mixed	high	changes in scholarly incentives and profile construction due to machine-readable metrics	0.24