The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

A publicly available library of 2,193 high-resolution 3D ant scans links morphology to genomes and slashes data-acquisition barriers for biological AI; the open resource shifts value toward model development, compute, and services while enabling new biodiversity and trait-analytics markets.

High-throughput phenomics of global ant biodiversity
Julian Katzke, F. Javier García, Jacob J. Relle, Fumika Azuma, Tomáš Faragó, Lazzat Aibekova, Alexandre Casadei-Ferreira, Shubham Gautam, Adrian Richter, Evropi Toulkeridou, Sabine Bremer, Elias Hamann, Jenny Hein, Janes Odar, Chandan Sarkar, Sabine Bremer, Jacobus J. Boomsma, Rodrigo M. Feitosa, Lukas Schrader, Guojie Zhang, Sándor Csősz, Minsoo Dong, Olívia Evangelista, Georg Fischer, Brian L. Fisher, Jaime A. Florez-Fernandez, Serge Aron, Abel Bernadou, Martín Bollazzi, Raphaël Boulay, Sylvia Cremer, Heike Feldhaar, Foitzik Susanne, Erik T. Frank, Jürgen Gadau, Daniele Giannetti, Stéphane De Greef, Heikki Helanterä, Ana Ješovnik, Andrew V. Suarez, Bálint Markó, David R. Nash, Jérôme Orivel, Jes Søe Pedersen, Frédéric Petitclerc, Stephen A. Rehner, Helen Sindre, András Tartally, Kazuki Tsuji, Irène Villalta, Herbert C. Wagner, Fede García, Kiko Gómez, Donató A. Grasso, Stéphane De Greef, Benoit Guénard, Peter G. Hawkes, R. Roy Johnson, Roberto A. Keller, Rasmus Stenbak Larsen, Timothy A. Linksvayer, Cong Liu, Arthur Matte, Masako Ogasawara, Hao Ran, Juanita Rodríguez, Enrico Schifani, Schultz Ted R., Jonathan Z. Shik, Jeffrey Sosa‐Calvo, Chao Tong, Leonardo Tozetto, Seonwoo Yoon, Masashi Yoshimura, Jie Zhao, Tilo Baumbach, Evan P. Economo, Thomas van de Kamp · March 05, 2026 · Nature Methods
openalex descriptive n/a evidence 7/10 relevance DOI Source PDF
Antscan provides an open, standardized library of 2,193 whole-body 3D ant scans across 792 species, linked to genomic data, to enable automated morphometrics, comparative phenomics, and scalable ML development.

The big data era in biology is underway, but the study of organismal form has been slow to capitalize on advances in imaging and computation. Imaging approaches can digitize whole organisms, but low throughput has limited the effort to document morphological diversity. Here, within the open science initiative 'Antscan', we applied high-throughput synchrotron X-ray microtomography to capture phenotypes across a diverse and ecologically dominant insect group: ants. At https://www.antscan.info , we provide 2,193 whole-body three-dimensional ant datasets from 212 genera and 792 species to broadly cover the ant phylogeny with a global scope, also pairing phenomic data with genome sequencing projects. Scans acquired with standardized parameters facilitate automated analysis, and free access to data can broaden the audience and incentivize methods development. Antscan presents a scalable approach to create libraries of diverse anatomies, heralding an era of studies on the evolution, structure and function of organismal phenotypes.

Summary

Main Finding

The Antscan project applied high-throughput synchrotron X-ray microtomography to produce and openly publish a large, standardized library of whole-body 3D ant phenotypes: 2,193 scans covering 212 genera and 792 species, linked to ongoing genome sequencing efforts. The dataset is intended to enable automated, scalable analysis of organismal form and to accelerate methods development across morphology, evolution, and functional studies.

Key Points

  • Scale and coverage:
    • 2,193 whole-body 3D ant datasets.
    • Taxonomic breadth: 212 genera, 792 species; global sampling to broadly cover ant phylogeny.
  • Data quality and standardization:
    • Scans acquired with standardized parameters to facilitate automated/replicable analysis and benchmarking.
  • Open access and linkage:
    • Data freely available at https://www.antscan.info.
    • Phenomic data paired with genome sequencing projects (multimodal phenome–genome resources).
  • Methodological contribution:
    • Demonstrates high-throughput application of synchrotron X-ray microtomography for whole-organism digitization.
    • Positions the resource as a community scaffold to incentivize algorithms and tools for 3D morphometrics and comparative phenomics.
  • Intended impact:
    • Scalable approach for libraries of diverse anatomies to support evolutionary, structural and functional studies.

Data & Methods

  • Imaging modality: Synchrotron X-ray microtomography (high-resolution 3D imaging).
  • Throughput: Optimized, standardized scanning pipeline to digitize whole organisms at scale (enabling hundreds to thousands of scans).
  • Dataset contents:
    • Whole-body 3D volumes/meshes of ants (2,193 samples).
    • Metadata: taxonomic labels, collection/locality and links to genome projects where available.
  • Accessibility:
    • Public repository/portal with downloadable data and associated metadata.
  • Automation-ready design:
    • Standardized acquisition parameters and metadata format intended to support automated segmentation, landmarking, feature extraction, and benchmarking for computer-vision/ML methods on biological 3D data.

Implications for AI Economics

  • Lowering data acquisition costs and barriers to entry:
    • Open, standardized 3D phenomic datasets reduce the need for individual labs/companies to finance expensive scanning campaigns, democratizing access for academic groups and startups.
    • Public availability acts as a public good that can accelerate innovation without duplicative investment.
  • Value of multimodal training data:
    • Paired phenome–genome data increases the commercial and scientific value of the dataset for models predicting phenotype from genotype (and vice versa), enabling higher-value downstream applications (e.g., trait prediction, evolutionary simulators).
  • Market creation and commercial opportunity:
    • Enables new product lines and services (automated taxonomic ID, biodiversity monitoring tools, conservation prioritization analytics, e-commerce for specimen digitization) that can be developed on top of the open dataset.
    • Startups can differentiate on model performance, UI/UX, integration, or proprietary downstream annotations rather than on raw data collection.
  • Effects on innovation competition and returns to compute:
    • Standardized, high-quality data concentrates competition on modeling, computing, and algorithmic innovation. Firms with greater compute/GPU resources and ML expertise may capture disproportionate returns by training large 3D/foundational biological models.
    • Benchmarks enabled by the dataset can accelerate iterative improvement and public comparison of approaches, lowering uncertainty for investors and funders.
  • Labor and specialization shifts:
    • Automation-ready 3D data promotes development of ML tools that can replace manual morphometric work (landmarking, measurements), potentially reallocating labor from routine annotation to higher-level curation, model development, and interpretation.
    • Could reshape skill demand toward computational morphology, ML engineering, and data curation.
  • Public good vs. proprietary models:
    • Open data raises the possibility of widely available high-performing models (public research, open-source foundation models) that reduce entry costs; but entities that combine open data with proprietary compute, annotations, or service platforms may still capture commercial rents.
  • Infrastructure and compute costs:
    • Although data collection costs are reduced for downstream users, processing 3D volumetric data requires substantial storage, GPU/TPU compute, and expert pipelines—creating demand for cloud compute services and managed ML platforms.
  • Policy and ecosystem effects:
    • Open, linked phenomic–genomic datasets could inform policy and conservation markets (e.g., biodiversity credits, ecosystem service valuation), by improving monitoring and trait-based risk assessment models.
    • Incentivizes public funding and standards for other taxa, amplifying cumulative returns to open biological data.
  • Risks and externalities:
    • Concentration of modeling capability around well-funded actors could still create inequality in capture of downstream economic gains despite open data.
    • Ethical and regulatory considerations for commercial uses (e.g., bioprospecting) may affect market dynamics.

Summary statement: Antscan is an economically potent open dataset — by drastically cutting data-acquisition frictions for high-quality 3D biological phenotypes and pairing with genomes, it shifts value toward modeling and compute, seeds new markets for automated biodiversity and trait analytics, and changes the locus of competition and labor toward ML infrastructure and service layers while raising questions about who captures resulting rents.

Assessment

Paper Typedescriptive Evidence Strengthn/a — This paper presents and documents an open dataset and imaging pipeline rather than testing causal hypotheses or estimating effects; it does not provide empirical causal evidence. Methods Rigorhigh — Large-scale, standardized synchrotron X-ray microtomography pipeline with clearly reported acquisition parameters, comprehensive metadata linkage (taxonomic labels, locality, genome links), and public release that enables reproducible downstream analysis and benchmarking. Sample2,193 whole-body high-resolution 3D scans (volumes/meshes) of ants covering 212 genera and 792 species with associated metadata (taxonomic IDs, collection/locality) and links to ongoing genome sequencing projects; globally sampled but drawn from available collections and sequenced specimens. Themesinnovation labor_markets GeneralizabilityTaxonomic scope limited to ants (Formicidae); morphological patterns and scanning results may not generalize to other taxa, Specimen set may reflect collection and museum biases (geographic, temporal, species abundance), limiting representativeness, Imaging captures ex vivo preserved morphology; does not represent live physiology, behavior, or soft-tissue dynamics, Methods and data format optimized for synchrotron microCT outputs; transferability to other imaging modalities (e.g., desktop microCT, photogrammetry) may be imperfect, Using the dataset at scale requires substantial storage and compute resources, constraining accessibility for some users, Dataset documents morphology and links to genomes but does not itself measure economic outcomes or adoption behaviors

Claims (17)

ClaimDirectionConfidenceOutcomeDetails
The Antscan project produced 2,193 whole-body 3D ant datasets (scans). Other positive high number of whole-body 3D ant scans (2,193)
n=2193
0.03
The dataset covers taxonomic breadth of 212 genera and 792 species. Other positive high taxonomic coverage (genera and species counts)
0.03
Sampling is global and broadly covers ant phylogeny. Other positive medium geographic/phylogenetic coverage of sampled specimens
0.02
Scans were acquired with standardized parameters to facilitate automated and replicable analysis and benchmarking. Other positive high use of standardized scanning parameters and metadata format
0.03
Imaging modality used is synchrotron X-ray microtomography (high-resolution 3D imaging). Other positive high imaging modality applied
0.03
The project demonstrated a high-throughput application of synchrotron X-ray microtomography for whole-organism digitization at scale. Research Productivity positive high throughput of whole-organism digitization (number of scans produced using the pipeline)
n=2193
0.03
The scanning pipeline was optimized and standardized to enable digitizing hundreds to thousands of specimens. Research Productivity positive high pipeline throughput/scale (hundreds–thousands of specimens)
n=2193
0.03
The dataset includes metadata such as taxonomic labels, collection/locality data, and links to genome projects where available. Research Productivity positive high presence and type of metadata fields associated with scans
0.03
All data are openly available at https://www.antscan.info. Adoption Rate positive high data accessibility (public availability and repository URL)
0.03
Phenomic (3D scans) data are linked/paired to ongoing genome sequencing projects to create multimodal phenome–genome resources. Research Productivity positive medium existence/extent of links between scan records and genome sequencing projects
0.02
The dataset and its standardization are intended to support automated segmentation, landmarking, feature extraction, and benchmarking for computer-vision and ML methods on biological 3D data. Research Productivity positive medium design features intended to enable automated ML workflows (standardized parameters and metadata)
0.02
Open, standardized 3D phenomic datasets reduce the need for individual labs/companies to finance expensive scanning campaigns and democratize access for academic groups and startups. Adoption Rate positive low reduction in data-acquisition costs/barriers for downstream users (projected)
0.01
Paired phenome–genome data increases the scientific and commercial value of the dataset for models predicting phenotype from genotype and vice versa. Research Productivity positive low value for phenotype–genotype predictive modeling (projected)
0.01
Standardized, high-quality data will concentrate competition on modeling, compute, and algorithmic innovation, favoring actors with greater compute resources. Market Structure neutral low distribution of competitive advantage in modeling/compute (projected)
0.01
Processing and using 3D volumetric data requires substantial storage and GPU/TPU compute, creating demand for cloud compute services and managed ML platforms. Firm Productivity positive medium computational and storage resource demand for processing the dataset (projected)
0.02
Open, linked phenomic–genomic datasets could inform policy and conservation markets (e.g., biodiversity credits) by improving monitoring and trait-based risk assessment models. Governance And Regulation positive low potential influence on policy and conservation market analytics (projected)
0.01
There are risks that concentration of modeling capability around well-funded actors could create inequality in capture of downstream economic gains despite open data. Inequality negative low risk of unequal economic capture from downstream applications (projected)
0.01

Notes