A publicly available library of 2,193 high-resolution 3D ant scans links morphology to genomes and slashes data-acquisition barriers for biological AI; the open resource shifts value toward model development, compute, and services while enabling new biodiversity and trait-analytics markets.

High-throughput phenomics of global ant biodiversity

Julian Katzke, F. Javier García, Jacob J. Relle, Fumika Azuma, Tomáš Faragó, Lazzat Aibekova, Alexandre Casadei-Ferreira, Shubham Gautam, Adrian Richter, Evropi Toulkeridou, Sabine Bremer, Elias Hamann, Jenny Hein, Janes Odar, Chandan Sarkar, Sabine Bremer, Jacobus J. Boomsma, Rodrigo M. Feitosa, Lukas Schrader, Guojie Zhang, Sándor Csősz, Minsoo Dong, Olívia Evangelista, Georg Fischer, Brian L. Fisher, Jaime A. Florez-Fernandez, Serge Aron, Abel Bernadou, Martín Bollazzi, Raphaël Boulay, Sylvia Cremer, Heike Feldhaar, Foitzik Susanne, Erik T. Frank, Jürgen Gadau, Daniele Giannetti, Stéphane De Greef, Heikki Helanterä, Ana Ješovnik, Andrew V. Suarez, Bálint Markó, David R. Nash, Jérôme Orivel, Jes Søe Pedersen, Frédéric Petitclerc, Stephen A. Rehner, Helen Sindre, András Tartally, Kazuki Tsuji, Irène Villalta, Herbert C. Wagner, Fede García, Kiko Gómez, Donató A. Grasso, Stéphane De Greef, Benoit Guénard, Peter G. Hawkes, R. Roy Johnson, Roberto A. Keller, Rasmus Stenbak Larsen, Timothy A. Linksvayer, Cong Liu, Arthur Matte, Masako Ogasawara, Hao Ran, Juanita Rodríguez, Enrico Schifani, Schultz Ted R., Jonathan Z. Shik, Jeffrey Sosa‐Calvo, Chao Tong, Leonardo Tozetto, Seonwoo Yoon, Masashi Yoshimura, Jie Zhao, Tilo Baumbach, Evan P. Economo, Thomas van de Kamp · March 05, 2026 · Nature Methods

openalex descriptive n/a evidence 7/10 relevance Full text usable extracted full text DOI Source PDF

Antscan provides an open, standardized library of 2,193 whole-body 3D ant scans across 792 species, linked to genomic data, to enable automated morphometrics, comparative phenomics, and scalable ML development.

The big data era in biology is underway, but the study of organismal form has been slow to capitalize on advances in imaging and computation. Imaging approaches can digitize whole organisms, but low throughput has limited the effort to document morphological diversity. Here, within the open science initiative 'Antscan', we applied high-throughput synchrotron X-ray microtomography to capture phenotypes across a diverse and ecologically dominant insect group: ants. At https://www.antscan.info , we provide 2,193 whole-body three-dimensional ant datasets from 212 genera and 792 species to broadly cover the ant phylogeny with a global scope, also pairing phenomic data with genome sequencing projects. Scans acquired with standardized parameters facilitate automated analysis, and free access to data can broaden the audience and incentivize methods development. Antscan presents a scalable approach to create libraries of diverse anatomies, heralding an era of studies on the evolution, structure and function of organismal phenotypes.

Summary

Main Finding

The Antscan initiative created a large, standardized, openly accessible 3D phenomics resource for ants using high-throughput synchrotron X‑ray microtomography: 2,193 whole‑body tomograms (792 species, 212 genera, 14/16 subfamilies), paired where possible with genomic data, processed for comparability and automated analysis, and released under CC BY 4.0 for community use.

Key Points

Scale and scope
- 2,193 whole‑body 3D datasets (plus 32 non‑ant outgroups).
- At least 792 species represented across 212 genera (covering >90% of described ant species by those genera).
- Specimen composition: 1,671 workers, 291 queens, 220 males.
- 186 species associated with genomic data (585 scans), 157 scans from same nest series as sequenced specimens.
Imaging approach and standardization
- High‑throughput synchrotron micro‑CT (KIT Light Source) with robotic sample exchange for rapid, high‑flux scans.
- Three synchrotron magnifications (pixel sizes 1.22, 2.44, 6.1 µm) plus some lab micro‑CT scans (8.4 µm) for very large or iodine‑stained specimens.
- Use of phase contrast and phase‑retrieval; "blended volumes" combine standard reconstructions and phase‑retrieved reconstructions to capture both exoskeleton and soft tissue contrast.
- Standardized acquisition and reconstruction parameters within magnification groups to ensure comparable gray‑value properties across datasets.
Processing and automation
- GPU‑based tomographic reconstruction from ~3,000 projections per scan.
- Automatic merging of multi‑height scans, conversion from 32‑bit to 8‑bit TIFF stacks.
- Automated background cropping and crude segmentation using a neural network (Biomedisa) to reduce file size and prepare data for downstream ML.
- Interactive online segmentation/processing via Biomedisa.
Data access, provenance and sustainability
- All tomograms, 3D meshes and metadata released under CC BY 4.0 via Biomedisa (interactive portal) and mirrored in KIT’s RADAR4KIT repository; each scan has a DOI and specimen identifiers.
- Rich metadata include taxonomy, ecology, locality, specimen provenance, and links to genomes.
Quality and caveats
- Ethanol‑preserved specimens were used to avoid destructive staining; a subset (132) were iodine‑stained for lab micro‑CT (these deviate from standard imaging and have different gray‑value characteristics).
- Some specimens show soft‑tissue shrinkage or decay (from prior DNA extraction, handling, storage), and occasional truncation of appendages when outside field of view.
- Conversion to 8‑bit and the inclusion of nonstandard stained scans introduce heterogeneity that downstream models must handle.

Data & Methods

Sampling
- Vouchered, ethanol‑preserved specimens from museums and personal collections worldwide.
- Phylogenetically broad sampling strategy: represent species‑poor clades, rare taxa, and multiple representatives of hyperdiverse genera (e.g., Camponotus, Pheidole, Strumigenys).
Imaging hardware & acquisition
- Synchrotron micro‑CT at two KIT beamlines, robotic sample exchange, rotary stage, high‑speed camera/detector.
- ~3,000 X‑ray projections per scan; phase contrast exploited for soft tissue visualization.
- For largest specimens or those stained with iodine, laboratory micro‑CT was used.
Reconstruction & preprocessing
- GPU‑based tomographic reconstruction; phase‑retrieval applied for enhanced soft‑tissue contrast; primary datasets are blended volumes from standard recon and phase‑retrieved recon.
- Automatic merging of height‑step scans where necessary.
- Original tomograms saved as 32‑bit; processed into 8‑bit TIFF stacks, with background cropping.
- Automated segmentation: Biomedisa neural network used to produce crude segmentations and 3D surface meshes; Biomedisa portal enables further semi‑automatic segmentation and sharing.
Data publishing & metadata
- Interactive repository: https://biomedisa.info/antscan (previews, interactive 3D models, download).
- Long‑term mirror: RADAR4KIT (KIT) with DOIs for scans and links to specimen IDs.
- Metadata fields include taxonomy, collector/curator credits, locality, habitat, ecological parameters, and genome associations.
Numbers & coverage recap
- 2,193 ant tomograms, 32 outgroup wasps.
- 212 genera (out of 343), 14/16 extant ant subfamilies.
- 186 species tied to genome projects.

Implications for AI Economics

Data as a public good and lowered acquisition costs
- Antscan is a large, curated, open 3D dataset that dramatically reduces data‑collection costs for researchers and companies developing 3D/computer‑vision models for small organisms. The CC BY 4.0 license enables academic and commercial reuse, increasing the dataset’s market value as a free input.
Training and benchmarking resource for 3D ML
- Standardized imaging parameters across large subsets create an unusually clean benchmark for volumetric segmentation, object recognition, morphometrics, and multi‑modal models combining genotype and phenotype. This supports reproducible benchmarking of algorithms and can accelerate progress in 3D vision models.
Economies of scale, infrastructure, and cost structure
- Demonstrates that centralizing high‑throughput, high‑capex infrastructure (synchrotron + robotics + GPU recon) can produce datasets at scale with lower per‑specimen marginal cost than bespoke lab scans. This has implications for how institutions and funders should invest—centralized facilities can enable broad downstream economic activity (tooling, services, analytics).
- However, high storage and compute needs for 3D volumetric data (GPU recon, segmentation, model training) create ongoing operational costs. Cloud/GPU providers and data‑hosting services can capture economic value by offering processing pipelines, hosting, and model‑training services tailored to such datasets.
Labor, automation, and market reallocation
- Automated acquisition and neural segmentation reduce manual curation and annotation labor. This can displace routine digitization jobs but also creates demand for higher‑skill roles (pipeline engineering, model development, biological annotation for edge cases). Markets for semi‑automatic human‑in‑the‑loop annotation tools (like Biomedisa) may expand.
Standards, interoperability, and data governance
- Standardization (consistent imaging parameters, DOIs, rich metadata, specimen provenance) increases data interoperability and lowers transaction/friction costs for multi‑institutional research and commercial applications. Clear provenance and crediting mechanisms reduce intellectual‑property uncertainty and increase reuse.
Heterogeneity, bias and model risk
- Remaining heterogeneity (iodine‑stained vs. unstained, 8‑bit conversion, truncation, preservation artifacts) creates dataset shift risks for models trained on Antscan when applied to other datasets/scanners. Economists and product managers should anticipate costs for domain adaptation, calibration, or additional labeling.
Opportunities for new products and research
- Commercial opportunities: biodiversity analytics platforms, automated trait extraction services, conservation assessment tools, agritech/biomimetics model licensing, and integrated genome–phenotype analytics.
- Research/economic value: linking genomes to 3D morphology enables new causal and predictive studies (e.g., trait evolution, ecosystem service valuation), which can inform policy, conservation investment decisions, and biotech R&D prioritization.
Policy and funding implications
- Antscan provides a model for public investment in shared scientific infrastructure that creates downstream private and public economic benefits. Funders should consider lifecycle costs (scanning, storage, compute, curation) and support accessible processing portals to maximize social returns.
Commercialization & licensing
- CC BY 4.0 explicitly allows commercial use, lowering barriers for startups and incumbents to build commercial services on top of Antscan; this contrasts with restricted licenses that limit market formation.
Summary takeaway for AI economists
- Antscan is a high‑quality, large, standardized open 3D dataset that reduces data acquisition frictions, enables benchmarkable ML development, and shifts cost structure toward compute and storage. It exemplifies how centralized high‑throughput scientific infrastructure plus open publishing can catalyze downstream markets and research, while introducing needs for investments in compute, domain adaptation, and sustainable data hosting.

If you want, I can: - Extract specific numbers or metadata fields for an economic cost model (storage, GPU hours). - Outline a simple cost/profit model for a startup offering model‑training and trait‑extraction services using Antscan. - Identify concrete downstream ML tasks (segmentation, species ID, trait regression) with estimated compute/storage needs.

Assessment

Paper Typedescriptive Evidence Strengthn/a — This paper presents and documents an open dataset and imaging pipeline rather than testing causal hypotheses or estimating effects; it does not provide empirical causal evidence. Methods Rigorhigh — Large-scale, standardized synchrotron X-ray microtomography pipeline with clearly reported acquisition parameters, comprehensive metadata linkage (taxonomic labels, locality, genome links), and public release that enables reproducible downstream analysis and benchmarking. Sample2,193 whole-body high-resolution 3D scans (volumes/meshes) of ants covering 212 genera and 792 species with associated metadata (taxonomic IDs, collection/locality) and links to ongoing genome sequencing projects; globally sampled but drawn from available collections and sequenced specimens. Themesinnovation labor_markets GeneralizabilityTaxonomic scope limited to ants (Formicidae); morphological patterns and scanning results may not generalize to other taxa, Specimen set may reflect collection and museum biases (geographic, temporal, species abundance), limiting representativeness, Imaging captures ex vivo preserved morphology; does not represent live physiology, behavior, or soft-tissue dynamics, Methods and data format optimized for synchrotron microCT outputs; transferability to other imaging modalities (e.g., desktop microCT, photogrammetry) may be imperfect, Using the dataset at scale requires substantial storage and compute resources, constraining accessibility for some users, Dataset documents morphology and links to genomes but does not itself measure economic outcomes or adoption behaviors

Claims (17)

Claim	Direction	Outcome	Confidence & Evidence	Details
The Antscan project produced 2,193 whole-body 3D ant datasets (scans). Other	positive	number of whole-body 3D ant scans (2,193)	Reading fidelity high Study strength n/a	n=2193 0.03
The dataset covers taxonomic breadth of 212 genera and 792 species. Other	positive	taxonomic coverage (genera and species counts)	Reading fidelity high Study strength n/a	not reported 0.03
Sampling is global and broadly covers ant phylogeny. Other	positive	geographic/phylogenetic coverage of sampled specimens	Reading fidelity medium Study strength n/a	not reported 0.02
Scans were acquired with standardized parameters to facilitate automated and replicable analysis and benchmarking. Other	positive	use of standardized scanning parameters and metadata format	Reading fidelity high Study strength n/a	not reported 0.03
Imaging modality used is synchrotron X-ray microtomography (high-resolution 3D imaging). Other	positive	imaging modality applied	Reading fidelity high Study strength n/a	not reported 0.03
The project demonstrated a high-throughput application of synchrotron X-ray microtomography for whole-organism digitization at scale. Research Productivity	positive	throughput of whole-organism digitization (number of scans produced using the pipeline)	Reading fidelity high Study strength n/a	n=2193 0.03
The scanning pipeline was optimized and standardized to enable digitizing hundreds to thousands of specimens. Research Productivity	positive	pipeline throughput/scale (hundreds–thousands of specimens)	Reading fidelity high Study strength n/a	n=2193 0.03
The dataset includes metadata such as taxonomic labels, collection/locality data, and links to genome projects where available. Research Productivity	positive	presence and type of metadata fields associated with scans	Reading fidelity high Study strength n/a	not reported 0.03
All data are openly available at https://www.antscan.info. Adoption Rate	positive	data accessibility (public availability and repository URL)	Reading fidelity high Study strength n/a	not reported 0.03
Phenomic (3D scans) data are linked/paired to ongoing genome sequencing projects to create multimodal phenome–genome resources. Research Productivity	positive	existence/extent of links between scan records and genome sequencing projects	Reading fidelity medium Study strength n/a	not reported 0.02
The dataset and its standardization are intended to support automated segmentation, landmarking, feature extraction, and benchmarking for computer-vision and ML methods on biological 3D data. Research Productivity	positive	design features intended to enable automated ML workflows (standardized parameters and metadata)	Reading fidelity medium Study strength n/a	not reported 0.02
Open, standardized 3D phenomic datasets reduce the need for individual labs/companies to finance expensive scanning campaigns and democratize access for academic groups and startups. Adoption Rate	positive	reduction in data-acquisition costs/barriers for downstream users (projected)	Reading fidelity low Study strength n/a	not reported 0.01
Paired phenome–genome data increases the scientific and commercial value of the dataset for models predicting phenotype from genotype and vice versa. Research Productivity	positive	value for phenotype–genotype predictive modeling (projected)	Reading fidelity low Study strength n/a	not reported 0.01
Standardized, high-quality data will concentrate competition on modeling, compute, and algorithmic innovation, favoring actors with greater compute resources. Market Structure	neutral	distribution of competitive advantage in modeling/compute (projected)	Reading fidelity low Study strength n/a	not reported 0.01
Processing and using 3D volumetric data requires substantial storage and GPU/TPU compute, creating demand for cloud compute services and managed ML platforms. Firm Productivity	positive	computational and storage resource demand for processing the dataset (projected)	Reading fidelity medium Study strength n/a	not reported 0.02
Open, linked phenomic–genomic datasets could inform policy and conservation markets (e.g., biodiversity credits) by improving monitoring and trait-based risk assessment models. Governance And Regulation	positive	potential influence on policy and conservation market analytics (projected)	Reading fidelity low Study strength n/a	not reported 0.01
There are risks that concentration of modeling capability around well-funded actors could create inequality in capture of downstream economic gains despite open data. Inequality	negative	risk of unequal economic capture from downstream applications (projected)	Reading fidelity low Study strength n/a	not reported 0.01