Editorial: Integrating machine learning and AI in biological research: unraveling complexities and driving advancements

The integration of Machine Learning (ML) and Artificial Intelligence (AI) is rapidly transforming biological research, providing sophisticated tools to analyze complex data, enhance precision, and navigate ethical considerations. This editorial summarizes five critical areas where AI is driving advancement, from foundational ethical shifts to deep prognostic insights in oncology.Manju V et al. discussed the foundational role of AI in ethical biomedical research. AI's role transcends mere computational efficiency; it is a chief facilitator in ensuring humane and efficacious science by adhering to the "3Rs": Replacement, Reduction, and Refinement, of animal-based research. This paper describes how traditional animal models have inherent limitations, including translational gaps, regulatory issues, and ethical controversies. AI provides the sophisticated analytical power necessary for predictions, simulations, and validations, minimizing reliance on animal subjects. By processing massive, complex datasets, machine and deep learning algorithms can simulate human biology, forecast therapy outcomes, and discover candidate drugs, thereby supporting Replacement and promoting Reduction through maximized experimental designs. This transition, however, necessitates strict validation requirements and ethical controls to ensure the reliability and integrity of the resulting models.Carreira et al. focused their research work at driving precision diagnostics in Polymicrobial Diseases. One immediate challenge in biomedicine is the accurate classification of polymicrobial diseases caused by microbial community imbalance (dysbiosis), where 16S rRNA gene sequence data is highly dimensional and heterogeneous. To address this, the curated pipeline EPheClass was developed, utilizing ensemble-based ML models (including k-nearest neighbours (kNN), Random Forest (RF), Support Vector Machines (SVM), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP)) for binary phenotype classification. The methodology described in this article emphasizes rigorous procedures for reliability and reproducibility, unlike earlier studies criticized for insufficient sample size or lack of proper validation. Key data processing steps include Centred Log-Ratio transformation (CLR) for compositional data and Recursive Feature Elimination (RFE) for feature selection. This approach prioritizes model parsimony, demonstrating high predictive performance with a dramatically reduced number of features. For instance, using the Dynamic Ensemble Selection-Performance (DES-P) technique, EPheClass achieved an impressive Area Under the Curve (AUC) of 0.973 for diagnosing periodontal disease (PD) in saliva samples using just 13 features. The pipeline's versatility was confirmed by successfully diagnosing Inflammatory Bowel Disease (IBD) (using 26 features) and classifying antibiotic exposure (DA) (using 22 features), demonstrating its generalization across different phenotypes and sample types.The goal of this research was to unravel cross-omics interactions, specifically Predicting miRNA from mRNA. The authors addressed a gap within the lack of publicly available paired datasets containing both miRNA and mRNA expression profiles. The authors' evaluation process consisted of seven paired datasets related to viral infections, specifically West Nile Virus (WNV) and Human Immunodeficiency Virus (HIV). Overall, both DNNs and LASSO models achieved strong correlations at the level of individual samples. However, DNNs proved superior in capturing predictive changes relevant to differential expression analysis (DEA). Specifically, cross-study validation using HIV datasets yielded strong correlations for log-fold changes (log2FCs) derived from DEA (R=0.59), demonstrating the model's ability to generalize to independent data of the same tissue type. Furthermore, data augmentation, specifically adding Gaussian noise, consistently improved the performance of the neural networks, helping mitigate the challenge of small sample sizes. Conversely, linear LASSO models, despite their strong sample-level performance, struggled to translate this accuracy into meaningful correlations for DEA log2FCs, suggesting the non-linear capability of DNNs is better suited for complex cross-omics relationships.The authors presented a powerful computational framework for Lung Adenocarcinoma (LUAD) Prognosis. This framework integrated multi-omics data (transcriptomic, DNA methylation, and somatic mutation data) with 10 clustering algorithms to identify three robust molecular subtypes (CS1, CS2, and CS3) associated with distinct clinical prognoses (CS3 having the best prognosis). Leveraging 10 ML algorithms in 101 unique combinations, researchers constructed the PIGRS (Lasso + GBM ensemble) prognostic model based on 15 immune-associated programmed cell death genes (PIRGs). PIGRS demonstrated strong prognostic efficacy across multiple cohorts, outperforming almost all previously published LUAD prognostic models. The model linked high PIGRS scores to increased genomic instability, including higher Tumor Mutational Burden (TMB) and intra-tumor heterogeneity (MATH scores), and suggested a relationship with immune escape. Subsequent experimental validation showed that knockdown of PSME3, significantly inhibited LUAD cell proliferation, migration, and invasion, and promoted apoptosis likely by affecting the PI3K/AKT/Bcl-2 signaling pathway.The authors focused on innate immune cell barrier-related genes to inform prognosis for pancreatic cancer (PC). Using 14 machine learning algorithms, the CDRG-RSF model (Random Survival Forest trained on risk genes) was established as the most robust prognostic tool, achieving excellent long-term predictive performance with 3-year and 5-year AUCs exceeding 0.7 in validation cohorts. High-risk PC patients exhibited elevated TMB and reduced infiltration of anti-tumor cytotoxic cells, specifically NK and CD8+ T cells. The model offered actionable therapeutic insights: high-risk patients showed resistance to Erlotinib and Oxaliplatin but increased sensitivity to 5-Fluorouracil. Five key prognostic genes were identified, including UBASH3B, a novel marker that exhibited a significant negative correlation with NK cell activation and appeared to mediate immune signaling and drug resistance, positioning it as a potential target for personalized therapy.Taken together, the convergence of ML/AI /biological research provides scientists with the algorithmic lenses necessary to filter complex, high-dimensional biological data into clinically actionable knowledge, moving the field rapidly toward precision medicine. These advancements promise a future where precision medicine, agricultural approaches, environmental impacts, etc. are informed by highly validated, robust, and reproducible computational frameworks, pushing the boundaries of discovery while upholding the highest standards of scientific ethics and rigor. This transformative collaboration is not just an incremental step but a fundamental leap towards solving the most challenging biological puzzles.

Summary

Main Finding

The editorial synthesizes five recent studies showing that ML/AI methods—ranging from ensemble classifiers and DNNs to multi‑omics integrative models and survival forests—are enabling more accurate, reproducible, and biologically actionable results across biomedical domains (diagnostics, cross‑omics prediction, and cancer prognosis). These advances also support ethical goals (3Rs) by reducing reliance on animal models.

Key Points

AI as an ethical and practical enabler: AI can support Replacement, Reduction, and Refinement of animal experiments by simulating biology and optimizing experiment design, but requires strict validation and ethical oversight.
High‑performance, parsimonious diagnostic pipelines: EPheClass (ensemble ML + preprocessing) achieved AUC = 0.973 for periodontal disease using only 13 microbial features; demonstrated generalization to IBD and antibiotic exposure classification.
Cross‑omics prediction: Deep neural networks outperformed linear LASSO for predicting miRNA from mRNA when the goal is to capture differential expression (DEA) changes; DNNs generalized across studies (log2FC correlation R ≈ 0.59) and benefited from Gaussian noise augmentation to mitigate small sample sizes.
Multi‑omics prognostic modeling in LUAD: Integration of transcriptomics, methylation, and mutation data with extensive clustering and ML combinations produced a prognostic ensemble (PIGRS: Lasso + GBM on 15 PIRGs) that outperformed most prior LUAD models; high PIGRS associated with genomic instability and immune escape; experimental knockdown of PSME3 validated functional relevance.
Pancreatic cancer prognostics and therapeutic insights: A Random Survival Forest model (CDRG‑RSF) based on innate immune barrier genes gave robust 3‑ and 5‑year AUCs > 0.7; identified UBASH3B as a prognostic/therapeutic marker linked to NK cell suppression and drug resistance.
Reproducibility and rigor emphasized: successful pipelines used appropriate compositional transforms (CLR), feature selection (RFE), ensemble/DES strategies, multi‑cohort validation, and experimental follow‑up.

Data & Methods

Data types: 16S rRNA compositional microbiome data; paired miRNA–mRNA expression datasets (WNV, HIV); multi‑omics cancer cohorts (transcriptome, DNA methylation, somatic mutations); survival/clinical outcome cohorts.
Preprocessing and feature engineering: Centered Log‑Ratio (CLR) transform for compositional data; Recursive Feature Elimination (RFE) to obtain parsimonious feature sets.
Models and algorithms:
- Ensembles: k‑nearest neighbours, Random Forests, SVM, XGBoost, Multilayer Perceptron; Dynamic Ensemble Selection‑Performance (DES‑P) used in EPheClass.
- Regularized linear models: LASSO.
- Deep learning: Deep Neural Networks (DNNs) for cross‑omics prediction; Gaussian noise augmentation to improve generalization.
- Multi‑omics integration: 10 clustering algorithms to define molecular subtypes; 10 ML algorithms combined into 101 unique pipelines to discover prognostic signatures.
- Prognostic models: PIGRS (Lasso + GBM ensemble) on 15 PIRGs; CDRG‑RSF (Random Survival Forest on risk genes).
Validation: cross‑study validation, multi‑cohort external validation, AUC and correlation metrics (e.g., AUC = 0.973; DEA log2FC correlation R = 0.59), survival AUCs > 0.7 at 3–5 years; experimental knockdown assays for functional confirmation (PSME3, UBASH3B).
Limitations noted: small/paired dataset scarcity for cross‑omics work, need for rigorous external validation and ethical controls.

Implications for AI Economics

R&D cost and productivity
- Reduced preclinical costs: validated in silico models that substitute or reduce animal studies (3Rs) can materially lower costs and shorten timelines in drug discovery and development.
- Faster biomarker discovery and patient stratification reduce expensive late‑stage trial failures, improving R&D ROI.
Market and investment signals
- High performance of compact models (e.g., AUC 0.973 with 13 features) increases commercial viability of targeted diagnostic products and point‑of‑care tools, attracting investment in translational AI startups.
- Demand for integrated multi‑omics platforms and validated ML pipelines creates market opportunities for data curation, model‑as‑a‑service, and regulatory‑compliant software.
Labor and skill composition
- Growing need for computational biologists, ML engineers, and data curators shifts hiring and training investments toward cross‑disciplinary skill sets; potential displacement of routine wet‑lab tasks but expansion of higher‑value experimental validation roles.
Data and infrastructure economics
- Value of high‑quality paired datasets is high—data sharing and standardized cohorts lower replication costs and facilitate model generalization; incentives (grants, data marketplaces) may be justified.
- Computational resource spending (cloud/GPU) increases but can be offset by reductions in physical lab costs; cost–benefit depends on model complexity and validation demands.
Regulatory and liability costs
- Strong validation requirements and clinical/regulatory pathways will shape timelines and compliance costs; models used to replace animal studies face high evidentiary standards.
- Liability and reimbursement frameworks for AI‑driven diagnostics/prognostics will influence adoption and pricing strategies.
Pricing and payer considerations
- Precision diagnostics and prognostic tools that stratify therapy can change payer math: better targeting reduces ineffective treatment costs but may raise short‑term spending on targeted therapies; health‑economic evaluations will be critical.
Innovation and diffusion dynamics
- Demonstrated generalizability (cross‑study validation, small‑feature sets) accelerates diffusion across healthcare providers and geographies; loss of performance outside validation cohorts remains an adoption risk.
Policy and public good
- Public support for dataset generation and model validation (to realize societal benefits like fewer animal experiments and better treatments) can produce positive externalities; appropriate regulation and transparency standards will influence market structure.

Concluding note: The editorial illustrates that validated, parsimonious AI models in biology do not just improve scientific outcomes—they reshape economic incentives along R&D, labor, data infrastructure, and regulatory dimensions. For economists and investors, the key levers are data quality/access, validation pathways, and alignment of reimbursement/regulatory frameworks with AI‑driven clinical value.

Assessment

Paper Typereview_meta Evidence Strengthmedium — Synthesizes multiple empirical studies that show strong predictive performance, external validation, and some experimental (functional) follow‑up, but most findings are predictive (not causal) and rely on relatively small, disease‑specific cohorts, data augmentation, and heterogeneous study designs that limit definitive inference about broad general effects. Methods Rigormedium — Methods across studies are generally sound (CLR for compositional data, RFE for parsimony, cross‑validation, external cohort testing, functional knockdown experiments), but common concerns remain: limited sample sizes, potential overfitting despite ensemble approaches, reliance on data augmentation for DNNs, heterogeneity across platforms/cohorts, and few randomized or prospective validations. SampleMultiple biomedical datasets spanning 16S rRNA microbiome (saliva) studies for periodontal disease/IBD/antibiotic exposure, paired miRNA–mRNA viral infection cohorts (WNV, HIV; seven datasets), and multi‑omics cancer cohorts (transcriptome, DNA methylation, somatic mutation calls, immune gene sets) with multiple external validation cohorts; experimental cell/functional assays for selected biomarkers (e.g., PSME3 knockdown). Themesproductivity innovation labor_markets adoption governance GeneralizabilitySmall and disease‑specific cohorts (periodontal disease, IBD, LUAD, pancreatic cancer) limit transferability across conditions, Cross‑study heterogeneity (platforms, cohorts, batch effects) may reduce out‑of‑sample performance, Use of data augmentation and complex ensembles may overstate robustness in truly independent clinical deployments, Clinical deployment requires prospective, randomized, and regulatory validation beyond retrospective/external cohort testing, Findings from cell‑line functional assays may not fully generalize to human in vivo biology

Claims (18)

Claim	Direction	Confidence	Outcome	Details
An AI‑powered pipeline (EPheClass) produced a parsimonious saliva microbiome classifier for periodontal disease with AUC = 0.973 using 13 features. Output Quality	positive	high	Classification AUC for periodontal disease (saliva)	AUC = 0.973 0.24
The same EPheClass approach produced successful parsimonious classifiers for IBD (26 features) and antibiotic exposure (22 features). Output Quality	positive	medium	Classification performance (AUC/accuracy) for IBD and antibiotic exposure	0.14
Applying centred log‑ratio (CLR) transformation and RFE to compositional microbiome data improves model parsimony and supports reproducibility in diagnostic classifiers. Output Quality	positive	medium	Number of features (parsimony) and classifier performance (AUC/reproducibility)	0.14
Dynamic Ensemble Selection‑Performance (DES‑P) produced parsimonious, high‑accuracy classifiers within the EPheClass pipeline. Output Quality	positive	medium	Classifier accuracy/AUC and model parsimony	0.14
Deep neural networks (DNNs) better captured cross‑study differential expression (DEA) signals when predicting miRNA from mRNA than sparse linear models (LASSO); for HIV the cross‑study log2 fold‑change (log2FC) correlation was approximately R ≈ 0.59 for the DNN approach. Output Quality	positive	high	Cross‑study correlation of predicted vs observed log2FC (DEA signal recovery)	n=7 R 0.59 0.24
Both DNNs and LASSO correlated well at the individual‑sample level, but linear models (LASSO) struggled to recover cross‑study DEA log2FCs despite good sample‑level fits. Output Quality	mixed	medium	Individual sample prediction correlation vs. cross‑study DEA log2FC recovery	n=7 0.14
Data augmentation with Gaussian noise improved DNN performance for small sample cross‑omics training sets. Output Quality	positive	medium	DNN predictive performance metrics (sample correlation, DEA log2FC correlation) after augmentation	0.14
Multi‑omics integration and consensus clustering (10 methods) in lung adenocarcinoma (LUAD) identified three molecular subtypes (CS1–CS3) with distinct prognoses. Output Quality	positive	medium	Molecular subtype membership and associated survival/prognosis differences	0.14
PIGRS prognostic model (LASSO + Gradient Boosting Machine ensemble using 15 programmed‑cell‑death immune genes) outperformed most published LUAD prognostic models. Output Quality	positive	medium	Prognostic performance (e.g., survival AUC, concordance) relative to published LUAD models	0.14
High PIGRS scores associate with genomic instability (higher tumor mutational burden and MATH heterogeneity scores) and immune‑escape signatures. Other	negative	medium	Tumor mutational burden (TMB), MATH score, immune‑escape signature measures	0.14
Experimental knockdown of PSME3 reduced proliferation and invasion and increased apoptosis in LUAD cells, implicating the PI3K/AKT/Bcl‑2 pathway as a mediator. Other	positive	high	Cell proliferation, invasion, apoptosis; downstream pathway activity (PI3K/AKT/Bcl‑2)	0.24
A Random Survival Forest built on curated cancer‑death‑related genes (CDRG‑RSF) achieved the best long‑term prognostic performance among 14 tested ML algorithms for pancreatic cancer, with 3‑ and 5‑year AUCs > 0.7. Output Quality	positive	high	3‑ and 5‑year survival AUC (prognostic accuracy)	3- and 5-year AUCs > 0.7 0.24
Patients classified as high‑risk by CDRG‑RSF had higher TMB, lower NK and CD8+ T cell infiltration, and model‑predicted resistance to Erlotinib and Oxaliplatin but sensitivity to 5‑fluorouracil. Other	mixed	medium	TMB, NK/CD8+ T cell infiltration estimates, predicted drug sensitivity/resistance	0.14
CDRG‑RSF identified five prognostic genes including UBASH3B, which is associated with reduced NK activation and may mediate drug resistance—making it a candidate therapeutic target. Other	positive	medium	Prognostic significance of genes; association with NK activation and predicted drug response	0.14
AI/ML methods can reduce reliance on animal models by simulating biology, optimizing experiments, and prioritizing candidate drugs—supporting the 3Rs (Replacement, Reduction, Refinement)—but this is contingent on rigorous validation and ethical oversight. Research Productivity	positive	medium	Potential reduction in animal use / improved ethical compliance (qualitative)	0.14
Widespread adoption of validated predictive models and curated multi‑omics datasets will shift R&D costs and productivity in biotech/pharma—reducing marginal costs of experiments, shortening timelines, and increasing returns to high‑quality data and models. Firm Productivity	positive	low	R&D marginal cost, development timelines, ROI (conceptual/economic)	0.07
Concentration of curated datasets and restrictive IP can create monopolistic rents and underprovision of public‑good datasets, implying policy interventions (data sharing incentives/standards) may be required. Market Structure	negative	low	Market concentration / data access (conceptual)	0.07
Techniques validated in these biomedical studies (compositional transforms, parsimonious ensemble pipelines, augmentation for small samples) are transferable to other biological domains such as agriculture and environmental monitoring. Adoption Rate	positive	low	Method transferability / performance in non‑medical biological applications (speculative)	0.07