A machine‑learning system is reported to raise retained food after harvest by 3.42% on Indian farms at no extra cost and claims near‑perfect prediction (R²=0.999); however, opaque data provenance and absent out‑of‑sample or field validation make the result fragile and potentially misleading.

AI in food inequality: Leveraging artificial intelligence to predict food waste in agriculture and post-harvesting

Akshit Erukulla, Swanish Baweja, Amol Sriprasadh, A. Paul, Aaron Sethi · Fetched March 12, 2026 · STEM Fellowship Journal

semantic_scholar correlational low evidence 7/10 relevance Summary only summary available; pdf_status=pending DOI Source

Using gradient‑boosting on proprietary Indian farm data the authors report a near‑perfect predictive model (R² = 0.999) that yields ML‑guided practice recommendations increasing retained post‑harvest food by 3.42% at no extra cost versus modern methods, but key methodological details and external validation are missing.

Food disparity is an international trend, driven by inefficiencies in poorly developed food distribution and agricultural infrastructure. An FAO and Kaggle Datasets study estimates post-harvest losses as intervention points with global median losses at 19.8%. India, as a major producer of most food commodities in agriculture, has relatively low post-harvest losses (3.2%), yet suffers from chronic hunger, as is clear from its 111/125 ranking on the Global Hunger Index [9][13]. This paradox of high production but low consumer supply outcome emphasizes the need for a critical analysis of India. This study utilized machine learning (ML) models in the form of gradient boosting regression to analyze Indian farm data, including such variables as pesticide, fertilizer, farm size, crop type, harvest date, and climatic conditions. The optimal model had an R 2 measure of 0.999 in predicting best farming practice based on local conditions. The optimization model increased food retention after post-harvest by 3.42% over modern methods, bringing food into the supply chain at no extra cost. Lastly, these findings present actionable recommendations to future agricultural policy while also offering practical solutions to regions facing analogous food security concerns.

Summary

Main Finding

Using gradient-boosting regression on Indian farm-level data, the study identifies locally optimized farming and post‑harvest practices that (a) increase retained food entering the supply chain by 3.42% relative to modern methods at no extra cost, and (b) can be predicted extremely accurately by the ML model (reported R² = 0.999). The work situates these gains against a global context of high post‑harvest losses (FAO/Kaggle median 19.8%) and India’s paradox of low reported post‑harvest loss (3.2%) yet poor food‑security outcomes (Global Hunger Index rank 111/125).

Key Points

Global median post‑harvest losses are around 19.8% (FAO & Kaggle datasets); India’s reported post‑harvest loss is relatively low (3.2%) despite high rates of hunger.
The paper frames post‑harvest loss reduction as a high‑leverage intervention point for improving food availability.
Features used in modeling include pesticide/fertilizer use, farm size, crop type, harvest date, and climatic variables.
The chosen ML technique is gradient boosting regression; the “optimal” model reportedly achieved R² = 0.999 for predicting best local farming practice.
The optimization module (applied recommendations) is reported to increase food retention after harvest by 3.42% relative to modern methods, without increasing cost.
Authors argue the results yield practical, low‑cost policy recommendations and interventions that can be applied to regions with similar food‑security profiles.

Data & Methods

Data sources: FAO and Kaggle datasets referenced for global context; proprietary/field Indian farm dataset for modeling (variables listed above). The paper does not report (or the summary omits) sample size and full provenance of the Indian dataset.
Modeling approach: gradient boosting regression to predict “best farming practice” conditional on local inputs (farm attributes and weather/climate).
Performance: reported R² = 0.999 for the optimal model; optimization yields a 3.42% improvement in retained food post‑harvest vs. modern methods.
Claimed cost implication: improved retention enters the supply chain “at no extra cost.”
Missing/unclear methodological details (from the summary): training/test split, cross‑validation scheme, hyperparameter tuning, treatment of confounders or endogeneity, exact definition/measurement of outcome (how “retained food” is measured), and whether results were validated out‑of‑sample or in field trials.

Implications for AI Economics

Targeting inefficiencies: ML recommendations that modestly raise post‑harvest retention can meaningfully increase effective supply without expanding production—potentially a high return on relatively small operational changes.
Resource allocation: Findings suggest public and private investments might yield larger welfare gains if shifted toward distribution, storage, and locally tailored post‑harvest practices rather than only boosting aggregate production.
Cost‑effectiveness: The reported “no extra cost” improvement implies favorable cost‑benefit for policy adoption, but the claim depends on robust measurement and true accounting for implementation/transaction costs.
Scaling and adoption: Practical impact depends on adoption rates, extension services, local capacity to implement ML recommendations, and farmer incentives; AI tools must be integrated with delivery mechanisms (training, equipment, supply‑chain contracts).
Equity and distributional effects: Gains in aggregate retained food do not automatically resolve access and affordability problems; policy design must consider market dynamics, price effects, and marginalized groups.
Model risk and external validity: Extremely high predictive performance (R² = 0.999) raises concerns about overfitting, data leakage, or measurement artifacts. Economic policy built on such models should demand transparency, out‑of‑sample validation, and field trials.
Research priorities for AI economists: rigorous cost‑effectiveness analysis, randomized/controlled field validation of ML-guided interventions, studies of adoption frictions, and exploration of how improved retention affects local markets and welfare.

Notes for readers: The summary reflects the paper’s reported results but the unusually high model fit and the absence of some methodological details in the available summary suggest careful scrutiny (replication, robustness checks, and field validation) before using these findings to shape large‑scale policy or investment decisions.

Assessment

Paper Typecorrelational Evidence Strengthlow — The study presents predictive ML results and a reported improvement in retained food, but provides no causal identification (no randomization or quasi‑experimental design), omits key methodological details and sample provenance, and reports an implausibly high R² (0.999) suggesting overfitting or data leakage; the claimed impact (3.42% increase at no extra cost) is not validated out‑of‑sample or in field trials. Methods Rigorlow — Essential methodological information is missing or unclear (sample size and sampling frame, definition/measurement of outcome, train/test split or cross‑validation, hyperparameter tuning, treatment of confounders, measures to prevent data leakage), and there is no reported out‑of‑sample or randomized validation to support the optimisation/impact claim. SampleProprietary Indian farm‑level dataset (provenance, geographic/temporal scope, and sample size not reported) with features such as pesticide/fertilizer use, farm size, crop type, harvest date, and climatic variables; FAO and Kaggle global datasets are cited for contextual benchmarks on post‑harvest losses. Themesproductivity adoption GeneralizabilityUnclear geographic/temporal coverage of the proprietary sample limits external validity beyond the observed farms, Unknown sample selection and potential sampling bias (non‑representative farms, aggregator/provider effects), Results may not generalize across crops, storage infrastructures, or supply‑chain institutions, Potential overfitting/data‑leakage undermines out‑of‑sample predictiveness, 'No extra cost' claim may not hold when implementation, transaction, or adoption costs are fully accounted for, Behavioral and institutional constraints (adoption rates, extension services) not modeled, limiting scalability

Claims (14)

Claim	Direction	Outcome	Confidence & Evidence	Details
Locally optimized farming and post-harvest practices increase retained food entering the supply chain by 3.42% relative to modern methods at no extra cost. Consumer Welfare	positive	retained food entering the supply chain (percent increase)	Reading fidelity medium Study strength low	3.42% 0.09
The ML model can predict the best local farming practice extremely accurately, reported R² = 0.999. Other	positive	model predictive performance (R²)	Reading fidelity medium Study strength low	R^2 = 0.999 0.09
Global median post-harvest losses are around 19.8% (FAO & Kaggle datasets). Consumer Welfare	negative	post-harvest loss (percent, global median)	Reading fidelity high Study strength low	19.8% 0.15
India’s reported post-harvest loss is relatively low (3.2%) despite poor food-security outcomes (Global Hunger Index rank 111/125). Consumer Welfare	mixed	post-harvest loss (percent) and Global Hunger Index rank	Reading fidelity high Study strength low	3.2% / rank 111/125 0.15
Features used in modeling include pesticide/fertilizer use, farm size, crop type, harvest date, and climatic variables. Other	null_result	predictor variables used in the ML model (feature list)	Reading fidelity high Study strength low	not reported 0.15
The chosen ML technique is gradient boosting regression. Other	null_result	modeling technique used	Reading fidelity high Study strength low	not reported 0.15
Data sources used are FAO and Kaggle datasets for global context and a proprietary/field Indian farm dataset for modeling. Other	null_result	data provenance/source	Reading fidelity high Study strength low	not reported 0.15
The paper does not report (or the summary omits) the sample size and full provenance of the Indian farm dataset. Research Productivity	null_result	reporting completeness for dataset (sample size/provenance)	Reading fidelity high Study strength low	not reported 0.15
Key methodological details are missing or not reported: training/test split, cross-validation scheme, hyperparameter tuning, treatment of confounders/endogeneity, exact definition/measurement of the outcome, and whether results were validated out-of-sample or in field trials. Research Productivity	null_result	methodological reporting completeness	Reading fidelity high Study strength low	not reported 0.15
The optimization recommendations can be implemented without increasing cost ('no extra cost'), implying favorable cost-effectiveness for adoption. Adoption Rate	positive	implementation cost implication (claimed no additional cost)	Reading fidelity medium Study strength low	no extra cost (claimed) 0.09
The authors argue the results yield practical, low-cost policy recommendations and interventions that can be applied to regions with similar food-security profiles. Governance And Regulation	positive	policy applicability / feasibility (qualitative claim)	Reading fidelity medium Study strength low	not reported 0.09
The paper frames post-harvest loss reduction as a high-leverage intervention point for improving food availability. Consumer Welfare	positive	policy priority framing (conceptual claim)	Reading fidelity medium Study strength low	not reported 0.09
The authors recommend further research priorities for AI economists: rigorous cost-effectiveness analysis, randomized/controlled field validation of ML-guided interventions, studies of adoption frictions, and exploration of market/welfare effects. Research Productivity	positive	recommended research agenda (qualitative)	Reading fidelity medium Study strength low	not reported 0.09
Extremely high reported model performance (R² = 0.999) raises concerns about overfitting, data leakage, or measurement artifacts and the need for transparency, out-of-sample validation, and field trials. Research Productivity	negative	model robustness / external validity concerns (qualitative)	Reading fidelity medium Study strength low	not reported 0.09

Entities

Gradient Boosting Regression (method) Proprietary Indian farm-level dataset (field data) (dataset) Post-harvest retained food entering the supply chain (retention) (outcome) Post-harvest loss (outcome) India (Indian farming population) (population) Farmers (smallholder/field-level farmers) (population) Optimization module for locally tailored recommendations (method) Model performance: R-squared = 0.999 (outcome) Modern farming/post-harvest methods (baseline) (method) FAO datasets (Food and Agriculture Organization) (dataset) Food availability and food-security outcomes (outcome) Kaggle datasets (platform-hosted data) (dataset) Global Hunger Index (GHI) (institution) Global median post-harvest loss = 19.8% (FAO/Kaggle) (outcome) No additional implementation cost (claimed) (outcome)