An LLM-powered forecasting tool cut cafeteria forecast errors by roughly 30% for new dishes while keeping staff in control via override features; qualitative sessions show humans still crucial for unusual or high-uncertainty situations.

Schnitzel-Prediction: Designing Human-Ai Collaboration For Cafeteria Demand Forecasting

Justus Cappel, Timo Strohmann, Mara Burger, Marleen Voß, Jan vom Brocke · June 14, 2026 · Journal of the Association for Information Systems

openalex descriptive medium evidence 7/10 relevance Full text usable extracted full text Source PDF

Structured author observations

Linked only from stored provider relations; the raw author line above is never matched by name.

OpenAlex

Latest observation: July 23, 2026

Cappel, Justus exact ORCID
Strohmann, Timo exact ORCID
Burger, Mara provider ID
Voss, Marleen exact ORCID
vom Brocke, Jan provider ID

An LLM-enhanced collaborative forecasting system developed in a nine-month ADR reduced forecast error for novel cafeteria menu items by about 30% versus naïve baselines while retaining human override controls, though human judgment remained essential for high-uncertainty events.

Citation observations

Cumulative provider counts captured on specific dates; providers are never combined.

0 cumulative citations

OpenAlex · Observed July 22, 2026

View corpus context

Cafeteria demand planning requires both algorithmic pattern recognition and human expertise, yet current systems treat these separately, which generates significant food waste. This paper reports on a 9-month action design research (ADR) project at a German financial services firm. Using a practice-driven abductive approach, we developed a collaborative forecasting system that leverages semantic processing using large language models (LLMs) to solve the “cold-start” problem for novel menu items while preserving human agency via override mechanisms. Our evaluation combines algorithmic benchmarking, reducing forecast errors by 30% over naive baselines, with two think-aloud sessions showing that human judgment remains critical for high-uncertainty events. We distill our findings into a meta-design and four design principles (DPs), grounded in kernel theories, for systems where human contextual intelligence and algorithmic recognition must coexist. We contribute to the discourse on human-AI collaboration and sustainable IS by providing a rigorous blueprint for designing synergistic, trustworthy, and diagnostic operational planning tools.

Summary

Main Finding

A nine-month action design research project produced a human–AI cafeteria demand forecasting system that combines tree‑based machine learning (XGBoost) for pattern recognition with semantic processing via large language models (LLMs) to solve cold-starts for novel menu items, while preserving human agency through override and feedback mechanisms. The hybrid system reduced forecast errors by ~30% versus naive baselines and retained critical human judgment for high‑uncertainty events (shown in two think‑aloud sessions). The authors distill a meta‑design and four design principles (DPs) grounded in kernel theories for operational planning systems that seek synergy between algorithmic recognition and human contextual intelligence.

Key Points

Problem: Cafeteria demand forecasting requires both large‑scale pattern recognition and situated human contextual knowledge; prior approaches treated these separately, causing persistent food waste (~20% in many settings).
Methodology: Practice‑driven Action Design Research (ADR) over three iterative BIE cycles (build–intervene–evaluate) with practitioners at a German financial services association.
Core artifact: A collaborative forecasting system combining:
- XGBoost ensembles for demand prediction (feature engineering: temporal lags, calendar markers, contextual features),
- LLM‑based semantic processing/embeddings to map novel menu descriptions to historical analogues (cold‑start handling),
- User interface mechanisms for transparency, calibrated uncertainty, and human overrides/feedback.
Performance: Algorithmic benchmarking shows ~30% reduction in forecast errors compared to naive baselines; initial quick POC achieved ~65% accuracy on validation before refinements.
Human role: Two think‑aloud sessions indicate human planners remain essential for exceptional/high‑uncertainty events (e.g., one‑offs, construction, meetings). System design preserves agency and facilitates trust.
Contributions: Meta‑design and four empirically derived DPs (grounded in human–AI collaboration, explainability/trust, and sustainable IS literatures) for designing operational planning tools that integrate human expertise and AI.
Limitations noted: Single organization case, nine‑month horizon, context specificity (communal catering).

Data & Methods

Case: Large regional financial services association cafeteria (200–400 portions/day; wide demand variability, holidays/bridge days, events).
Dataset: Operational data covering demand, pricing, and menu information from 2022 onward (historical transaction/consumption records and contextual markers).
ADR process: Three iterative cycles from October 2024–June 2025:
Alpha 1 — problem scoping and simple statistical POC (65% validation accuracy).
Alpha 2 — prototype with XGBoost, richer features, added contextual variables.
Final — integrated system with LLM semantic processing for cold‑start items, UI for overrides, and feedback loops.
Algorithms & techniques:
- Primary forecasting: Gradient boosting (XGBoost) chosen for nonlinear patterns, mixed data handling, and interpretability (feature importance).
- Cold‑start: LLM semantic embeddings / natural language processing of menu text to link new dishes to historical analogues.
- UX elements: Explainability cues, uncertainty/calibration display, and manual override/feedback capture.
Evaluation:
- Algorithmic benchmarking against naive baselines (reported ~30% error reduction).
- Qualitative evaluation: Two think‑aloud user sessions to probe interaction, trust, and when humans override model output.
- Operational assessment: Early indications of reduced mismatch between production and demand; formal cost/waste accounting left for future work.

Implications for AI Economics

Direct economic gains: Improved forecasting accuracy reduces overproduction and food waste, yielding procurement cost savings and lower disposal costs. A reported ~30% error reduction implies meaningful reductions in variable food costs and waste‑related expenses (exact monetary gains require local costing).
Value of hybrid investments: The study highlights that combining ML with LLM semantic capabilities and human‑in‑the‑loop interfaces can deliver higher practical value than automation‑only or human‑only approaches. Economic returns depend on balancing spending on model development, LLM/embedding infrastructure, and investment in usable interfaces and governance.
Adoption and ROI depend on trust & agency: Economic benefits materialize only if frontline planners adopt the system. Design features that preserve control (overrides), provide calibrated uncertainty, and enable feedback accelerate adoption and thus the realization of savings.
Labour and human capital effects: The approach augments experienced planners rather than replacing them—reducing cognitive load and burnout risk while preserving tacit knowledge. Firms should account for potential reallocation of labor (less time forecasting, more time on exception handling and quality control).
Scalability & markets: Similar hybrid systems can be productized for broader communal catering, hospitals, universities, and corporate cafeterias. There is potential for subscription/enterprise software markets combining domain ML models with LLM‑based cold‑start modules.
Measurement recommendations for economic evaluation:
- Track reduction in meals wasted (units) and convert to procurement and disposal cost savings.
- Estimate emissions/ externality reductions for sustainability valuation (CO2e per kg food avoided).
- Calculate payback period: compare development/operational costs (models, LLM API or hosting, integration, UX) vs. monthly waste/cost savings.
- Consider A/B or randomized rollout to estimate causal impact on waste and costs.
Policy/externalities: Reduced food waste aligns with sustainability goals and may yield regulatory or reputational benefits (and possibly incentives), further improving the economic case.
Research agenda for AI economics:
- Formal cost‑benefit and sensitivity analyses across settings (small vs large cafeterias).
- Comparative studies of investment allocation (model accuracy vs explainability/UI) on adoption and economic returns.
- Market analysis for LLM‑augmented forecasting tools and pricing strategies that internalize sustainability externalities.

Limitations and next steps: single‑case ADR evidence—broader trials, randomized evaluations, and full economic accounting (waste volumes → € savings; emissions valuation) are needed to quantify generalizable economic impacts and inform deployment decisions.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper provides a mixed-methods evaluation: quantitative algorithmic benchmarking showing a ~30% reduction in forecasting error versus naïve baselines, plus qualitative think-aloud sessions that illuminate human-AI interaction. However, evidence is limited to a single-site ADR project, lacks randomized or quasi-experimental identification, uses small qualitative samples, and does not measure downstream economic outcomes (e.g., realized waste or cost savings) at scale. Methods Rigormedium — The action design research approach and combined quantitative/qualitative evaluation are appropriate for design-oriented contributions; benchmarking against baselines is a sensible quantitative test. Rigor is constrained by limited detail (unclear baseline strength, model specifications, and sample sizes), absence of controlled experiments or causal inference, small qualitative sample (two think-aloud sessions), and single-organization deployment limiting robustness checks. SampleNine-month action design research project at a German financial-services firm's cafeteria; used the firm's historical cafeteria demand/sales data and newly introduced menu items to develop and test an LLM-based semantic forecasting component, benchmarked against naïve forecasting baselines, and evaluated through two think-aloud sessions with human planners; no large-scale randomized or multi-site data reported. Themeshuman_ai_collab productivity GeneralizabilitySingle-site case study at one German financial firm limits external validity, Specific to corporate cafeteria/food-service context and novelty of menu items, Small qualitative sample (two think-aloud sessions) limits behavioral generalization, Model performance may depend on the particular LLM, tuning, and data preprocessing used, Benchmarks use naïve baselines—gains may shrink versus stronger forecasting models, Nine-month horizon may not capture longer-term seasonality or organizational adaptation, Design principles and override workflows may not transfer across different organizational cultures or operational processes

Claims (8)

Claim	Direction	Outcome	Confidence & Evidence	Details
Cafeteria demand planning requires both algorithmic pattern recognition and human expertise, yet current systems treat these separately, which generates significant food waste. Organizational Efficiency	negative	food waste	Reading fidelity high Study strength speculative	not reported 0.03
This paper reports on a 9-month action design research (ADR) project at a German financial services firm. Other	null_result	study duration and setting	Reading fidelity high Study strength high	not reported 0.3
We developed a collaborative forecasting system that leverages semantic processing using large language models (LLMs) to solve the 'cold-start' problem for novel menu items while preserving human agency via override mechanisms. Task Allocation	null_result	resolution of cold-start forecasting for novel menu items; preservation of human agency via overrides	Reading fidelity high Study strength medium	not reported 0.18
Algorithmic benchmarking reduced forecast errors by 30% over naive baselines. Error Rate	positive	forecast error	Reading fidelity high Study strength medium	30% reduction in forecast errors over naive baselines 0.18
Two think-aloud sessions show that human judgment remains critical for high-uncertainty events. Decision Quality	positive	importance/role of human judgment in handling high-uncertainty forecasting events	Reading fidelity high Study strength medium	n=2 0.18
We distill our findings into a meta-design and four design principles (DPs), grounded in kernel theories, for systems where human contextual intelligence and algorithmic recognition must coexist. Other	null_result	design principles and meta-design artifact	Reading fidelity high Study strength high	not reported 0.3
The paper provides a rigorous blueprint for designing synergistic, trustworthy, and diagnostic operational planning tools, contributing to the discourse on human-AI collaboration and sustainable information systems (IS). Organizational Efficiency	positive	guidance/blueprint for operational planning tool design	Reading fidelity high Study strength medium	not reported 0.18
The system preserves human agency via override mechanisms. Worker Satisfaction	positive	preservation of human agency (ability to override algorithmic forecasts)	Reading fidelity high Study strength medium	not reported 0.18