Generative AI promises scalable, individualized feedback that could cut costs and boost learning at scale, but real benefits depend on reliable pedagogical alignment, careful evaluation, and governance; without quality controls and equitable access, AI feedback risks misleading students and widening educational inequalities.

The Future of Feedback: How Can AI Help Transform Feedback to Be More Engaging, Effective, and Scalable?

Jennifer Meyer, Olaf Köller, Thorben Jansen, Johanna Fleckenstein, Michael W. Asher, Sarah Bichler, Laura Brandl, Jasmin Breitwieser, Kai S. Cortina, Mutlu Cukurova, Martin Daumiller, Hannah Deininger, Frank Fischer, Dragan Gašević, Jeanine Grütter, Anna Hilz, Ioana Jivet, Jelena Jovanović, Rene F. Kizilcec, Livia Kuklick, Marlit Annalena Lindner, Anastasiya Lipnevich, Ute Mertens, Detmar Meurers, Kou Murayama, Tanya Nazaretsky, Knut Neumann, Ernesto Panadero, Maciej Pankiewicz, Zachary A. Pardos, Chris Piech, Hannah Pünjer, Nikol Rummel, Marlene Steinbach, Olga Viberg, Naomi Winstone · March 12, 2026

arxiv descriptive n/a evidence 7/10 relevance Source PDF

Experts find that generative-AI feedback can deliver scalable, personalized, real-time learning supports but its net educational value depends on feedback quality, pedagogical alignment, teacher integration, and governance to manage equity and privacy risks.

With digital learning environments becoming more prevalent, the ease with which generative AI enables the scalable production of real-time, automated feedback holds the potential to reshape learning and teaching experiences. This meeting report synthesizes the interdisciplinary perspectives of 50 scholars from educational psychology, computer science, science education, and the learning sciences on the use of generative AI for feedback and its promises and risks in educational practice. We highlight points of convergence in the scholarship, identify areas of debate and unresolved challenges, and outline open questions and future directions for research and educational practice that emerged from structured small-group activities designed to bridge disciplinary barriers.

Summary

Main Finding

A multidisciplinary group of 50 scholars concluded that generative AI–driven feedback in digital learning environments offers large, scalable promises for personalized, real-time feedback but also poses substantive risks and unresolved challenges. The net educational value depends on how well AI feedback is aligned with pedagogical goals, evaluated for quality, integrated with human teaching, and governed to manage equity, privacy, and incentives.

Key Points

Promises
- Scalability: generative AI can produce real-time, individualized feedback at scale, potentially reducing per-student feedback costs and increasing feedback frequency.
- Personalization: models can tailor explanations, scaffolding, and practice to learners’ current states and preferences.
- Timeliness & engagement: immediate feedback may sustain momentum and improve formative assessment cycles.
- New modalities: AI can generate diverse feedback formats (text, hints, worked examples, formative prompts) adaptable to content and learner needs.
Risks and challenges
- Quality & validity: AI feedback can be incorrect, misleading, or misaligned with learning objectives; assessing feedback quality is nontrivial.
- Pedagogical fit: automated feedback may not capture nuance that expert teachers use (motivation, socio-emotional cues, complex reasoning).
- Over-reliance and gaming: learners may over-rely on AI, reducing effortful learning or gaming systems to get desirable responses.
- Equity & access: differential access to high-quality systems can exacerbate educational inequalities; bias in training data can harm marginalized groups.
- Privacy & data governance: extensive learner data needed to personalize feedback raises privacy and consent concerns.
- Teacher roles & labor impacts: unclear effects on teacher workload, skills required, and labor demand — potential for both complementarity and substitution.
Areas of debate / unresolved questions
- How to measure whether AI feedback produces durable learning gains vs. transient performance improvements.
- Optimal mixes of automated and human feedback across contexts and learner ages.
- Standards for evaluating feedback fidelity, interpretability, and fairness.
- Institutional and incentive structures needed to promote safe, effective adoption.

Data & Methods

Nature of the evidence: the document is a structured meeting report synthesizing expert perspectives, not an empirical study. No primary experimental or observational data are presented.
Participants & approach:
- Interdisciplinary workshop: 50 scholars from educational psychology, computer science, science education, and learning sciences.
- Structured small-group activities designed to surface cross-disciplinary views and identify consensus, debates, and open questions.
- Synthesis approach: qualitative thematic extraction of convergent points, tensions, and future research/practice directions.
Limitations:
- Expert-opinion synthesis lacks systematic empirical estimates, causal identification, or quantitative cost/benefit analysis.
- Potential contributor selection bias and disciplinary framing may shape identified priorities and risks.

Implications for AI Economics

Productivity & cost structures
- Potential to lower marginal costs of high-quality feedback and increase throughput of instruction delivery; important to model per-student cost declines and fixed vs. variable cost shifts.
- Need economic estimates of scalability benefits (e.g., cost per hour of effective feedback), accounting for implementation, monitoring, and content adaptation costs.
Labor market effects
- Complementarity vs. substitution: AI feedback may augment teacher productivity (allowing focus on higher-order tasks) or substitute for some routine feedback tasks, altering demand for different teacher skills.
- Re-skilling and task reallocation: economic models should consider transition costs, training needs, and effects on wages/ employment in education sectors.
Distributional impacts & inequality
- Differential adoption across regions and institutions could widen human-capital gaps. Economists should analyze access constraints, pricing models, and public provision options.
- Assess how algorithmic bias and differential effectiveness by subgroup could change long-run earnings inequality via human capital pathways.
Incentives, quality assurance, and market design
- Incentive problems: providers may optimize for engagement or test performance rather than learning; regulators or purchasers must design contracts and metrics that align provider incentives with educational outcomes.
- Market concentration risk: capabilities and data advantages could lead to platform dominance; antitrust and open-data considerations matter for competition and innovation.
Research & evaluation priorities for economists
- Rigorous impact evaluation: randomized controlled trials and long-run follow-ups to estimate causal effects on learning, attainment, and downstream earnings.
- Cost-effectiveness and ROI studies comparing AI-enabled feedback to alternative interventions (tutoring, teacher professional development).
- Structural and equilibrium models: simulate adoption dynamics, labor reallocation, pricing, and distributional consequences across schooling systems.
- Measurement innovations: develop validated metrics for feedback quality, learning durability, and noncognitive outcomes affected by feedback.
- Policy experiments: evaluate subsidy, procurement, and regulation strategies (privacy rules, transparency mandates, quality standards) to guide public-sector adoption.

Actionable research next steps for AI economists: combine field RCTs of AI-feedback deployments with microdata on teacher time use and costs; build models of adoption under heterogeneous school budgets; and estimate long-run returns to AI-augmented learning to inform procurement and regulatory choices.

Assessment

Paper Typedescriptive Evidence Strengthn/a — This is a structured expert-opinion synthesis without primary experimental or observational data and therefore provides no empirical or causal estimates to evaluate evidence strength. Methods Rigormedium — The report draws on a multidisciplinary, structured workshop (50 scholars) and uses qualitative thematic synthesis, which supports breadth and informed judgment; however, it lacks systematic literature review procedures, pre-registered protocols, empirical data, and formal bias mitigation, limiting reproducibility and inferential rigor. SampleA purposive, interdisciplinary workshop of about 50 scholars from educational psychology, computer science, science education, and the learning sciences who participated in structured small-group activities; synthesis is qualitative and based on participant discussions and consensus-building rather than empirical sampling or data collection. Themesskills_training productivity labor_markets inequality adoption governance human_ai_collab GeneralizabilityNot empirical: conclusions reflect expert judgment rather than representative data or causal estimates., Selection bias: participant composition and disciplinary framing may shape priorities and risks identified., Context bias: perspectives may reflect high-income, well-resourced education systems and current-generation models., Technology evolution: findings are contingent on rapidly changing generative-AI capabilities and deployment practices., Stakeholder coverage: limited direct input from frontline teachers, students, school administrators, and low-resource contexts.

Claims (15)

Claim	Direction	Confidence	Outcome	Details
Generative AI can produce real-time, individualized feedback at scale, potentially reducing per-student feedback costs and increasing feedback frequency. Consumer Welfare	positive	medium	per-student feedback cost; feedback frequency; scalability of feedback delivery	n=50 0.02
Large language and generative models can tailor explanations, scaffolding, and practice to learners' current states and preferences (personalization). Skill Acquisition	positive	medium	degree of personalization (alignment of feedback to learner state/preferences); quality of tailored explanations/scaffolds	n=50 0.02
Immediate AI-generated feedback may sustain learner momentum and improve formative assessment cycles (timeliness & engagement). Skill Acquisition	positive	medium	learner engagement; tempo of formative assessment cycles; short-term task completion rates	n=50 0.02
Generative AI can enable new feedback modalities (text, hints, worked examples, formative prompts) adaptable to content and learner needs. Innovation Output	positive	medium	variety of feedback modalities produced; adaptability of modality to content/learner needs	n=50 0.02
AI-generated feedback can be incorrect, misleading, or misaligned with learning objectives; assessing feedback quality is nontrivial. Output Quality	negative	high	feedback factual correctness; alignment with stated learning objectives; rate of misleading/incorrect feedback	n=50 0.03
Automated feedback may not capture pedagogical nuances expert teachers use (motivation, socio-emotional cues, complex reasoning), limiting pedagogical fit. Output Quality	negative	high	coverage of socio-emotional and complex-reasoning cues in feedback; correspondence with teacher judgments	n=50 0.03
Learners may over-rely on AI feedback or game systems to obtain desirable responses, reducing effortful learning. Skill Acquisition	negative	medium	learner reliance on AI (usage patterns); changes in effortful learning behaviors; incidence of gaming behaviors	n=50 0.02
Differential access to high-quality AI feedback systems and bias in training data can exacerbate educational inequalities and harm marginalized groups. Inequality	negative	medium	access disparities; differential effectiveness by subgroup; measures of algorithmic bias impacting learning outcomes	n=50 0.02
Extensive learner data needed to personalize AI feedback raises privacy and data-governance concerns (consent, storage, usage). Governance And Regulation	negative	high	volume/type of learner data collected; privacy risk indicators; compliance with consent/data-governance standards	n=50 0.03
The net educational value of AI-generated feedback depends on alignment with pedagogical goals, quality evaluation, integration with human teaching, and governance to manage equity, privacy, and incentives. Skill Acquisition	mixed	high	net educational value (composite of learning outcomes, equity metrics, privacy compliance, teacher integration measures)	n=50 0.03
AI feedback may either augment teacher productivity (complementarity) or substitute for routine teacher feedback tasks (substitution), with unclear net labor impacts. Employment	mixed	medium	teacher time allocation; demand for teacher skills; employment levels in education; productivity measures	n=50 0.02
Adoption of AI feedback could lower marginal costs of delivering high-quality feedback and change fixed vs. variable cost structures for instruction delivery. Firm Productivity	positive	medium	marginal cost per unit of feedback; changes in fixed/variable cost composition	n=50 0.02
Provider incentives may be misaligned (e.g., optimizing for engagement or test performance instead of durable learning), requiring contracts, regulation, or purchaser design to align incentives. Governance And Regulation	negative	high	provider optimization metrics (engagement/test performance) vs. durable learning outcomes; presence/absence of aligned procurement/regulatory mechanisms	n=50 0.03
Capabilities and data advantages for certain vendors could lead to market concentration and platform dominance in AI-driven educational feedback. Market Structure	negative	medium	market concentration measures (market share, Herfindahl index); entry barriers; data-asset concentration	n=50 0.02
Rigorous research priorities include randomized controlled trials with long-run follow-ups, cost-effectiveness studies, structural adoption models, and validated metrics for feedback quality and learning durability. Research Productivity	positive	high	existence and quality of RCTs and long-run studies; availability of validated metrics for feedback quality and learning durability	n=50 0.03