The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

AI-generated feedback drafts prompt teaching assistants to give substantially more and longer personalized feedback—up 10.8 percentage points and ~40 characters—while preserving usefulness and human control.

AI Assistance for Discretionary Work: Increasing Feedback Provision in Higher Education
Romina Mahinpei, Victoria Dean, Ruth Fong, Lydia T. Liu, Manoel Horta Ribeiro · June 02, 2026
arxiv rct medium evidence 7/10 relevance Source PDF
In a randomized field experiment in a 300-level ML course, providing TAs with AI-generated editable feedback drafts increased the probability of leaving feedback by 10.8 percentage points and raised feedback length by ~40 characters without reducing perceived usefulness or increasing time per character.

AI systems increasingly shape human workflows by generating intermediate artifacts that users can adopt, revise, or ignore. While prior work has shown that AI assistance can improve the efficiency and accuracy of required tasks, less is known about whether it can increase participation in discretionary but beneficial work that users often intend to perform but frequently skip. We study this question in the context of personalized feedback provision in higher education, a pedagogically valuable but often optional practice. We conduct a mixed-methods study combining a randomized field experiment and qualitative interviews in a 300-level machine learning course with n=11 teaching assistants (TAs) and n=88 students. Student submissions were randomly assigned to either (1) a treatment condition where TAs received AI-assisted feedback drafts after grading or (2) a control condition without drafts. TAs remained fully in control and could use, edit, or ignore drafts at their discretion. We find that AI-assisted feedback significantly increases feedback provision (+10.8 percentage points, SE=1.1, p<0.001) and feedback length (+39.8 chars, SE=3.45, p<0.001) without negatively affecting student usefulness ratings or reducing time per character. Qualitative findings suggest that AI-assisted drafts function as editable scaffolds that lower barriers to initiating feedback rather than reducing overall effort. Our findings highlight AI's promise for discretionary but beneficial tasks: increasing work that might otherwise go undone while preserving human control over final outcomes.

Summary

Main Finding

AI-generated feedback drafts shown to teaching assistants (TAs) increased discretionary feedback provision in a real university course: +10.81 percentage points in the probability a student submission received feedback (SE = 1.10, p < 0.001) and +39.79 characters in feedback length (SE = 3.45, p < 0.001), without changing time spent per character (0.29 s/char, SE = 0.35, p = 0.41) or reducing student-rated usefulness (difference −0.01, SE = 0.06, p = 0.88). Qualitative interviews indicate the mechanism is lowered initiation costs (editable scaffolds), not substitution of human effort.

Authors / source: Mahinpei et al., “AI Assistance for Discretionary Work: Increasing Feedback Provision in Higher Education,” arXiv:2606.03095v1 (June 2, 2026).

Key Points

  • Experimental design: randomized field experiment at the question-submission level in a 300-level machine learning course.
  • Sample: 11 TAs and 88 students; student submissions randomized to treatment (AI-assisted draft shown after grading) or control (no draft).
  • Intervention: LLM-backed Chrome extension (o4-mini reported for generation) that surfaced a personalized feedback draft after the TA graded; TAs could use, edit, or ignore the draft (full human control).
  • Primary quantitative outcomes:
    • Feedback provision (binary): +10.81 pp (SE 1.10, p < 0.001).
    • Feedback length: +39.79 characters (SE 3.45, p < 0.001).
    • Time per character: no significant change (0.29 s/char, SE 0.35, p = 0.41).
    • Student usefulness rating: no detectable difference (−0.01, SE 0.06, p = 0.88).
  • Mechanism from interviews: drafts act as personalized starting points that reduce the barrier to initiating feedback (verification/help with wording/tone), rather than replacing human judgment or effort.
  • Behavioral pattern: TAs treated drafts as editable artifacts; AI increased frequency and length of discretionary feedback while preserving student experience and human oversight.
  • Contributions highlighted by authors:
  • AI can increase performance of optional but socially valuable work.
  • The main effect operates via task initiation, not per-unit effort reduction.
  • Human–AI collaboration observed as adaptation to intermediate artifacts.
  • Design tradeoff: preserving oversight limits how much AI can reduce total human effort.

Data & Methods

  • Mixed-methods approach:
    • Randomized field experiment: question-level randomization of student submissions to treatment vs control during the semester.
    • Behavioral logs recording whether feedback was sent, content length, timestamps.
    • TA and student surveys measuring perceived usefulness and TA perceptions of drafts.
    • Semi-structured qualitative interviews with TAs and students to probe mechanisms and experiences.
  • Tools & integration:
    • Lightweight LLM Chrome extension integrated into the grading platform (Gradescope), generating drafts after grading.
    • Model selection and prompt iteration involved instructors and TAs (formative study with multiple models).
  • Statistical inference:
    • Reported point estimates with standard errors and p-values for primary outcomes (above).
    • Analysis interprets no change in time-per-character as evidence that AI lowered initiation fixed costs rather than reducing marginal effort.
  • Limitations noted by authors:
    • Single course setting (300-level ML), limited sample of TAs (n=11) and students (n=88) — external validity concerns.
    • Short-term effects observed during a single semester; longer-run behavioral adaptation not measured.
    • Use of a specific LLM and integration; results may vary with different models, prompt designs, or platform contexts.
    • Potential risks (hallucinations, pedagogical appropriateness) mitigated by keeping human oversight, but not eliminated.

Implications for AI Economics

  • Labor supply for discretionary tasks:
    • AI can increase the supply of un-compensated or low-incentive “invisible”/discretionary labor (mentoring, feedback, documentation) by lowering fixed initiation costs. This is distinct from pure substitution of labor hours — it can unlock tasks that would otherwise be skipped.
    • Modeling implication: production functions that separate (a) probability a discretionary task is attempted and (b) effort conditional on attempt. AI shifts the extensive margin (attempt probability) more than the intensive margin (effort per unit).
  • Complementarity vs substitution:
    • Evidence of complementarity: TAs retained control and used AI outputs as inputs, suggesting AI augments human capability rather than replaces it in this setting. Economically, AI acts as a productivity-enhancing capital good for initiating discretionary tasks.
    • But preserving oversight constrains cost savings; where oversight is relaxed, substitution effects could be larger.
  • Incentives, contracts, and platform design:
    • Organizations (universities, firms, open-source projects) could deploy AI assistants to increase performance of socially valuable but underprovided activities. Design choices (editable drafts, human-in-the-loop) matter for worker acceptance and outcome quality.
    • For labor contracting and compensation, AI may alter how managers measure contributions (more visible feedback), which could affect recognition, pay, and allocation of time across tasks.
  • Welfare and scaling:
    • Potential welfare gains from increased provision of high-value but under-supplied activities (improved student learning, better documentation, mentoring).
    • Scaling considerations: widespread adoption could change equilibrium of time allocation — if AI reduces the barrier to providing unpaid helpful work, aggregate quality could rise without proportional increases in TA labor hours; but long-run general equilibrium effects on wages, hiring, and role definitions are open questions.
  • Measurement and policy:
    • Economic evaluations should measure both extensive and intensive margins of activity and capture student/recipient welfare, not only time saved.
    • Regulatory/policy considerations include transparency (disclosure of AI assistance), liability for errors/hallucinations, and training/standards for safe use.
  • Research priorities for AI economics:
    • Quantify general equilibrium effects when AI shifts previously invisible labor into observable outputs (e.g., credit allocation, compensation dynamics).
    • Study heterogeneity: tasks with different initiation costs, observability, and accountability constraints may respond differently.
    • Long-term dynamics: persistence of effects, changes in intrinsic motivation, and potential crowding-out or upskilling of human workers.
    • Cost–benefit analysis comparing fully automated vs human-in-the-loop designs across safety, accuracy, and labor-market outcomes.

Overall takeaway for AI economists: this study provides causal evidence that LLM-based assistance can increase the incidence of valuable discretionary work by lowering initiation costs while preserving human control. That mechanism has distinct implications for modeling AI’s impact on labor supply, task composition, and welfare beyond the classic productivity/substitution framing.

Assessment

Paper Typerct Evidence Strengthmedium — Internal causal identification is strong because of randomized assignment at the submission level, producing precise, statistically significant effects; however, the small number of TAs (n=11), potential contamination/spillovers (TAs seeing both treated and control submissions), single-course setting, and short-term outcomes limit confidence in external validity and the breadth of inference. Methods Rigormedium — The study uses a randomized field experiment with quantitative outcome measures and complementary qualitative interviews, and reports effect sizes and standard errors, which is good practice; but the paper provides limited detail (in the summary) about clustering adjustment for TA-level dependence, pre-registration/power calculations, balance checks, handling of multiple submissions per student, and robustness checks, and the small TA sample constrains inference about heterogeneity. SampleField experiment in a single 300-level machine learning course with 88 students and 11 teaching assistants; randomization occurred at the student-submission level; quantitative outcomes include whether feedback was provided, feedback length (characters), student usefulness ratings, and time per character; qualitative semi-structured interviews conducted with TAs to probe mechanisms. Themeshuman_ai_collab skills_training IdentificationRandomized controlled trial: student submissions were randomly assigned (submission-level randomization) to a treatment arm where TAs received AI-generated editable feedback drafts after grading versus a control arm with no drafts; TAs retained full discretion to edit, use, or ignore drafts. Outcomes (feedback provision, length, student usefulness ratings, time per character) were compared across randomized groups. GeneralizabilitySingle course at a single institution—results may not generalize to other subjects, institutions, or non-academic workplaces, Small number of TAs (n=11) limits inference about instructor heterogeneity and possible clustering effects, TAs likely tech-savvy (machine learning course), so uptake of AI drafts may be higher than in other populations, Short-term outcomes (feedback provision and length) — unclear effects on student learning, long-run behavior, or productivity outside education, Potential novelty/Hawthorne effects from introducing AI assistance may attenuate over time, Results depend on the specific AI model and prompt design used; different models or lower-quality drafts could change effects

Claims (7)

ClaimDirectionConfidenceOutcomeDetails
AI-assisted feedback significantly increases feedback provision by 10.8 percentage points. Task Allocation positive high feedback provision (whether feedback was provided)
n=88
10.8 percentage points
1.0
AI-assisted feedback increases feedback length by 39.8 characters. Output Quality positive high feedback length (number of characters)
n=88
39.8 chars
1.0
AI-assisted feedback does not negatively affect student usefulness ratings. Output Quality null_result high student usefulness ratings of feedback
n=88
0.6
AI-assisted feedback does not reduce time per character (i.e., it does not increase time cost per unit of feedback). Task Completion Time null_result high time per character (effort per unit of feedback)
n=88
0.6
Qualitative findings indicate AI-assisted drafts function as editable scaffolds that lower barriers to initiating feedback rather than reducing overall effort. Task Allocation positive high perceived barriers to initiating feedback / perceived TA effort
n=11
0.6
TAs remained fully in control and could use, edit, or ignore AI-generated drafts at their discretion. Other positive high degree of human control over AI-generated artifacts (procedural/design feature)
n=11
1.0
AI assistance shows promise for increasing discretionary but beneficial work (tasks users intend but often skip) while preserving human control over final outcomes. Task Allocation positive medium participation in discretionary beneficial tasks (feedback provision) and preservation of human control
n=88
0.36

Notes