AI-generated feedback drafts prompt teaching assistants to give substantially more and longer personalized feedback—up 10.8 percentage points and ~40 characters—while preserving usefulness and human control.
AI systems increasingly shape human workflows by generating intermediate artifacts that users can adopt, revise, or ignore. While prior work has shown that AI assistance can improve the efficiency and accuracy of required tasks, less is known about whether it can increase participation in discretionary but beneficial work that users often intend to perform but frequently skip. We study this question in the context of personalized feedback provision in higher education, a pedagogically valuable but often optional practice. We conduct a mixed-methods study combining a randomized field experiment and qualitative interviews in a 300-level machine learning course with n=11 teaching assistants (TAs) and n=88 students. Student submissions were randomly assigned to either (1) a treatment condition where TAs received AI-assisted feedback drafts after grading or (2) a control condition without drafts. TAs remained fully in control and could use, edit, or ignore drafts at their discretion. We find that AI-assisted feedback significantly increases feedback provision (+10.8 percentage points, SE=1.1, p<0.001) and feedback length (+39.8 chars, SE=3.45, p<0.001) without negatively affecting student usefulness ratings or reducing time per character. Qualitative findings suggest that AI-assisted drafts function as editable scaffolds that lower barriers to initiating feedback rather than reducing overall effort. Our findings highlight AI's promise for discretionary but beneficial tasks: increasing work that might otherwise go undone while preserving human control over final outcomes.
Summary
Main Finding
AI-generated feedback drafts shown to teaching assistants (TAs) increased discretionary feedback provision in a real university course: +10.81 percentage points in the probability a student submission received feedback (SE = 1.10, p < 0.001) and +39.79 characters in feedback length (SE = 3.45, p < 0.001), without changing time spent per character (0.29 s/char, SE = 0.35, p = 0.41) or reducing student-rated usefulness (difference −0.01, SE = 0.06, p = 0.88). Qualitative interviews indicate the mechanism is lowered initiation costs (editable scaffolds), not substitution of human effort.
Authors / source: Mahinpei et al., “AI Assistance for Discretionary Work: Increasing Feedback Provision in Higher Education,” arXiv:2606.03095v1 (June 2, 2026).
Key Points
- Experimental design: randomized field experiment at the question-submission level in a 300-level machine learning course.
- Sample: 11 TAs and 88 students; student submissions randomized to treatment (AI-assisted draft shown after grading) or control (no draft).
- Intervention: LLM-backed Chrome extension (o4-mini reported for generation) that surfaced a personalized feedback draft after the TA graded; TAs could use, edit, or ignore the draft (full human control).
- Primary quantitative outcomes:
- Feedback provision (binary): +10.81 pp (SE 1.10, p < 0.001).
- Feedback length: +39.79 characters (SE 3.45, p < 0.001).
- Time per character: no significant change (0.29 s/char, SE 0.35, p = 0.41).
- Student usefulness rating: no detectable difference (−0.01, SE 0.06, p = 0.88).
- Mechanism from interviews: drafts act as personalized starting points that reduce the barrier to initiating feedback (verification/help with wording/tone), rather than replacing human judgment or effort.
- Behavioral pattern: TAs treated drafts as editable artifacts; AI increased frequency and length of discretionary feedback while preserving student experience and human oversight.
- Contributions highlighted by authors:
- AI can increase performance of optional but socially valuable work.
- The main effect operates via task initiation, not per-unit effort reduction.
- Human–AI collaboration observed as adaptation to intermediate artifacts.
- Design tradeoff: preserving oversight limits how much AI can reduce total human effort.
Data & Methods
- Mixed-methods approach:
- Randomized field experiment: question-level randomization of student submissions to treatment vs control during the semester.
- Behavioral logs recording whether feedback was sent, content length, timestamps.
- TA and student surveys measuring perceived usefulness and TA perceptions of drafts.
- Semi-structured qualitative interviews with TAs and students to probe mechanisms and experiences.
- Tools & integration:
- Lightweight LLM Chrome extension integrated into the grading platform (Gradescope), generating drafts after grading.
- Model selection and prompt iteration involved instructors and TAs (formative study with multiple models).
- Statistical inference:
- Reported point estimates with standard errors and p-values for primary outcomes (above).
- Analysis interprets no change in time-per-character as evidence that AI lowered initiation fixed costs rather than reducing marginal effort.
- Limitations noted by authors:
- Single course setting (300-level ML), limited sample of TAs (n=11) and students (n=88) — external validity concerns.
- Short-term effects observed during a single semester; longer-run behavioral adaptation not measured.
- Use of a specific LLM and integration; results may vary with different models, prompt designs, or platform contexts.
- Potential risks (hallucinations, pedagogical appropriateness) mitigated by keeping human oversight, but not eliminated.
Implications for AI Economics
- Labor supply for discretionary tasks:
- AI can increase the supply of un-compensated or low-incentive “invisible”/discretionary labor (mentoring, feedback, documentation) by lowering fixed initiation costs. This is distinct from pure substitution of labor hours — it can unlock tasks that would otherwise be skipped.
- Modeling implication: production functions that separate (a) probability a discretionary task is attempted and (b) effort conditional on attempt. AI shifts the extensive margin (attempt probability) more than the intensive margin (effort per unit).
- Complementarity vs substitution:
- Evidence of complementarity: TAs retained control and used AI outputs as inputs, suggesting AI augments human capability rather than replaces it in this setting. Economically, AI acts as a productivity-enhancing capital good for initiating discretionary tasks.
- But preserving oversight constrains cost savings; where oversight is relaxed, substitution effects could be larger.
- Incentives, contracts, and platform design:
- Organizations (universities, firms, open-source projects) could deploy AI assistants to increase performance of socially valuable but underprovided activities. Design choices (editable drafts, human-in-the-loop) matter for worker acceptance and outcome quality.
- For labor contracting and compensation, AI may alter how managers measure contributions (more visible feedback), which could affect recognition, pay, and allocation of time across tasks.
- Welfare and scaling:
- Potential welfare gains from increased provision of high-value but under-supplied activities (improved student learning, better documentation, mentoring).
- Scaling considerations: widespread adoption could change equilibrium of time allocation — if AI reduces the barrier to providing unpaid helpful work, aggregate quality could rise without proportional increases in TA labor hours; but long-run general equilibrium effects on wages, hiring, and role definitions are open questions.
- Measurement and policy:
- Economic evaluations should measure both extensive and intensive margins of activity and capture student/recipient welfare, not only time saved.
- Regulatory/policy considerations include transparency (disclosure of AI assistance), liability for errors/hallucinations, and training/standards for safe use.
- Research priorities for AI economics:
- Quantify general equilibrium effects when AI shifts previously invisible labor into observable outputs (e.g., credit allocation, compensation dynamics).
- Study heterogeneity: tasks with different initiation costs, observability, and accountability constraints may respond differently.
- Long-term dynamics: persistence of effects, changes in intrinsic motivation, and potential crowding-out or upskilling of human workers.
- Cost–benefit analysis comparing fully automated vs human-in-the-loop designs across safety, accuracy, and labor-market outcomes.
Overall takeaway for AI economists: this study provides causal evidence that LLM-based assistance can increase the incidence of valuable discretionary work by lowering initiation costs while preserving human control. That mechanism has distinct implications for modeling AI’s impact on labor supply, task composition, and welfare beyond the classic productivity/substitution framing.
Assessment
Claims (7)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| AI-assisted feedback significantly increases feedback provision by 10.8 percentage points. Task Allocation | positive | high | feedback provision (whether feedback was provided) |
n=88
10.8 percentage points
1.0
|
| AI-assisted feedback increases feedback length by 39.8 characters. Output Quality | positive | high | feedback length (number of characters) |
n=88
39.8 chars
1.0
|
| AI-assisted feedback does not negatively affect student usefulness ratings. Output Quality | null_result | high | student usefulness ratings of feedback |
n=88
0.6
|
| AI-assisted feedback does not reduce time per character (i.e., it does not increase time cost per unit of feedback). Task Completion Time | null_result | high | time per character (effort per unit of feedback) |
n=88
0.6
|
| Qualitative findings indicate AI-assisted drafts function as editable scaffolds that lower barriers to initiating feedback rather than reducing overall effort. Task Allocation | positive | high | perceived barriers to initiating feedback / perceived TA effort |
n=11
0.6
|
| TAs remained fully in control and could use, edit, or ignore AI-generated drafts at their discretion. Other | positive | high | degree of human control over AI-generated artifacts (procedural/design feature) |
n=11
1.0
|
| AI assistance shows promise for increasing discretionary but beneficial work (tasks users intend but often skip) while preserving human control over final outcomes. Task Allocation | positive | medium | participation in discretionary beneficial tasks (feedback provision) and preservation of human control |
n=88
0.36
|