The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

An AI tutor that anticipates teamwork breakdowns improves pair-programming outcomes: in a 26-dyad lab study, forecast-driven prompts raised debugging success and cut task time while enhancing joint attention and mental effort.

ProPACT: A Proactive AI-Driven Adaptive Collaborative Tutor for Pair Programming
Anahita Golrang, Kshitij Sharma, olga viberg · May 04, 2026
arxiv quasi_experimental medium evidence 7/10 relevance Source PDF
ProPACT, a proactive AI tutor that forecasts imminent suboptimal collaboration states and delivers minimally intrusive scaffolds, increased pair-programming debugging success, efficiency, feedback uptake, and short-term gains in joint attention and mental effort in a 26-dyad within-subject study.

Effective pair programming depends on coordination of attention, cognitive effort, and joint regulation over time, yet most adaptive learning systems remain individual-centric and reactive. This paper introduces ProPACT, a proactive AI-driven adaptive collaborative tutor that treats collaboration itself as the object of instruction. ProPACT constructs a multimodal dyadic learner model based on Joint Visual Attention (JVA), Joint Mental Effort (JME), and individual mental effort, and employs an XGBoost-based forecasting model to predict emerging suboptimal collaboration states up to 30 seconds in advance. These predictions drive a hierarchical adaptive policy that delivers minimally intrusive scaffolds while fading support during productive collaboration. A within-subject study with 26 pair-programming dyads shows that proactive feedback significantly improves debugging success, task efficiency, feedback uptake, and post-intervention gains in JVA and JME, demonstrating the potential of forecast-driven dyadic adaptivity for real-time collaborative learning regulation.

Summary

Main Finding

ProPACT is a multimodal, forecast-driven adaptive tutor for pair programming that treats collaboration as the object of instruction. Using dual eye-tracking and pupillometry to model Joint Visual Attention (JVA), Joint Mental Effort (JME) and individual mental effort (ME), an XGBoost predictor forecasts suboptimal dyadic states up to 30 seconds ahead and triggers a hierarchical, minimally intrusive scaffold policy. In a within-subject lab study (26 pairs) ProPACT significantly improved debugging success, reduced debugging time, increased feedback uptake, and produced immediate gains in JVA and JME (all effects p < .0001).

Key Points

  • Novelty
    • Treats collaboration (dyadic alignment of attention and cognitive effort) as the instructional target rather than only individuals.
    • Uses short-horizon forecasting (up to 30s) to enable proactive, anticipatory interventions instead of reactive alerts.
  • Dyadic modeling
    • Multimodal signals: Joint Visual Attention (JVA) from dual eye-tracking, individual mental effort (ME) via pupillometry (Index of Pupillary Activity), and Joint Mental Effort (JME) computed by synchrony measures (cross-recurrence quantification).
    • Signals discretized into High/Average/Low relative to resting baseline (±2SD).
  • Forecasting & policy
    • XGBoost model predicts categorical JVA, JME, ME 30s ahead.
    • Predictions mapped to a desired collaboration-state matrix and a top-down hierarchical policy that prioritizes minimal intrusion and escalates only as needed.
  • Feedback modalities (staged from least to most directive)
    • Dual text selection (always-on mutual awareness).
    • Gaze-awareness visualization (triggered when JVA low).
    • Dialogue prompt (triggered when JME low).
    • GitHub Copilot autocomplete activation (triggered on forecasted individual extreme ME patterns).
    • Task-based hint (last-resort when both collaborators show sustained extreme ME).
  • Empirical results (within-subject, N = 26 dyads)
    • No order effects detected.
    • Debugging success: significantly higher with ProPACT vs control (t[49.96] = -13.51, p < .0001).
    • Debugging time: significantly lower with ProPACT (t[44.70] = 4.39, p < .0001).
    • Feedback uptake: significantly higher with ProPACT (F[49.81] = -17.69, p < .0001).
    • Process measures: JVA increased after feedback (t = 12.76, p < .0001); JME increased after feedback (t = 19.33, p < .0001).
  • Limitations called out by authors
    • Lab setting with short tasks and CS-student sample—limits external validity.
    • Reliance on specialized hardware (dual eye-tracking, pupillometry) limits scalability.
    • Short-term outcomes only — no long-term learning, retention, or transfer data.

Data & Methods

  • Participants & design
    • 26 pairs (undergrad/masters CS or engineering students), within-subjects comparison: control (no feedback) vs ProPACT feedback; tasks contained logical bugs.
  • Sensing & features
    • Dual eye-tracking → gaze mapped to a persistent code-aligned grid; JVA computed as cosine similarity of gaze distributions aggregated over 30-second windows.
    • Pupillometry → Index of Pupillary Activity (IPA) computed over non-overlapping 10-second windows as proxy for ME.
    • JME computed from individual ME signals via cross-recurrence quantification to measure synchrony.
    • Normalization / discretization: each signal compared to individual resting baseline with ±2SD thresholds producing High / Average / Low categories.
  • Forecasting & trigger logic
    • Predictor: XGBoost forecasting model outputs categorical forecasts for JVA, JME, ME up to 30 seconds ahead.
    • Hierarchical policy: a rule-based decision mapping forecasted states to feedback types (do-nothing if desired state; minimal cues first; escalate to Copilot or hint for sustained extremes).
    • Trigger table of conditions summarized (e.g., JVA = Low → gaze-awareness; JME = Low → dialogue prompt; both MEs = High → task-based hint).
  • Evaluation metrics & analysis
    • Primary outcome: debugging success (number of bugs fixed).
    • Secondary outcomes: debugging time on task (efficiency), feedback uptake (code edits after feedback before next trigger).
    • Process measures: pre- and post-feedback JVA and JME (averaged 2 minutes before and after feedback).
    • Statistical tests: paired t-tests for condition comparisons and pre/post process measures; tests for homoscedasticity (Breusch–Pagan) and normality (Shapiro–Wilk); order effects tested and non-significant.

Implications for AI Economics

  • Productivity and labor value
    • Short-run productivity gains: ProPACT reduced debugging time and increased bug-fixing success in the lab. If transferable, such systems could raise developer productivity per pair-hour, changing effective labor supply and output in software teams.
    • Complementarity with human skill: By scaffolding coordination and reducing time spent on coordination breakdowns, these systems act as productivity complements rather than substitutes for core coding skills—potentially increasing demand for higher-level design and integration work.
  • Markets for collaborative AI tools
    • Integration with platforms (e.g., Copilot) suggests a business opportunity for embedding dyadic forecasting/adaptation into IDEs and team tooling marketplaces.
    • A hybrid model (lightweight awareness + selective AI code-assist) could be a differentiated commercial product against plain autocomplete tools.
  • Capital, costs, and scaling barriers
    • Current reliance on dual eye-tracking and pupillometry creates upfront hardware and deployment costs that constrain adoption. Cost–benefit is favorable only if productivity gains persist and scale.
    • The authors propose moving toward scalable signals (webcam gaze estimation, keystroke/interaction logs). Economically, reducing sensor costs or using existing infrastructure (webcams) is key to wide-market viability.
  • Human capital and training dynamics
    • Short-term scaffolds that increase performance may reduce immediate training time but could also risk attenuating the learning of certain coordination skills if scaffolds are overused (moral hazard in skill acquisition). Longitudinal effects on developer human capital and wage trajectories need study.
  • Distributional & access concerns
    • Early adopters (well-resourced firms, big tech) may capture disproportionate productivity gains, widening productivity and wage gaps with smaller firms or developers lacking access to advanced tooling.
    • Privacy, surveillance, and trust issues (eye-tracking, pupillometry) may hinder adoption or require additional compensation and regulation, affecting deployment costs and firm decisions.
  • Policy and procurement
    • Firms or educational institutions investing in proactive collaborative AI should weigh costs of sensors, user acceptability, and training benefits. Public procurement or subsidized pilots could accelerate adoption in education, but standards for transparency and user consent will matter economically.
  • Competitive dynamics & complementarities
    • Tools that combine dyadic forecasting with AI code assistance could outcompete standalone autocomplete tools by improving team-level outcomes.
    • Complementary services—analytics, privacy-preserving sensing, instructor dashboards—could form new markets.
  • Research & evidence needs for economic decisions
    • Key missing pieces for economic modeling: external validity in real-world teams, longitudinal impact on learning and productivity, replacement vs augmentation effects on tasks, and user acceptance costs. These are necessary to estimate ROI and labor-market impacts reliably.

Overall assessment for AI-economics stakeholders: ProPACT demonstrates promising lab-scale evidence that forecast-driven dyadic adaptivity can raise team performance and coordination. The main economic obstacles are sensing costs, generalizability to real-world settings, and potential effects on skill accumulation and distribution of gains. If scalable sensing alternatives or cloud/IDE integrations succeed, these systems could become valuable productivity-enhancing complements in software development toolchains, with nontrivial implications for firm-level productivity, labor demand composition, and markets for collaborative AI products.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — The study reports objective improvements (debugging success, efficiency, uptake, and joint-attention/effort gains) from a controlled within-subject intervention using real-time forecasting, which supports causal interpretation; however, the sample is small (26 dyads), conducted in a laboratory/learning setting, and the design description lacks detail on randomization/counterbalancing and long-term/field validation, limiting confidence and external validity. Methods Rigormedium — Strengths include multimodal dyadic measurement (JVA, JME, individual effort), an XGBoost forecasting model predicting 30s ahead, and a hierarchical adaptive policy evaluated experimentally; weaknesses include small N, potential overfitting of the forecasting model, limited reporting on model validation, possible order effects in within-subject design, and unclear pre-registration or robustness checks. SampleControlled lab study of 26 pair-programming dyads (presumably 52 participants) performing debugging tasks; multimodal data collected (joint visual attention via gaze/visual metrics, joint and individual mental-effort proxies), with interventions delivered in real time based on XGBoost forecasts up to 30 seconds ahead. Themeshuman_ai_collab skills_training productivity IdentificationWithin-subject experimental comparison: each dyad experienced the proactive-feedback condition driven by the ProPACT forecasting model versus a baseline/other condition (within-dyad contrasts control for stable dyad-level confounders); causal claims rest on temporal alignment of interventions and subsequent changes in objective task outcomes and multimodal measures, though details on counterbalancing/randomization of condition order and blinding are not provided. GeneralizabilitySmall, convenience lab sample (26 dyads) — limited statistical power and external validity, Educational/pedagogical debugging tasks may not reflect professional software-development settings, Short-term, within-session effects — unclear persistence or transfer to novel tasks, Dyadic interactions only — unclear applicability to larger teams or different collaboration structures, Relies on multimodal sensors/metrics (gaze, effort proxies) that may not be available or reliable in field settings, Potential cultural/sample homogeneity (not reported) limiting demographic generalizability

Claims (8)

ClaimDirectionConfidenceOutcomeDetails
ProPACT is a proactive AI-driven adaptive collaborative tutor that treats collaboration itself as the object of instruction. Team Performance positive high collaboration quality
0.24
ProPACT constructs a multimodal dyadic learner model based on Joint Visual Attention (JVA), Joint Mental Effort (JME), and individual mental effort. Team Performance positive high Joint Visual Attention (JVA) and Joint Mental Effort (JME) measurements
0.48
ProPACT employs an XGBoost-based forecasting model to predict emerging suboptimal collaboration states up to 30 seconds in advance. Team Performance positive high prediction of emerging suboptimal collaboration states (prediction horizon up to 30 seconds)
n=26
0.48
ProPACT uses a hierarchical adaptive policy that delivers minimally intrusive scaffolds while fading support during productive collaboration. Training Effectiveness positive high scaffolding intrusiveness and adaptive fading of support
0.24
In a within-subject study with 26 pair-programming dyads, proactive feedback significantly improves debugging success. Output Quality positive high debugging success
n=26
0.48
Proactive feedback significantly improves task efficiency. Task Completion Time positive high task efficiency (e.g., time to complete debugging tasks)
n=26
0.48
Proactive feedback significantly improves feedback uptake. Training Effectiveness positive high feedback uptake (adoption of scaffolded suggestions)
n=26
0.48
Proactive feedback produces post-intervention gains in Joint Visual Attention (JVA) and Joint Mental Effort (JME). Team Performance positive high Joint Visual Attention (JVA) and Joint Mental Effort (JME)
n=26
0.48

Notes