Generative AI boosts average knowledge‑worker performance, but the gains are uneven: those who can elicit, filter and verify model outputs reap outsized benefits while others gain little or even fall behind; brief AIC training and simple workflow scaffolds reduce this new form of performance inequality.

Generative AI and the Productivity Divide: Human-AI Complementarities in Education

Lihi Idan, Bharat Anand · May 18, 2026

arxiv rct medium evidence 8/10 relevance Source PDF

In an RCT, access to an LLM raised average task performance among early-career knowledge-worker analogs, but gains were concentrated among participants with high AI Interaction Competence (AIC), while a simple scaffolding intervention reduced outcome variance.

Generative Artificial Intelligence (GenAI) is transforming how firms create, process, and apply knowledge, yet little is known about the heterogeneity of its productivity effects across users. We report results from a randomized controlled experiment in which participants-analogs of early-career knowledge workers-were assigned to self-study a technical domain using either traditional resources or large-language-model (LLM) assistance. On average, GenAI access significantly increased task performance, but the distribution of gains was highly uneven. Improvements were not predicted by GPA or prior knowledge, but by \textit{AI Interaction Competence (AIC)} -- the ability to elicit, filter, and verify model outputs. High-AIC participants realized outsized gains; low-AIC participants saw limited or even negative marginal returns. A scaffolding intervention (conceptual maps) reduced outcome variance, indicating that standardized workflows can mitigate inequality in AI-mediated performance. We interpret these findings through the lens of human-AI complementarities: GenAI raises mean productivity while introducing a new axis of capability inequality. Managerially, firms should pair GenAI access with short AIC micro-training and simple standard operating procedures to capture value consistently and avoid uneven adoption outcomes.

Summary

Main Finding

Access to generative AI (LLMs) raises mean task performance for early‑career knowledge‑work tasks, but benefits are highly heterogeneous: gains are driven by individuals’ AI Interaction Competence (AIC) — the skill to prompt, filter, and verify model outputs. High‑AIC users realize large improvements; low‑AIC users see little or even negative marginal returns. A simple process scaffold (conceptual maps) reduces between‑user variance, suggesting lightweight workflows and micro‑training can make GenAI adoption more uniformly productive.

Key Points

AI Interaction Competence (AIC) introduced as a new form of human capital: ability to formulate goal‑oriented prompts, verify outputs, and iterate effectively with LLMs. AIC, not GPA or baseline domain knowledge, predicts who captures GenAI gains.
Randomized controlled experiment: LLM access increased average post‑test performance, but distribution of gains was uneven. High‑AIC participants had outsized gains; low‑AIC participants experienced limited or negative returns.
Managerial/process levers matter: a simple scaffolding intervention (conceptual roadmaps and recommended sequencing) reduced outcome variance without lowering mean performance. Other managerial variants (more time, peer collaboration) were tested as organizational levers.
Raises an equity dimension to GenAI adoption: the technology shifts the productivity frontier upward while introducing a new axis of capability inequality across workers.
Practical recommendation: pair GenAI deployment with short AIC micro‑trainings and simple standard operating procedures to increase and stabilize value capture across employees.

Data & Methods

Design: Randomized controlled trial simulating an early‑career knowledge‑work learning task (self‑study + application).
Sample: N = 179 participants recruited at Texas A&M (primarily engineering students; mix of undergrad, masters, PhD).
Pre‑intervention profiling: demographics, GPA, self‑assessed ML/LLM knowledge and AIC, 15‑item baseline multiple‑choice exam (general ML + LLM‑specific items).
Randomization:
- Primary: Baseline resources (no LLM) vs LLM condition (restricted to free ChatGPT).
- Secondary (within LLM novices): four subarms — baseline LLM, increased time (4 hrs/day), scaffolding (conceptual roadmap + sequencing), peer collaboration.
Intervention: self‑directed study on LLMs for three consecutive days (minimum 3 hrs/day; 4 hrs for time arm). Allowed one one‑page cheat sheet for post‑test.
Outcomes:
- Primary: post‑intervention exam (28 multiple‑choice items focused on LLM knowledge, plus three open numerical tie‑breakers), scores normalized to [0,1].
- Secondary: engagement/attrition and revealed resource preferences.
Analysis: treatment effects estimated controlling for baseline performance and covariates; heterogeneity explored by prior knowledge, AIC, and other moderators.
Key empirical findings (reported qualitatively in paper):
- LLM access → higher mean performance.
- Gains not predicted by traditional markers (e.g., GPA, baseline knowledge) but by measured AIC.
- High‑AIC participants: large positive treatment effects. Low‑AIC participants: small or negative effects.
- Scaffolding intervention reduced variance in outcomes among LLM users, improving consistency of gains.

Implications for AI Economics

Human–AI complementarities matter for productivity measurement. Aggregate estimates of AI’s effect on productivity should account for heterogeneity in AIC; simple averages overstate value for populations with low AIC and understate distributional effects.
Skill‑biased technical change revisited: GenAI does not simply substitute routine tasks — it amplifies returns to a new, interactional skill set (AIC). This can widen within‑firm and across‑worker productivity dispersion unless firms actively build complementary capabilities.
Organizational adoption strategy: firms can increase ROI and reduce uneven adoption outcomes by investing in low‑cost interventions (AIC micro‑training, conceptual scaffolds, standardized workflows) rather than only providing tool access.
Labor market consequences: differential AIC endowments could affect wage dispersion, task allocation, and promotion paths. Measuring and credentialing AIC may become important for hiring and training policies.
Policy and measurement recommendations:
- When modelling AI’s macroeconomic impact, include parameters for the distribution of interaction skills and the cost/effectiveness of upskilling interventions.
- Encourage development and evaluation of scalable AIC training modules; assess impacts on both mean productivity and variance.
Research directions:
- External validity: replicate in professional populations and diverse occupations to quantify real‑world magnitudes and persistence of effects.
- Longitudinal dynamics: how quickly does AIC develop with experience, and do short micro‑trainings have durable effects?
- Team and organizational complementarities: how do team structures, monitoring, and incentives interact with AIC heterogeneity to shape firm‑level productivity?
- Measurement: develop validated instruments to observe/score AIC (beyond self‑reports) for use in economics and management studies.

Short takeaway: Generative AI raises average productivity but creates a new skill frontier (AIC). To realize consistent gains and limit widening productivity disparities, firms and policymakers should treat AIC as a target for inexpensive, scalable training and process design rather than assume access alone suffices.

Assessment

Paper Typerct Evidence Strengthmedium — Internal validity for the average causal effect of GenAI is strong due to randomization and an experimental protocol; the scaffolding intervention appears causal if randomized. However, the central heterogeneity claim relies on an observational moderator (AIC) rather than a randomized assignment, and external validity is limited by the lab/short-term analog setting, single domain/LLM, and sample composition. Methods Rigormedium — The study uses a randomized controlled design and clear outcome measures, which is a rigorous approach for causal inference on mean effects; but potential concerns include unknown sample size and representativeness, the validity/reliability of the novel AIC measure, short-term tasks rather than workplace productivity, and limited discussion (in the abstract) of balance checks, pre-registration, or robustness analyses. SampleExperimental sample of participants recruited as analogs of early-career knowledge workers (likely students or online workers) who were randomized to self-study a technical domain using either traditional resources or LLM assistance; outcomes are task performance measures in that technical domain; AIC measured from interaction behaviors and/or pretests; a subset received a randomized scaffolding (concept maps) intervention. Themesproductivity skills_training human_ai_collab inequality org_design IdentificationRandomized assignment of participants (analogs of early-career knowledge workers) to either LLM-assisted self-study or traditional resources; causal effect of GenAI access estimated by comparing task performance across randomized groups; a separate scaffolding (concept-map) intervention was randomized to test mitigation of variance; heterogeneity by AI Interaction Competence (AIC) is analyzed observationally (AIC measured, not randomized). GeneralizabilityLab/short-term experimental tasks may not reflect long-run workplace productivity, Participants are analogs (e.g., students or crowdworkers), not incumbent professionals, Single technical domain and specific LLM/version limit applicability across tasks and models, Measured AIC may be context- and task-specific; external validity to team and firm settings unclear, Outcomes capture individual task performance, not downstream organization-level outcomes (productivity, profits, turnover)

Claims (7)

Claim	Direction	Confidence	Outcome	Details
We conducted a randomized controlled experiment in which participants—analogs of early-career knowledge workers—were assigned to self-study a technical domain using either traditional resources or large-language-model (LLM) assistance. Other	null_result	high	experimental assignment / study design (treatment vs control)	1.0
On average, GenAI access significantly increased task performance. Developer Productivity	positive	high	task performance (overall)	0.6
The distribution of gains from GenAI access was highly uneven across users. Inequality	mixed	high	distribution (variance) of performance gains	0.6
Improvements were not predicted by GPA or prior knowledge, but were predicted by AI Interaction Competence (AIC) — the ability to elicit, filter, and verify model outputs. Developer Productivity	positive	high	task performance improvements (predicted by AIC vs GPA/prior knowledge)	0.6
High-AIC participants realized outsized gains from GenAI access; low-AIC participants saw limited or even negative marginal returns. Developer Productivity	mixed	high	treatment effect on task performance by AIC subgroup	0.6
A scaffolding intervention (conceptual maps) reduced outcome variance, indicating that standardized workflows can mitigate inequality in AI-mediated performance. Inequality	positive	high	variance (dispersion) of task performance outcomes	0.6
Managerially, firms should pair GenAI access with short AIC micro-training and simple standard operating procedures (SOPs) to capture value consistently and avoid uneven adoption outcomes. Training Effectiveness	positive	high	consistency of value capture / adoption outcomes (proposed effect of training and SOPs)	0.1