Generative AI boosts average knowledge‑worker performance, but the gains are uneven: those who can elicit, filter and verify model outputs reap outsized benefits while others gain little or even fall behind; brief AIC training and simple workflow scaffolds reduce this new form of performance inequality.
Generative Artificial Intelligence (GenAI) is transforming how firms create, process, and apply knowledge, yet little is known about the heterogeneity of its productivity effects across users. We report results from a randomized controlled experiment in which participants-analogs of early-career knowledge workers-were assigned to self-study a technical domain using either traditional resources or large-language-model (LLM) assistance. On average, GenAI access significantly increased task performance, but the distribution of gains was highly uneven. Improvements were not predicted by GPA or prior knowledge, but by \textit{AI Interaction Competence (AIC)} -- the ability to elicit, filter, and verify model outputs. High-AIC participants realized outsized gains; low-AIC participants saw limited or even negative marginal returns. A scaffolding intervention (conceptual maps) reduced outcome variance, indicating that standardized workflows can mitigate inequality in AI-mediated performance. We interpret these findings through the lens of human-AI complementarities: GenAI raises mean productivity while introducing a new axis of capability inequality. Managerially, firms should pair GenAI access with short AIC micro-training and simple standard operating procedures to capture value consistently and avoid uneven adoption outcomes.
Summary
Main Finding
Access to generative AI (LLMs) raises mean task performance for early‑career knowledge‑work tasks, but benefits are highly heterogeneous: gains are driven by individuals’ AI Interaction Competence (AIC) — the skill to prompt, filter, and verify model outputs. High‑AIC users realize large improvements; low‑AIC users see little or even negative marginal returns. A simple process scaffold (conceptual maps) reduces between‑user variance, suggesting lightweight workflows and micro‑training can make GenAI adoption more uniformly productive.
Key Points
- AI Interaction Competence (AIC) introduced as a new form of human capital: ability to formulate goal‑oriented prompts, verify outputs, and iterate effectively with LLMs. AIC, not GPA or baseline domain knowledge, predicts who captures GenAI gains.
- Randomized controlled experiment: LLM access increased average post‑test performance, but distribution of gains was uneven. High‑AIC participants had outsized gains; low‑AIC participants experienced limited or negative returns.
- Managerial/process levers matter: a simple scaffolding intervention (conceptual roadmaps and recommended sequencing) reduced outcome variance without lowering mean performance. Other managerial variants (more time, peer collaboration) were tested as organizational levers.
- Raises an equity dimension to GenAI adoption: the technology shifts the productivity frontier upward while introducing a new axis of capability inequality across workers.
- Practical recommendation: pair GenAI deployment with short AIC micro‑trainings and simple standard operating procedures to increase and stabilize value capture across employees.
Data & Methods
- Design: Randomized controlled trial simulating an early‑career knowledge‑work learning task (self‑study + application).
- Sample: N = 179 participants recruited at Texas A&M (primarily engineering students; mix of undergrad, masters, PhD).
- Pre‑intervention profiling: demographics, GPA, self‑assessed ML/LLM knowledge and AIC, 15‑item baseline multiple‑choice exam (general ML + LLM‑specific items).
- Randomization:
- Primary: Baseline resources (no LLM) vs LLM condition (restricted to free ChatGPT).
- Secondary (within LLM novices): four subarms — baseline LLM, increased time (4 hrs/day), scaffolding (conceptual roadmap + sequencing), peer collaboration.
- Intervention: self‑directed study on LLMs for three consecutive days (minimum 3 hrs/day; 4 hrs for time arm). Allowed one one‑page cheat sheet for post‑test.
- Outcomes:
- Primary: post‑intervention exam (28 multiple‑choice items focused on LLM knowledge, plus three open numerical tie‑breakers), scores normalized to [0,1].
- Secondary: engagement/attrition and revealed resource preferences.
- Analysis: treatment effects estimated controlling for baseline performance and covariates; heterogeneity explored by prior knowledge, AIC, and other moderators.
- Key empirical findings (reported qualitatively in paper):
- LLM access → higher mean performance.
- Gains not predicted by traditional markers (e.g., GPA, baseline knowledge) but by measured AIC.
- High‑AIC participants: large positive treatment effects. Low‑AIC participants: small or negative effects.
- Scaffolding intervention reduced variance in outcomes among LLM users, improving consistency of gains.
Implications for AI Economics
- Human–AI complementarities matter for productivity measurement. Aggregate estimates of AI’s effect on productivity should account for heterogeneity in AIC; simple averages overstate value for populations with low AIC and understate distributional effects.
- Skill‑biased technical change revisited: GenAI does not simply substitute routine tasks — it amplifies returns to a new, interactional skill set (AIC). This can widen within‑firm and across‑worker productivity dispersion unless firms actively build complementary capabilities.
- Organizational adoption strategy: firms can increase ROI and reduce uneven adoption outcomes by investing in low‑cost interventions (AIC micro‑training, conceptual scaffolds, standardized workflows) rather than only providing tool access.
- Labor market consequences: differential AIC endowments could affect wage dispersion, task allocation, and promotion paths. Measuring and credentialing AIC may become important for hiring and training policies.
- Policy and measurement recommendations:
- When modelling AI’s macroeconomic impact, include parameters for the distribution of interaction skills and the cost/effectiveness of upskilling interventions.
- Encourage development and evaluation of scalable AIC training modules; assess impacts on both mean productivity and variance.
- Research directions:
- External validity: replicate in professional populations and diverse occupations to quantify real‑world magnitudes and persistence of effects.
- Longitudinal dynamics: how quickly does AIC develop with experience, and do short micro‑trainings have durable effects?
- Team and organizational complementarities: how do team structures, monitoring, and incentives interact with AIC heterogeneity to shape firm‑level productivity?
- Measurement: develop validated instruments to observe/score AIC (beyond self‑reports) for use in economics and management studies.
Short takeaway: Generative AI raises average productivity but creates a new skill frontier (AIC). To realize consistent gains and limit widening productivity disparities, firms and policymakers should treat AIC as a target for inexpensive, scalable training and process design rather than assume access alone suffices.
Assessment
Claims (7)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We conducted a randomized controlled experiment in which participants—analogs of early-career knowledge workers—were assigned to self-study a technical domain using either traditional resources or large-language-model (LLM) assistance. Other | null_result | high | experimental assignment / study design (treatment vs control) |
1.0
|
| On average, GenAI access significantly increased task performance. Developer Productivity | positive | high | task performance (overall) |
0.6
|
| The distribution of gains from GenAI access was highly uneven across users. Inequality | mixed | high | distribution (variance) of performance gains |
0.6
|
| Improvements were not predicted by GPA or prior knowledge, but were predicted by AI Interaction Competence (AIC) — the ability to elicit, filter, and verify model outputs. Developer Productivity | positive | high | task performance improvements (predicted by AIC vs GPA/prior knowledge) |
0.6
|
| High-AIC participants realized outsized gains from GenAI access; low-AIC participants saw limited or even negative marginal returns. Developer Productivity | mixed | high | treatment effect on task performance by AIC subgroup |
0.6
|
| A scaffolding intervention (conceptual maps) reduced outcome variance, indicating that standardized workflows can mitigate inequality in AI-mediated performance. Inequality | positive | high | variance (dispersion) of task performance outcomes |
0.6
|
| Managerially, firms should pair GenAI access with short AIC micro-training and simple standard operating procedures (SOPs) to capture value consistently and avoid uneven adoption outcomes. Training Effectiveness | positive | high | consistency of value capture / adoption outcomes (proposed effect of training and SOPs) |
0.1
|