A ten-minute onboarding session substantially raised voluntary use of an LLM and produced measurable performance gains on a law-school exam: training increased usage from 26% to 41% and improved scores by 0.27 grade points, whereas untrained access delivered no score benefit.
Can targeted user training unlock the productive potential of generative artificial intelligence (GenAI) in professional settings? We investigate this question using a randomized study involving 164 law students completing an issue-spotting examination. Participants were assigned to one of three conditions: no GenAI access, optional access to a large language model (LLM), or optional access accompanied by an approximately ten-minute training intervention. Training significantly increased LLM adoption--the usage rate rose from 26% to 41%--and improved examination performance. Students with trained access scored 0.27 grade points higher than those with untrained access (p = 0.027), equivalent to roughly one-third of a letter grade. By contrast, access to an LLM without training did not improve performance and was associated with shorter answers relative to no access. Using principal stratification, we decompose the overall effect into adoption and effectiveness channels. Point estimates are consistent with training operating primarily by expanding the scope of GenAI use rather than by enhancing effectiveness among existing users, though confidence intervals are wide. Overall, our findings provide evidence that complementary investments in user training are critical for realizing GenAI productivity gains in knowledge-intensive fields where concerns about reliability may inhibit adoption.
Summary
Main Finding
A brief, targeted training intervention substantially increased voluntary adoption of a generative LLM and improved objective performance on a law issue‑spotting exam. Training raised reported LLM use from 26% to 41% and yielded a 0.27 grade‑point improvement versus untrained LLM access (p = 0.027, roughly one‑third of a letter grade). LLM access without training did not improve scores (and was associated with shorter answers). Principal‑stratification estimates are consistent with training working mainly by inducing additional users (extensive margin) rather than materially improving effectiveness among those who would have used the LLM anyway, though confidence intervals are wide.
Key Points
- Design: Randomized controlled trial with three arms:
- Group 1: No GenAI access (Westlaw only).
- Group 2: Optional LLM access (DeepSeek), no guidance.
- Group 3: Optional LLM access + ~9.5 minute training video + 5‑question quiz.
- Sample: 213 signed up; 164 completed the study and were analyzed (Group sizes in analysis: 49, 57, 58).
- Task: Open‑book, timed (75 min) issue‑spotting examination in contract law (identification + analysis of major and sub‑issues).
- Adoption effect: Self‑reported LLM use rose from 26% (Group 2) to 41% (Group 3).
- Performance effect:
- Trained access (Group 3) vs untrained access (Group 2): +0.27 grade points, p = 0.027.
- Untrained LLM access vs no access: no statistically significant improvement; answers tended to be shorter.
- Mechanism evidence: Principal stratification decomposition suggests training primarily expands who chooses to use GenAI (induced users) rather than substantially increasing the per‑user effectiveness of GenAI use. Estimates consistent with this mechanism but imprecise.
- Training content emphasized prompting strategies, decomposing tasks, multiple prompts/iterations, giving feedback, and strong cautions about hallucinations and the need for human verification.
- Pre‑registration: hypotheses pre‑registered; some analyses declared when not pre‑registered.
Data & Methods
- Randomization: between‑subjects assignment to three conditions to isolate the marginal effect of training while holding LLM access constant between Groups 2 and 3.
- Participants: LL.B. and J.D. students enrolled in contract law at the University of Hong Kong; invited ~2 months before the official exam; compensation/gamification (gift card + feedback).
- Interventions:
- DeepSeek LLM made available to Groups 2 and 3.
- Training: 9m30s instructional video demonstrating best practices, warnings about reliability, and a short quiz (delivered only to Group 3).
- Outcome measures:
- Primary: exam score (graded against rubric listing major issues and sub‑issues).
- Secondary: self‑reported DeepSeek use (binary), answer length, and post‑exam survey items on prior training and attitudes.
- Estimation:
- Mean comparisons across groups for adoption and scores (p‑values reported).
- Principal stratification used to decompose effects into (a) adoption/extensive margin and (b) effectiveness/intensive margin among would‑be users.
- Attrition and protocol:
- 164 of 213 completed; groups run in separate classrooms without prior disclosure of differing LLM access to minimize selection and spillovers.
- Limitations noted by authors: student sample (not practicing lawyers), single task context (issue‑spotting), self‑reported usage, short training and one LLM, and wide confidence intervals in mechanistic decomposition.
Implications for AI Economics
- Complementary investments matter: Technology access alone may not translate into productivity gains in judgment‑intensive, error‑sensitive tasks. Small, low‑cost training can materially raise adoption and realized gains.
- Adoption frictions are central: Training appears to lower psychological or perceived risk costs (k_T in the authors’ model), expanding the set of users and use‑cases—this highlights the importance of non‑pecuniary frictions in diffusion models for high‑skill occupations.
- Distributional effects & heterogeneity:
- Theoretical model and some empirical patterns align with the idea that lower‑ability users tend to gain more from effective GenAI use, but training may shift who adopts rather than raising per‑user effectiveness; this can produce leveling effects or alter occupational task allocation depending on how training is targeted.
- Organizational policy and human capital:
- Firms and professional organizations should treat user training, guidance, and verification protocols as complements to GenAI deployment—investments in instruction, workflows, and oversight may unlock productivity that raw access does not.
- Measurement and evaluation of AI productivity:
- Lab/field differences matter: controlled studies showing GenAI capabilities are necessary but insufficient for estimating real‑world productivity gains; adoption and interaction patterns conditional on instruction, incentives, and accountability are critical mediators.
- Research agenda:
- Broader field experiments with practitioners, longer or iterative training, integration of retrieval‑augmented systems, and task‑varied settings will be needed to quantify persistence, external validity, and the balance between extensive vs intensive margin effects.
- Caution for regulators and firms:
- Rapid rollout of GenAI without investment in user literacy and verification can yield non‑trivial risks in high‑stakes professions (hallucinations, malpractice, liability). Policy should encourage training, transparent disclosure, and monitoring as part of safe adoption.
Assessment
Claims (13)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| A brief, targeted training increased voluntary LLM use from 26% (optional access without training) to 41% (optional access with training). Adoption Rate | positive | high | LLM adoption (whether the student used the LLM) |
n=164
increase from 26% to 41% (15 percentage points)
0.6
|
| Training improved exam scores by 0.27 grade points relative to optional access without training (p = 0.027). Output Quality | positive | high | Exam score (grade points) on a law-school issue-spotting exam |
n=164
+0.27 grade points (p = 0.027)
0.6
|
| Providing optional LLM access without training did not increase average exam scores versus no LLM access. Output Quality | null_result | medium | Exam score (grade points) |
n=164
no significant average difference versus no-access
0.36
|
| Optional LLM access without training was associated with shorter written answers compared with no LLM access. Output Quality | negative | medium | Answer length (measured length of exam answers) |
n=164
shorter written answers in untrained optional-access group (directional)
0.36
|
| Principal stratification analysis suggests the training’s effect on scores operated primarily by expanding the set of LLM users (an adoption channel) rather than substantially improving per-user productivity among those who would already use the LLM. Adoption Rate | mixed | low | Mechanism components: adoption rate and per-user effectiveness (score conditional on usage) |
n=164
mechanism decomposition indicates larger contribution from adoption margin than within-user productivity (imprecise)
0.18
|
| Some mechanism-specific estimates are imprecise due to the sample size; confidence intervals for those estimates are wide. Other | mixed | high | Precision of mechanism estimates (confidence interval width for adoption vs productivity decomposition) |
n=164
mechanism-specific estimates imprecise; wide confidence intervals
0.6
|
| The intervention consisted of roughly a ten-minute training focused on how to use the LLM effectively. Other | null_result | high | Intervention duration/content (training implementation) |
n=164
approximately 10-minute targeted training intervention (implementation detail)
0.6
|
| The study used a randomized controlled design with three arms: no LLM access, optional LLM access, and optional LLM access plus brief training. Other | null_result | high | Study design (randomization and arm definitions) |
n=164
randomized controlled design with three arms (no access; optional access; optional access + training)
0.6
|
| The primary outcomes analyzed were LLM adoption (use), exam score (grade points), and answer length. Other | null_result | high | Adoption; exam score; answer length |
n=164
primary outcomes: LLM adoption (use), exam score, answer length
0.6
|
| The observed score improvement of 0.27 grade points corresponds roughly to one-third of a letter grade. Output Quality | positive | medium | Exam score (grade points; interpreted as fraction of a letter grade) |
n=164
+0.27 grade points (~one-third of a letter grade)
0.36
|
| Analyses were conducted as intent-to-treat comparisons across arms, with hypothesis tests reported (including p-values) and principal stratification used for mechanism decomposition. Other | null_result | high | Analysis methods (ITT, hypothesis tests, principal stratification) |
n=164
analyses conducted as intent-to-treat with hypothesis tests and principal stratification
0.6
|
| Results and implications are limited by the sample and context: evidence comes from law students on a single issue-spotting exam using one brief training intervention, so generalizability to experienced professionals, other tasks, or other models is untested. Other | mixed | high | Generalizability/applicability to other populations and tasks |
n=164
limited generalizability (law students, single exam, one brief training)
0.6
|
| Policy and managerial implication suggested: investing in short, targeted onboarding/training for GenAI tools (rather than only providing access) may deliver measurable performance gains and increase voluntary adoption. Adoption Rate | positive | speculative | Organizational adoption and productivity (extrapolated from student trial outcomes) |
n=164
implication: short targeted onboarding may increase adoption and measurable performance gains (extrapolation)
0.06
|