Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

Can targeted user training unlock the productive potential of generative artificial intelligence (GenAI) in professional settings? We investigate this question using a randomized study involving 164 law students completing an issue-spotting examination. Participants were assigned to one of three conditions: no GenAI access, optional access to a large language model (LLM), or optional access accompanied by an approximately ten-minute training intervention. Training significantly increased LLM adoption--the usage rate rose from 26% to 41%--and improved examination performance. Students with trained access scored 0.27 grade points higher than those with untrained access (p = 0.027), equivalent to roughly one-third of a letter grade. By contrast, access to an LLM without training did not improve performance and was associated with shorter answers relative to no access. Using principal stratification, we decompose the overall effect into adoption and effectiveness channels. Point estimates are consistent with training operating primarily by expanding the scope of GenAI use rather than by enhancing effectiveness among existing users, though confidence intervals are wide. Overall, our findings provide evidence that complementary investments in user training are critical for realizing GenAI productivity gains in knowledge-intensive fields where concerns about reliability may inhibit adoption.

Summary

Main Finding

A brief, targeted training intervention substantially increased voluntary adoption of a generative LLM and improved objective performance on a law issue‑spotting exam. Training raised reported LLM use from 26% to 41% and yielded a 0.27 grade‑point improvement versus untrained LLM access (p = 0.027, roughly one‑third of a letter grade). LLM access without training did not improve scores (and was associated with shorter answers). Principal‑stratification estimates are consistent with training working mainly by inducing additional users (extensive margin) rather than materially improving effectiveness among those who would have used the LLM anyway, though confidence intervals are wide.

Key Points

Design: Randomized controlled trial with three arms:
- Group 1: No GenAI access (Westlaw only).
- Group 2: Optional LLM access (DeepSeek), no guidance.
- Group 3: Optional LLM access + ~9.5 minute training video + 5‑question quiz.
Sample: 213 signed up; 164 completed the study and were analyzed (Group sizes in analysis: 49, 57, 58).
Task: Open‑book, timed (75 min) issue‑spotting examination in contract law (identification + analysis of major and sub‑issues).
Adoption effect: Self‑reported LLM use rose from 26% (Group 2) to 41% (Group 3).
Performance effect:
- Trained access (Group 3) vs untrained access (Group 2): +0.27 grade points, p = 0.027.
- Untrained LLM access vs no access: no statistically significant improvement; answers tended to be shorter.
Mechanism evidence: Principal stratification decomposition suggests training primarily expands who chooses to use GenAI (induced users) rather than substantially increasing the per‑user effectiveness of GenAI use. Estimates consistent with this mechanism but imprecise.
Training content emphasized prompting strategies, decomposing tasks, multiple prompts/iterations, giving feedback, and strong cautions about hallucinations and the need for human verification.
Pre‑registration: hypotheses pre‑registered; some analyses declared when not pre‑registered.

Data & Methods

Randomization: between‑subjects assignment to three conditions to isolate the marginal effect of training while holding LLM access constant between Groups 2 and 3.
Participants: LL.B. and J.D. students enrolled in contract law at the University of Hong Kong; invited ~2 months before the official exam; compensation/gamification (gift card + feedback).
Interventions:
- DeepSeek LLM made available to Groups 2 and 3.
- Training: 9m30s instructional video demonstrating best practices, warnings about reliability, and a short quiz (delivered only to Group 3).
Outcome measures:
- Primary: exam score (graded against rubric listing major issues and sub‑issues).
- Secondary: self‑reported DeepSeek use (binary), answer length, and post‑exam survey items on prior training and attitudes.
Estimation:
- Mean comparisons across groups for adoption and scores (p‑values reported).
- Principal stratification used to decompose effects into (a) adoption/extensive margin and (b) effectiveness/intensive margin among would‑be users.
Attrition and protocol:
- 164 of 213 completed; groups run in separate classrooms without prior disclosure of differing LLM access to minimize selection and spillovers.
Limitations noted by authors: student sample (not practicing lawyers), single task context (issue‑spotting), self‑reported usage, short training and one LLM, and wide confidence intervals in mechanistic decomposition.

Implications for AI Economics

Complementary investments matter: Technology access alone may not translate into productivity gains in judgment‑intensive, error‑sensitive tasks. Small, low‑cost training can materially raise adoption and realized gains.
Adoption frictions are central: Training appears to lower psychological or perceived risk costs (k_T in the authors’ model), expanding the set of users and use‑cases—this highlights the importance of non‑pecuniary frictions in diffusion models for high‑skill occupations.
Distributional effects & heterogeneity:
- Theoretical model and some empirical patterns align with the idea that lower‑ability users tend to gain more from effective GenAI use, but training may shift who adopts rather than raising per‑user effectiveness; this can produce leveling effects or alter occupational task allocation depending on how training is targeted.
Organizational policy and human capital:
- Firms and professional organizations should treat user training, guidance, and verification protocols as complements to GenAI deployment—investments in instruction, workflows, and oversight may unlock productivity that raw access does not.
Measurement and evaluation of AI productivity:
- Lab/field differences matter: controlled studies showing GenAI capabilities are necessary but insufficient for estimating real‑world productivity gains; adoption and interaction patterns conditional on instruction, incentives, and accountability are critical mediators.
Research agenda:
- Broader field experiments with practitioners, longer or iterative training, integration of retrieval‑augmented systems, and task‑varied settings will be needed to quantify persistence, external validity, and the balance between extensive vs intensive margin effects.
Caution for regulators and firms:
- Rapid rollout of GenAI without investment in user literacy and verification can yield non‑trivial risks in high‑stakes professions (hallucinations, malpractice, liability). Policy should encourage training, transparent disclosure, and monitoring as part of safe adoption.

Assessment

Paper Typerct Evidence Strengthmedium — A randomized experiment provides credible internal identification for the reported effects; the trained-vs-untrained comparison yields a statistically significant score gain (p=0.027). However, the sample is modest (n=164), mechanism-specific estimates are imprecise with wide confidence intervals, and evidence is limited to a single short task and population, which constrains inference beyond the study context. Methods Rigormedium — Design strengths include randomization, intent-to-treat analysis, and a principled attempt to decompose channels via principal stratification; weaknesses include small sample size reducing power for heterogeneity/mechanism estimates, limited description of the training content and fidelity, and reliance on a single task and setting. Sample164 law students taking an issue-spotting exam, randomized into three arms (no access; optional access to an LLM; optional access plus ~10-minute training). Primary outcomes were voluntary LLM use, exam score (grade points), and answer length. Themesproductivity adoption skills_training human_ai_collab IdentificationRandomized assignment to three arms (no LLM access; optional LLM access; optional LLM access + ~10-minute training) with intent-to-treat comparisons across arms; principal stratification used to decompose the total effect into an adoption channel versus effectiveness among compliers. GeneralizabilitySample limited to law students rather than practicing professionals, Single task (law-school issue-spotting exam) — may not generalize to other knowledge tasks or workplace settings, Short-term intervention and outcomes only (no long-run learning or retention measured), Specific LLM/model and exact training content not fully specified — effects may vary by model or training design, Modest sample size limits ability to assess heterogeneity across user types or tasks

Claims (13)

Claim	Direction	Confidence	Outcome	Details
A brief, targeted training increased voluntary LLM use from 26% (optional access without training) to 41% (optional access with training). Adoption Rate	positive	high	LLM adoption (whether the student used the LLM)	n=164 increase from 26% to 41% (15 percentage points) 0.6
Training improved exam scores by 0.27 grade points relative to optional access without training (p = 0.027). Output Quality	positive	high	Exam score (grade points) on a law-school issue-spotting exam	n=164 +0.27 grade points (p = 0.027) 0.6
Providing optional LLM access without training did not increase average exam scores versus no LLM access. Output Quality	null_result	medium	Exam score (grade points)	n=164 no significant average difference versus no-access 0.36
Optional LLM access without training was associated with shorter written answers compared with no LLM access. Output Quality	negative	medium	Answer length (measured length of exam answers)	n=164 shorter written answers in untrained optional-access group (directional) 0.36
Principal stratification analysis suggests the training’s effect on scores operated primarily by expanding the set of LLM users (an adoption channel) rather than substantially improving per-user productivity among those who would already use the LLM. Adoption Rate	mixed	low	Mechanism components: adoption rate and per-user effectiveness (score conditional on usage)	n=164 mechanism decomposition indicates larger contribution from adoption margin than within-user productivity (imprecise) 0.18
Some mechanism-specific estimates are imprecise due to the sample size; confidence intervals for those estimates are wide. Other	mixed	high	Precision of mechanism estimates (confidence interval width for adoption vs productivity decomposition)	n=164 mechanism-specific estimates imprecise; wide confidence intervals 0.6
The intervention consisted of roughly a ten-minute training focused on how to use the LLM effectively. Other	null_result	high	Intervention duration/content (training implementation)	n=164 approximately 10-minute targeted training intervention (implementation detail) 0.6
The study used a randomized controlled design with three arms: no LLM access, optional LLM access, and optional LLM access plus brief training. Other	null_result	high	Study design (randomization and arm definitions)	n=164 randomized controlled design with three arms (no access; optional access; optional access + training) 0.6
The primary outcomes analyzed were LLM adoption (use), exam score (grade points), and answer length. Other	null_result	high	Adoption; exam score; answer length	n=164 primary outcomes: LLM adoption (use), exam score, answer length 0.6
The observed score improvement of 0.27 grade points corresponds roughly to one-third of a letter grade. Output Quality	positive	medium	Exam score (grade points; interpreted as fraction of a letter grade)	n=164 +0.27 grade points (~one-third of a letter grade) 0.36
Analyses were conducted as intent-to-treat comparisons across arms, with hypothesis tests reported (including p-values) and principal stratification used for mechanism decomposition. Other	null_result	high	Analysis methods (ITT, hypothesis tests, principal stratification)	n=164 analyses conducted as intent-to-treat with hypothesis tests and principal stratification 0.6
Results and implications are limited by the sample and context: evidence comes from law students on a single issue-spotting exam using one brief training intervention, so generalizability to experienced professionals, other tasks, or other models is untested. Other	mixed	high	Generalizability/applicability to other populations and tasks	n=164 limited generalizability (law students, single exam, one brief training) 0.6
Policy and managerial implication suggested: investing in short, targeted onboarding/training for GenAI tools (rather than only providing access) may deliver measurable performance gains and increase voluntary adoption. Adoption Rate	positive	speculative	Organizational adoption and productivity (extrapolated from student trial outcomes)	n=164 implication: short targeted onboarding may increase adoption and measurable performance gains (extrapolation) 0.06

A ten-minute onboarding session substantially raised voluntary use of an LLM and produced measurable performance gains on a law-school exam: training increased usage from 26% to 41% and improved scores by 0.27 grade points, whereas untrained access delivered no score benefit.