An AI career coach modestly boosts short-term goal progress compared with no support, but not versus structured written reflection; its chief added value appears to be raising users' sense of accountability, not deeper changes in goal alignment.
Helping people identify and pursue personally meaningful career goals at scale remains a key challenge in applied psychology. Career coaching can improve goal quality and attainment, but its cost and limited availability restrict access. Large language model (LLM)-based chatbots offer a scalable alternative, yet the psychological mechanisms by which they might support goal pursuit remain untested. Here we report a preregistered three-arm randomised controlled trial (N = 517) comparing an AI career coach ("Leon," powered by Claude Sonnet), a matched structured written questionnaire covering closely matched reflective topics, and a no-support control on goal progress at a two-week follow-up. The AI chatbot produced significantly higher goal progress than the control (d = 0.33, p = .016). Compared with the written-reflection condition, the AI did not significantly improve overall goal progress, but it increased perceived social accountability. In the preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]), whereas self-concordance did not. These findings suggest that AI-assisted goal setting can improve short-term goal progress, and that its clearest added value over structured self-reflection lies in increasing felt accountability.
Summary
Main Finding
An LLM-powered AI career coach ("Leon," Claude Sonnet) produced small but significant improvements in short-term goal progress versus no support (d = 0.33, p = .016) in a preregistered three-arm RCT (Nrandomised = 517; Nanalytic = 323). The AI did not significantly outperform a matched structured written-reflection questionnaire on overall progress, but it did increase perceived social accountability, and accountability statistically mediated the AI-versus-questionnaire effect on progress (indirect effect = 0.15, 95% CI [0.04, 0.31]). Self-concordance did not mediate effects.
Key Points
- Design: preregistered, between-subjects RCT with three arms:
- G1: No-support control (enter goals directly).
- G2: Structured written questionnaire (matched reflective topics; enforced minimum engagement).
- G3: AI chatbot coaching session (Leon; interactive conversation across same thematic phases).
- Sample and timeline:
- 517 employed adults (UK/US, age 18–50) randomized; analytic sample of 323 completed both sessions and attention checks.
- Two sessions ~14 days apart; primary outcome measured at 2-week follow-up.
- Primary outcome:
- Goal progress (9 items, α = .86) at two weeks.
- AI vs Control: MAI = 3.45 vs MControl = 3.02; t = 2.42, p = .016, d = 0.33.
- Questionnaire vs Control: trend but non-significant (d = 0.24, p = .076).
- AI vs Questionnaire: no significant difference in overall progress (d = 0.08, p = .54).
- Mediators:
- Perceived accountability (T1) was substantially higher in the AI condition than both questionnaire (d = 0.43, p = .002) and control (d = 0.68, p < .001).
- In the AI vs Questionnaire contrast, perceived accountability mediated the effect on progress (indirect = 0.15, 95% CI [0.04, 0.31]); self-concordance did not mediate.
- Across conditions, accountability predicted higher progress (b = 0.17, p = .004); self-concordance did not.
- Exploratory findings:
- Goals produced via the AI were coded (LLM-assisted) as substantially more specific than in questionnaire or control conditions.
- Goal specificity had a modest bivariate association with progress (r = .14); exploratory mediation suggested specificity may contribute to AI benefits but was not a robust independent predictor in adjusted models.
- Both active arms (AI and questionnaire) were rated higher on perceived structured reflection than control; structured reflection appears to be a shared active ingredient for active vs no-support contrasts.
- Higher user satisfaction (NPS) for AI vs questionnaire and control.
- Limitations noted by authors:
- Short follow-up (2 weeks) — unknown durability of effects.
- Attrition: analytic sample ≈62.5% of randomized.
- Active control matched on topics but not a single-ingredient manipulation; AI’s conversational features bundled multiple changes (interactivity, contingency, follow-up prompts).
- Preprint; not peer-reviewed.
Data & Methods
- Recruitment: Prolific; eligibility: UK/US, employed, 18–50, English fluency.
- Randomisation: between-subjects to three arms; preregistered analysis plan available.
- Interventions:
- Questionnaire: five open-ended prompts (career background, energizing activities, priorities, constraints, should-vs-want check) with enforced minimum engagement.
- AI chatbot: guided conversational coaching across the same four phases; mean session ~22 minutes, ~49 messages.
- Measures:
- T1: accountability, self-concordance (per goal), commitment, manipulation checks (perceived interactivity, perceived structured reflection), NPS.
- T2 (~14 days): primary outcome goal progress (3 items per goal × 3 goals = 9 items), T2 versions of mediators and commitment, NPS.
- Post hoc: goal specificity coded via LLM-based rubric (time-frame and measurability).
- Statistical analysis:
- Planned pairwise Welch t-tests for main contrasts.
- Two-mediator (accountability, self-concordance) parallel mediation with 5,000 bootstrap samples for indirect effects.
- ANCOVA robustness checks with demographic and AI-familiarity covariates.
- Exploratory clustered goal-level models for specificity analyses.
Implications for AI Economics
- Evidence of scalable impact: a modern LLM-based chatbot produced measurable short-term improvements in goal progress relative to no support. The effect size (d ≈ 0.33) is comparable to typical behavioural goal-setting interventions and smaller than meta-analytic estimates for professional coaching (g ≈ 0.59), suggesting AI can deliver partial but meaningful gains at much lower marginal cost.
- Mechanism matters for product/design economics:
- The AI’s added value over structured written reflection appears driven primarily by social-accountability features (contingent conversational interaction, perceived follow-up), not by increased autonomous motivation (self-concordance).
- For practitioners and firms, product features that amplify perceived accountability (e.g., interactive follow-ups, scheduling check-ins, explicit evaluator stance) may yield more behavioral impact than mere static reflection prompts.
- Cost-effectiveness and substitution/complementarity:
- Given negligible marginal delivery cost, AI coaching could be highly cost-effective for large-scale employee development, public employment services, or consumer wellbeing offerings—especially for low-intensity, short-horizon goals.
- However, effect sizes are smaller than those reported for professional human coaching. Firms should consider hybrid models: AI for scalable baseline support and triage, with human coaches reserved for high-value or high-complexity cases (complementarity).
- Labor-market and welfare considerations:
- If sustained and scaled, AI-driven improvements in goal pursuit could influence human capital accumulation, career transitions, and productivity. Even small per-person gains aggregated across workforces could be economically meaningful.
- The observation that the chatbot surfaced more non-career goals and produced more specific goals suggests LLM coaching could change the composition and specificity of worker aspirations—potentially affecting training uptake, upskilling choices, or propensity to pursue promotions/reskilling.
- Evaluation and policy implications:
- Economic deployment requires longer-run evidence linking AI-assisted goal-setting to objective labor-market outcomes (training completion, promotions, earnings), heterogeneous effects across worker types, and robustness across LLMs.
- Regulators and organizations should weigh data privacy, confidentiality, informed consent, and potential misalignment risks when deploying LLM coaches at scale.
- Research/evaluation priorities for economics-oriented follow-ups:
- Randomized trials with longer horizons (months to years) measuring objective labor-market outcomes and retention.
- Cost-benefit analyses comparing AI coaching, structured digital interventions, and human coaching (including ROI for employers).
- Experiments decomposing accountability features (e.g., explicit follow-up scheduling, social reporting, identifiable coach persona) to quantify which design elements drive economic value.
- Heterogeneity analyses by worker skill level, occupation, and baseline motivation to understand substitution vs complementarity with traditional training/coaching.
Overall, the study provides experimental evidence that conversational LLMs can modestly improve short-run goal pursuit via perceived social accountability, supporting cautious optimism about economically scalable AI coaching—while underscoring the need for longer-run, outcome-oriented economic evaluations.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We conducted a preregistered three-arm randomized controlled trial (RCT) comparing an AI career coach ('Leon,' powered by Claude Sonnet), a matched structured written questionnaire, and a no-support control. Training Effectiveness | null_result | high | trial design / allocation and follow-up measurement of goal-related outcomes at two weeks |
n=517
0.6
|
| The AI chatbot produced significantly higher goal progress than the no-support control at two-week follow-up. Training Effectiveness | positive | high | goal progress (self-reported goal progress at two-week follow-up) |
n=517
d = 0.33, p = .016
0.6
|
| Compared with the matched written-reflection questionnaire, the AI did not significantly improve overall goal progress. Training Effectiveness | null_result | high | goal progress (self-reported goal progress at two-week follow-up) |
n=517
non-significant vs written-reflection
0.6
|
| The AI increased perceived social accountability relative to the written-reflection questionnaire. Worker Satisfaction | positive | medium | perceived social accountability (self-report) |
n=517
0.36
|
| In a preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]). Training Effectiveness | positive | high | goal progress (mediated by perceived social accountability) |
n=517
indirect effect = 0.15, 95% CI [0.04, 0.31]
0.6
|
| Self-concordance did not mediate the AI-over-questionnaire effect on goal progress. Training Effectiveness | null_result | high | goal progress (mediator tested: self-concordance, self-report) |
n=517
no mediation detected
0.6
|
| AI-assisted goal setting can improve short-term (two-week) goal progress. Training Effectiveness | positive | medium | short-term goal progress (self-reported at two weeks) |
n=517
short-term (two-week) improvement (see d = 0.33 vs control)
0.36
|
| The clearest added value of AI over structured self-reflection lies in increasing felt accountability. Worker Satisfaction | positive | medium | perceived social accountability and resulting goal progress |
n=517
0.36
|