The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

An AI career coach modestly boosts short-term goal progress compared with no support, but not versus structured written reflection; its chief added value appears to be raising users' sense of accountability, not deeper changes in goal alignment.

AI-Assisted Goal Setting Improves Goal Progress Through Social Accountability
Michel Schimpf, Julian Voigt, Thomas Bohné · March 18, 2026
arxiv rct medium evidence 7/10 relevance Source PDF
A preregistered RCT (N=517) found that an LLM-based AI career coach modestly increased self-reported goal progress versus no support, and its advantage over a matched written-reflection exercise operated through increased perceived social accountability rather than greater self-concordance.

Helping people identify and pursue personally meaningful career goals at scale remains a key challenge in applied psychology. Career coaching can improve goal quality and attainment, but its cost and limited availability restrict access. Large language model (LLM)-based chatbots offer a scalable alternative, yet the psychological mechanisms by which they might support goal pursuit remain untested. Here we report a preregistered three-arm randomised controlled trial (N = 517) comparing an AI career coach ("Leon," powered by Claude Sonnet), a matched structured written questionnaire covering closely matched reflective topics, and a no-support control on goal progress at a two-week follow-up. The AI chatbot produced significantly higher goal progress than the control (d = 0.33, p = .016). Compared with the written-reflection condition, the AI did not significantly improve overall goal progress, but it increased perceived social accountability. In the preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]), whereas self-concordance did not. These findings suggest that AI-assisted goal setting can improve short-term goal progress, and that its clearest added value over structured self-reflection lies in increasing felt accountability.

Summary

Main Finding

An LLM-powered AI career coach ("Leon," Claude Sonnet) produced small but significant improvements in short-term goal progress versus no support (d = 0.33, p = .016) in a preregistered three-arm RCT (Nrandomised = 517; Nanalytic = 323). The AI did not significantly outperform a matched structured written-reflection questionnaire on overall progress, but it did increase perceived social accountability, and accountability statistically mediated the AI-versus-questionnaire effect on progress (indirect effect = 0.15, 95% CI [0.04, 0.31]). Self-concordance did not mediate effects.

Key Points

  • Design: preregistered, between-subjects RCT with three arms:
    • G1: No-support control (enter goals directly).
    • G2: Structured written questionnaire (matched reflective topics; enforced minimum engagement).
    • G3: AI chatbot coaching session (Leon; interactive conversation across same thematic phases).
  • Sample and timeline:
    • 517 employed adults (UK/US, age 18–50) randomized; analytic sample of 323 completed both sessions and attention checks.
    • Two sessions ~14 days apart; primary outcome measured at 2-week follow-up.
  • Primary outcome:
    • Goal progress (9 items, α = .86) at two weeks.
    • AI vs Control: MAI = 3.45 vs MControl = 3.02; t = 2.42, p = .016, d = 0.33.
    • Questionnaire vs Control: trend but non-significant (d = 0.24, p = .076).
    • AI vs Questionnaire: no significant difference in overall progress (d = 0.08, p = .54).
  • Mediators:
    • Perceived accountability (T1) was substantially higher in the AI condition than both questionnaire (d = 0.43, p = .002) and control (d = 0.68, p < .001).
    • In the AI vs Questionnaire contrast, perceived accountability mediated the effect on progress (indirect = 0.15, 95% CI [0.04, 0.31]); self-concordance did not mediate.
    • Across conditions, accountability predicted higher progress (b = 0.17, p = .004); self-concordance did not.
  • Exploratory findings:
    • Goals produced via the AI were coded (LLM-assisted) as substantially more specific than in questionnaire or control conditions.
    • Goal specificity had a modest bivariate association with progress (r = .14); exploratory mediation suggested specificity may contribute to AI benefits but was not a robust independent predictor in adjusted models.
    • Both active arms (AI and questionnaire) were rated higher on perceived structured reflection than control; structured reflection appears to be a shared active ingredient for active vs no-support contrasts.
    • Higher user satisfaction (NPS) for AI vs questionnaire and control.
  • Limitations noted by authors:
    • Short follow-up (2 weeks) — unknown durability of effects.
    • Attrition: analytic sample ≈62.5% of randomized.
    • Active control matched on topics but not a single-ingredient manipulation; AI’s conversational features bundled multiple changes (interactivity, contingency, follow-up prompts).
    • Preprint; not peer-reviewed.

Data & Methods

  • Recruitment: Prolific; eligibility: UK/US, employed, 18–50, English fluency.
  • Randomisation: between-subjects to three arms; preregistered analysis plan available.
  • Interventions:
    • Questionnaire: five open-ended prompts (career background, energizing activities, priorities, constraints, should-vs-want check) with enforced minimum engagement.
    • AI chatbot: guided conversational coaching across the same four phases; mean session ~22 minutes, ~49 messages.
  • Measures:
    • T1: accountability, self-concordance (per goal), commitment, manipulation checks (perceived interactivity, perceived structured reflection), NPS.
    • T2 (~14 days): primary outcome goal progress (3 items per goal × 3 goals = 9 items), T2 versions of mediators and commitment, NPS.
    • Post hoc: goal specificity coded via LLM-based rubric (time-frame and measurability).
  • Statistical analysis:
    • Planned pairwise Welch t-tests for main contrasts.
    • Two-mediator (accountability, self-concordance) parallel mediation with 5,000 bootstrap samples for indirect effects.
    • ANCOVA robustness checks with demographic and AI-familiarity covariates.
    • Exploratory clustered goal-level models for specificity analyses.

Implications for AI Economics

  • Evidence of scalable impact: a modern LLM-based chatbot produced measurable short-term improvements in goal progress relative to no support. The effect size (d ≈ 0.33) is comparable to typical behavioural goal-setting interventions and smaller than meta-analytic estimates for professional coaching (g ≈ 0.59), suggesting AI can deliver partial but meaningful gains at much lower marginal cost.
  • Mechanism matters for product/design economics:
    • The AI’s added value over structured written reflection appears driven primarily by social-accountability features (contingent conversational interaction, perceived follow-up), not by increased autonomous motivation (self-concordance).
    • For practitioners and firms, product features that amplify perceived accountability (e.g., interactive follow-ups, scheduling check-ins, explicit evaluator stance) may yield more behavioral impact than mere static reflection prompts.
  • Cost-effectiveness and substitution/complementarity:
    • Given negligible marginal delivery cost, AI coaching could be highly cost-effective for large-scale employee development, public employment services, or consumer wellbeing offerings—especially for low-intensity, short-horizon goals.
    • However, effect sizes are smaller than those reported for professional human coaching. Firms should consider hybrid models: AI for scalable baseline support and triage, with human coaches reserved for high-value or high-complexity cases (complementarity).
  • Labor-market and welfare considerations:
    • If sustained and scaled, AI-driven improvements in goal pursuit could influence human capital accumulation, career transitions, and productivity. Even small per-person gains aggregated across workforces could be economically meaningful.
    • The observation that the chatbot surfaced more non-career goals and produced more specific goals suggests LLM coaching could change the composition and specificity of worker aspirations—potentially affecting training uptake, upskilling choices, or propensity to pursue promotions/reskilling.
  • Evaluation and policy implications:
    • Economic deployment requires longer-run evidence linking AI-assisted goal-setting to objective labor-market outcomes (training completion, promotions, earnings), heterogeneous effects across worker types, and robustness across LLMs.
    • Regulators and organizations should weigh data privacy, confidentiality, informed consent, and potential misalignment risks when deploying LLM coaches at scale.
  • Research/evaluation priorities for economics-oriented follow-ups:
    • Randomized trials with longer horizons (months to years) measuring objective labor-market outcomes and retention.
    • Cost-benefit analyses comparing AI coaching, structured digital interventions, and human coaching (including ROI for employers).
    • Experiments decomposing accountability features (e.g., explicit follow-up scheduling, social reporting, identifiable coach persona) to quantify which design elements drive economic value.
    • Heterogeneity analyses by worker skill level, occupation, and baseline motivation to understand substitution vs complementarity with traditional training/coaching.

Overall, the study provides experimental evidence that conversational LLMs can modestly improve short-run goal pursuit via perceived social accountability, supporting cautious optimism about economically scalable AI coaching—while underscoring the need for longer-run, outcome-oriented economic evaluations.

Assessment

Paper Typerct Evidence Strengthmedium — Randomized design with a sizeable sample (N=517) supports causal inference for the short-term effect, but outcomes are self-reported goal progress at a two-week follow-up, the follow-up window is short, only one LLM/persona was tested, and demand/expectation effects or sample selection (likely an online convenience sample) may limit confidence in broader causal claims. Methods Rigormedium — High-quality features include preregistration, randomization, an active control (matched written reflection), and mediation testing; however, reliance on self-report outcomes, brief follow-up, limited external validity (single model/persona and unspecified sample representativeness), and potential unmeasured conversational/placebo effects moderate the rigor rating. SamplePreregistered randomized trial with N = 517 adult participants allocated to one of three arms (AI chatbot "Leon" powered by Claude Sonnet; matched structured written-reflection questionnaire; no-support control); primary outcome was self-reported goal progress measured at a two-week follow-up; mediators included perceived social accountability and self-concordance. (Demographic details and recruitment mode not provided in excerpt.) Themeshuman_ai_collab skills_training IdentificationPreregistered three-arm randomized controlled trial: participants were randomly assigned to an AI career coach (Claude Sonnet "Leon"), a matched structured written-reflection questionnaire, or a no-support control; causal claims about the effect of the AI coach on short-term goal progress rest on random assignment. A mediation analysis tested whether perceived social accountability (and self-concordance) explained AI vs questionnaire differences. GeneralizabilityShort follow-up (2 weeks) — unclear persistence of effects, Outcome is self-reported goal progress rather than objective career or productivity outcomes, Single LLM/persona tested — results may not generalize to other models or chatbot designs, Sample likely an online convenience/WEIRD sample (demographics not specified), Unclear which types of career goals were studied — domain-specific limits, Cultural and language context not specified; may not generalize across regions

Claims (8)

ClaimDirectionConfidenceOutcomeDetails
We conducted a preregistered three-arm randomized controlled trial (RCT) comparing an AI career coach ('Leon,' powered by Claude Sonnet), a matched structured written questionnaire, and a no-support control. Training Effectiveness null_result high trial design / allocation and follow-up measurement of goal-related outcomes at two weeks
n=517
0.6
The AI chatbot produced significantly higher goal progress than the no-support control at two-week follow-up. Training Effectiveness positive high goal progress (self-reported goal progress at two-week follow-up)
n=517
d = 0.33, p = .016
0.6
Compared with the matched written-reflection questionnaire, the AI did not significantly improve overall goal progress. Training Effectiveness null_result high goal progress (self-reported goal progress at two-week follow-up)
n=517
non-significant vs written-reflection
0.6
The AI increased perceived social accountability relative to the written-reflection questionnaire. Worker Satisfaction positive medium perceived social accountability (self-report)
n=517
0.36
In a preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]). Training Effectiveness positive high goal progress (mediated by perceived social accountability)
n=517
indirect effect = 0.15, 95% CI [0.04, 0.31]
0.6
Self-concordance did not mediate the AI-over-questionnaire effect on goal progress. Training Effectiveness null_result high goal progress (mediator tested: self-concordance, self-report)
n=517
no mediation detected
0.6
AI-assisted goal setting can improve short-term (two-week) goal progress. Training Effectiveness positive medium short-term goal progress (self-reported at two weeks)
n=517
short-term (two-week) improvement (see d = 0.33 vs control)
0.36
The clearest added value of AI over structured self-reflection lies in increasing felt accountability. Worker Satisfaction positive medium perceived social accountability and resulting goal progress
n=517
0.36

Notes