AI-Assisted Goal Setting Improves Goal Progress Through Social Accountability

Helping people identify and pursue personally meaningful career goals at scale remains a key challenge in applied psychology. Career coaching can improve goal quality and attainment, but its cost and limited availability restrict access. Large language model (LLM)-based chatbots offer a scalable alternative, yet the psychological mechanisms by which they might support goal pursuit remain untested. Here we report a preregistered three-arm randomised controlled trial (N = 517) comparing an AI career coach ("Leon," powered by Claude Sonnet), a matched structured written questionnaire covering closely matched reflective topics, and a no-support control on goal progress at a two-week follow-up. The AI chatbot produced significantly higher goal progress than the control (d = 0.33, p = .016). Compared with the written-reflection condition, the AI did not significantly improve overall goal progress, but it increased perceived social accountability. In the preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]), whereas self-concordance did not. These findings suggest that AI-assisted goal setting can improve short-term goal progress, and that its clearest added value over structured self-reflection lies in increasing felt accountability.

Summary

Main Finding

An LLM-powered AI career coach ("Leon," Claude Sonnet) produced small but significant improvements in short-term goal progress versus no support (d = 0.33, p = .016) in a preregistered three-arm RCT (Nrandomised = 517; Nanalytic = 323). The AI did not significantly outperform a matched structured written-reflection questionnaire on overall progress, but it did increase perceived social accountability, and accountability statistically mediated the AI-versus-questionnaire effect on progress (indirect effect = 0.15, 95% CI [0.04, 0.31]). Self-concordance did not mediate effects.

Key Points

Design: preregistered, between-subjects RCT with three arms:
- G1: No-support control (enter goals directly).
- G2: Structured written questionnaire (matched reflective topics; enforced minimum engagement).
- G3: AI chatbot coaching session (Leon; interactive conversation across same thematic phases).
Sample and timeline:
- 517 employed adults (UK/US, age 18–50) randomized; analytic sample of 323 completed both sessions and attention checks.
- Two sessions ~14 days apart; primary outcome measured at 2-week follow-up.
Primary outcome:
- Goal progress (9 items, α = .86) at two weeks.
- AI vs Control: MAI = 3.45 vs MControl = 3.02; t = 2.42, p = .016, d = 0.33.
- Questionnaire vs Control: trend but non-significant (d = 0.24, p = .076).
- AI vs Questionnaire: no significant difference in overall progress (d = 0.08, p = .54).
Mediators:
- Perceived accountability (T1) was substantially higher in the AI condition than both questionnaire (d = 0.43, p = .002) and control (d = 0.68, p < .001).
- In the AI vs Questionnaire contrast, perceived accountability mediated the effect on progress (indirect = 0.15, 95% CI [0.04, 0.31]); self-concordance did not mediate.
- Across conditions, accountability predicted higher progress (b = 0.17, p = .004); self-concordance did not.
Exploratory findings:
- Goals produced via the AI were coded (LLM-assisted) as substantially more specific than in questionnaire or control conditions.
- Goal specificity had a modest bivariate association with progress (r = .14); exploratory mediation suggested specificity may contribute to AI benefits but was not a robust independent predictor in adjusted models.
- Both active arms (AI and questionnaire) were rated higher on perceived structured reflection than control; structured reflection appears to be a shared active ingredient for active vs no-support contrasts.
- Higher user satisfaction (NPS) for AI vs questionnaire and control.
Limitations noted by authors:
- Short follow-up (2 weeks) — unknown durability of effects.
- Attrition: analytic sample ≈62.5% of randomized.
- Active control matched on topics but not a single-ingredient manipulation; AI’s conversational features bundled multiple changes (interactivity, contingency, follow-up prompts).
- Preprint; not peer-reviewed.

Data & Methods

Recruitment: Prolific; eligibility: UK/US, employed, 18–50, English fluency.
Randomisation: between-subjects to three arms; preregistered analysis plan available.
Interventions:
- Questionnaire: five open-ended prompts (career background, energizing activities, priorities, constraints, should-vs-want check) with enforced minimum engagement.
- AI chatbot: guided conversational coaching across the same four phases; mean session ~22 minutes, ~49 messages.
Measures:
- T1: accountability, self-concordance (per goal), commitment, manipulation checks (perceived interactivity, perceived structured reflection), NPS.
- T2 (~14 days): primary outcome goal progress (3 items per goal × 3 goals = 9 items), T2 versions of mediators and commitment, NPS.
- Post hoc: goal specificity coded via LLM-based rubric (time-frame and measurability).
Statistical analysis:
- Planned pairwise Welch t-tests for main contrasts.
- Two-mediator (accountability, self-concordance) parallel mediation with 5,000 bootstrap samples for indirect effects.
- ANCOVA robustness checks with demographic and AI-familiarity covariates.
- Exploratory clustered goal-level models for specificity analyses.

Implications for AI Economics

Evidence of scalable impact: a modern LLM-based chatbot produced measurable short-term improvements in goal progress relative to no support. The effect size (d ≈ 0.33) is comparable to typical behavioural goal-setting interventions and smaller than meta-analytic estimates for professional coaching (g ≈ 0.59), suggesting AI can deliver partial but meaningful gains at much lower marginal cost.
Mechanism matters for product/design economics:
- The AI’s added value over structured written reflection appears driven primarily by social-accountability features (contingent conversational interaction, perceived follow-up), not by increased autonomous motivation (self-concordance).
- For practitioners and firms, product features that amplify perceived accountability (e.g., interactive follow-ups, scheduling check-ins, explicit evaluator stance) may yield more behavioral impact than mere static reflection prompts.
Cost-effectiveness and substitution/complementarity:
- Given negligible marginal delivery cost, AI coaching could be highly cost-effective for large-scale employee development, public employment services, or consumer wellbeing offerings—especially for low-intensity, short-horizon goals.
- However, effect sizes are smaller than those reported for professional human coaching. Firms should consider hybrid models: AI for scalable baseline support and triage, with human coaches reserved for high-value or high-complexity cases (complementarity).
Labor-market and welfare considerations:
- If sustained and scaled, AI-driven improvements in goal pursuit could influence human capital accumulation, career transitions, and productivity. Even small per-person gains aggregated across workforces could be economically meaningful.
- The observation that the chatbot surfaced more non-career goals and produced more specific goals suggests LLM coaching could change the composition and specificity of worker aspirations—potentially affecting training uptake, upskilling choices, or propensity to pursue promotions/reskilling.
Evaluation and policy implications:
- Economic deployment requires longer-run evidence linking AI-assisted goal-setting to objective labor-market outcomes (training completion, promotions, earnings), heterogeneous effects across worker types, and robustness across LLMs.
- Regulators and organizations should weigh data privacy, confidentiality, informed consent, and potential misalignment risks when deploying LLM coaches at scale.
Research/evaluation priorities for economics-oriented follow-ups:
- Randomized trials with longer horizons (months to years) measuring objective labor-market outcomes and retention.
- Cost-benefit analyses comparing AI coaching, structured digital interventions, and human coaching (including ROI for employers).
- Experiments decomposing accountability features (e.g., explicit follow-up scheduling, social reporting, identifiable coach persona) to quantify which design elements drive economic value.
- Heterogeneity analyses by worker skill level, occupation, and baseline motivation to understand substitution vs complementarity with traditional training/coaching.

Overall, the study provides experimental evidence that conversational LLMs can modestly improve short-run goal pursuit via perceived social accountability, supporting cautious optimism about economically scalable AI coaching—while underscoring the need for longer-run, outcome-oriented economic evaluations.

Assessment

Paper Typerct Evidence Strengthmedium — Randomized design with a sizeable sample (N=517) supports causal inference for the short-term effect, but outcomes are self-reported goal progress at a two-week follow-up, the follow-up window is short, only one LLM/persona was tested, and demand/expectation effects or sample selection (likely an online convenience sample) may limit confidence in broader causal claims. Methods Rigormedium — High-quality features include preregistration, randomization, an active control (matched written reflection), and mediation testing; however, reliance on self-report outcomes, brief follow-up, limited external validity (single model/persona and unspecified sample representativeness), and potential unmeasured conversational/placebo effects moderate the rigor rating. SamplePreregistered randomized trial with N = 517 adult participants allocated to one of three arms (AI chatbot "Leon" powered by Claude Sonnet; matched structured written-reflection questionnaire; no-support control); primary outcome was self-reported goal progress measured at a two-week follow-up; mediators included perceived social accountability and self-concordance. (Demographic details and recruitment mode not provided in excerpt.) Themeshuman_ai_collab skills_training IdentificationPreregistered three-arm randomized controlled trial: participants were randomly assigned to an AI career coach (Claude Sonnet "Leon"), a matched structured written-reflection questionnaire, or a no-support control; causal claims about the effect of the AI coach on short-term goal progress rest on random assignment. A mediation analysis tested whether perceived social accountability (and self-concordance) explained AI vs questionnaire differences. GeneralizabilityShort follow-up (2 weeks) — unclear persistence of effects, Outcome is self-reported goal progress rather than objective career or productivity outcomes, Single LLM/persona tested — results may not generalize to other models or chatbot designs, Sample likely an online convenience/WEIRD sample (demographics not specified), Unclear which types of career goals were studied — domain-specific limits, Cultural and language context not specified; may not generalize across regions

Claims (8)

Claim	Direction	Confidence	Outcome	Details
We conducted a preregistered three-arm randomized controlled trial (RCT) comparing an AI career coach ('Leon,' powered by Claude Sonnet), a matched structured written questionnaire, and a no-support control. Training Effectiveness	null_result	high	trial design / allocation and follow-up measurement of goal-related outcomes at two weeks	n=517 0.6
The AI chatbot produced significantly higher goal progress than the no-support control at two-week follow-up. Training Effectiveness	positive	high	goal progress (self-reported goal progress at two-week follow-up)	n=517 d = 0.33, p = .016 0.6
Compared with the matched written-reflection questionnaire, the AI did not significantly improve overall goal progress. Training Effectiveness	null_result	high	goal progress (self-reported goal progress at two-week follow-up)	n=517 non-significant vs written-reflection 0.6
The AI increased perceived social accountability relative to the written-reflection questionnaire. Worker Satisfaction	positive	medium	perceived social accountability (self-report)	n=517 0.36
In a preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]). Training Effectiveness	positive	high	goal progress (mediated by perceived social accountability)	n=517 indirect effect = 0.15, 95% CI [0.04, 0.31] 0.6
Self-concordance did not mediate the AI-over-questionnaire effect on goal progress. Training Effectiveness	null_result	high	goal progress (mediator tested: self-concordance, self-report)	n=517 no mediation detected 0.6
AI-assisted goal setting can improve short-term (two-week) goal progress. Training Effectiveness	positive	medium	short-term goal progress (self-reported at two weeks)	n=517 short-term (two-week) improvement (see d = 0.33 vs control) 0.36
The clearest added value of AI over structured self-reflection lies in increasing felt accountability. Worker Satisfaction	positive	medium	perceived social accountability and resulting goal progress	n=517 0.36

An AI career coach modestly boosts short-term goal progress compared with no support, but not versus structured written reflection; its chief added value appears to be raising users' sense of accountability, not deeper changes in goal alignment.