Evidence that AI helps women’s careers is thin and fragmentary: most studies audit bias or offer short-term skills support, while few measure long-term retention or advancement or report robust governance and accountability practices.
Abstract Artificial intelligence (AI) is increasingly integrated into career guidance and organisational decision systems, yet empirical evidence on applications designed to support women’s career development remains limited. Following the PRISMA extension for scoping reviews (PRISMA-ScR) and a preregistered protocol, we searched seven databases (plus backward and forward citation searching) and synthesised 13 empirical studies published between 2018 and 2025. Using inductive thematic analysis, we identified three functional domains: (1) bias mitigation and representation (e.g. auditing gendered language and platform-level disparities), (2) skills development and empowerment (e.g. AI-supported learning and writing interventions) and (3) career pathways and retention (e.g. matching and attrition-risk modelling). The evidence base was concentrated in system-facing applications that detect or shape inequities within recruitment, evaluation and exposure systems; fewer studies evaluated individual-facing developmental support, and sustained career outcomes were rarely measured. Formal theory use was limited, with only a small minority of studies explicitly drawing on established frameworks; reporting on ethics, transparency and governance was inconsistent. We suggest that research prioritises longitudinal and theory-informed evaluations, including intersectionality-informed analyses, and assess downstream impacts on women’s career trajectories alongside robust governance and accountability practices.
Summary
Main Finding
A scoping review of 13 empirical studies (2018–2025) finds AI applications aimed at supporting women’s career development cluster into three functional domains—(1) bias mitigation & representation, (2) skills development & empowerment, and (3) career pathways & retention. Evidence is concentrated on system-facing tools (e.g., auditing language, ad-delivery, screening, exposure models) with few sustained individual-level outcome evaluations. Formal theory, intersectional analyses and transparent ethics/governance reporting are limited, and long-term career impacts are rarely measured.
Key Points
- Evidence base: 13 empirical studies identified via a preregistered PRISMA-ScR search across seven databases (search period 2010–Mar 2025; synthesis focused on 2018–2025).
- Three functional domains:
- Bias mitigation & representation: NLP and auditing tools that detect gendered language, platform-level exposure disparities, and biased visuals.
- Skills development & empowerment: GenAI/NLP-based writing and learning interventions that improved short-term skills/confidence in small samples.
- Career pathways & retention: ML/HR-analytics models for job matching, returnship alignment, and attrition risk forecasting.
- Modality and focus: Majority of studies are system-facing (detecting or shaping inequities in recruitment, evaluation, exposure). Fewer studies evaluate individual-facing developmental supports; almost none track sustained labor-market outcomes (wages, promotions, long-run retention).
- Methods & rigor: Heterogeneous methods (field experiments, predictive modeling, qualitative narratives, quasi-experiments); limited use of formal theoretical frameworks (some use Systems Theory Framework and Social Cognitive Career Theory).
- Ethics & governance: Reporting on transparency, fairness metrics, accountability, and governance is inconsistent across studies.
- Geographic spread: Studies from multiple regions (USA, Europe, Middle East, India, Saudi Arabia, UAE), often context-specific results (e.g., ad-delivery algorithms privileging men because of cost-optimization).
- Representative findings: algorithmic ad delivery prioritized men for STEM ads; NLP revealed persistent gendered descriptors in evaluations and letters of recommendation; ChatGPT-based training improved writing fluency/confidence among teachers.
Data & Methods
- Review design: PRISMA-ScR guided scoping review with preregistered protocol (OSF link provided in paper).
- Search & selection:
- Databases searched: PubMed, Scopus, Web of Science, APA PsyInfo, APA PsycArticles, Psychology & Behavioral Sciences Collection, Google Scholar.
- Initial de-duplication and screening: 702 unique records screened; 36 full-text assessed (24 from databases + 12 via snowballing); 13 studies included.
- Two-reviewer blinded screening using Rayyan; conflicts adjudicated by a third reviewer when needed.
- Data extraction: standardized charting (authors, year, country, population, AI type, theoretical framework, career outcomes, ethical/technical challenges). Dual extraction with cross-checking on 15% of entries.
- Synthesis: Inductive thematic analysis (Braun & Clarke; Saldaña) to generate thematic domains and map studies to system-facing vs individual-facing, proximal vs sustained outcomes, theory use, and ethics reporting.
- AI methods observed in included studies: supervised ML (decision trees, logistic regression), NLP and text-mining, sentiment/emotion analysis, ML ad-delivery systems, GenAI (LLMs, image-based), KNN classifiers, HR analytics.
- Outcome measurement scope:
- Proximal outcomes: detection of bias, short-term skill/confidence gains, model prediction accuracy (e.g., resume-to-role alignment).
- Sustained outcomes: sparse—some attrition-risk forecasts and retrospective cohort analyses, but limited causal or long-term follow-up.
Implications for AI Economics
- Distributional impacts and market efficiency:
- AI can reduce frictions in access to guidance (scale, personalization) and potentially increase labor-market participation and human capital accumulation among women, improving allocative efficiency.
- However, platform-level optimization objectives (e.g., profit or engagement maximization) can generate negative distributional externalities—unequal exposure to job ads or recruitment funnels—that exacerbate gender gaps rather than correct them.
- Labor supply, retention, and human-capital returns:
- Tools that improve short-term skills/confidence may raise female labor supply or persistence in male-dominated fields, but absent long-run evidence, the effect on wages, promotions, and returns to training is unknown.
- Attrition-risk models and targeted retention interventions have potential welfare gains if used to inform equitable HR policies; misuse (e.g., surveillance, punitive measures) could reduce worker welfare and increase turnover.
- Incentive design and objective functions:
- Economists and platform designers should re-specify algorithmic objectives to internalize equity considerations (e.g., multi-objective optimization combining engagement/revenue with exposure parity or downstream diversity targets).
- Regulatory or subsidy mechanisms can shift private incentives—e.g., require fairness constraints, mandate transparency, or subsidize equity-oriented features on recruitment platforms.
- Measurement and causal inference needs:
- To quantify economic value and distributional effects, randomized controlled trials, quasi-experimental designs, and longitudinal administrative datasets are needed to estimate causal impacts on wages, promotions, labor-force participation, job-match quality, and retention.
- Key metrics for economists: treatment effects on earnings, promotion hazard rates, job-match surplus, retention probabilities, return-on-investment of AI coaching, externalities on aggregate labor-market sorting, and cost-effectiveness compared to human-delivered interventions.
- General equilibrium and long-run dynamics:
- Widespread adoption of AI-mediated career tools could shift occupational composition and network effects (e.g., who gets visibility), altering wage structures and returns to skills—dynamic/GE models should assess feedback loops (e.g., firms’ hiring behavior adjusting to AI-screened pools).
- Data, governance, and public policy:
- Poor transparency and limited governance heighten risks of biased outcomes; policy tools (algorithmic audits, disclosure requirements, data-access regimes for independent evaluation) are necessary to ensure accountability and calibrate private incentives.
- Intersectionality matters: economic impacts differ across race, class, geography—data collection and reporting should enable intersectional stratification to avoid masking heterogeneity in returns.
- Research & policy priorities for AI economics:
- Fund longitudinal RCTs and quasi-experimental studies measuring wage and promotion outcomes.
- Evaluate trade-offs between efficiency gains and equity (cost-benefit analyses incorporating distributional weights).
- Develop and test incentive-aligned algorithmic objectives and platform-level constraints that balance profitability with exposure parity.
- Create standardized reporting (transparency, fairness metrics, governance) to enable cross-study meta-analysis and policy evaluation.
- Encourage public-private data partnerships with privacy safeguards to enable external auditing and robust causal research.
In short, AI applications have meaningful potential to lower barriers and scale career supports for women, but economic benefits depend on objective design, governance, and careful measurement of long-run labor-market outcomes; absent those, algorithmic systems risk perpetuating or amplifying existing gendered inequities.
Assessment
Claims (9)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Artificial intelligence (AI) is increasingly integrated into career guidance and organisational decision systems. Adoption Rate | positive | high | integration/adoption of AI into career guidance and organisational decision systems |
0.24
|
| Empirical evidence on applications designed to support women’s career development remains limited. Research Productivity | negative | high | availability/quantity of empirical evidence on AI for women's career development |
n=13
0.24
|
| We searched seven databases (plus backward and forward citation searching) and synthesised 13 empirical studies published between 2018 and 2025. Research Productivity | null_result | high | number of empirical studies identified and synthesized |
n=13
13 empirical studies
0.4
|
| Using inductive thematic analysis, we identified three functional domains: (1) bias mitigation and representation, (2) skills development and empowerment and (3) career pathways and retention. Innovation Output | positive | high | categorisation of AI applications into functional domains |
n=13
0.24
|
| The evidence base was concentrated in system-facing applications that detect or shape inequities within recruitment, evaluation and exposure systems. Adoption Rate | neutral | high | focus of existing empirical studies (system-facing vs individual-facing applications) |
n=13
0.24
|
| Fewer studies evaluated individual-facing developmental support, and sustained career outcomes were rarely measured. Employment | negative | high | number of studies evaluating individual-facing developmental support and measurement of sustained career outcomes |
n=13
0.24
|
| Formal theory use was limited, with only a small minority of studies explicitly drawing on established frameworks. Research Productivity | negative | high | use of formal theoretical frameworks in studies |
n=13
small minority
0.24
|
| Reporting on ethics, transparency and governance was inconsistent. Governance And Regulation | negative | high | consistency of reporting on ethics, transparency and governance in the literature |
n=13
0.24
|
| Research should prioritise longitudinal and theory-informed evaluations, including intersectionality-informed analyses, and assess downstream impacts on women’s career trajectories alongside robust governance and accountability practices. Governance And Regulation | positive | high | recommended research priorities (longitudinal/theory-informed studies, intersectional analyses, governance/accountability assessments) |
0.04
|