Evidence (7953 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
The paper constructs a multidimensional digitalization index composed of digital infrastructure, digital service capacity, and the digital development environment.
Index construction described in data/methods: composite indicator combining measures of connectivity/broadband (infrastructure), e-commerce/digital finance (service capacity), and policy/institutional/human capital indicators (development environment).
The study is observational (panel) and subject to limitations: residual confounding is possible; two-way fixed-effects estimators can be biased with heterogeneous treatment timing or dynamics; external validity beyond China and non-grain crops is not established.
Authors' stated limitations and caveats in the paper regarding identification and generalizability of results from the CLDS 2014–2018 observational panel.
The study uses two-way fixed-effects (household and year) models as the primary identification strategy and employs propensity score matching (PSM) as a robustness check.
Methods section of the paper describing estimation strategy applied to the CLDS 2014–2018 panel of grain-producing households.
The regional average minimum cost of salaried labor (MCSL) was 43.1% of GDP per worker in 2023.
Computed for the same 19-country sample (baseline 2023) using country statutory employer obligations and reporting MCSL relative to GDP per worker following the updated IDB approach.
The regional average non-wage cost of salaried labor (NWC) in Latin America and the Caribbean was 51.1% of formal wages in 2023.
Calculated for a sample of 19 Latin American and Caribbean countries for baseline year 2023 by compiling country-specific statutory employer obligations (payroll taxes, social contributions, mandated benefits, severance, etc.) and expressing employer non-wage costs relative to formal wages using the updated IDB methodology.
Attributing productivity changes specifically to AI requires causal identification beyond VIS accounting (e.g., experiments, instrumental variables, difference-in-differences).
Paper notes that VIS is an accounting framework and that causal attribution to AI requires econometric/experimental methods beyond input–output accounting.
The method uses BEA for industry output and industry-by-industry transactions, BLS for employment and hours worked, and IMPLAN for detailed input–output structure and sector mapping; coverage period is 2014–2023.
Explicit data sources and time coverage stated: public BEA, BLS, and IMPLAN annual data 2014–2023 used to construct input–output matrices and labor measures.
Limitations of the review include the small sample of studies, uneven geographic coverage, heterogeneity in methods across studies, and limited long‑run evidence (especially on generative AI), which complicate causal aggregation.
Author-reported limitations based on the meta-assessment of the 17 included studies (variation in methods, contexts, and time horizons).
Design of this work: a systematic literature review and meta‑synthesis of empirical findings from peer‑reviewed journals (2020–2025), based on 17 publications.
Stated methods and inclusion criteria of the paper: systematic review of peer‑reviewed literature (sample = 17).
Long-term evidence on generative AI’s structural labor‑market effects is scarce; few longitudinal studies exist.
Assessment of study horizons and methods among the 17 papers indicates limited long-run and longitudinal analyses specifically on generative AI impacts.
Empirical coverage is limited for low‑income countries; evidence from such settings is scarce.
Geographic distribution of the 17 reviewed studies shows concentration in advanced economies with few or no studies focused on low-income countries.
The literature shows a surge in research activity on AI and labor markets in 2023–2025 and a concentration of studies in advanced economies.
Meta-analytic summary of the publication years and geographic focus among the 17 selected publications (temporal and geographic count of included studies).
Results depend on accurate skill extraction from vacancy texts and valid measures of occupational exposure/complementarity; causal interpretation of diffusion effects may be limited by endogeneity (e.g., technology adoption responding to labor-market conditions).
Authors' stated methodological limitations: reliance on text-analysis identification of skills and on constructed measures of exposure/complementarity; acknowledgement of endogeneity concerns limiting causal claims.
The paper proposes two conceptual models (AI/ML‑Driven Labor Market Transformation Model and Sectoral Impact and Resilience Model) to organize heterogeneous findings and generate testable hypotheses about how AI reshapes labor across sectors and skill levels.
Conceptual synthesis integrating Technological Determinism, Socio‑Technical Systems Theory (STS), and Skill‑Biased Technological Change (SBTC); the models are theoretical outputs of the review used to map mechanisms and heterogeneity rather than empirical findings.
There are substantial measurement and identification gaps in the literature: heterogeneity in measuring 'AI adoption', limited long‑run causal evidence, and geographic bias toward advanced economies.
Methodological assessment within the review noting variability across studies in AI measures (patents, investment, task exposure proxies), paucity of long‑run causal designs, and concentration of empirical studies in advanced economies; this is a meta‑evidence limitation statement.
The Iceberg Index indicates where capability exists but does not indicate whether or when job losses will occur.
Explicit caution in the paper noting the distinction between technical exposure (capability overlap) and realized labor-market outcomes; methodological limitation described.
The Iceberg Index captures capability overlap but does not capture firm adoption choices, regulatory constraints, social acceptance, complementarity effects, or worker reallocation dynamics.
Limitations section in the paper explicitly listing these omitted factors; methodological boundaries of the Iceberg Index stated.
Model and simulations are implemented with the AgentTorch framework.
Implementation note in the paper indicating AgentTorch was used to build the agent-based models and run simulations.
The simulation model represents 151 million U.S. workers as autonomous agents, covers 32,000+ distinct skills, links agents to thousands of AI tools, and provides county-level resolution (~3,000 U.S. counties).
Model specification described in the paper: large-population agent-based model (AgentTorch) parameterized with occupation, skills portfolios, wages, and county locations; counts provided in the paper.
The Iceberg Index is a skills-centered metric that measures the wage value of specific skills AI systems can perform within each occupation; it quantifies technical exposure (capability overlap), not displacement, adoption timelines, or realized outcomes.
Methodological definition: mapping of ~32,000 skills to occupations with wage-value contributions, summing wages of skills that current AI capabilities cover to compute the index.
The study maps employment channels for AI-competent graduates and documents the most frequent job titles/roles and associated wage levels.
Descriptive analysis of employer channels, occupational role frequencies, and wage data compiled in the monitoring dataset covering graduates and alternative-route entrants.
Quasi-experimental designs (difference-in-differences, instrumental variables, event studies) and panel regressions are useful methods for identifying causal effects of AI adoption where plausibly exogenous variation exists.
Methodological summary in the paper listing common empirical strategies used in the literature to estimate causal impacts of technology adoption.
Current research is limited by measurement challenges in capturing AI capabilities and firm-level adoption, and by a lack of longitudinal worker-firm data and causal identification in many settings.
Explicit limitations noted by the paper: gaps in task measures, scarce longitudinal linked datasets, and methodological challenges in causal inference.
This paper's approach is qualitative and based on secondary literature synthesis; it does not collect primary survey, experimental, or administrative data.
Explicit statement in the Data & Methods section of the paper.
Key empirical gaps remain: better measurement of K_T (AI/software capital), more granular matched employer‑employee and wealth data, and improved estimates of task-substitution elasticities are required to precisely quantify incidence and policy impacts.
Authors’ stated research agenda and limitations section, including sensitivity analyses showing outcome variation with parameter choices and measurement uncertainty.
Simulated teachers: for each LLM, we ran multiple independent runs treated as simulated teachers (typically around 30–40 per model in the Baseline Experiment and around 20–30 per condition × training-group cell in the Scaffolding Intervention Experiment); the conversation context was reset between teachers and preserved across trials within a teacher; LLMs were not given feedback about the outcome of their teaching to prevent learning during the task.
Methods section 'Simulated teachers' and prompt/instruction descriptions in Methods; sampling details provided in Methods 2.3.
Models are prompted to assess profiles along dimensions of social acceptance, marital stability, and cultural compatibility.
Experimental procedure: prompts asked models to rate profiles on the three named dimensions.
We evaluate five LLM families (GPT, Gemini, Llama, Qwen, and BharatGPT).
Methods: models enumerated as the LLM families evaluated in the audit.
We vary caste identity across Brahmin, Kshatriya, Vaishya, Shudra, and Dalit, and income across five buckets.
Experimental design described: caste identity explicitly manipulated across five named caste categories; income varied across five buckets.
We conduct a controlled audit of caste bias in LLM-mediated matchmaking evaluations using real-world matrimonial profiles.
Described methodology in the paper: a controlled audit using real-world matrimonial profiles to probe LLMs for caste bias.
Policy design to align high-tech industrial development with carbon-reduction goals should account for industrial life-cycle stages and value-chain positions.
Policy implication drawn from the empirical findings (inverted U-shape, stage-dependent mechanism, regional heterogeneity, and subsector differences) in the paper.
The modern labor market needs specialists in teaching professions.
Interpretation by the authors based on the counts of current vacancies and much larger pool of potential positions from the Unified Register of Vacancies and Ministry of Education data.
Employers mostly give preference to teachers who are ready to work in institutions of professional (vocational and technical) and specialized pre-higher education, as well as in private training centers.
Analysis of vacancy postings and/or comparison of vacancy counts across institution types using the Unified Register of Vacancies and Ministry of Education data (paper reports this pattern of employer preference).
The potential (not-yet-vacant but may become so) teacher positions are much more, over 140,000.
Estimates of potential teacher positions taken from data of the Ministry of Education and Science of Ukraine (administrative data reported in the paper).
The total number of current vacancies for teachers in the specified educational institutions is over 1,000.
Count of vacant teacher positions from the Unified Register of Vacancies of the State Employment Service (administrative register analysis reported in the paper).
Repositioning informal systems as co-creators in urban governance (relational public administration) enables transformative governance and effective localization of SDGs in sustainable cities in South Africa.
Conceptual/analytical argumentation (theoretical paper; no empirical sample reported).
Determinants that significantly increase the likelihood of participation in small-scale livestock production in Malawi include household size, access to credit, access to extension services, landholding size, distance to the market, and location in the Northern region.
Cross-sectional analysis of IHS5 (sample = 8,795 households); determinants identified as significant in the analysis.
Households engaged in small-scale livestock production in Malawi earned, on average, an additional MWK 36,405.76 compared to non-producing households.
Cross-sectional analysis of the Fifth Integrated Household Survey (IHS5) with a sample of 8,795 households.
Individuals in Thohoyandou used traditional healing practices (e.g., steam inhalation with stones and salt; herbal concoctions including various named plants and mixtures) to survive COVID-19 without hospitalization, underscoring the significance of traditional healing practices during the pandemic.
Narrative inquiry based on in-depth interviews with three respondents (sample size = 3).
Teacher unions function as a counter-hegemonic force challenging neoliberal geopolitics and political norms and are repositioning as intellectual activists rather than compliant officials.
Qualitative interpretivist analysis of narrative interviews with unionized educators and public union discussions (no sample size reported).
Digitalization significantly enhances market access and supplier diversity for SMMEs.
Qualitative secondary data thematic analysis (literature/reports/industry initiatives; no sample size reported).
Indigenous Knowledge Systems (IKS) represent a dynamic body of wisdom encompassing sustainable agriculture, natural resource management, and community resilience, and offer proven, contextually grounded solutions to modern challenges like climate change and food insecurity.
Qualitative desktop research synthesizing existing literature (literature review; no sample size reported).
Overall, most LLMs achieve high Teaching Scores and are best fit by the Bayes Optimal Teacher, suggesting model-based (mentalizing) teaching strategies rather than model-free heuristics.
Synthesis of Baseline Experiment performance (high Teaching Scores, BOT BIC fits) and cognitive model comparisons shown in Figures 2–4.
In scaffolding conditions LLMs reliably executed the auxiliary selection step: under Reward Scaffolding they preferentially selected edges ranked highest by reward, and under Inference Scaffolding they preferentially selected edges ranked as more likely unknown to the learner.
Scaffolding Intervention Experiment; auxiliary-step selection probabilities plotted in Figure 5 showing edge-rank dependent selection patterns for each LLM and humans.
Bayes Optimal Teacher (BOT) provides the best overall account (via BIC) for the trial-by-trial teaching choices of most LLM models.
Cognitive model fitting and BIC model comparison applied to each simulated teacher's trial-by-trial choices; results shown in Figure 4 (BIC scores and fraction of simulated teachers best fit by each model).
Most LLM teachers were concentrated in the higher-performing range, close to the Bayes Optimal Teacher benchmark, overlapping with higher-performing human subjects; some models (notably GPT-o4-mini, Gemini 2.5 Flash, Claude Sonnet 4.5) showed more variability.
Distribution of individual-level average Teaching Scores in Baseline Experiment (Figure 3) comparing LLMs to humans and to cognitive-model benchmark scores.
Most LLMs showed strong alignment with humans in graph-by-graph performance: seven models had large positive Pearson correlations with human performance (r ≈ 0.76–0.89, all p < 10^-4), and two additional models showed moderate correlations (r ≈ 0.46–0.56, p < .05); GPT-3.5 and Llama-4 Maverick were not significantly correlated with humans.
Baseline Experiment; graph-wise mean Teaching Score computed for 20 unique graphs; Pearson correlations between each model's graph-wise profile and human profile reported in Figure 2 with r and p-values.
Prompts can be treated as decision policies that allocate discretion between researcher and system, governing what is executed and when iteration stops.
Methodological framing advanced by the authors describing prompts as decision policies; conceptual claim based on the paper's analytic framework rather than empirical measurement.
Operational constraints and decision rule prompts deliver large and stable footprint reductions while preserving decision equivalent topic outputs.
Experimental comparisons of prompt strategies in the benchmarked workflow showing reductions in runtime/CO2e and evaluated topic outputs' decision-equivalence (asserted in abstract; no numeric reductions or sample sizes provided).
We benchmark a modern economic survey workflow, an LDA-based literature mapping implemented with GenAI assisted coding and executed in a fixed cloud notebook, measuring runtime and estimated CO2e with CodeCarbon.
Experimental benchmark described in the paper: single implemented workflow (LDA-based literature mapping) executed in a fixed cloud notebook with runtime and CO2e measured using CodeCarbon (methodological claim).