Evidence (14055 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
We re-recruited 530 participants from 52 countries two years after they gave their preferences in the PRISM dataset to evaluate personalised and non-personalised language models in blinded multi-turn conversations (large-scale within-subject experiment).
Study methodology reported in paper: within-subject experiment, re-recruitment of 530 participants from 52 countries, blinded multi-turn conversations comparing models.
Personalisation is a standard feature of conversational AI systems used by millions; yet, the efficacy of personalisation methods is often evaluated in academic research using simulated users rather than real people.
Authors' literature and field observation stated in introduction; contextual claim about common practice in academic evaluations (no numeric experiment reported for this claim).
The study includes Natural Language Processing (NLP) analysis of 5 million consumer contacts.
Methodological statement in the paper specifying the NLP data volume.
The study includes surveys of 800 marketers.
Methodological statement in the paper specifying the survey sample size.
The study includes AI adoption audits from 120 organizations.
Methodological statement in the paper specifying the audits sample size.
LLM-generated solutions contain roughly the same number of ideas as participant-generated solutions.
Comparative analysis of idea counts within solutions reported in the paper; phrased as 'roughly the same number of ideas' (no numeric effect size provided in the abstract).
The findings are consolidated via the AI Engineering Integration Framework and the Skills Transition Risk Matrix, which provide guidelines for strategically harnessing AI while safeguarding the Engineering profession.
Paper reports development of two conceptual/practical tools (framework and matrix) as outputs of the study; no validation details provided in abstract.
Case studies were performed covering five major industries.
Paper's reported methodology (number of case studies stated in abstract).
A Delphi study was conducted with 40 global experts.
Paper's reported methodology (Delphi sample explicitly stated in abstract).
A comprehensive mixed-methods study was conducted, incorporating a survey of 320 organizations.
Paper's reported methodology (survey sample explicitly stated in abstract).
AwareLLM was evaluated in a user study with 20 participants, compared to a standard LLM assistant across multiple tasks.
Experimental methods statement in paper; explicitly reports a user study and sample size.
Using an agent-based simulation of a multi-SKU convenience store environment, the study evaluates deployment efficiency, inventory responsiveness, and managerial cognitive reallocation.
Methodological claim: the paper reports an agent-based simulation experiment in a multi-SKU convenience store context; details such as number of simulations, parameter settings, or statistical results are not provided in the excerpt.
The dominant paradigm for AI agents is an "on-the-fly" loop in which agents synthesize plans and execute actions within seconds or minutes in response to user prompts.
Statement in paper presenting a characterization of current AI agent design; conceptual/observational claim with no empirical data or sample reported.
Persistent data gaps—especially concerning worker-level outcomes, informal labor, and non-Anglophone markets—warrant urgent research investment.
Authors' assessment based on scope of included studies and acknowledged limitations in observation windows and geographic/labor-form coverage.
Following PRISMA 2020 guidelines, we systematically searched six academic databases (Scopus, Web of Science, EconLit, SSRN, IEEE Xplore, Google Scholar) for empirical studies documenting observed—not predicted—labor market changes since 2020; from 1,847 initial records, 94 studies meeting inclusion criteria were retained for qualitative synthesis and 42 for quantitative data extraction.
Methods: systematic literature search following PRISMA 2020 across six named databases; initial records = 1,847; retained = 94 for qualitative synthesis, 42 for quantitative extraction.
We thematically analysed twelve semi-structured interviews with SME owners and managers conducted in early 2025 using Atlas.ti, yielding 19 codes grouped into six categories.
Methods statement in the paper describing qualitative sample and analysis procedures.
We examine the interplay between AI adoption, social capital formation, workforce dynamics, and sustainable development in Eastern Macedonia and Thrace (EMT), one of the EU's least developed regions.
Study context and scope as stated in the paper; empirical work conducted in EMT.
Research has concentrated on advanced urban economies, leaving the implications of AI for peripheral small and medium-sized enterprises (SMEs) operating under weak human capital, thin digital infrastructure, and constrained social capital — underexplored.
Statement in the paper contrasting existing research focus (advanced urban economies) with a lack of attention to peripheral SMEs; no empirical sample size for this bibliographic claim reported in the excerpt.
Once functional deployment and operational investment are controlled for, worker-task use is not associated with employment declines.
Multivariate regression results reported in the paper using BTOS AI supplement data showing the coefficient on worker-task use becomes statistically indistinguishable from zero after controlling for functional deployment and operational investment; exact model details and sample size not provided in excerpt.
This study conducts an empirical analysis using data on industrial robots from the International Federation of Robotics (IFR) and panel data from 14 sub-sectors of China's manufacturing industry.
Statement in paper describing data and methods: use of IFR robot data combined with panel data covering 14 manufacturing sub-sectors (panel regression framework implied).
The location of the Pareto frontier depends only on population characteristics, utility functions and the fairness score, but not on the technical design of the algorithm — the findings hold for pre-processing, in-processing, and post-processing approaches alike.
Theoretical proof/argument demonstrating that the Pareto frontier characterization is a function of distributions, utilities and fairness metric, independent of algorithmic implementation approach (pre-, in-, post-processing).
Under the Brier score specifically, with type-independent inflation cost, the second-best welfare equals the first-best welfare (welfare equivalence).
Analytical result/proof specialized to the Brier score and the assumption of type-independent inflation costs; comparative welfare analysis in the model.
Return forecasts are translated into long–short portfolios to assess economic performance.
Stated evaluation approach: conversion of predicted returns into long–short portfolios for economic/performance assessment.
The analysis is based on 30 market, liquidity, valuation, profitability, technical and risk factors and compares linear models, tree-based machine learning and deep learning architectures (including GRU, LSTM and Transformer) within a rolling-window forecasting framework.
Description of empirical design: use of 30 factor variables and explicit listing of model families (linear, tree-based, GRU, LSTM, Transformer) and use of a rolling-window forecasting setup.
We introduce the weighted evaluation index (WEI), a finance-specific performance metric that integrates prediction accuracy with market adaptability.
Methodological contribution stated in the paper: introduction of a new performance metric called WEI described as integrating accuracy and market adaptability.
We introduce the Diff-RMSE method for nonlinear factor identification.
Methodological contribution stated in the paper: introduction of a new method named 'Diff-RMSE' for identifying nonlinear factors.
The study uses A-share market data from 2013 to 2024 with equity and firm-characteristic data available from databases such as RESSET and CSMAR for more than 5,000 listed firms.
Empirical dataset description in the paper: time period 2013–2024, sources named (RESSET, CSMAR), and statement 'more than 5,000 listed firms'.
The synthesis covers research and practitioner guidance from the years 2023–2025.
Methods statement specifying the temporal scope of sources used for the synthesis.
This paper synthesizes recent research and practitioner guidance (2023–2025) to develop a practical model for designing human–AI collaboration in the financial reporting function (controllership).
Methods section declaration describing scope and approach (literature/practitioner guidance synthesis covering 2023–2025).
We conducted a controlled experiment comparing traditional task-splitting methods with AI-assisted approaches using GitLab Duo.
Methodological statement in the paper reporting a controlled experiment using GitLab Duo; sample size not stated in the provided summary.
We audited 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central.
Direct data collection and audit described in the paper: dataset of 111,000,000 references from 2,500,000 papers across the four named preprint/repository sources.
Future research should test these findings across different institutional contexts, particularly European economies.
Paper's stated limitations and suggestions for future research.
The analysis employs fixed-effects models, U-tests, bootstrap mediation, and patent text similarity analysis.
Methods statement listing econometric and text-analytic techniques used in the paper.
The study uses a sample of 25,204 firm-year observations from Chinese A-share manufacturing companies (2010–2023).
Paper statement of sample and period; descriptive sample construction (firm-year observations = 25,204).
The empirical analysis is based on Chinese A–share listed firms observed from 2012 to 2024 and uses a difference‑in‑differences (DID) identification strategy.
Study description in the paper's methods/abstract specifying sample period (2012–2024), population (Chinese A–share listed firms), and methodology (DID).
We validate the framework empirically on five benchmarks (MATH, MMLU, TriviaQA, SimpleQA, LiveCodeBench) across eight models from five providers.
Empirical experiments reported in the paper using five named datasets and eight models from five providers (experimental evaluation / benchmarking).
For k-model cascades, first-order conditions imply a single shadow price that equalizes marginal quality-per-cost across stage boundaries.
Analytical derivation of first-order conditions for k-stage cascades within the decision-theoretic constrained-optimization framework presented in the paper.
Given a pool of k models, the frontier achievable by deterministic two-model threshold cascades is the pointwise envelope over choose(k,2) pairwise cascades, with switching points where the optimal pair changes.
Theoretical characterization/derivation in the paper (mathematical result about deterministic two-model threshold cascades and combinatorial envelope over pairwise cascades).
Reciprocal shadow prices link the budget-constrained and quality-constrained formulations of the cascade optimization.
Analytical derivation in the decision-theoretic framework using constrained optimization and duality presented in the paper.
For a two-model cascade, the cost-quality frontier is piecewise concave on decreasing-benefit regions of the confidence support.
Theoretical development in a decision-theoretic framework using constrained optimization and duality; proven properties for the two-model case reported in the paper (analytical result).
These results are robust to alternative model specifications, including different lag lengths and forecast horizons.
Robustness checks reported in the paper: re-estimation of TVP-VAR with alternative lag lengths and forecast horizons producing consistent qualitative results.
The emergence of generative AI is not associated with a uniform increase in financial connectedness.
Empirical TVP-VAR analysis comparing connectedness measures before and after the emergence of generative AI (paper compares connectedness over the sample period and reports no uniform increase).
This study uses daily data from January 2021 to December 2025 to analyze spillover dynamics among AI-related equities, cryptocurrencies, and traditional financial assets within a time-varying parameter vector autoregression (TVP-VAR) framework.
Statement of data frequency and sample period plus description of methodology (TVP-VAR) in the paper; empirical analysis applied to specified asset groups.
Under standard smoothness and finite variance conditions, SGD is minimax optimal for finding stationary points measured by l2-norms, thereby fundamentally precluding any complexity gains for sign-based methods in standard settings.
Theoretical statement based on prior minimax optimality results for SGD under standard smoothness and finite-variance assumptions (as cited/used in the paper). No new experiment; relies on worst-case lower-bound theory.
The boundaries (critical thresholds) separating the tax regimes are derived from the workers' budget constraint.
Analytic derivation in the paper showing that constraints coming from the workers' budget constraint produce critical values of τ_ai and τ_f that determine transitions between the three regimes.
The model features quadratic self-amplification in both AI capability (λ A^2) and financial capital (γ_F K_f^2), coupled through investment flows.
Model specification and equations in the paper showing terms λ A^2 for AI capability growth and γ_F K_f^2 for financial capital growth, with explicit investment flow terms linking AI and financial capital.
The study uses a panel dataset of 35,347 firm-year observations from 2010 to 2023.
Reported sample description in the paper: panel dataset covering 2010–2023 with 35,347 firm-year observations.
AI-assisted decision-making paradigms do not have a significant direct effect on task performance.
Experimental study of 59 pre-service teachers using a two-factor mixed design (between-subjects: AI-assisted decision-making paradigms; within-subjects: human-AI consistency). Data analyzed with Bayesian cumulative link mixed model and structural equation modeling; authors report no significant direct effect.
In the U.S., no single 'AI Act' has passed (as of 2026).
Stated in the paper as a factual legal/policy status; this is verifiable via legislative records and is presented without an underlying sample (paper cites status as of 2026).
The authors ran a within-subjects study comparing authoring AD from scratch against editing AI drafts of varying quality.
Explicit methodological statement in the paper (within-subjects study design); sample size not reported in the excerpt.