Evidence (8625 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
The study uses 5 million job postings from Beijing covering 2018--2024 as its primary data source.
Stated dataset scope and size in the paper's description of data.
We construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models.
Methodological construction described in the paper: task-level GenAI suitability assessments from five LLMs applied to tasks in 5 million Beijing job postings (2018--2024), aggregated to the neighborhood level.
Decision-makers (DMs) are similarly ambiguity-seeking and ambiguity-generated insensitive (a-insensitive) regardless of whether the analyst is human or a machine learning (ML) model.
Incentivized laboratory experiment in which participants' ambiguity attitudes were measured for forecasts attributed to human and ML analysts; comparison of ambiguity-seeking and a-insensitivity across analyst type reported in the paper (sample size not reported in abstract).
The same bias was not observed when imagining help from another human participant.
Empirical comparison reported in the abstract: predictions about receiving help from another human did not show the same faster-than-reality bias as predictions about AI assistance (from the same preregistered study, N = 1237).
Actual completion times between independent completion and AI-assisted completion did not differ.
Empirical result reported in the abstract comparing measured completion times for independent vs. AI-assisted task completion in the preregistered study (N = 1237).
We conducted a preregistered large-scale behavioral study (N = 1237) to characterize mismatches between expectations and reality, with a focus on simple cognitive tasks.
Authors report study design and sample size in the abstract: preregistered behavioral experiment with N = 1237 participants.
Identification strategy exploits import lumpiness in product categories linked to automation technologies (including robots) to disentangle adoption effects from selection into adoption.
Methodological claim: use of import 'lumpiness' in automation-related product categories as a plausibly exogenous source of adoption variation within a difference-in-differences framework.
We integrate datasets on trade activities, firm, and worker characteristics for the population of Italian importing firms from 2011 to 2019.
Data integration described in abstract; population-level administrative datasets on trade, firm, and worker characteristics for Italian importing firms covering years 2011–2019.
The study examines the impact of AI technologies on Uzbekistan's labor market transformation in the context of implementing the national strategy 'Digital Uzbekistan - 2030' and the Strategy for the Development of AI Technologies until 2030.
Framing and scope statement in the paper; analysis based on national strategy documents, statistical data, industry reviews, and regulatory legal documents.
The system was evaluated on a real 64-GPU A100 testbed emulating three wind-powered sites with Azure production traces.
Experimental evaluation described in abstract: 64-GPU A100 testbed, emulation of three sites, use of Azure production traces.
The paper includes comparisons against accelerated baselines (reported experimental comparisons).
Statement in experimental section that comparisons to accelerated baselines were performed; specific baselines and results are in the paper.
The paper examines the legal implications of overusing export controls.
Statement of the paper's analytic scope and structure (description of content).
We sample 50 benchmark games from a 2,000-game generated pool and evaluate nine frontier and open-weight LLMs in a head-to-head tournament with over 36,000 matches.
Empirical setup reported in the paper's abstract: 50 sampled games, 2,000-game pool, nine LLMs, >36,000 head-to-head matches.
We interviewed 24 product-focused individuals at a large technology firm about how AI has impacted their own work, their work within their product team, and their professional interactions.
Qualitative semi-structured interviews with 24 product-focused employees at a single large technology firm; sample size = 24.
This scoping review adhered to the PRISMA-ScR guidelines and encompassed 29 peer-reviewed empirical studies published from 2020 to 2025.
Methods statement in the paper (explicit methodological description).
The paper identifies five major research gaps and proposes future research directions in intelligent international marketing.
Author-reported outcome of the paper's systematic review and content analysis (2010–2025); descriptive claim about the paper's contributions.
Prior productivity does not predict AI use.
Analysis linking prior productivity measures to reported AI adoption in the Census Bureau survey data; finding of no predictive relationship reported.
The analysis uses a mandatory, purpose-designed Census Bureau survey of approximately 28,500 establishments.
Census Bureau mandatory survey specifically designed for this study; sample size stated as approximately 28,500 establishments.
Identification of effects uses within-firm variation with firm and city-by-year fixed effects.
Identification strategy reported in abstract: within-firm variation under firm and city-by-year fixed effects.
The study measures four skill-category demand shares and their within-category importance from job-description text.
Methodological statement in abstract: measurement of four skill-category demand shares and within-category importance via job-description text.
AI exposure is decomposed into displacement and augmentation components based on task routineness.
Methodological claim in abstract: decomposition of exposure into displacement and augmentation using a routineness criterion for tasks.
The authors construct firm-by-year potential AI exposure via semantic matching between AI patent texts and detailed occupation task descriptions.
Method description in abstract: semantic matching of AI patent texts to occupation task descriptions to build firm-by-year exposure.
The study uses approximately 67 million online job postings from two major Chinese recruitment platforms (2019–2024).
Statement in paper abstract describing dataset size and source (job postings from two major Chinese recruitment platforms over 2019–2024).
The study extends the Technology Acceptance Model (TAM), Dynamic Capabilities Theory, and the Technology-Organisation-Environment (TOE) framework into the qualitative, emerging-economy entrepreneurial context.
Authors' stated theoretical contribution based on mapping thematic results to TAM, Dynamic Capabilities, and TOE frameworks within analysis and discussion sections.
This study employed an interpretivist, qualitative research design using sixteen in-depth semi-structured interviews with entrepreneurs across fintech, edtech, health-tech, logistics, retail, and SaaS in Delhi/NCR, India, and used Braun & Clarke's (2006) six-phase thematic analysis framework.
Explicit methodological description in the paper: interpretivist qualitative design; n=16 in-depth semi-structured interviews across specified sectors in Delhi/NCR; thematic analysis following Braun & Clarke (2006).
Using a qualitative approach with 17 expert interviews from employees at startups.
Methods statement in paper specifying qualitative study design and sample size of 17 interviews.
Process-related insights into how GenAI transforms startups are limited.
Authors' literature positioning / gap statement in paper (no empirical metric provided).
The paper's findings are based on three pre-registered user studies with a combined sample size of N = 2691.
Statement in the paper's abstract reporting three pre-registered user studies and combined N = 2691.
Agentic payments are distinct from traditional automated systems because they emphasise autonomy, contextual reasoning and adaptability.
Conceptual distinction asserted in the abstract (comparative analysis between agentic payments and traditional automated systems).
The paper examines operational logic, defining features and emerging use cases of agentic payments across retail, e-commerce and decentralised finance.
Stated scope in the abstract; analysis and case-study-driven review across specified sectors (retail, e-commerce, DeFi). No sample sizes reported.
Agentic payments refer to transactions initiated and completed by AI agents without direct human intervention.
Explicit definitional statement in the abstract (conceptual definition provided by the authors).
All [the listed orchestration frameworks] follow the same pattern: an external orchestrator above the LLM, injecting instructions and routing decisions every turn.
Author assertion based on architectural analysis of the listed frameworks (observation of orchestration pattern in the named projects).
The paper draws on empirical studies from 2024–2026.
Methodological statement in the paper specifying the time window of empirical studies used in the analysis.
This inverse scaling does not appear on single-threshold metrics common in LLM forecasting benchmarks.
Comparative evaluation reported in the paper showing that single-threshold (binary) scoring metrics do not exhibit the inverse-scaling pattern observed with tail-inclusive distributional metrics (specific metrics and calculations not given in excerpt).
Domain knowledge does not reliably rescue calibration.
Experiments reported in the paper where domain-knowledge interventions (procedures or prompts incorporating domain knowledge) were applied and did not consistently improve forecast calibration (details not provided in excerpt).
Using large language models, we measure the AIO level of Chinese listed companies from 2010 to 2023.
Authors report constructing firm-level measures of artificial intelligence orientation (AIO) by applying large language models to corporate texts/disclosures for Chinese listed companies over the 2010–2023 period.
This study provides the first cross-class synthesis covering raw materials, work-in-process, and finished goods within a unified evaluative framework, positioning machine learning and deep reinforcement learning methods alongside classical policy families and quantifying the boundary conditions for each approach.
Author-stated theoretical contribution and scope of the review (coverage of raw materials, WIP, finished goods and methods).
A random-effects model estimated by restricted maximum likelihood was applied to pool percentage cost-reduction effect sizes across 18 studies admissible to quantitative synthesis.
Methods reported in the paper: random-effects meta-analysis using REML across 18 studies eligible for quantitative pooling.
A systematic review and meta-analytic synthesis of 31 peer-reviewed studies published between 2004 and 2025 was conducted following the PRISMA 2020 protocol.
Study methods reported in the paper: systematic review following PRISMA 2020; sample of 31 peer-reviewed studies dated 2004–2025.
Across 660 trials with Claude Code, code cleanliness does not change the agent's pass rate.
Empirical evaluation: 660 trials run using Claude Code on the minimal-pair repos with hidden tests; reported comparison of pass rates between clean and messy repo variants showing no change.
We conduct extensive experiments on public datasets, in simulated auction environments, and through large-scale online deployment on Taobao.
Statement of experimental methodology describing the types of evaluations performed (public datasets, simulated auctions, and online deployment).
Reported empirical values are transformed through transparent indicators such as relative growth, CAGR, growth multipliers, stock-flow ratios, concentration ratios, and HHI.
Methodological description and application in the paper listing these specific indicators used to summarize public data on AI investment, adoption, robots, compute, and labour-market reallocation.
The study uses a conceptual-empirical quantitative diagnostic design rather than a causal econometric model.
Explicit methodological statement in the paper describing the design choice and rejecting causal econometric modeling in favor of diagnostics using public institutional data and transparent indicators.
The agentic economy is not yet a completed global order, but its transition pressure is measurable enough to require a distinct economic vocabulary, reproducible diagnostics, and future sector-level measurement.
Synthesis of diagnostic indicators (AI investment/adoption trends, robot stock, compute-energy coupling, labour reallocation measures) showing measurable transition pressures; conclusion drawn from the conceptual-empirical diagnostic.
Following PRISMA 2020 guidelines, searches across Google Scholar, Web of Science, Scopus, ScienceDirect, and CNKI yielded 1,562 initial records, of which 21 studies published between 2019 and 2026 met inclusion criteria.
Methodological description of the systematic literature review reported in the paper: initial records = 1,562; included studies = 21; publication years 2019–2026.
Small and medium-sized enterprises (SMEs) constitute over 98.5% of businesses in many economies including China.
Descriptive statistic reported in the paper's background/intro; source of the statistic not specified within the summary provided.
This study analyzes developments through April 2026.
Explicit timeframe statement in the paper's summary/introduction.
Results remain robust across checks.
Robustness checks reported by the authors (unspecified in abstract) that do not overturn the main findings.
China's 14th Five Year Plan (FYP) is used as a quasi-natural experiment / strategic policy shock to study effects of AI washing.
Research design leverages the FYP announcement as an exogenous policy shock in a difference-in-differences framework (design claim; no sample size in abstract).
AI washing is identified as the residual between AI narrative intensity and patent output.
Constructed a firm-level AI washing proxy by regressing AI narrative intensity on patent output and using the residual; described as the study's measurement approach (no sample size reported in the abstract).