Evidence (13870 claims)
Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 196 | 98 | 892 | 1984 |
| Governance & Regulation | 817 | 394 | 188 | 121 | 1544 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 627 | 233 | 123 | 96 | 1088 |
| Research Productivity | 411 | 123 | 56 | 332 | 933 |
| Output Quality | 467 | 178 | 59 | 47 | 751 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 167 | 122 | 24 | 496 |
| Task Allocation | 207 | 64 | 71 | 32 | 379 |
| Skill Acquisition | 165 | 59 | 60 | 17 | 301 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 52 | 107 | 13 | 279 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 150 | 48 | 26 | 3 | 227 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 63 | 20 | 12 | 184 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 93 | 21 | 13 | 19 | 148 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Creative Output | 31 | 17 | 7 | 3 | 59 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Nineteen studies met the eligibility criteria and were analyzed using qualitative thematic synthesis.
Reported result of the screening/eligibility process in the review: final included sample = 19 peer-reviewed articles; analysis method stated as qualitative thematic synthesis.
We conducted a systematic review guided by PRISMA 2020, searching Scopus and Web of Science (Title/Abstract/Keywords) for English-language journal articles published between 2015 and 2025.
Methods reported in the paper: PRISMA 2020-guided systematic review; databases searched explicitly named (Scopus, Web of Science); query fields (Title/Abstract/Keywords); language and date restrictions stated (English, 2015–2025).
The study used a mixed-method approach, combining qualitative and quantitative analysis of multiple case studies involving AI applications such as computer vision, robotics, and predictive analytics.
Authors report study design as mixed-method (qualitative + quantitative) applied to multiple case studies examining AI applications (computer vision, robotics, predictive analytics). No numeric sample size reported in the summary.
The paper analyses the complex interactive relationships among job seekers, recruitment platforms, and enterprises on the basis of the classic theory of incomplete information games.
Methodological description in abstract stating the use of incomplete information game theory to model interactions among stakeholders.
Mainstream recruitment algorithms are taken as the core research object and the multidimensional specific manifestations and internal generation mechanisms of group prejudices in algorithm screening are systematically investigated.
Methodological claim in the paper describing the study's scope and analytic focus (systematic investigation of manifestations and internal mechanisms); no empirical detail provided in abstract.
Existing academic research focuses primarily on macrolevel governance paths of algorithmic discrimination, with relatively insufficient in-depth exploration of the microlevel game logic of job seekers and the construction of systematic adaptation strategies.
Paper's literature review/positioning statement claiming a gap in the literature (macro focus vs. microlevel adaptation under-explored); no systematic literature-mapping statistics provided in abstract.
This paper treats pilots of supply chain innovation and application as a quasi-natural experiment and employs a difference-in-differences method to identify causal effects of supply chain digitalization.
Methodological description in the paper: sample of A-share listed companies (Shanghai and Shenzhen) 2013–2022; DID estimation using policy pilots as exogenous variation.
Future research should prioritize longitudinal and comparative studies to bridge the gap between experimental promise and practical application.
Authors' stated research agenda/recommendation in the review's conclusion.
Findings were synthesized narratively due to methodological heterogeneity.
Methods/results statement in the review explaining narrative synthesis choice because of heterogeneity among included studies.
Risk of bias was assessed using the ROBINS-I tool.
Methods statement in the review specifying ROBINS-I for risk-of-bias assessment.
The review followed PRISMA guidelines.
Methods statement in the paper indicating PRISMA adherence.
After screening, 10 studies met the inclusion criteria.
PRISMA-style screening result reported in the review (records screened and included).
A comprehensive search across Scopus, Web of Science, IEEE Xplore, and ScienceDirect yielded 260 records.
Systematic search following PRISMA guidelines reported in the paper; databases searched explicitly listed.
Prior research often treats AI presence as binary, framing it either as a hidden tool or a visible teammate.
Literature-summary claim asserted by the authors (literature review / conceptual critique). No quantitative evidence reported in the abstract.
The LLM fallacy is situated within existing literature on automation bias, cognitive offloading, and human–AI collaboration, but is distinguished as a form of attributional distortion specific to AI-mediated workflows.
Conceptual positioning and literature synthesis in the paper; claim is analytic rather than empirically tested in the abstract.
Less attention has been given to how LLM usage reshapes users' perceptions of their own capabilities.
Literature gap claim from the paper's review of prior research on model reliability, hallucination, and trust calibration; no quantitative synthesis or meta-analysis reported.
The system evaluation was performed in a deployed multi-tenant enterprise application across three conditions: manual operation, unconstrained AI with safety layers disabled, and full bounded autonomy.
Method description in the paper's evaluation section: deployment context (multi-tenant enterprise app), three experimental conditions, and 25 scenario trials spanning seven failure families.
The review focuses on the 2020–2025 period for studies of AI application in financial auditing.
Stated scope/timeframe of literature included in the review.
Article selection was conducted using the Scopus (Q1–Q4) and Sinta (1–2) databases based on predefined inclusion and exclusion criteria, resulting in a final sample of 15 articles.
Stated data sources and selection procedure in the Methods section; final sample size explicitly reported as 15.
This study employs a Systematic Literature Review (SLR) method following the PRISMA 2020 protocol.
Stated methodology in the paper: explicit use of SLR and PRISMA 2020 protocol.
The study analyzes Chinese A-share listed companies in core digital economy industries from 2015 to 2024 using a panel fixed‑effects regression model.
Study design and methods statement describing the sample frame (A-share listed firms in core digital economy industries, 2015–2024) and the use of panel fixed‑effects regression.
We conducted a year-long longitudinal study of AI use in a high-stakes workplace among cancer specialists.
Methodological statement in the paper indicating a year-long longitudinal empirical study with cancer specialists (no sample size or detailed methods reported in abstract).
A total of 160 peer-reviewed articles met the inclusion criteria for the review.
Direct numerical summary reported in the abstract (number of articles meeting inclusion criteria).
This study conducted a systematic review of articles published in Web of Science and Scopus up to December 2025, following established methodological guidelines.
Explicit statement in abstract describing the study method (systematic review), data sources (Web of Science and Scopus), and time cutoff (December 2025).
We evaluate AIBuildAI on MLE-Bench, a benchmark of realistic Kaggle-style AI development tasks spanning visual, textual, time-series and tabular modalities.
Evaluation methodology described in paper (benchmark selection and task modalities).
We surveyed 860 Microsoft developers to understand where they want AI support, and where they want it to stay out.
Primary empirical method reported in the paper (survey) with sample size explicitly stated as 860 Microsoft developers.
Developers spend roughly one-tenth of their workday writing code.
Statement reported in the paper (abstract). No sample-size or measurement method for this specific statistic provided in the abstract.
We examine 12 tasks across two practical settings: an AI consultancy providing solutions to business problems and an AI software team developing software products.
Description of experimental design and sample reported in the paper (method section): 12 tasks, two practical settings.
We tested 9 frontier models on BTB.
Abstract states that nine frontier models were evaluated using the benchmark.
Completing a BTB task takes bankers up to 21 hours, underscoring the economic stakes of successfully delegating this work to AI.
Reported time-to-complete statistic in abstract (claimed maximum of 21 hours per task); implies measurement of human task completion time by bankers.
BTB requires agents to execute senior banker requests by navigating data rooms, using industry tools (market data platform, SEC filings database), and generating multi-file deliverables including Excel financial models, PowerPoint pitch decks, and PDF/Word reports.
Benchmark design specifications reported in abstract describing the tasks and artifact types agents must produce.
BankerToolBench (BTB) is an open-source benchmark of end-to-end analytical workflows routinely performed by junior investment bankers.
Paper describes BTB design and explicitly states it is open-source and targets end-to-end workflows for junior investment bankers.
We collaborated with 502 investment bankers from leading firms to develop an ecologically valid benchmark grounded in representative work environments.
Reported collaboration/sample size stated in abstract: 502 investment bankers involved in benchmark development.
We ran a longitudinal 20-month empirical study (July 2024 -- February 2026) that chronicles the system's evolution.
Explicit statement of study duration and dates in the paper's abstract.
The global onset of Industry 4.0 and Artificial Intelligence (AI) necessitates a re-evaluation of employment forecasts for Nagpur's medium enterprises.
Interpretive/prescriptive claim based on the paper's framing of technological change (Industry 4.0/AI) and implications for employment forecasting; no empirical sample size or quantitative backing provided in the excerpt.
Medium-scale industries in zones like Butibori and Hingna have traditionally been labor-intensive.
Descriptive statement in the paper about the nature of current industries in Nagpur/MIDC; no sample size or quantitative data reported in the excerpt.
The full model, including all 11 analytical tabs, is made publicly available to facilitate replication and independent sensitivity testing.
Paper states that the full model and all 11 analytical tabs are publicly available.
A sensitivity analysis shows that the high-skill capture rate and the pace of friction decay are the two parameters with the greatest influence on the aggregate result.
Paper reports results of a sensitivity analysis identifying parameter importance; explicitly names high-skill capture rate and friction decay pace as most influential.
AI coverage scores are sourced from Massenkoff and McCrory (2026) and mapped to NAICS industries using employment-weighted averages derived from BLS Occupational Employment and Wage Statistics data for 2023.
Citation to Massenkoff and McCrory (2026) for theoretical LLM task coverage across SOC groups and explicit statement that mapping used employment-weighted averages from BLS OES 2023.
The core formula multiplies six inputs: base GDP, labor share, AI coverage, productivity gain percentage, adjusted adoption rate, and a skill-weighted capture rate.
Model specification in the paper describing the multiplicative core formula and listing the six inputs.
The baselines are implemented as prompts, representing the realistic deployment alternative to a governed framework.
Methodological statement in paper describing how baselines were implemented (as prompts); presented as representing realistic alternative deployment.
We benchmark three systems on an 11-case balanced prior authorization appeal evaluation set.
Methodological statement in paper describing evaluation; sample size explicitly stated as 11 cases.
A motivation–resistance theoretical framework helps study AI knowledge stickiness, where 'motivation' captures within-city diffusion potential and 'resistance' captures frictions preventing knowledge transfer across cities and inducing local lock-in.
Conceptual/theoretical contribution presented in the paper defining the motivation–resistance framework and interpretable constructs (motivation and resistance) for explaining stickiness.
The study uses a city-year panel of AI patent applications combined with urban statistics for the years 2014–2023 and estimates relationships using a two-way fixed-effects model.
Methodological description in the paper specifying data sources (AI patent applications, urban statistics), temporal coverage (2014–2023) and econometric approach (two-way fixed-effects).
The two case firms demonstrated contrasting approaches to implementing AI in recruitment.
Findings and case descriptions comparing the two firms' AI recruitment strategies and levels of implementation (n = 2 firms; interviews with 22 participants).
The research contributes by shifting focus to under-researched non-Western workplace settings, particularly technologically advancing Middle Eastern economies like Qatar.
Paper's stated contribution and scope: focus on Qatari organisations and Middle Eastern context.
Four key themes emerged from the data: (1) process optimisation through AI integration, (2) subjectivity in AI-powered recruitment, (3) recruitment strategies in the age of AI, and (4) strategic investments in AI.
Findings: thematic analysis identified these four themes from interview data (n = 22) across the two case firms.
Thematic analysis was used to identify patterns and relationships within the interview data.
Methods: analysis section reporting use of thematic analysis framework.
Data were collected through semi-structured interviews with twenty-two participants across various organisational roles and hierarchical levels.
Methods: semi-structured interviews reported with total participants n = 22 across roles/levels.
The research investigated two prominent Qatari firms with contrasting AI recruitment implementation approaches.
Methods / case selection: two firms were selected and contrasted on their AI recruitment approaches (number of firms = 2).