Evidence (14055 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
We analyzed over 1.5M assets and 128K agents in EvoMap.
Descriptive dataset statement in the paper reporting the scope of the empirical analysis (assets and agents counts).
We conducted a global large-scale randomized field experiment, delivering customized LLM-generated feedback for over 31,000 arXiv preprints across 150 fields and more than 45,000 researchers from 133 geographic regions.
Statement in paper describing experimental design and scale: randomized field experiment; sample described as >31,000 preprints, >45,000 researchers, 150 fields, 133 regions.
The study uses 5 million job postings from Beijing covering 2018--2024 as its primary data source.
Stated dataset scope and size in the paper's description of data.
We construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models.
Methodological construction described in the paper: task-level GenAI suitability assessments from five LLMs applied to tasks in 5 million Beijing job postings (2018--2024), aggregated to the neighborhood level.
Decision-makers (DMs) are similarly ambiguity-seeking and ambiguity-generated insensitive (a-insensitive) regardless of whether the analyst is human or a machine learning (ML) model.
Incentivized laboratory experiment in which participants' ambiguity attitudes were measured for forecasts attributed to human and ML analysts; comparison of ambiguity-seeking and a-insensitivity across analyst type reported in the paper (sample size not reported in abstract).
There is a significant deficiency in India-centric qualitative investigations on human-AI collaboration in the IT sector.
Authors' review of peer-reviewed literature and secondary data concluding a gap in India-focused qualitative studies (literature gap analysis). No numeric count provided.
The same bias was not observed when imagining help from another human participant.
Empirical comparison reported in the abstract: predictions about receiving help from another human did not show the same faster-than-reality bias as predictions about AI assistance (from the same preregistered study, N = 1237).
Actual completion times between independent completion and AI-assisted completion did not differ.
Empirical result reported in the abstract comparing measured completion times for independent vs. AI-assisted task completion in the preregistered study (N = 1237).
We conducted a preregistered large-scale behavioral study (N = 1237) to characterize mismatches between expectations and reality, with a focus on simple cognitive tasks.
Authors report study design and sample size in the abstract: preregistered behavioral experiment with N = 1237 participants.
Identification strategy exploits import lumpiness in product categories linked to automation technologies (including robots) to disentangle adoption effects from selection into adoption.
Methodological claim: use of import 'lumpiness' in automation-related product categories as a plausibly exogenous source of adoption variation within a difference-in-differences framework.
We integrate datasets on trade activities, firm, and worker characteristics for the population of Italian importing firms from 2011 to 2019.
Data integration described in abstract; population-level administrative datasets on trade, firm, and worker characteristics for Italian importing firms covering years 2011–2019.
The study examines the impact of AI technologies on Uzbekistan's labor market transformation in the context of implementing the national strategy 'Digital Uzbekistan - 2030' and the Strategy for the Development of AI Technologies until 2030.
Framing and scope statement in the paper; analysis based on national strategy documents, statistical data, industry reviews, and regulatory legal documents.
The degree of persuasiveness for LLM-based narrative explanations did not meaningfully impact decision accuracy over a simple AI prediction alone.
Large-scale human behavioral experiment comparing decision accuracy with AI prediction alone versus AI prediction plus narrative explanations of varying persuasiveness (method described in paper).
The system was evaluated on a real 64-GPU A100 testbed emulating three wind-powered sites with Azure production traces.
Experimental evaluation described in abstract: 64-GPU A100 testbed, emulation of three sites, use of Azure production traces.
The paper includes comparisons against accelerated baselines (reported experimental comparisons).
Statement in experimental section that comparisons to accelerated baselines were performed; specific baselines and results are in the paper.
The paper examines the legal implications of overusing export controls.
Statement of the paper's analytic scope and structure (description of content).
AI infrastructure decisions involve trade-offs across physical resource systems including energy, land, water, and labor.
Descriptive claim in the abstract and framing sections; supported by cited prior work on the economic, physical, and moral limits of AI development and by illustrative regional cases.
The evidence is used illustratively rather than as a full causal test.
Explicit methodological statement in the abstract describing the role of the evidence (coded comments and cases) as illustrative.
The article interprets stakeholder and regional positions as different ways of prioritizing the triad's frontiers.
Analysis of the coded public comments and illustrative regional cases used to map stakeholder/regional positions onto the Progress/Sustainability/Equity triad.
The article draws on a previously coded dataset of 10,068 public comments submitted to the 2025 U.S. AI Action Plan.
Empirical resource used in the paper; dataset size explicitly reported as 10,068 coded public comments.
We sample 50 benchmark games from a 2,000-game generated pool and evaluate nine frontier and open-weight LLMs in a head-to-head tournament with over 36,000 matches.
Empirical setup reported in the paper's abstract: 50 sampled games, 2,000-game pool, nine LLMs, >36,000 head-to-head matches.
We interviewed 24 product-focused individuals at a large technology firm about how AI has impacted their own work, their work within their product team, and their professional interactions.
Qualitative semi-structured interviews with 24 product-focused employees at a single large technology firm; sample size = 24.
This study is a systematic literature review conducted following PRISMA 2020 guidelines synthesizing peer-reviewed studies published between 2019 and 2025 identified via searches in Scopus, Web of Science and Google Scholar.
Author-stated methodology in the paper: PRISMA 2020 systematic literature review covering 2019–2025 with database searches in Scopus, Web of Science, and Google Scholar.
This scoping review adhered to the PRISMA-ScR guidelines and encompassed 29 peer-reviewed empirical studies published from 2020 to 2025.
Methods statement in the paper (explicit methodological description).
AI capability is conceptualized/measured as having sub-dimensions including technical infrastructure and management.
Measurement/model description in paper: AI capability broken into sub-dimensions (technical infrastructure, management); supported by survey instrument and measurement model using PLS-SEM on 251 firms.
The mixed-method approach, combining partial least squares–structural equation modeling (PLS-SEM) and fuzzy-set qualitative comparative analysis (fsQCA), was used for analyzing the survey data of 251 firms.
Methods statement in paper: authors report using a mixed-method approach (PLS-SEM and fsQCA) on survey data; sample size explicitly stated as 251 firms.
The paper identifies five major research gaps and proposes future research directions in intelligent international marketing.
Author-reported outcome of the paper's systematic review and content analysis (2010–2025); descriptive claim about the paper's contributions.
Prior productivity does not predict AI use.
Analysis linking prior productivity measures to reported AI adoption in the Census Bureau survey data; finding of no predictive relationship reported.
The analysis uses a mandatory, purpose-designed Census Bureau survey of approximately 28,500 establishments.
Census Bureau mandatory survey specifically designed for this study; sample size stated as approximately 28,500 establishments.
Large language models are routinely used as automated evaluators (to review code, moderate content, or score outputs), often with many items passing through one conversation.
Background/introductory claim in the paper describing common practice; not an experimental result but contextual motivation.
Position of biased turns does not matter: five biased turns placed anywhere in a 50-turn history produce the same shift.
Follow-up experiment manipulating the positions of biased turns within 50-turn histories and observing equivalent bias magnitudes.
Bias does not grow with context length: 5 prior turns and 50 produce the same shift (Spearman |r| < 0.01; OLS slope p = 0.80).
Correlation and OLS analysis of bias magnitude versus context-length (number of prior turns) reported in the experiments.
We conducted 75,898 API calls to 11 models from 4 providers (OpenAI, Anthropic, Google, and four open-source models).
Descriptive statement of the experimental scope reported in the paper: total number of API calls and models/providers tested.
When execution is standardized on a cheaper Gemini Flash scaffold (separating planning from execution), a pooled 32-game planner bakeoff is consistent with near-equality (p approx 0.821).
Empirical experiment: 32-game planner-only comparison where execution was standardized; reported p-value ≈ 0.821 indicating no significant difference among planners.
We study this setting in a timed multi-phase Risk environment with explicit victory targets and repeated planning and execution cycles.
Methodological description of the experimental environment used in the paper (timed multi-phase Risk environment with explicit victory targets and repeated cycles).
Identification of effects uses within-firm variation with firm and city-by-year fixed effects.
Identification strategy reported in abstract: within-firm variation under firm and city-by-year fixed effects.
The study measures four skill-category demand shares and their within-category importance from job-description text.
Methodological statement in abstract: measurement of four skill-category demand shares and within-category importance via job-description text.
AI exposure is decomposed into displacement and augmentation components based on task routineness.
Methodological claim in abstract: decomposition of exposure into displacement and augmentation using a routineness criterion for tasks.
The authors construct firm-by-year potential AI exposure via semantic matching between AI patent texts and detailed occupation task descriptions.
Method description in abstract: semantic matching of AI patent texts to occupation task descriptions to build firm-by-year exposure.
The study uses approximately 67 million online job postings from two major Chinese recruitment platforms (2019–2024).
Statement in paper abstract describing dataset size and source (job postings from two major Chinese recruitment platforms over 2019–2024).
The study extends the Technology Acceptance Model (TAM), Dynamic Capabilities Theory, and the Technology-Organisation-Environment (TOE) framework into the qualitative, emerging-economy entrepreneurial context.
Authors' stated theoretical contribution based on mapping thematic results to TAM, Dynamic Capabilities, and TOE frameworks within analysis and discussion sections.
This study employed an interpretivist, qualitative research design using sixteen in-depth semi-structured interviews with entrepreneurs across fintech, edtech, health-tech, logistics, retail, and SaaS in Delhi/NCR, India, and used Braun & Clarke's (2006) six-phase thematic analysis framework.
Explicit methodological description in the paper: interpretivist qualitative design; n=16 in-depth semi-structured interviews across specified sectors in Delhi/NCR; thematic analysis following Braun & Clarke (2006).
Using a qualitative approach with 17 expert interviews from employees at startups.
Methods statement in paper specifying qualitative study design and sample size of 17 interviews.
Process-related insights into how GenAI transforms startups are limited.
Authors' literature positioning / gap statement in paper (no empirical metric provided).
The paper's findings are based on three pre-registered user studies with a combined sample size of N = 2691.
Statement in the paper's abstract reporting three pre-registered user studies and combined N = 2691.
Light AI users perform similarly to matched users who do not use AI.
Same controlled logical reasoning experiment with on-demand AI assistance comparing light AI users to matched non-users (sample size not stated in abstract).
We map that space through six interconnected elements: sociotechnical context, decision-making frameworks, human decision participants, AI capabilities, interaction, and holistic evaluation.
The paper's proposed analytical/framework contribution listing six elements (descriptive of the authors' mapping work).
Most current work treats human-AI combination as an engineering problem and concentrates on interpretability, trust calibration, or interface design.
Authors' characterization of the existing literature and dominant research foci (qualitative literature assessment; no quantitative breakdown provided).
We call this persistent shortfall the 'synergy gap.'
Terminology/definition introduced by the authors in the paper (conceptual claim, not an empirical finding).
Agentic payments are distinct from traditional automated systems because they emphasise autonomy, contextual reasoning and adaptability.
Conceptual distinction asserted in the abstract (comparative analysis between agentic payments and traditional automated systems).