Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Under the Brier score specifically, with type-independent inflation cost, the second-best welfare equals the first-best welfare (welfare equivalence).
Analytical result/proof specialized to the Brier score and the assumption of type-independent inflation costs; comparative welfare analysis in the model.
The synthesis covers research and practitioner guidance from the years 2023–2025.
Methods statement specifying the temporal scope of sources used for the synthesis.
This paper synthesizes recent research and practitioner guidance (2023–2025) to develop a practical model for designing human–AI collaboration in the financial reporting function (controllership).
Methods section declaration describing scope and approach (literature/practitioner guidance synthesis covering 2023–2025).
The empirical analysis is based on Chinese A–share listed firms observed from 2012 to 2024 and uses a difference‑in‑differences (DID) identification strategy.
Study description in the paper's methods/abstract specifying sample period (2012–2024), population (Chinese A–share listed firms), and methodology (DID).
These results are robust to alternative model specifications, including different lag lengths and forecast horizons.
Robustness checks reported in the paper: re-estimation of TVP-VAR with alternative lag lengths and forecast horizons producing consistent qualitative results.
The emergence of generative AI is not associated with a uniform increase in financial connectedness.
Empirical TVP-VAR analysis comparing connectedness measures before and after the emergence of generative AI (paper compares connectedness over the sample period and reports no uniform increase).
This study uses daily data from January 2021 to December 2025 to analyze spillover dynamics among AI-related equities, cryptocurrencies, and traditional financial assets within a time-varying parameter vector autoregression (TVP-VAR) framework.
Statement of data frequency and sample period plus description of methodology (TVP-VAR) in the paper; empirical analysis applied to specified asset groups.
The boundaries (critical thresholds) separating the tax regimes are derived from the workers' budget constraint.
Analytic derivation in the paper showing that constraints coming from the workers' budget constraint produce critical values of τ_ai and τ_f that determine transitions between the three regimes.
The model features quadratic self-amplification in both AI capability (λ A^2) and financial capital (γ_F K_f^2), coupled through investment flows.
Model specification and equations in the paper showing terms λ A^2 for AI capability growth and γ_F K_f^2 for financial capital growth, with explicit investment flow terms linking AI and financial capital.
In the U.S., no single 'AI Act' has passed (as of 2026).
Stated in the paper as a factual legal/policy status; this is verifiable via legislative records and is presented without an underlying sample (paper cites status as of 2026).
This paper focuses on five research questions about the historical pathways, leverage points, trajectory differences, alternative projects, and socio-technical programmes related to current dominant generative AI tools and possible AGI-adjacent development.
Explicit listing of the five research questions in the paper's introduction/aims; statement of scope and focus.
Online-safety regulation under the UK Online Safety Act and the EU Digital Services Act increasingly treats scalar metrics as compliance evidence.
Statement in paper's introduction / motivation; cites policy trend (UK Online Safety Act and EU Digital Services Act) as motivating context (policy texts referenced in paper).
Prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions.
Conceptual critique presented by the authors; no quantitative validation presented for this claim within the excerpt.
For over a century, the electric grid has relied on a single statistical assumption: load diversity, the principle that the uncorrelated demands of millions of small consumers produce a smooth, predictable aggregate.
Statement and historical framing presented by the paper as background context; no empirical time series or citations provided in the excerpt.
The study constructs a tripartite evolutionary game framework composed of government regulators, leading computing power incumbents, and downstream AI innovators to analyze strategic interactions and derive evolutionarily stable strategies.
Methodological claim documented in the paper describing the model structure and analytic approach (method: formal model specification and ESS derivation).
The study employs a System GMM estimator to address potential endogeneity and uses Fixed Effects (FE) and Random Effects (RE) models for robustness checks.
Methodological statement in the paper describing the econometric approach; verifiable from the methods section (no sample size or instrumentation details provided in the supplied text).
AI learns from both explicit knowledge (papers, documentation, structured databases) and implicit knowledge (reasoning patterns, debugging processes, intermediate steps).
Stated as a conceptual premise in the position paper; no empirical methods, sample, or quantitative data reported.
The paper evaluates 'Spec Kit' and 'TDAD' as instantiations of the SGM via a four-month pilot study.
Empirical pilot evaluation reported in the paper; duration specified as four months. Sample size or number of teams/participants in pilot not specified in the summary.
The paper identifies two amplifying mechanisms for PRP: the code review bottleneck and the context window constraint.
Theoretical argumentation in the paper naming two mechanisms that amplify the PRP phenomenon (qualitative explanation).
The paper formally defines PRP with three moderating variables: task abstraction, codebase maturity, and developer experience.
Theoretical/formal definition presented in the paper identifying three moderators; claim is descriptive of the paper's conceptual model.
This paper conducted a multivocal literature review of 67 sources spanning 2022–2026.
Statement of method in the paper describing the literature review (count of sources = 67).
Telemetry across 10,000+ developers shows flat delivery metrics (no improvement in delivery outcomes) despite changes in PR and review behavior.
Observational telemetry across >10,000 developers reported in the paper; described result is no meaningful change in delivery metrics (e.g., delivery throughput, lead time) despite increases in PRs and longer reviews.
The medium of exchange of the traditional economy is mainly the fiat currency of each country or region, and when cross-border transactions occur, they need to be settled according to the exchange rate.
Author's descriptive statement based on general observation of monetary systems; no empirical sample or study data provided in the excerpt.
Determining how much value individual data contributions bring to the network remains an open problem.
Literature gap claim in paper (review of existing approaches and statement of open problem; no empirical sample).
The review uses a collection of qualitative and quantitative approaches (i.e., it synthesizes both qualitative and quantitative studies).
Explicit methodological description in the abstract indicating mixed-methods literature synthesis.
A collection of qualitative and quantitative approaches reveals predictors of technological integration, including organisational preparedness, economic factors, policies, and human capital.
Statement about the review's synthesized findings from multiple qualitative and quantitative studies identifying these predictors; method = mixed-methods literature synthesis.
The primary technologies covered in this review are Electronic Health Records (EHR), telemedicine, artificial intelligence (AI), and the Internet of Things (IoT).
Explicit topical scope statement in the paper (description of review subjects); based on the paper's own selection of topics for review.
We introduce a public benchmark dataset of 11,500 user queries to support our study and future research of generative search.
Authors constructed and released a public benchmark dataset containing 11,500 real-user queries (dataset release described in the paper).
The literature review employs the PRISMA model to screen, identify, and synthesize available literature on AI, Machine Learning and Deep Learning in promoting managerial productivity and task efficiency.
Methodological statement in the paper's abstract (explicitly states use of PRISMA for screening and synthesis).
Using the Iterated Prisoner's Dilemma (IPD) is an effective scenario to probe cooperative behavior and the influence of visual inputs on VLM decision-making.
Methodological choice described in the paper: experiments were structured around repeated IPD games to operationalize cooperative vs. selfish decisions under visual priming conditions.
The paper introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition.
Methodological contribution: introduction/definition of specific operational metrics as stated in the paper.
The paper formulates a geo-distributed inference placement model with feasibility masks and migration frictions.
Methodological/modeling contribution described in the paper; specifies modeling components (feasibility masks, migration frictions).
The paper distinguishes physical electricity transmission from digital relocation of electricity-consuming computation.
Conceptual/analytic distinction explicitly stated as a contribution in the paper.
We develop an energy-geography framework for geo-distributed AI inference that models a three-layer architecture of clients, service nodes, and compute nodes, and formulates inference placement as a constrained optimization problem over electricity prices, marginal carbon intensity, power usage effectiveness, compute capacity, network latency, and migration frictions.
Methodological contribution described in the paper: formulation of a modeling/optimization framework and specification of variables considered.
Inference workloads can sometimes be executed away from the user-facing service location, provided that latency, state locality, capacity, and regulatory constraints remain acceptable.
Conceptual claim and modeling premise stated in the paper; used as an assumption motivating the relocation/placement model rather than an empirical finding.
The paper traces near-term evolutionary trajectories for digital proto-life through three narratives: Lamarck (self-modifying coding agents), Remora (resource-seeking companion chatbots), and Mycelium (DAO-LLC trading bots).
Methodological statement in the abstract: exploratory scenario method with three specified narrative scenarios; descriptive rather than empirical.
Fears of AI automation do not primarily increase support for traditional interventions such as unemployment benefits and training programs.
Comparative analysis of policy preference responses in the 2024 OECD 'Risks that Matter' survey as reported in the paper.
The paper develops a typology of enterprise applications by their sensitivity to AI-induced shifts in make-or-buy economics.
Paper's stated contribution (conceptual typology based on analysis of application categories and AI sensitivity).
This paper adopts a conceptual research approach, combining transaction cost economics and the resource-based view with an assessment of current AI capabilities, to systematically re-evaluate the factors underlying the make-or-buy decision.
Paper's stated methodology and theoretical framing (methodological claim about the paper itself).
At this stage, AI adoption in Israel does not result in widespread layoffs; its primary impact lies in restructuring the labor market through a slowdown in recruitment, changes in job composition, and the emergence of new AI-related roles.
Empirical claim reported in the paper; the excerpt does not specify datasets, time periods, or sample sizes supporting this observation.
We run over 1,100 games with over 16,000 private conversations totaling 15.2 million tokens and over 150,000 player actions.
Dataset and experimental log statistics reported in the paper.
We run AI-only games and conduct a user study pitting human players against AI opponents.
Method statement in the paper describing experiments with both AI-only and human-vs-AI games.
Players have asymmetric objectives and negotiations are non-binding, allowing alliances to form and break as players' short-term interests align and diverge.
Specification of game mechanics and rules in the paper (design features of C2C).
We introduce Cooperate to Compete (C2C), a multi-agent environment where players can engage in private negotiations while competing to be the first to achieve their secret objective.
Description of a newly developed environment (paper introduces the game and its rules/design).
We develop an analytical model in which a firm jointly chooses AI deployment and cybersecurity investment under this governance-capability gap.
Methodological claim: the paper presents an analytical (theoretical) model describing joint choice of deployment and cybersecurity investment.
Foundational research on AI identity is the central conclusion of this report.
Authors' stated conclusion of the paper.
We define AI Identity as the continuous relationship between what an AI agent is declared to be and what it is observed to do, bounded by the confidence that those two things correspond at any given moment.
Conceptual definition presented by the authors (conceptual/terminological contribution rather than empirical evidence).
The sign reversal is a structural consequence of the reviewer effort collapse under log-concave quality distributions; this is proved analytically.
Formal analytical proofs in the paper that use the assumption of log-concave quality distributions to show the mechanism producing the sign reversal.
We formalize the distinction between compensatory and non-compensatory decision regimes and define a pre-execution legitimacy boundary.
Theoretical formalization presented in the paper (definitions and conceptual framework). No empirical evidence or sample size provided.
Most existing approaches implicitly assume that once a decision is produced, it is eligible for execution.
Author assertion / conceptual critique of existing approaches presented in the paper (no empirical test reported).