Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Most existing approaches to AI safety, risk management, and governance focus on post-hoc validation, probabilistic risk estimation, or certification of model behavior.
Author statement summarizing the literature / prior work in AI safety and governance (conceptual claim in the paper's introduction). No empirical survey or sample size reported.
We develop a formal model in which institutions choose the scale of automation, the degree of codification, and safeguards on iterative use.
Methodological statement: the paper presents a formal/theoretical model specifying institutional choice variables (model description rather than empirical result).
We compare LLM-guided bidding against truthful and heuristic strategies using the Vickrey-Clarke-Groves (VCG) mechanism as a benchmark for incentive-compatible, dominant-strategy truthfulness.
Methodological claim describing the comparative experimental design: simulations use VCG as benchmark and include comparisons to truthful and heuristic bidding strategies. No sample size or detailed experimental parameters are provided in the excerpt.
When the theoretical assumptions guaranteeing truthfulness hold, LLM bidders recover near-equilibrium outcomes consistent with VCG predictions.
Simulation experiments comparing LLM-guided bidding to the VCG benchmark and to truthful/heuristic strategies under conditions where VCG assumptions are satisfied. The paper reports that LLM outcomes were close to the VCG-predicted equilibrium. No numeric sample size or quantitative effect sizes reported in the provided text.
We investigate the use of Large Language Models (LLMs) as bidding agents in repeated 6G spectrum auctions with budget constraints in vehicular networks.
Descriptive statement of the study design: the paper reports simulation/experimental evaluation where each user equipment (UE) is modeled as a rational player in repeated spectrum auctions; comparison against truthful and heuristic strategies under Vickrey-Clarke-Groves (VCG) benchmark. No numeric sample size reported in the provided text.
Die Studie basiert auf einer wiederholten Querschnittsbefragung lizenzierter Beschäftigter einer außeruniversitären Forschungseinrichtung.
Autorenangabe im Abstract: wiederholte Querschnittsbefragung (survey) unter lizenzieren Beschäftigten der untersuchten Forschungseinrichtung; methodische Beschreibung im Abstract.
The paper provides a natural definition of benchmark hacking in this strategic context by comparing a player's equilibrium effort allocation to that of a single-agent baseline scenario.
Conceptual/theoretical definition introduced in the model comparing equilibrium effort allocations to a single-agent (non-competitive) baseline.
We study this question using 10,659 matched human-agent pairs from Moltbook, a social media platform where each autonomous agent is publicly linked to its owner's Twitter/X account.
Descriptive statement of the study dataset reported in the paper: dataset of 10,659 matched human-agent pairs from Moltbook with public linkage to owner's Twitter/X account.
The paper proposes a conceptual framework linking AI adoption to employability and role transformation, mediated by skill adaptation, continuous learning, and organizational readiness.
Author-proposed conceptual framework presented in the review paper (theoretical linkage based on literature synthesis).
This study takes food delivery riders as the research object and analyzes the dilemma of labor relations determination under AIGC.
Methodological statement in the paper specifying the chosen subject of analysis (food delivery riders); this is an explicit description of the paper's scope rather than an empirical finding.
The paper develops an interdisciplinary conceptual framework that integrates insights from economics, management theory, and digital governance to characterize algorithmic enterprises.
Methodological claim about the paper's approach; stated in abstract as the paper's contribution (conceptual framework built from interdisciplinary literature).
Future research should strengthen cross-national comparisons, longitudinal tracking, and interdisciplinary collaboration to support development of a technology governance framework that balances efficiency with equity.
Author recommendation based on identified research gaps in the literature review (prescriptive/recommendation).
Existing research has clear gaps: limited evidence from developing-country contexts, insufficient attention to within-occupation heterogeneity, incomplete accounts of psychological mechanisms underlying AI anxiety, and a shortage of rigorous evaluations of reskilling policy effectiveness.
Author's assessment based on the reviewed literature identifying thematic gaps and methodological limitations (critical literature review).
This study leverages the establishment of National New-Generation Artificial Intelligence Innovation and Development Pilot Zones as a quasi-natural experiment and employs a multi-period DID model on A-share listed manufacturing firms from 2010 to 2023.
Methodological description provided in the paper: policy rollout as quasi-natural experiment; multi-period difference-in-differences estimation; sample frame specified as A-share listed manufacturing firms on the Shanghai and Shenzhen Stock Exchanges, 2010–2023.
The First Fundamental Theorem of Welfare Economics assumes that welfare-bearing agents are autonomous and implicitly relies on a binary distinction between autonomy and instrumentality.
Explicit statement in the paper's introduction/abstract describing the theorem's assumptions; conceptual/theoretical textual analysis (no empirical sample).
This paper was generated by AI, using https://github.com/chenandrewy/ralph-wiggum-asset-pricing/.
Author statement in the abstract declaring the paper was generated by AI and providing a GitHub link.
This review was conducted following the guidelines of the Preferred Reporting of Items in a Systematic Review and Meta-Analysis (PRISMA).
Methodological statement in the paper's abstract indicating PRISMA adherence; no further protocol details or study counts provided in the abstract.
The paper foregrounds industrial firms' own digital agency as a less understood aspect in the literature on digitalization and governance.
Authors' positioning of their contribution and literature review claim in the paper (qualitative/theoretical claim).
The analysis is limited to OECD economies and monthly aggregate data, which constrains generalizability.
Study design: monthly panel of 38 OECD economies from 2000–2024 as stated in paper; author-reported limitation.
Digital trade alone is not statistically significant in affecting CO2 emissions (β = −0.030).
Same fixed-effects econometric specification on the monthly panel of 38 OECD economies (2000–2024); coefficient reported but not statistically significant.
We evaluate 20 state-of-the-art LLMs on their ability to predict empirically supported causal directions.
Experimental evaluation: 20 LLMs tested on the benchmark (10,490 triplets, including 1,056 contested instances) to predict empirically verified causal signs.
From 10,490 causal triplets (treatment-outcome pairs with empirically verified effect directions) derived from top-tier economics and finance journals, we identify 1,056 ideology-contested instances.
Construction/extension of the EconCausal benchmark by selecting 10,490 causal triplets from top-tier economics and finance journals and labeling 1,056 as ideology-contested (intervention- vs market-oriented divergence).
The governance of open-weight artificial intelligence (AI) models has been framed as a binary choice: openness as risk, restriction as safety.
Literature and policy framing review presented in the paper (conceptual/argumentative analysis).
This is an exploratory and qualitative state-of-practice study grounded in over 30 interviews across four stakeholder groups (large enterprises, small/medium firms, AI developers, and CAD/CAM/CAE vendors).
Methodological statement in the paper describing study design and sample composition.
Key breakthroughs needed include integration with traditional engineering tools and data types, robust verification frameworks, and improved spatial and physical reasoning.
Interviewee-identified requirements compiled from over 30 interviews; stakeholders repeatedly pinpoint integration, verification, and spatial/physical reasoning as priority technical advances.
We conduct a controlled experiment where AI agents trade in a prediction market after receiving private signals, measuring information aggregation by the log error of the last price.
Statement of experimental design and measurement approach in the paper: laboratory-style controlled experiment, private signals given to agents, log error of last price used to quantify aggregation.
Allowing strategic prompting does not affect information aggregation.
Experimental manipulation that included strategic prompting of AI agents prior to trading; aggregation measured by log error of last price; observed no effect.
Changing the initial price does not affect information aggregation.
Experimental condition varying the initial market price and measuring resulting aggregation performance (log error of last price); reported no effect.
Changing the duration of the market does not affect information aggregation.
Experimental manipulation of market duration in the trading experiment; measured aggregation (log error of last price) across durations and found no effect.
Allowing cheap talk communication does not affect information aggregation.
Experimental condition comparing markets with and without cheap talk communication; aggregation measured by log error of the last price; reported no effect.
The study analyzes AI policies issued by provincial-level governments in China using a policy instrument framework and fuzzy-set qualitative comparative analysis (fsQCA).
Methods statement in the paper describing dataset (provincial-level AI policy documents), theoretical framing (policy instrument framework), and analytic method (fsQCA).
Five major themes emerged from the review: (1) Machine Learning for Credit Risk Assessment and Financial Inclusion; (2) Deep Learning and Neural Networks for Market Prediction and Volatility Forecasting; (3) Natural Language Processing and Sentiment Analysis for Decision Support; (4) AI-Based Fraud Detection and Operational Risk Management; and (5) Explainable AI, Regulatory Technology, and Governance Frameworks.
Thematic synthesis of the 64 retained studies reported in results; explicit listing of five themes in the paper's Results section.
We conducted a scoping review across four major databases (SciSpace, Google Scholar, ArXiv) covering publications from 2019 to 2025 and retained 64 unique studies after deduplication and screening.
Methods section: Arksey and O'Malley framework (enhanced by Levac et al.), explicit database search (SciSpace, Google Scholar, ArXiv), timeframe stated (2019–2025), and reported final sample of 64 studies after deduplication and screening.
This study proposes a framework for evaluating platform ecosystems by their long-term effects on human capital formation and institutional resilience.
Methodological contribution claimed by the paper (development of an evaluative framework); presented as part of the paper's contributions rather than an empirical finding.
The empirical analysis covers MENA economies over the period 2010–2023.
Paper explicitly states the temporal and geographic scope: MENA economies, 2010–2023.
The study employs a dynamic panel data approach using the System Generalized Method of Moments (System GMM) estimator to address endogeneity, unobserved heterogeneity, and persistence effects.
Methods statement in the paper describing the use of System GMM for panel data covering MENA economies over 2010–2023.
Four propositions formalize the gradient, cascade compounding, delegation-depth effects, and extension sufficiency, establishing boundary conditions for the framework's valid operating envelope.
Theoretical/formal propositions presented in the paper that articulate limits and conditions for the framework's applicability.
The framework is analytically assessed for transferability across four decision system architectures.
Paper reports an analytic (cross-architecture) assessment comparing framework applicability across four named decision system architectures.
A formal welfare framework, analogous to the Nordhaus optimal patent life, characterises the trade-offs and yields testable predictions.
Proposal of a formal theoretical framework by the authors (analogy to Nordhaus); presented as a modeling approach rather than as an implemented empirical model in the excerpt.
CRediT contributions, funding acknowledgements and AI disclosure statements illustrate the annulus lifecycle.
Empirical examples/case illustrations cited by the authors to demonstrate how different metadata types move through the annulus; no systematic empirical analysis or sample size provided in the excerpt.
By analogy with the efficient market hypothesis, the width of the innovation annulus measures production inefficiency, set by the interplay of friction and demand.
Theoretical analogy and conceptual mapping presented in the paper; no empirical calibration or measurement of 'width' reported in the excerpt.
The innovation annulus is a permanent, functional feature of the ecosystem -- not a pathology to eliminate.
Normative/descriptive assertion by the authors based on their theoretical framing; no empirical longitudinal evidence provided in the excerpt.
We introduce the innovation annulus: the zone between freely available structured data and the advancing frontier of commercially refined knowledge products.
Definition/construct introduced by the authors as part of their conceptual framework; no empirical validation shown in the excerpt.
The real tension in scholarly knowledge infrastructure lies between the persistent cost of producing and refining structured metadata under deep technological friction, and the differentiated demands distinct communities place on data quality, focus and granularity.
Theoretical/analytical argument in the paper; presented as the central descriptive diagnosis rather than supported by empirical measurement in the excerpt.
The outreach casenotes used in the study are fairly short and heavily redacted.
Descriptive statement about the dataset of street outreach casenotes provided by the nonprofit partner used in the audit (direct observation by authors).
LLM zero-shot classification does not introduce additional textual biases beyond the algorithmic biases already present in tabular classification.
Authors' assessment/audit comparing zero-shot LLM classification using casenote text against tabular-only classification, concluding no additional textual bias introduced. (Details and sample size not provided in abstract.)
We conducted an in-the-wild evaluation with over 2,200 individuals from heterogeneous organisations and roles in 116 countries, via log analysis, surveys, and 20 interviews.
Reported evaluation methods and sample in the paper's abstract: log analysis, surveys, and 20 interviews with over 2,200 participants across 116 countries.
We measure processes of polarization and integration in global AI research over three decades using large-scale scientific publication data.
Methodological claim describing the study: the analysis spans three decades and uses large-scale publication data and network comparisons to randomized baselines.
A stylized calibration to four providers using April 2026 data treats parameter values as inputs to a comparative risk mapping, not structural estimates.
Paper reports a calibration exercise using data from four providers (April 2026) and emphasizes it is a comparative mapping rather than structural estimation.
Discrimination (QoS gap) vanishes at a joint boundary rather than at a simple threshold in alpha alone.
Analytical result from the model characterizing the boundary conditions for non-discrimination.