Evidence (14922 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filter claims →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Heterogeneity analysis: universities bridge distant domains through knowledge diversity.
Stratified/heterogeneity analysis reported in the paper showing that university actors are associated with cross-domain bridging and higher measured knowledge diversity in the diffusion paths.
LinuxArena is the largest and most diverse control setting for software engineering to date.
Authors assert this comparative claim based on the reported scale and diversity (20 environments, 1,671 main tasks, 184 side tasks); no detailed comparison data included in the excerpt.
Our findings can help practitioners, educators, and policymakers promote responsible and effective use of AI tools.
Authors assert applicability of their qualitative findings and the proposed framework (derived from 22 interviews) to inform stakeholders.
We demonstrate that, by modifying the agent's tools (FreeCAD and the assembly solver), we are able to create a strong verification signal which enables our system to build 3D assemblies with movable parts.
Claim of experimental demonstration: authors state they modified tools (FreeCAD and assembly solver) to create a verification signal enabling building of movable 3D assemblies. Implied evidence is demonstrations/experiments in the paper (details, sample sizes, benchmarks not included in excerpt).
This design decision allows AADvark to reason directly about assemblies with moving parts and can thereby achieve cross-cutting goals, including but not limited to mechanical movements.
Claim about functional consequence of the design choice (ability to reason about moving assemblies and achieve related goals); evidence implied to be from system behavior/demonstrations in the paper but not provided in the excerpt.
In simulation (chess, using learned human models from large-scale gameplay data), our approach consistently outperforms interventions based on the strongest chess engine (Stockfish) across a wide range of settings.
Simulation experiments in chess using models of human play trained from large-scale gameplay data; comparisons against Stockfish-based interventions (details described in paper).
This paper presents the first comparative study of game-theoretic mechanisms designed to enable cooperative outcomes between rational agents in equilibrium.
Authors' characterization of their contribution: a comparative study across four social dilemmas evaluating multiple mechanisms; no external validation provided in excerpt.
Data elements provide a unique mechanism that enables late‑entrant firms to catch up technologically.
Interpretation drawn from the observed stronger positive association between data factor utilization and AI patent output among low‑TFP (late‑entrant) firms in the panel analyses.
These results demonstrate that hierarchical agent systems can automate the full AI model development process from task specification to deployable model, suggesting a pathway toward broadly accessible AI development with minimal human intervention.
Interpretation and conclusion based on the reported evaluation results on MLE-Bench.
Extensive empirical observation in the paper suggests that context completeness may be more strongly associated with output quality than prompting technique alone.
Interpretive statement based on the observational study results described in the paper (comparisons of structured vs. baseline interactions and associated acceptance/iteration metrics).
AI adoption improves efficiency, cost reduction, and strategic innovation.
Synthesis across included empirical studies reporting organizational outcomes following AI implementation (effects reported qualitatively across the 27 studies).
Perceived complexity is not overly high (i.e., AI adoption was not seen as overly difficult to implement), which supports adoption.
Paper reports complexity as one of the DOI-related variables and states that adoption is not perceived as overly difficult—based on responses from 110 ICT professionals and PLS-SEM results.
Perceived operational benefits (clear operational value) of AI encourage its adoption.
Paper lists perceived benefits / operational value as one of the examined variables and reports it as a positive factor in the outcomes (from survey of 110, analyzed with PLS-SEM).
Exploitative innovation is associated with performance through incremental efficiency mechanisms.
Authors' interpretation of model results from the survey (104 managers) suggesting exploitative innovation improves performance via incremental efficiency, though specific mechanisms were not separately measured.
Progress on ClawBench brings us closer to AI agents that can function as reliable general-purpose assistants.
Author's concluding / aspirational statement regarding the benchmark's role in tracking and fostering progress.
This is the first impossibility result in AI governance, establishing a formal boundary below which current paradigms remain valid and above which distributed accountability mechanisms become necessary.
Claim of novelty in the paper (author assertion). The paper provides the formal theorem and discusses implications; novelty relative to prior literature is asserted but not empirically demonstrated.
This approach significantly reduced regression risk during large-scale refactoring.
Case study reports that model-generated tests plus constrained refactoring validated by passing tests reduced regression risk; qualitative claim without quantitative metric in abstract.
These findings establish a framework for evidence-based policy interventions to align the NIH AI portfolio with health equity goals and strategic research priorities.
Interpretive/concluding statement proposing that the reported portfolio analysis can inform policy interventions; framed as implication rather than an empirical result.
EcoAssist increased developers' awareness of energy use.
Reported outcome from the controlled study with 20 developers (stated increase in awareness); no numeric magnitude provided in excerpt.
The positive effect of AIRC on productivity is mediated through improvements in reproducibility.
Structural equation modeling (SEM) reports mediation through reproducibility metrics in the OECD panel analysis.
The positive effect of AIRC on productivity is mediated through improvements in review efficiency.
Structural equation modeling (SEM) indicates mediation paths from AIRC to productivity via measures of review efficiency in the panel data.
Human-in-the-loop governance is a practical lever to align GenAI productivity with environmental efficiency.
Interpretation of the experimental results: findings that certain prompt-based governance (operational constraints/decision rules) reduced footprint while preserving outputs, leading to the recommendation (argumentative claim).
Inference efficiency and system level optimisation are growing rapidly in the Green AI literature.
Temporal / thematic analysis of literature cited in the paper's mapping (asserted growth; no growth rates or counts provided in abstract).
Exposing codebase-specific verification mechanisms may significantly improve the performance of externally trained agents operating in unfamiliar environments.
Paper suggests that providing access to repository-specific verification (tests, static analysis) could improve externally trained agents based on observed advantage for models that used validation tools.
Iterative verification helps achieve effective agent behavior.
Paper infers from analysis (models using iterative verification achieved better performance) that iterative verification contributes to effective agent behavior.
Experts (pooled) forecast annualized GDP growth rising to around 4% under a 'rapid' AI progress scenario.
Conditional survey forecasts elicited under a described 'rapid' AI capabilities scenario (abstract summarizes pooled expert forecasts across groups). Exact sample sizes not provided in excerpt.
As a consequence of these dynamics, 'algorithmic unions' (organised, coordinated resistance) may evolve organically as a survival strategy against over-optimized management systems.
Interpretation/implication drawn from the EGT model results (theoretical suggestion), not supported by empirical observations in the paper.
Coordinated digital green development strategies are important to promote a more balanced and inclusive transition toward China’s dual-carbon goals.
Policy implication drawn from the study's empirical findings (AI reduces inequality while green innovation has not diffused), recommending coordinated digital and green development to achieve balanced outcomes.
The analysis implies specific implications for healthcare leadership and procurement (e.g., procurement and leadership should consider incentive and risk-allocation effects, not just task optimisation).
Authors' conclusions/recommendations drawn from the theoretical analysis and typology (prescriptive claim in the paper; no empirical evaluation reported in the abstract).
The occupational upgrading among women is consistent with task-based demand shifts associated with technological change and the entry of younger, more educated female cohorts.
Authors' interpretation linking observed reallocation patterns to task-based demand shifts and changing female cohort composition; supported by decomposition of employment flows and cohort/education patterns (as described).
These patterns suggest personality as a predictor of readiness beyond stage-based tailoring: vulnerable users benefit from targeted rather than comprehensive interventions.
Authors' inference from the clustered outcome patterns observed in the experiment (resilient/overcontrolled/undercontrolled differences) indicating personality moderates responsiveness to different intervention types.
Overcontrolled workers showed outcome-specific improvements with theory-driven AI.
Reported experimental finding: participants in the overcontrolled cluster improved on certain (outcome-specific) measures when assigned to the theory-driven AI (Trucey) condition.
Resilient workers achieved broad psychological gains primarily from the handbook.
Reported experimental result: resilient cluster exhibited broad psychological improvements, with the traditional negotiation handbook (Control-NoAI) producing those gains.
Autonomous coding agents, able to create branches, open pull requests, and perform code reviews, now actively contribute to real-world projects.
Empirical observations reported in the dataset and study showing agent-originated branches, PRs, and review actions in open-source projects (paper asserts these actions occurred in real projects).
Workplace organization (W) materially modifies the augmentation function so that two firms with identical technology investments can realize 'radically different' augmentation outcomes.
Conceptual claim supported by the paper's theoretical model (phi(D,W)) and cited empirical illustration (Colombia EDIT survey interaction result).
AI enhances innovation and productivity, even though it currently contributes to higher CO2 emissions.
Statement in the study linking AI adoption to improvements in innovation and productivity alongside the empirical finding of higher CO2 emissions (based on the same cross-country panel analysis over 2000–2023).
The revealed preference approach is a powerful mechanism for communicating human preferences to AI agents, but its success depends on careful implementation.
Overall findings from the online experiment showing higher predictive accuracy from revealed preferences combined with contextual results about subjects' choices and AI alignment; authors' synthesis and recommendation.
Because other AI systems exhibit similar scaling-law economics, the mechanisms identified extend beyond computer vision, reinforcing that partial automation is often the economically rational long-run outcome, not merely a transitional phase.
Theoretical argument generalized from scaling-law evidence in the paper; no additional cross-domain empirical evidence reported in the summary.
These findings support the practical value of structured intent representation as a robust, protocol-like communication layer for human-AI interaction.
Aggregate interpretation of the experimental results (cross-language variance reduction, model compensation pattern, equivalence of structured frameworks, and user-study improvements).
We further provide initial evidence that this AI-for-AI paradigm can transfer beyond the AI stack through experiments in mathematics and biomedicine.
Reported preliminary experiments in mathematics and biomedicine intended to test transfer beyond the AI development stack.
To our knowledge, ASI-Evolve is the first unified framework to demonstrate AI-driven discovery across three central components of AI development: data, architectures, and learning algorithms.
Authors' claim of primacy based on reported experiments demonstrating AI-driven discovery in pretraining data curation, neural architecture design, and reinforcement learning algorithm design.
Intelligent manufacturing policies can generate economically meaningful benefits by improving firms’ sustainability performance and the credibility of ESG information, which are central to capital allocation and the effectiveness of green governance.
Synthesis/implication drawn from the empirical findings reported in the paper (positive effects on ESG ratings, reduced greenwashing, and lower ESG uncertainty).
The growth of digital platforms contributes to the decentralization of job creation.
Paper cites contemporary data on the growth of digital platforms as part of its analysis (no specific platform-level datasets or sample sizes cited in the abstract).
The paper's predictions are consistent with practitioner reports.
Authors claim qualitative consistency with practitioner reports (no systematic survey/sample size provided in the provided text).
The paper's predictions are consistent with empirical observations from scientific productivity data.
Authors state they compare model predictions to scientific productivity data (no sample sizes or dataset details provided in the provided text).
The paper's predictions are consistent with empirical observations from AI coding benchmarks.
Authors state they compare model predictions to AI coding benchmark results (no sample sizes or specific benchmarks reported in the provided text).
An AI planner that uses a mix of static analysis with AI instructions can create migration plans for very complex code components that are reliably followed by the combination of an orchestrator and coders, using AI-generated example-based playbooks.
Methodological description and reported demonstrations in the paper (planner + orchestrator + coders following playbooks); no numeric sample size reported in abstract.
AI-enabled ESG ratings, green innovation, ethical AI, RegTech, and explainable AI in finance are becoming highly influential in international financial markets.
Paper identifies these themes as emerging and influential based on trends in the reviewed literature and topical focus areas; no quantitative adoption metrics or sample sizes are provided in the excerpt.
With experience, users issue more targeted queries and engage more deeply with supporting citations.
Longitudinal analysis of user behavior in the Asta dataset showing changes over time/with experience: increased use of targeted queries and higher engagement (clicks/inspect actions) with citations.
Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways.
Interaction-log analysis showing patterns of revisits, non-linear navigation between generated outputs and cited evidence within sessions in the Asta dataset.