Evidence (14922 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	795	210	105	955	2131
Governance & Regulation	886	414	197	126	1654
Organizational Efficiency	826	204	129	87	1257
Technology Adoption Rate	681	259	128	110	1189
Research Productivity	464	138	65	349	1028
Output Quality	503	196	61	53	813
Decision Quality	351	180	84	51	673
AI Safety & Ethics	238	288	71	34	637
Firm Productivity	455	58	92	20	631
Market Structure	186	172	123	25	511
Task Allocation	222	70	76	34	407
Innovation Output	238	28	48	18	334
Skill Acquisition	177	62	62	17	318
Employment Level	107	57	108	13	287
Fiscal & Macroeconomic	135	72	44	26	284
Firm Revenue	172	50	28	5	256
Consumer Welfare	121	68	45	12	246
Task Completion Time	183	33	10	13	240
Inequality Measures	45	126	50	6	227
Worker Satisfaction	95	74	23	12	204
Error Rate	77	98	11	4	190
Regulatory Compliance	84	73	17	7	181
Automation Exposure	61	61	27	14	166
Training Effectiveness	98	21	14	19	154
Wages & Compensation	78	37	25	6	146
Developer Productivity	105	18	14	6	144
Team Performance	87	17	28	10	143
Job Displacement	12	83	23	1	119
Hiring & Recruitment	53	8	8	3	72
Social Protection	39	17	8	2	66
Creative Output	32	20	8	3	64
Skill Obsolescence	5	50	6	1	62
Labor Share of Income	17	20	17	—	54
Worker Turnover	15	15	—	3	33
Industry	—	—	—	1	1

Heterogeneity analysis: universities bridge distant domains through knowledge diversity.

Stratified/heterogeneity analysis reported in the paper showing that university actors are associated with cross-domain bridging and higher measured knowledge diversity in the diffusion paths.

medium positive Mapping China’s digital transformation: a multilayer network... universities' role in bridging domains via knowledge diversity

LinuxArena is the largest and most diverse control setting for software engineering to date.

Authors assert this comparative claim based on the reported scale and diversity (20 environments, 1,671 main tasks, 184 side tasks); no detailed comparison data included in the excerpt.

medium positive LinuxArena: A Control Setting for AI Agents in Live Producti... relative size and diversity of the control setting compared to prior work

Our findings can help practitioners, educators, and policymakers promote responsible and effective use of AI tools.

Authors assert applicability of their qualitative findings and the proposed framework (derived from 22 interviews) to inform stakeholders.

medium positive Towards an Appropriate Level of Reliance on AI: A Preliminar... promotion of responsible and effective AI use (policy/education/practice guidanc...

We demonstrate that, by modifying the agent's tools (FreeCAD and the assembly solver), we are able to create a strong verification signal which enables our system to build 3D assemblies with movable parts.

Claim of experimental demonstration: authors state they modified tools (FreeCAD and assembly solver) to create a verification signal enabling building of movable 3D assemblies. Implied evidence is demonstrations/experiments in the paper (details, sample sizes, benchmarks not included in excerpt).

medium positive Agent-Aided Design for Dynamic CAD Models ability to build 3D assemblies with movable parts (enabled by enhanced verificat...

This design decision allows AADvark to reason directly about assemblies with moving parts and can thereby achieve cross-cutting goals, including but not limited to mechanical movements.

Claim about functional consequence of the design choice (ability to reason about moving assemblies and achieve related goals); evidence implied to be from system behavior/demonstrations in the paper but not provided in the excerpt.

medium positive Agent-Aided Design for Dynamic CAD Models reasoning about assemblies with moving parts / achieving mechanical movement goa...

In simulation (chess, using learned human models from large-scale gameplay data), our approach consistently outperforms interventions based on the strongest chess engine (Stockfish) across a wide range of settings.

Simulation experiments in chess using models of human play trained from large-scale gameplay data; comparisons against Stockfish-based interventions (details described in paper).

medium positive Improving Human Performance with Value-Aware Interventions: ... assisted player performance in simulations (chess game outcomes / score improvem...

This paper presents the first comparative study of game-theoretic mechanisms designed to enable cooperative outcomes between rational agents in equilibrium.

Authors' characterization of their contribution: a comparative study across four social dilemmas evaluating multiple mechanisms; no external validation provided in excerpt.

medium positive CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... existence of a comparative study of equilibrium-enabling mechanisms

Data elements provide a unique mechanism that enables late‑entrant firms to catch up technologically.

Interpretation drawn from the observed stronger positive association between data factor utilization and AI patent output among low‑TFP (late‑entrant) firms in the panel analyses.

medium positive The level of data element utilization in the integration of ... technological catch‑up (proxied by AI patent output increases among late entrant...

These results demonstrate that hierarchical agent systems can automate the full AI model development process from task specification to deployable model, suggesting a pathway toward broadly accessible AI development with minimal human intervention.

Interpretation and conclusion based on the reported evaluation results on MLE-Bench.

medium positive AIBuildAI: An AI Agent for Automatically Building AI Models feasibility of end-to-end automation of AI development and accessibility of AI d...

Extensive empirical observation in the paper suggests that context completeness may be more strongly associated with output quality than prompting technique alone.

Interpretive statement based on the observational study results described in the paper (comparisons of structured vs. baseline interactions and associated acceptance/iteration metrics).

medium positive Context Engineering: A Practitioner Methodology for Structur... association between context completeness and output quality

AI adoption improves efficiency, cost reduction, and strategic innovation.

Synthesis across included empirical studies reporting organizational outcomes following AI implementation (effects reported qualitatively across the 27 studies).

medium positive Artificial Intelligence for Business Decision-Making in Lati... efficiency, costs, and innovation outcomes

Perceived complexity is not overly high (i.e., AI adoption was not seen as overly difficult to implement), which supports adoption.

Paper reports complexity as one of the DOI-related variables and states that adoption is not perceived as overly difficult—based on responses from 110 ICT professionals and PLS-SEM results.

medium positive Drivers of AI Adoption: The Role of Innovation Attributes, O... AI adoption (in relation to perceived complexity)

Perceived operational benefits (clear operational value) of AI encourage its adoption.

Paper lists perceived benefits / operational value as one of the examined variables and reports it as a positive factor in the outcomes (from survey of 110, analyzed with PLS-SEM).

medium positive Drivers of AI Adoption: The Role of Innovation Attributes, O... AI adoption

Exploitative innovation is associated with performance through incremental efficiency mechanisms.

Authors' interpretation of model results from the survey (104 managers) suggesting exploitative innovation improves performance via incremental efficiency, though specific mechanisms were not separately measured.

medium positive Generative AI Adoption in B2B Firms: Ethical Governance, Inn... long-term competitive performance

Progress on ClawBench brings us closer to AI agents that can function as reliable general-purpose assistants.

Author's concluding / aspirational statement regarding the benchmark's role in tracking and fostering progress.

medium positive ClawBench: Can AI Agents Complete Everyday Online Tasks? long-term_agent_reliability / general-purpose_assistant_capability

This is the first impossibility result in AI governance, establishing a formal boundary below which current paradigms remain valid and above which distributed accountability mechanisms become necessary.

Claim of novelty in the paper (author assertion). The paper provides the formal theorem and discusses implications; novelty relative to prior literature is asserted but not empirically demonstrated.

medium positive The Accountability Horizon: An Impossibility Theorem for Gov... novelty (first impossibility result) and policy implication (necessity of distri...

This approach significantly reduced regression risk during large-scale refactoring.

Case study reports that model-generated tests plus constrained refactoring validated by passing tests reduced regression risk; qualitative claim without quantitative metric in abstract.

medium positive AI-Assisted Unit Test Writing and Test-Driven Code Refactori... regression risk

These findings establish a framework for evidence-based policy interventions to align the NIH AI portfolio with health equity goals and strategic research priorities.

Interpretive/concluding statement proposing that the reported portfolio analysis can inform policy interventions; framed as implication rather than an empirical result.

medium positive An Analysis of Artificial Intelligence Adoption in NIH-Funde... use of analysis to support policy interventions

EcoAssist increased developers' awareness of energy use.

Reported outcome from the controlled study with 20 developers (stated increase in awareness); no numeric magnitude provided in excerpt.

medium positive EcoAssist: Embedding Sustainability into AI-Assisted Fronten... developers' awareness of energy use

The positive effect of AIRC on productivity is mediated through improvements in reproducibility.

Structural equation modeling (SEM) reports mediation through reproducibility metrics in the OECD panel analysis.

medium positive AI-Augmented Peer Review and Scientific Productivity: A Cros... research reproducibility (as mediator)

The positive effect of AIRC on productivity is mediated through improvements in review efficiency.

Structural equation modeling (SEM) indicates mediation paths from AIRC to productivity via measures of review efficiency in the panel data.

medium positive AI-Augmented Peer Review and Scientific Productivity: A Cros... review efficiency (as mediator)

Human-in-the-loop governance is a practical lever to align GenAI productivity with environmental efficiency.

Interpretation of the experimental results: findings that certain prompt-based governance (operational constraints/decision rules) reduced footprint while preserving outputs, leading to the recommendation (argumentative claim).

medium positive On the Carbon Footprint of Economic Research in the Age of G... alignment between GenAI-assisted productivity and environmental efficiency via g...

Inference efficiency and system level optimisation are growing rapidly in the Green AI literature.

Temporal / thematic analysis of literature cited in the paper's mapping (asserted growth; no growth rates or counts provided in abstract).

medium positive On the Carbon Footprint of Economic Research in the Age of G... growth of specific research themes (inference efficiency, system-level optimisat...

Exposing codebase-specific verification mechanisms may significantly improve the performance of externally trained agents operating in unfamiliar environments.

Paper suggests that providing access to repository-specific verification (tests, static analysis) could improve externally trained agents based on observed advantage for models that used validation tools.

medium positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... performance of externally trained agents in unfamiliar codebases

Iterative verification helps achieve effective agent behavior.

Paper infers from analysis (models using iterative verification achieved better performance) that iterative verification contributes to effective agent behavior.

medium positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... agent effectiveness (behavior leading to task success)

Experts (pooled) forecast annualized GDP growth rising to around 4% under a 'rapid' AI progress scenario.

Conditional survey forecasts elicited under a described 'rapid' AI capabilities scenario (abstract summarizes pooled expert forecasts across groups). Exact sample sizes not provided in excerpt.

medium positive Forecasting the Economic Effects of AI annualized GDP growth under rapid AI scenario

As a consequence of these dynamics, 'algorithmic unions' (organised, coordinated resistance) may evolve organically as a survival strategy against over-optimized management systems.

Interpretation/implication drawn from the EGT model results (theoretical suggestion), not supported by empirical observations in the paper.

medium positive THE RED QUEEN in the DASHBOARD: CO-EVOLUTIONARY DYNAMICS of ... emergence / viability of organized coordinated resistance ('algorithmic unions')

Coordinated digital green development strategies are important to promote a more balanced and inclusive transition toward China’s dual-carbon goals.

Policy implication drawn from the study's empirical findings (AI reduces inequality while green innovation has not diffused), recommending coordinated digital and green development to achieve balanced outcomes.

medium positive Artificial intelligence, green innovation, and regional carb... balanced and inclusive transition to carbon peak and neutrality goals

The analysis implies specific implications for healthcare leadership and procurement (e.g., procurement and leadership should consider incentive and risk-allocation effects, not just task optimisation).

Authors' conclusions/recommendations drawn from the theoretical analysis and typology (prescriptive claim in the paper; no empirical evaluation reported in the abstract).

medium positive Incentives, Equilibria, and the Limits of Healthcare AI: A G... recommended focus of healthcare leadership and procurement decisions

The occupational upgrading among women is consistent with task-based demand shifts associated with technological change and the entry of younger, more educated female cohorts.

Authors' interpretation linking observed reallocation patterns to task-based demand shifts and changing female cohort composition; supported by decomposition of employment flows and cohort/education patterns (as described).

medium positive Routine-Biased Technological Change and the Gender Wage Gap ... consistency of observed upgrading with task-demand shifts and cohort composition

These patterns suggest personality as a predictor of readiness beyond stage-based tailoring: vulnerable users benefit from targeted rather than comprehensive interventions.

Authors' inference from the clustered outcome patterns observed in the experiment (resilient/overcontrolled/undercontrolled differences) indicating personality moderates responsiveness to different intervention types.

medium positive Not My Truce: Personality Differences in AI-Mediated Workpla... readiness/responsiveness to interventions (i.e., likelihood of benefit from targ...

Overcontrolled workers showed outcome-specific improvements with theory-driven AI.

Reported experimental finding: participants in the overcontrolled cluster improved on certain (outcome-specific) measures when assigned to the theory-driven AI (Trucey) condition.

medium positive Not My Truce: Personality Differences in AI-Mediated Workpla... outcome-specific improvements (unspecified in abstract; likely negotiation-relev...

Resilient workers achieved broad psychological gains primarily from the handbook.

Reported experimental result: resilient cluster exhibited broad psychological improvements, with the traditional negotiation handbook (Control-NoAI) producing those gains.

medium positive Not My Truce: Personality Differences in AI-Mediated Workpla... psychological gains (broad, unspecified psychological measures)

Autonomous coding agents, able to create branches, open pull requests, and perform code reviews, now actively contribute to real-world projects.

Empirical observations reported in the dataset and study showing agent-originated branches, PRs, and review actions in open-source projects (paper asserts these actions occurred in real projects).

medium positive Investigating Autonomous Agent Contributions in the Wild: Ac... presence of agent-originated development activities (branches, PRs, reviews)

Workplace organization (W) materially modifies the augmentation function so that two firms with identical technology investments can realize 'radically different' augmentation outcomes.

Conceptual claim supported by the paper's theoretical model (phi(D,W)) and cited empirical illustration (Colombia EDIT survey interaction result).

medium positive From Automation to Augmentation: A Framework for Designing H... augmentation outcomes / returns to technology

AI enhances innovation and productivity, even though it currently contributes to higher CO2 emissions.

Statement in the study linking AI adoption to improvements in innovation and productivity alongside the empirical finding of higher CO2 emissions (based on the same cross-country panel analysis over 2000–2023).

medium positive Artificial Intelligence: A Blessing or a Curse for Climate A... innovation and productivity

The revealed preference approach is a powerful mechanism for communicating human preferences to AI agents, but its success depends on careful implementation.

Overall findings from the online experiment showing higher predictive accuracy from revealed preferences combined with contextual results about subjects' choices and AI alignment; authors' synthesis and recommendation.

medium positive Should I State or Should I Show? Aligning AI with Human Pref... effectiveness of revealed-preference communication for aligning AI with human pr...

Because other AI systems exhibit similar scaling-law economics, the mechanisms identified extend beyond computer vision, reinforcing that partial automation is often the economically rational long-run outcome, not merely a transitional phase.

Theoretical argument generalized from scaling-law evidence in the paper; no additional cross-domain empirical evidence reported in the summary.

medium positive Economics of Human and AI Collaboration: When is Partial Aut... prevalence of partial automation across AI application domains

These findings support the practical value of structured intent representation as a robust, protocol-like communication layer for human-AI interaction.

Aggregate interpretation of the experimental results (cross-language variance reduction, model compensation pattern, equivalence of structured frameworks, and user-study improvements).

medium positive Structured Intent as a Protocol-Like Communication Layer: Cr... practical utility / robustness of structured intent representations

We further provide initial evidence that this AI-for-AI paradigm can transfer beyond the AI stack through experiments in mathematics and biomedicine.

Reported preliminary experiments in mathematics and biomedicine intended to test transfer beyond the AI development stack.

medium positive ASI-Evolve: AI Accelerates AI transferability of AI-for-AI paradigm to domains outside core AI (mathematics an...

To our knowledge, ASI-Evolve is the first unified framework to demonstrate AI-driven discovery across three central components of AI development: data, architectures, and learning algorithms.

Authors' claim of primacy based on reported experiments demonstrating AI-driven discovery in pretraining data curation, neural architecture design, and reinforcement learning algorithm design.

medium positive ASI-Evolve: AI Accelerates AI breadth of AI-driven discovery across data, architectures, and learning algorith...

Intelligent manufacturing policies can generate economically meaningful benefits by improving firms’ sustainability performance and the credibility of ESG information, which are central to capital allocation and the effectiveness of green governance.

Synthesis/implication drawn from the empirical findings reported in the paper (positive effects on ESG ratings, reduced greenwashing, and lower ESG uncertainty).

medium positive Intelligent Manufacturing Demonstration Projects Driving Cor... sustainability performance and credibility of ESG information

The growth of digital platforms contributes to the decentralization of job creation.

Paper cites contemporary data on the growth of digital platforms as part of its analysis (no specific platform-level datasets or sample sizes cited in the abstract).

medium positive AI Civilization and the Transformation of Work role of digital platforms in job creation / decentralization

The paper's predictions are consistent with practitioner reports.

Authors claim qualitative consistency with practitioner reports (no systematic survey/sample size provided in the provided text).

medium positive The Novelty Bottleneck: A Framework for Understanding Human ... qualitative alignment with practitioner experiences

The paper's predictions are consistent with empirical observations from scientific productivity data.

Authors state they compare model predictions to scientific productivity data (no sample sizes or dataset details provided in the provided text).

medium positive The Novelty Bottleneck: A Framework for Understanding Human ... consistency with scientific productivity patterns

The paper's predictions are consistent with empirical observations from AI coding benchmarks.

Authors state they compare model predictions to AI coding benchmark results (no sample sizes or specific benchmarks reported in the provided text).

medium positive The Novelty Bottleneck: A Framework for Understanding Human ... consistency with AI coding benchmark performance

An AI planner that uses a mix of static analysis with AI instructions can create migration plans for very complex code components that are reliably followed by the combination of an orchestrator and coders, using AI-generated example-based playbooks.

Methodological description and reported demonstrations in the paper (planner + orchestrator + coders following playbooks); no numeric sample size reported in abstract.

medium positive A Multi-agent AI System for Deep Learning Model Migration fr... reliability of migration plans being followed (plan adherence)

AI-enabled ESG ratings, green innovation, ethical AI, RegTech, and explainable AI in finance are becoming highly influential in international financial markets.

Paper identifies these themes as emerging and influential based on trends in the reviewed literature and topical focus areas; no quantitative adoption metrics or sample sizes are provided in the excerpt.

medium positive Artificial intelligence in sustainable finance and Environme... influence/adoption of specific AI-related ESG themes in financial markets

With experience, users issue more targeted queries and engage more deeply with supporting citations.

Longitudinal analysis of user behavior in the Asta dataset showing changes over time/with experience: increased use of targeted queries and higher engagement (clicks/inspect actions) with citations.

medium positive Understanding Usage and Engagement in AI-Powered Scientific ... targeted query frequency and citation engagement over user experience/time

Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways.

Interaction-log analysis showing patterns of revisits, non-linear navigation between generated outputs and cited evidence within sessions in the Asta dataset.

medium positive Understanding Usage and Engagement in AI-Powered Scientific ... revisit and navigation behavior (frequency of revisits, non-linear navigation pa...

« Prev 1 2 3 … 243 244 245 … 298 299 Next »