Evidence (4793 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Productivity Remove filter

ChatGPT adoption among private households has been rapid following release, but adoption is far from uniform.

Descriptive adoption patterns measured from Comscore browsing data over time (pre- and post-Nov 30, 2022) on the household panel (2021–2024); time-series of observed ChatGPT site visits and adoption rates.

high positive https://arxiv.org/pdf/2603.03144 ChatGPT adoption rate over time

Despite the diminishing returns they predict, progress in practice has often continued through rapidly improving efficiency, visible for example in falling cost per token.

Observed industry/empirical trend cited in the paper (example: falling cost per token). No numerical samples or sample size given in the excerpt.

high positive The Unreasonable Effectiveness of Scaling Laws in AI cost per token and continued progress (performance improvements over time)

Scaling laws are largely empirical and observational, but they appear repeatedly across model families and increasingly across training-adjacent regimes.

Paper asserts repeated empirical appearance across model families and training-adjacent regimes; claim is descriptive/observational without sample size in the excerpt.

high positive The Unreasonable Effectiveness of Scaling Laws in AI generalizability (occurrence) of scaling-law patterns across model families and ...

Scaling laws make progress predictable, albeit at a declining rate.

Conceptual claim in the paper based on the power-law form of scaling laws (no numerical quantification or sample size provided in the excerpt).

high positive The Unreasonable Effectiveness of Scaling Laws in AI predictability of progress (model performance as compute increases)

Classical AI scaling laws, especially for pre-training, describe how training loss decreases with compute in a power-law form.

Stated observationally in the paper as established empirical regularity across pre-training runs and prior literature on scaling laws (no sample size or specific experiments reported in the excerpt).

high positive The Unreasonable Effectiveness of Scaling Laws in AI training loss

Task-level analyses show that activities expanded in AI-enabled projects—particularly ideation and experimentation—are increasingly compatible with large language model capabilities, suggesting potential for future productivity gains as these technologies mature.

Task-level classification mapping tasks described in proposals to LLM-relevant capabilities using LLM-based classification; finding that tasks expanded in AI-enabled projects cluster on ideation and experimentation, which align with current LLM strengths.

high positive Artificial Intelligence in Science: Returns, Reallocation, a... frequency/expansion of specific task categories (ideation, experimentation) and ...

AI-enabled projects undertake a broader set of tasks.

Task-level analysis of proposal descriptions (task inventories) classifying tasks via keyword extraction and LLMs, showing AI-enabled proposals list a wider variety of activities than non-AI proposals.

high positive Artificial Intelligence in Science: Returns, Reallocation, a... breadth/variety of tasks undertaken in projects

AI-enabled projects involve larger teams.

Comparison of team structure in proposals (team size) between AI-enabled and non-AI projects using the same comprehensive proposal dataset and LLM-based classification of AI presence.

high positive Artificial Intelligence in Science: Returns, Reallocation, a... team size / team structure

AI-enabled projects reallocate resources toward human capital (i.e., shift budget allocations toward labor / human capital).

Analysis of detailed budget allocations in the proposal dataset, comparing projects identified as AI-enabled versus non-AI projects using keyword extraction and LLM classification to identify AI presence and role.

high positive Artificial Intelligence in Science: Returns, Reallocation, a... budget allocation share toward human capital (labor share)

In the short run, AI adoption is associated with modest improvements in scientific outcomes concentrated in the upper tail.

Observational analysis linking identified AI presence in a comprehensive dataset of research proposals (funded and unfunded) to subsequent publication outcomes; AI presence identified via keyword extraction combined with large language model (LLM) classification; publication outcomes measured after proposal submission.

high positive Artificial Intelligence in Science: Returns, Reallocation, a... subsequent publication outcomes (scientific outcomes)

Education and workforce development should shift focus from rote knowledge accumulation to cultivating skills in human-AI collaboration, creative problem-solving, and the design of novel economic domains.

Normative policy recommendation derived from the paper's framework and analysis of anticipated labor market changes (no empirical evaluation or trial data reported in the abstract).

high positive AI Civilization and the Transformation of Work educational focus / skill composition

Human-AI co-evolution will significantly increase individual productivity and open new frontiers of economic activity.

Projected outcome based on combined analysis of AI capabilities, historical patterns, and platform growth; the abstract does not report empirical measurement or sample sizes for this projection.

high positive AI Civilization and the Transformation of Work individual productivity and emergence of new economic activities

AI-driven productivity augmentation dramatically lowers the barriers to creating economic value, enabling the decentralized generation of employment.

Argument supported by paper's analysis of contemporary labor market dynamics and the growth of digital platforms; no quantified empirical estimates or sample sizes provided in the abstract.

high positive AI Civilization and the Transformation of Work barriers to entry for value creation / individual productivity

The transition to an AI-civilization will fundamentally restructure the mechanisms of employment creation from a centralized model (few organizations creating jobs for the many) to a decentralized ecosystem where individuals are empowered to generate their own employment opportunities.

Central thesis of the paper, motivated by theoretical argumentation and synthesis of contemporary data on labor markets and digital platforms (no empirical test or sample sizes specified in the abstract).

high positive AI Civilization and the Transformation of Work structure/mechanism of employment creation (centralized vs decentralized)

Historical precedents from past technological revolutions suggest that innovation tends to expand, rather than shrink, the scope of economic activity and employment in the long run.

Paper draws on analysis of economic history (qualitative historical analysis implied; no specific historical datasets or sample sizes provided in the abstract).

high positive AI Civilization and the Transformation of Work scope of economic activity and long-run employment levels

Google has been pioneering machine learning usage across dozens of products.

Contextual statement in the abstract about the organization's activity; asserted without empirical detail in abstract.

high positive A Multi-agent AI System for Deep Learning Model Migration fr... extent of ML usage across Google products

The techniques and approaches described can be generalized for other framework migrations and general code transformation tasks.

Authors' stated expectation/generalization claim in the abstract; no empirical evidence or cross-framework experiments reported in the abstract.

high positive A Multi-agent AI System for Deep Learning Model Migration fr... generalizability to other framework migrations / code transformation tasks

The system creates a virtuous circle where effectively AI supports its own development workflow.

Conceptual claim supported by the system's design and reported improvements that enable iterative AI-assisted development; described qualitatively in the paper.

high positive A Multi-agent AI System for Deep Learning Model Migration fr... self-supporting/iterative improvement of AI-assisted development workflow

Our approach dramatically reduces the time (6.4x-8x speedup) for deep learning model migrations.

Quantitative speedup figure reported in the paper's abstract (6.4x-8x); likely based on measured migration times on demonstrated cases, though the abstract does not state sample size or exact experimental setup.

high positive A Multi-agent AI System for Deep Learning Model Migration fr... time required to perform deep learning model migrations

The system accelerates code migrations in a large hyperscaler environment on commercial real-world use-cases.

Reported demonstration and evaluation in a hyperscaler (commercial) environment using real-world cases as described in the paper; no detailed sample size given in abstract.

high positive A Multi-agent AI System for Deep Learning Model Migration fr... speed of code migrations in commercial/hyperscaler environment

We define quality metrics and AI-based judges that accelerate development when the code to evaluate has no tests and has to adhere to strict style and dependency requirements.

Design and implementation of quality metrics and AI-based judges described in the paper; claimed acceleration of development workflow (no numeric quantification in abstract).

high positive A Multi-agent AI System for Deep Learning Model Migration fr... development speed / time to develop when evaluating untested code under strict s...

We built an AI-based multi-agent system to support automatic migration of TensorFlow-based deep learning models into JAX-based ones.

System implementation and description in the paper; demonstration on real-world code migration tasks in a hyperscaler environment (qualitative description in abstract).

high positive A Multi-agent AI System for Deep Learning Model Migration fr... existence and functioning of an AI-based migration system

The productivity channel raises corporate cash flows and is equity-bullish.

Model mechanism described in the paper: productivity effects of AI increase corporate cash flows which, within the model, produce an equity-bullish effect on the ERP/valuations.

high positive When Does AI Raise the Equity Risk Premium? Displacement, Pa... corporate cash flows / equity risk premium

Efficient conversion of R&D into technological barriers is key to avoiding the 'AI trap'; new energy vehicle firms should prioritize R&D efficiency, translate innovation into stable returns, and maintain sound financial conditions.

Paper's conclusion/recommendation derived from empirical findings (2013–2023 sample) linking R&D conversion/patent transformation and intelligent equipment output to reduced financial risk from AI dependence.

high positive The 'Intelligent Trap' in Corporate Finance—A Study Based on... reduction of AI-related financial risk via R&D conversion; firm financial stabil...

Strong knowledge or intelligent equipment output and effective patent transformation mitigate the financial risks associated with AI dependence.

Moderation and heterogeneity tests reported in the paper using the same sample (listed NEV and automobile manufacturers, 2013–2023) indicate these factors reduce the adverse effect of AI dependence on financial safety.

high positive The 'Intelligent Trap' in Corporate Finance—A Study Based on... mitigation of corporate financial risk associated with AI dependence

The dataset, contexts, annotations, and evaluation harness are released publicly.

Paper states that dataset, contexts, annotations, and evaluation harness are released publicly (release / open-source claim).

high positive SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... public release / availability

A structured 2,000-token diff-with-summary prompt outperforms a 2,500-token full-context prompt (enriched with execution context, behaviour mapping, and test signatures) across all 8 models.

Direct prompt/context-size comparison across the 8 models on SWE-PRBench; reported that the 2,000-token diff-with-summary prompt yields better performance than the 2,500-token full-context prompt with extra enrichments.

high positive SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... model detection/performance under specific prompt/context designs

The LLM-as-judge framework used for evaluation is validated at kappa = 0.75.

Inter-judge validation reported in paper (agreement metric kappa reported as 0.75). Specific validation sample size not stated in the excerpt.

high positive SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... judge reliability / inter-annotator (or LLM-judge) agreement

Pull requests are drawn from active open-source repositories, filtered from 700 candidates using a Repository Quality Score.

Dataset curation procedure reported: initial pool of 700 candidate repositories/PRs filtered by a Repository Quality Score to produce the final benchmark.

high positive SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... data provenance / filtering (number of candidates filtered to final set)

We introduce SWE-PRBench, a benchmark of 350 pull requests with human-annotated ground truth for evaluating AI code review quality.

Dataset construction described in paper: benchmark contains 350 pull requests with human annotations. Pull requests drawn from active open-source repositories and filtered from 700 candidates using a Repository Quality Score.

high positive SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... benchmark size and availability (350 human-annotated PRs)

The paper concludes by articulating expected outcomes for management practice and proposes a research agenda calling for future mixed-methods validation of the framework.

Stated conclusion and explicit call for mixed-methods validation; no validation results provided in this paper.

high positive Behavioral Factors as Determinants of Successful Scaling of ... guidance for management practice and roadmap for empirical validation

The review derives constructs, hypothesized links among them, and governance implications for managing and institutionalizing workplace AI.

Paper reports that reviewed sources were used to derive constructs and governance implications; this is a conceptual derivation rather than empirical testing.

high positive Behavioral Factors as Determinants of Successful Scaling of ... set of constructs, hypothesized relationships, and governance recommendations

The framework and synthesis can be used to diagnose patterns of disengagement and pilot-to-production failure in corporate AI initiatives.

Proposed analytical structure derived from literature synthesis and conceptual mapping; intended as a diagnostic tool but not empirically validated within this paper.

high positive Behavioral Factors as Determinants of Successful Scaling of ... ability to diagnose disengagement and failure modes

The paper integrates adoption frameworks (TAM and TOE) with evidence on human-AI interaction to produce a scaling-oriented conceptual framework for diagnosing disengagement and pilot-to-production failures.

Comparative conceptual analysis and framework building based on reviewed literature; no new empirical validation reported.

high positive Behavioral Factors as Determinants of Successful Scaling of ... diagnostic capacity for identifying causes of disengagement and pilot-to-product...

Integrating technological, human, and organizational capabilities is important to maximize the benefits of AI in smart manufacturing.

Conclusion based on thematic patterns in interviews, observations, and document analysis from purposively sampled supply chain and production professionals; identified as an implementation implication.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... realization of AI benefits / implementation success

Firms adopting AI-driven forecasting and inventory strategies can achieve higher operational agility, better strategic resource alignment, and maintain a competitive advantage in dynamic manufacturing contexts.

Synthesis and implications drawn from thematic analysis of interviews, site visits, and documents from purposively sampled industry practitioners; presented as study conclusions rather than quantitatively tested outcomes.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... operational agility / strategic alignment / competitive advantage

AI supports sustainability initiatives within manufacturing operations.

Thematic analysis of practitioner interviews and organizational documentation where respondents linked AI-based forecasting/inventory optimization to sustainability outcomes (e.g., waste reduction).

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... sustainability outcomes (e.g., waste reduction)

AI improves supply chain coordination among partners and internal functions.

Interview and document-based thematic findings from purposively sampled supply chain managers and industry experts reporting enhanced coordination following AI adoption.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... supply chain coordination

AI contributes to operational resilience in manufacturing supply chains.

Qualitative evidence from interviews and organizational documents indicating that AI-enabled forecasting and inventory controls improve firms' ability to adapt to disruptions; thematic analysis produced resilience as a reported benefit.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... operational resilience

Organizational readiness, skilled personnel, data quality, and robust technological infrastructure are critical factors influencing AI effectiveness.

Recurring themes identified via thematic analysis of semi-structured interviews with supply chain and production professionals, corroborated by observational site visits and organizational documents from purposive sample.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... AI effectiveness (implementation success/performance)

AI reduces excess inventory levels in manufacturing firms.

Thematic findings from interviews, site visits, and documents from industry experts and practitioners who reported decreased excess inventory following AI-driven forecasting and inventory optimization.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... excess inventory levels

AI reduces stockouts in manufacturing supply chains.

Practitioner accounts and organizational document evidence from purposive qualitative sampling and thematic analysis indicating fewer stockouts associated with AI-driven forecasting and inventory controls.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... incidence of stockouts

AI adoption reduces operational inefficiencies in manufacturing processes.

Thematic analysis of qualitative data (semi-structured interviews, site observations, organizational documents) from purposively sampled industry practitioners reporting reductions in inefficiencies after AI implementation.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... operational inefficiencies

AI supports proactive decision-making among supply chain and production stakeholders.

Qualitative reports from interviews and document review with supply chain managers, production planners, and industry experts; thematic analysis identified proactive decision-making as a theme associated with AI use.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... proactivity of decision-making

AI enables adaptive inventory management in manufacturing operations.

Findings from thematic analysis of semi-structured interviews with supply chain managers, production planners, and industry experts, plus observational site visits and organizational documents (purposive sampling).

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... adaptive inventory management capability

AI technologies enhance forecasting accuracy in smart manufacturing.

Qualitative evidence from purposive sample of supply chain managers, production planners, and industry experts gathered via semi-structured interviews, observational site visits, and organizational documents; analyzed using thematic analysis.

high positive Assessing the Effectiveness of AI-Driven Techniques for Dema... forecasting accuracy

Our dataset is available at https://guide-bench.github.io.

Paper's statement providing a URL for dataset access.

high positive GUIDE: A Benchmark for Understanding and Assisting Users in ... dataset availability / accessibility

Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software (e.g., PowerPoint, Photoshop).

Motivating claim in the paper's introduction/abstract, based on prior work and the authors' argument about potential application domains.

high positive GUIDE: A Benchmark for Understanding and Assisting Users in ... potential for GUI agents to assist users

Providing user context significantly improved the performance, raising help prediction by up to 50.2pp.

Experimental comparison reported in the paper showing differences in Help Prediction performance with and without provided user context; reported improvement magnitude of up to 50.2 percentage points.

high positive GUIDE: A Benchmark for Understanding and Assisting Users in ... improvement in help prediction accuracy when user context is provided

GUIDE defines three tasks - (i) Behavior State Detection, (ii) Intent Prediction, and (iii) Help Prediction that test a model's ability to recognize behavior state, reason about goals, and decide when and how to help.

Paper's benchmark/task definitions describing three evaluation tasks and their goals.

high positive GUIDE: A Benchmark for Understanding and Assisting Users in ... task definitions evaluating model capabilities (behavior detection, intent predi...

« Prev 1 2 3 … 26 27 28 … 95 96 Next »