Evidence (7953 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Training footprint is the largest cluster in the mapped Green AI literature.

Result from the paper's literature mapping / clustering (statement in abstract; no numeric cluster sizes given).

high positive On the Carbon Footprint of Economic Research in the Age of G... relative prevalence (cluster size) of 'training footprint' theme

We map the recent Green AI literature into seven themes: training footprint is the largest cluster, while inference efficiency and system level optimisation are growing rapidly, alongside measurement protocols, green algorithms, governance, and security and efficiency trade-offs.

Bibliometric / thematic mapping of recent Green AI literature described in the paper (method: literature mapping; exact number of papers or mapping procedure not specified in abstract).

high positive On the Carbon Footprint of Economic Research in the Age of G... distribution of themes within Green AI literature (theme prevalence and growth)

Compared to relationship-based debt, stable equity significantly promotes high-quality development in the high-end equipment manufacturing and new energy industries.

Comparative subgroup regression analysis on the same dataset (743 listed enterprises, 2014–2023) indicating that the coefficient for stable equity is significantly larger than that for relationship-based debt in the high-end equipment manufacturing and new energy industry subsamples.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (comparison of effects by financing type...

The effects of two distinct forms of patient capital—stable equity and relationship-based debt—are more pronounced in promoting high-quality development in the new energy vehicle industry, energy conservation and environmental protection industry, biotechnology industry, new materials industry, and next-generation information technology industry.

Industry heterogeneity / subgroup analyses on the 2014–2023 panel of 743 listed firms showing stronger estimated effects of both stable equity and relationship-based debt on firm high-quality development within these specified industries.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (industry-specific stronger effects of t...

The impact of patient capital on the high-quality development of enterprises exhibits regional heterogeneity: enterprises in the central region are more sensitive to patient capital in terms of high-quality development.

Subsample/regional heterogeneity analysis on the panel of 743 listed enterprises (2014–2023) comparing region-specific coefficients and finding a larger/stronger effect in the central region.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (differential effect across regions)

The application of artificial intelligence enhances the positive impact of patient capital on the high-quality development of enterprises in strategic emerging industries.

Moderation analysis using the same firm panel (743 listed enterprises, 2014–2023) that includes an interaction term between patient capital and measures of AI application, with the interaction reported as positive and statistically significant.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (moderated by AI application)

Patient capital promotes the high-quality development of these enterprises by easing financing constraints.

Mediation analysis on panel data of 743 listed firms (2014–2023) reporting that financing-constraint indicators mediate the impact of patient capital on firm high-quality development.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by financing constraints)

Patient capital promotes the high-quality development of these enterprises by alleviating information asymmetry.

Mediation tests using firm-level panel data (743 listed enterprises, 2014–2023) that include measures of information asymmetry and show a mediating effect in the patient capital → high-quality development pathway.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by information asymmetry)

Patient capital promotes the high-quality development of these enterprises by enhancing the level of synergy in digital and green transformation (digital-green transformation synergy).

Mediation analysis on the same panel (743 listed enterprises, 2014–2023) showing that measures of digital-green transformation synergy mediate the relationship between patient capital and firm high-quality development.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by digital-green transformatio...

Patient capital plays a significant role in promoting the high-quality development of enterprises in strategic emerging industries.

Empirical analysis using panel data from 743 listed enterprises in China’s strategic emerging industries over 2014–2023; regression analysis reporting a statistically significant positive coefficient for patient capital on a firm-level measure of high-quality development.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (firm-level)

Average ratings [for same-caste matches were] up to 25% higher (on a 10-point scale) than inter-caste matches.

Quantitative result reported in the analysis comparing average ratings (10-point scale) between same-caste and inter-caste matches; statement specifies magnitude 'up to 25%'.

high positive Sima AIunty: Caste Audit in LLM-Driven Matchmaking average rating on a 10-point scale

Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably.

Reported results across evaluated LLMs showing consistent patterns where same-caste profile pairings received higher ratings than inter-caste pairings.

high positive Sima AIunty: Caste Audit in LLM-Driven Matchmaking favorability ratings for same-caste vs inter-caste matches

We share our methodology and lessons learned to enable other organizations to construct similar production-derived benchmarks.

Paper states intention and contribution: releasing methodology and lessons to allow replication by other organizations.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... ability of other organizations to construct similar benchmarks

We detail data collection and curation practices including LLM-based task classification, test relevance validation, and multi-run stability checks to address challenges in constructing reliable evaluation signals from monorepo environments.

Methodological description in paper listing specific practices (LLM-based classification, test relevance validation, multi-run stability checks) aimed at producing reliable evaluation signals in monorepos.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... reliability of evaluation signals derived from monorepo environments

Models making greater use of work validation tools, such as executing tests and invoking static analysis, achieve higher solve rates.

Reported relationship from paper's analysis correlating models' use of verification tools (test execution, static analysis) with higher solve rates across evaluated models.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... solve rate (task success) as a function of verification tool usage

Systematic analysis of four foundation models yields solve rates from 53.2% to 72.2%.

Empirical evaluation reported in paper: four foundation models were evaluated on the ProdCodeBench benchmark producing reported solve-rate range.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... solve rate (task success rate)

Each curated sample consists of a verbatim prompt, a committed code change and fail-to-pass tests spanning seven programming languages.

Descriptive dataset claim in paper specifying components of each sample and that samples cover seven programming languages.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... dataset composition (prompt, code change, tests) and language coverage (7 langua...

We present ProdCodeBench, a benchmark built from real sessions with a production AI coding assistant.

Paper describes methodology and introduces ProdCodeBench explicitly as constructed from real production assistant sessions.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... existence and provenance of benchmark (production-derived dataset)

Benchmarks that reflect production workloads are better for evaluating AI coding agents in industrial settings.

Argument presented in paper motivating creation of production-derived benchmark; no specific empirical comparison to alternative benchmarks reported in the abstract.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... quality of evaluation for AI coding agents (suitability of benchmark)

Carbon emissions initially increase with the expansion of robotics manufacturing.

Panel regressions on the 277 Chinese prefecture-level cities (2008–2019) showing the left-hand (rising) portion of the inverted U-shaped relationship.

high positive Exploring the nonlinear relationship between robotics manufa... urban carbon emissions

A representative incident (ISS-004) demonstrated boundary-based containment with 10-minute detection latency, zero user exposure, and 80-minute resolution.

Incident ISS-004 report in the paper giving specific timings for detection latency (10 minutes), user exposure (zero), and resolution (80 minutes).

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... incident detection latency, user exposure, and time-to-resolution

The multi-agent approach improved reliability: audited handoffs detected and blocked a coordinate transformation error affecting all 2,452 stations before publication.

Incident detection reported in the SF2Bench deployment where audited handoffs prevented publication of a coordinate transformation error that would have affected all 2,452 stations.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... detection/blocking of a systemic coordinate transformation error (error preventi...

The multi-agent approach improved efficiency — the SF2Bench deployment was completed by a single operator in two days with repeated artifact reuse across deployments.

Operational report from the production deployment: single operator completion time of two days and reuse of artifacts across deployments as stated in the paper.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... time to complete deployment (task completion time) and operator effort

SF2Bench, a compound flooding benchmark comprising 2,452 monitoring stations and 8,557 published files spanning 39 years, validates the multi-agent workflow.

Reported dataset composition and use in the paper: SF2Bench with stated counts and temporal span used to validate the multi-agent workflow.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... scale and temporal coverage of benchmark used to validate workflow (stations, fi...

EnviSmart treats reliability as an architectural property through two mechanisms: (1) a three-track knowledge architecture that externalizes behaviors (governance constraints), domain knowledge (retrievable context), and skills (tool-using procedures) as persistent, interlocking artifacts; and (2) a role-separated multi-agent design where deterministic validators and audited handoffs restore fail-stop semantics at trust boundaries before irreversible steps.

System architecture and design description in the paper; presented as the core reliability mechanisms implemented in EnviSmart.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... architectural approach to reliability (design features implemented)

We introduce EnviSmart, a production data management system deployed on campus-wide storage infrastructure for environmental research.

System description and statement of deployment in the paper; presented as a production deployment (no randomized evaluation reported).

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... existence and production deployment of EnviSmart

Embedding LLM-driven agents into environmental FAIR data management can externalize operational knowledge and scale curation across heterogeneous data and evolving conventions.

Conceptual / argumentative claim made in the paper as a motivation for the system; no quantitative experiment tied to this statement in the excerpt.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... ability to externalize operational knowledge and scale curation

Overcoming the structural skill deficit through deliberate investment in tertiary education reform and strong private-public partnerships for continuous vocational learning is mandatory for Nigeria to successfully leverage the AI revolution for inclusive economic growth and ensure long-term workforce resilience.

Study conclusion synthesizing survey results (150 firms) and qualitative policy/workforce analysis to make policy recommendations.

high positive Human Capital and the AI-Powered Future of Work: (Training, ... inclusive economic growth and long-term workforce resilience

The rate of new job creation hinges critically on the immediate implementation of targeted, scalable reskilling programs.

Paper's projections and analysis drawing on the survey of 150 firms and qualitative interviews; presented as a conditional/projection based on current skills gap and training initiatives.

high positive Human Capital and the AI-Powered Future of Work: (Training, ... rate of new job creation

The agentic-specificity classification helps organizations distinguish challenges that require novel approaches from those that are addressable with established practices.

Authors' proposed classification (agentic-specific vs. carried-over/amplified) intended as a practical decision aid; derived from the coding and comparative analysis.

high positive BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION practical_utility_of_agentic_specificity_classification

The taxonomy provides a diagnostic framework for identifying priority barrier dimensions and understanding cross-dimensional amplification mechanisms.

Authors present a taxonomy derived from the review and claim it can be used diagnostically by organizations; supported by the coded barrier classification and STS mapping.

high positive BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION usefulness_of_taxonomy_for_diagnosis

Azar et al. (2023) show that monopsonistic employers have stronger incentives to automate, and US commuting zones with higher labor market concentration experienced more robot adoption.

Citation to Azar et al. (2023) empirical evidence reported in the paper.

high positive Steering Technological Progress robot adoption correlated with labor market concentration

Noy and Zhang (2023) and Brynjolfsson et al. (2025) provide emerging empirical evidence that AI can function as a labor-complementary technology when designed to do so.

Cited empirical studies referenced in the paper arguing that certain AI applications complement human labor.

high positive Steering Technological Progress AI's complementarity to labor / effect on labor demand

Eloundou et al. (2024) predict that half of US jobs are significantly exposed to recent advances in generative AI.

Citation to Eloundou et al. (2024) empirical study reported in the paper's introduction.

high positive Steering Technological Progress share of US jobs exposed to generative AI

Firms may not sufficiently account for non-monetary aspects (safety, meaning of work) when choosing technologies; a planner would include these non-monetary considerations in steering technological progress.

Theoretical argument and model extension in Section 6 on monetary vs non-monetary aspects of technology choices.

high positive Steering Technological Progress inclusion of non-monetary considerations in technology choice

In multi-good economies, a planner can raise poor agents' real incomes not only by affecting factor incomes but also by focusing technological progress on making goods cheaper that are disproportionately consumed by poorer agents.

Extension of the baseline model to multiple goods (Section 5) identifying distributional consumption-channel effects.

high positive Steering Technological Progress real income of poorer agents

When capital and labor are gross complements, a planner concerned with workers' welfare would favor capital-augmenting innovations to raise wages.

Analytical result from a factor-augmenting application of the paper's model examining complementarity conditions between capital and labor.

high positive Steering Technological Progress wages

A welfare-maximizing planner will impose positive robot taxes when robots substitute for human labor, with the optimal tax rate increasing in the planner's concern for workers' welfare.

Model application to robot taxation presented in the paper; comparative statics on planner weights.

high positive Steering Technological Progress optimal robot tax rate

When redistribution is costly or incomplete, production efficiency is no longer optimal and a planner will distort technology choice to improve distribution (i.e., engage more in steering).

Theoretical derivation extending Atkinson-Stiglitz framework with endogenous technology and costly redistribution; comparative statics on redistribution cost.

high positive Steering Technological Progress extent of technological steering

The welfare benefits of steering technological progress are greater the less efficient social safety nets are.

Theoretical result derived in the paper's baseline and extended models analyzing a planner who can shape technology choices and faces costly/incomplete redistribution.

high positive Steering Technological Progress welfare benefits of technological steering

In the short run, with fixed human capital, wages, and job boundaries, AI raises productivity by reducing the time required to perform steps.

Model distinction between short-run (fixed job design and skills) and long-run horizons; short-run optimization shows AI reduces expected execution times for steps, thereby raising productivity.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation time required to complete production steps (task completion time)

Aggregating heterogeneous firms that deploy a commonly available AI technology yields an aggregate production function that admits a constant elasticity of substitution (CES) representation with three inputs: aggregate manual labor, aggregate AI-assisted labor, and aggregate capital.

Theoretical aggregation argument drawing on Houthakker (1955) and Levhari (1968), deriving a macro-level CES representation from a microfounded algorithmic cost function defined by firms' joint optimization over AI deployment and job design.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation form of the aggregate production function (CES representation and separability o...

Improvements in AI quality generate non-linear effects on labor demand and wages because firms' cost-minimizing AI deployment and job designs change discretely at particular AI quality thresholds (microfoundation for the productivity J-curve).

Theoretical analysis of discrete switches in the cost-minimizing arrangement as AI success probability and execution times change; characterization of threshold effects and discussion linking to the J-curve phenomenon (model results and comparative statics).

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation labor demand and wages response to AI quality improvements (non-linear threshold...

Adjacency to AI-executed steps increases the likelihood that a given step is executed by AI (local complementarities): a step is more likely to be AI-executed in occupations where its neighboring steps are also AI-executed.

Empirical comparisons of conceptually similar steps across occupations paired with workflow adjacency information and realized AI execution outcomes from Anthropic’s Economic Index; statistical tests reported in the paper.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation probability (or likelihood) that a step is AI-executed conditional on neighborin...

AI-executed steps co-occur in contiguous chains rather than being randomly scattered across a production workflow.

Empirical analysis linking O*NET tasks to human assessments of AI exposure (Eloundou et al., 2024), realized AI execution outcomes from Anthropic’s Economic Index (Handa et al., 2025), and GPT-generated workflow orderings for occupations; statistical tests comparing observed contiguity to random/scaled baselines reported in the paper.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation contiguity of AI-executed steps in occupation workflows

Instrumenting AI use cases with treatment assignment suggests each additional AI use case prompted by treatment leads to approximately 26% higher revenue.

Instrumental variable analysis using randomized treatment as instrument for number of AI use cases in the 515-firm sample; outcome measured as revenue.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... firm revenue (per additional AI use case)

Instrumenting AI use cases with treatment assignment suggests each additional AI use case prompted by treatment leads to 0.85 more completed tasks.

Instrumental variable analysis using randomized treatment as instrument for number of AI use cases in the 515-firm sample; outcome measured as completed tasks.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of tasks completed (per additional AI use case)

Revenue and investment gains are largest at the 90th percentile and above, suggesting AI expands the upper range of what firms achieve.

Quantile/upper-tail analysis of revenue and investment outcomes in the randomized sample (515 firms); reported concentration of gains at the 90th percentile+.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... distribution of revenue and investment gains (percentile analysis)

Treated firms generate 1.9x higher revenue compared to control firms.

RCT with 515 firms; revenue reported by firms during and after the accelerator; comparison of mean revenues between treated and control groups.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... firm revenue

Treated firms are 11 percentage points (18%) more likely to acquire paying customers.

RCT with 515 firms; customer acquisition measured in weekly reports / traction outcomes; treatment vs control comparison.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... probability of acquiring paying customers

« Prev 1 2 3 … 41 42 43 … 159 160 Next »