Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

The impact of patient capital on the high-quality development of enterprises exhibits regional heterogeneity: enterprises in the central region are more sensitive to patient capital in terms of high-quality development.

Subsample/regional heterogeneity analysis on the panel of 743 listed enterprises (2014–2023) comparing region-specific coefficients and finding a larger/stronger effect in the central region.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (differential effect across regions)

The application of artificial intelligence enhances the positive impact of patient capital on the high-quality development of enterprises in strategic emerging industries.

Moderation analysis using the same firm panel (743 listed enterprises, 2014–2023) that includes an interaction term between patient capital and measures of AI application, with the interaction reported as positive and statistically significant.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (moderated by AI application)

Patient capital promotes the high-quality development of these enterprises by easing financing constraints.

Mediation analysis on panel data of 743 listed firms (2014–2023) reporting that financing-constraint indicators mediate the impact of patient capital on firm high-quality development.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by financing constraints)

Patient capital promotes the high-quality development of these enterprises by alleviating information asymmetry.

Mediation tests using firm-level panel data (743 listed enterprises, 2014–2023) that include measures of information asymmetry and show a mediating effect in the patient capital → high-quality development pathway.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by information asymmetry)

Patient capital promotes the high-quality development of these enterprises by enhancing the level of synergy in digital and green transformation (digital-green transformation synergy).

Mediation analysis on the same panel (743 listed enterprises, 2014–2023) showing that measures of digital-green transformation synergy mediate the relationship between patient capital and firm high-quality development.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by digital-green transformatio...

Patient capital plays a significant role in promoting the high-quality development of enterprises in strategic emerging industries.

Empirical analysis using panel data from 743 listed enterprises in China’s strategic emerging industries over 2014–2023; regression analysis reporting a statistically significant positive coefficient for patient capital on a firm-level measure of high-quality development.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (firm-level)

Average ratings [for same-caste matches were] up to 25% higher (on a 10-point scale) than inter-caste matches.

Quantitative result reported in the analysis comparing average ratings (10-point scale) between same-caste and inter-caste matches; statement specifies magnitude 'up to 25%'.

high positive Sima AIunty: Caste Audit in LLM-Driven Matchmaking average rating on a 10-point scale

Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably.

Reported results across evaluated LLMs showing consistent patterns where same-caste profile pairings received higher ratings than inter-caste pairings.

high positive Sima AIunty: Caste Audit in LLM-Driven Matchmaking favorability ratings for same-caste vs inter-caste matches

We share our methodology and lessons learned to enable other organizations to construct similar production-derived benchmarks.

Paper states intention and contribution: releasing methodology and lessons to allow replication by other organizations.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... ability of other organizations to construct similar benchmarks

We detail data collection and curation practices including LLM-based task classification, test relevance validation, and multi-run stability checks to address challenges in constructing reliable evaluation signals from monorepo environments.

Methodological description in paper listing specific practices (LLM-based classification, test relevance validation, multi-run stability checks) aimed at producing reliable evaluation signals in monorepos.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... reliability of evaluation signals derived from monorepo environments

Models making greater use of work validation tools, such as executing tests and invoking static analysis, achieve higher solve rates.

Reported relationship from paper's analysis correlating models' use of verification tools (test execution, static analysis) with higher solve rates across evaluated models.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... solve rate (task success) as a function of verification tool usage

Systematic analysis of four foundation models yields solve rates from 53.2% to 72.2%.

Empirical evaluation reported in paper: four foundation models were evaluated on the ProdCodeBench benchmark producing reported solve-rate range.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... solve rate (task success rate)

Each curated sample consists of a verbatim prompt, a committed code change and fail-to-pass tests spanning seven programming languages.

Descriptive dataset claim in paper specifying components of each sample and that samples cover seven programming languages.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... dataset composition (prompt, code change, tests) and language coverage (7 langua...

We present ProdCodeBench, a benchmark built from real sessions with a production AI coding assistant.

Paper describes methodology and introduces ProdCodeBench explicitly as constructed from real production assistant sessions.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... existence and provenance of benchmark (production-derived dataset)

Benchmarks that reflect production workloads are better for evaluating AI coding agents in industrial settings.

Argument presented in paper motivating creation of production-derived benchmark; no specific empirical comparison to alternative benchmarks reported in the abstract.

high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... quality of evaluation for AI coding agents (suitability of benchmark)

Carbon emissions initially increase with the expansion of robotics manufacturing.

Panel regressions on the 277 Chinese prefecture-level cities (2008–2019) showing the left-hand (rising) portion of the inverted U-shaped relationship.

high positive Exploring the nonlinear relationship between robotics manufa... urban carbon emissions

A representative incident (ISS-004) demonstrated boundary-based containment with 10-minute detection latency, zero user exposure, and 80-minute resolution.

Incident ISS-004 report in the paper giving specific timings for detection latency (10 minutes), user exposure (zero), and resolution (80 minutes).

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... incident detection latency, user exposure, and time-to-resolution

The multi-agent approach improved reliability: audited handoffs detected and blocked a coordinate transformation error affecting all 2,452 stations before publication.

Incident detection reported in the SF2Bench deployment where audited handoffs prevented publication of a coordinate transformation error that would have affected all 2,452 stations.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... detection/blocking of a systemic coordinate transformation error (error preventi...

The multi-agent approach improved efficiency — the SF2Bench deployment was completed by a single operator in two days with repeated artifact reuse across deployments.

Operational report from the production deployment: single operator completion time of two days and reuse of artifacts across deployments as stated in the paper.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... time to complete deployment (task completion time) and operator effort

SF2Bench, a compound flooding benchmark comprising 2,452 monitoring stations and 8,557 published files spanning 39 years, validates the multi-agent workflow.

Reported dataset composition and use in the paper: SF2Bench with stated counts and temporal span used to validate the multi-agent workflow.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... scale and temporal coverage of benchmark used to validate workflow (stations, fi...

EnviSmart treats reliability as an architectural property through two mechanisms: (1) a three-track knowledge architecture that externalizes behaviors (governance constraints), domain knowledge (retrievable context), and skills (tool-using procedures) as persistent, interlocking artifacts; and (2) a role-separated multi-agent design where deterministic validators and audited handoffs restore fail-stop semantics at trust boundaries before irreversible steps.

System architecture and design description in the paper; presented as the core reliability mechanisms implemented in EnviSmart.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... architectural approach to reliability (design features implemented)

We introduce EnviSmart, a production data management system deployed on campus-wide storage infrastructure for environmental research.

System description and statement of deployment in the paper; presented as a production deployment (no randomized evaluation reported).

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... existence and production deployment of EnviSmart

Embedding LLM-driven agents into environmental FAIR data management can externalize operational knowledge and scale curation across heterogeneous data and evolving conventions.

Conceptual / argumentative claim made in the paper as a motivation for the system; no quantitative experiment tied to this statement in the excerpt.

high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... ability to externalize operational knowledge and scale curation

Overcoming the structural skill deficit through deliberate investment in tertiary education reform and strong private-public partnerships for continuous vocational learning is mandatory for Nigeria to successfully leverage the AI revolution for inclusive economic growth and ensure long-term workforce resilience.

Study conclusion synthesizing survey results (150 firms) and qualitative policy/workforce analysis to make policy recommendations.

high positive Human Capital and the AI-Powered Future of Work: (Training, ... inclusive economic growth and long-term workforce resilience

The rate of new job creation hinges critically on the immediate implementation of targeted, scalable reskilling programs.

Paper's projections and analysis drawing on the survey of 150 firms and qualitative interviews; presented as a conditional/projection based on current skills gap and training initiatives.

high positive Human Capital and the AI-Powered Future of Work: (Training, ... rate of new job creation

The agentic-specificity classification helps organizations distinguish challenges that require novel approaches from those that are addressable with established practices.

Authors' proposed classification (agentic-specific vs. carried-over/amplified) intended as a practical decision aid; derived from the coding and comparative analysis.

high positive BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION practical_utility_of_agentic_specificity_classification

The taxonomy provides a diagnostic framework for identifying priority barrier dimensions and understanding cross-dimensional amplification mechanisms.

Authors present a taxonomy derived from the review and claim it can be used diagnostically by organizations; supported by the coded barrier classification and STS mapping.

high positive BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION usefulness_of_taxonomy_for_diagnosis

Azar et al. (2023) show that monopsonistic employers have stronger incentives to automate, and US commuting zones with higher labor market concentration experienced more robot adoption.

Citation to Azar et al. (2023) empirical evidence reported in the paper.

high positive Steering Technological Progress robot adoption correlated with labor market concentration

Noy and Zhang (2023) and Brynjolfsson et al. (2025) provide emerging empirical evidence that AI can function as a labor-complementary technology when designed to do so.

Cited empirical studies referenced in the paper arguing that certain AI applications complement human labor.

high positive Steering Technological Progress AI's complementarity to labor / effect on labor demand

Eloundou et al. (2024) predict that half of US jobs are significantly exposed to recent advances in generative AI.

Citation to Eloundou et al. (2024) empirical study reported in the paper's introduction.

high positive Steering Technological Progress share of US jobs exposed to generative AI

Firms may not sufficiently account for non-monetary aspects (safety, meaning of work) when choosing technologies; a planner would include these non-monetary considerations in steering technological progress.

Theoretical argument and model extension in Section 6 on monetary vs non-monetary aspects of technology choices.

high positive Steering Technological Progress inclusion of non-monetary considerations in technology choice

In multi-good economies, a planner can raise poor agents' real incomes not only by affecting factor incomes but also by focusing technological progress on making goods cheaper that are disproportionately consumed by poorer agents.

Extension of the baseline model to multiple goods (Section 5) identifying distributional consumption-channel effects.

high positive Steering Technological Progress real income of poorer agents

When capital and labor are gross complements, a planner concerned with workers' welfare would favor capital-augmenting innovations to raise wages.

Analytical result from a factor-augmenting application of the paper's model examining complementarity conditions between capital and labor.

high positive Steering Technological Progress wages

A welfare-maximizing planner will impose positive robot taxes when robots substitute for human labor, with the optimal tax rate increasing in the planner's concern for workers' welfare.

Model application to robot taxation presented in the paper; comparative statics on planner weights.

high positive Steering Technological Progress optimal robot tax rate

When redistribution is costly or incomplete, production efficiency is no longer optimal and a planner will distort technology choice to improve distribution (i.e., engage more in steering).

Theoretical derivation extending Atkinson-Stiglitz framework with endogenous technology and costly redistribution; comparative statics on redistribution cost.

high positive Steering Technological Progress extent of technological steering

The welfare benefits of steering technological progress are greater the less efficient social safety nets are.

Theoretical result derived in the paper's baseline and extended models analyzing a planner who can shape technology choices and faces costly/incomplete redistribution.

high positive Steering Technological Progress welfare benefits of technological steering

In the short run, with fixed human capital, wages, and job boundaries, AI raises productivity by reducing the time required to perform steps.

Model distinction between short-run (fixed job design and skills) and long-run horizons; short-run optimization shows AI reduces expected execution times for steps, thereby raising productivity.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation time required to complete production steps (task completion time)

Aggregating heterogeneous firms that deploy a commonly available AI technology yields an aggregate production function that admits a constant elasticity of substitution (CES) representation with three inputs: aggregate manual labor, aggregate AI-assisted labor, and aggregate capital.

Theoretical aggregation argument drawing on Houthakker (1955) and Levhari (1968), deriving a macro-level CES representation from a microfounded algorithmic cost function defined by firms' joint optimization over AI deployment and job design.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation form of the aggregate production function (CES representation and separability o...

Improvements in AI quality generate non-linear effects on labor demand and wages because firms' cost-minimizing AI deployment and job designs change discretely at particular AI quality thresholds (microfoundation for the productivity J-curve).

Theoretical analysis of discrete switches in the cost-minimizing arrangement as AI success probability and execution times change; characterization of threshold effects and discussion linking to the J-curve phenomenon (model results and comparative statics).

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation labor demand and wages response to AI quality improvements (non-linear threshold...

Adjacency to AI-executed steps increases the likelihood that a given step is executed by AI (local complementarities): a step is more likely to be AI-executed in occupations where its neighboring steps are also AI-executed.

Empirical comparisons of conceptually similar steps across occupations paired with workflow adjacency information and realized AI execution outcomes from Anthropic’s Economic Index; statistical tests reported in the paper.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation probability (or likelihood) that a step is AI-executed conditional on neighborin...

AI-executed steps co-occur in contiguous chains rather than being randomly scattered across a production workflow.

Empirical analysis linking O*NET tasks to human assessments of AI exposure (Eloundou et al., 2024), realized AI execution outcomes from Anthropic’s Economic Index (Handa et al., 2025), and GPT-generated workflow orderings for occupations; statistical tests comparing observed contiguity to random/scaled baselines reported in the paper.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation contiguity of AI-executed steps in occupation workflows

Instrumenting AI use cases with treatment assignment suggests each additional AI use case prompted by treatment leads to approximately 26% higher revenue.

Instrumental variable analysis using randomized treatment as instrument for number of AI use cases in the 515-firm sample; outcome measured as revenue.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... firm revenue (per additional AI use case)

Instrumenting AI use cases with treatment assignment suggests each additional AI use case prompted by treatment leads to 0.85 more completed tasks.

Instrumental variable analysis using randomized treatment as instrument for number of AI use cases in the 515-firm sample; outcome measured as completed tasks.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of tasks completed (per additional AI use case)

Revenue and investment gains are largest at the 90th percentile and above, suggesting AI expands the upper range of what firms achieve.

Quantile/upper-tail analysis of revenue and investment outcomes in the randomized sample (515 firms); reported concentration of gains at the 90th percentile+.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... distribution of revenue and investment gains (percentile analysis)

Treated firms generate 1.9x higher revenue compared to control firms.

RCT with 515 firms; revenue reported by firms during and after the accelerator; comparison of mean revenues between treated and control groups.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... firm revenue

Treated firms are 11 percentage points (18%) more likely to acquire paying customers.

RCT with 515 firms; customer acquisition measured in weekly reports / traction outcomes; treatment vs control comparison.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... probability of acquiring paying customers

Treated firms complete 12% more tasks.

RCT with 515 firms; weekly progress reports used to measure tasks completed; comparison of completed tasks between treatment (255) and control (260) groups.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of tasks completed

The additional AI use cases discovered by treated firms are concentrated in product development and strategy-related domains.

Analysis of categorized AI use cases reported in weekly progress reports from the randomized accelerator sample (515 firms); comparison of functional distribution of use cases between treated and control firms.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... distribution of AI use cases across firm functions (e.g., product development, s...

Treated firms discover 2.7 additional AI use cases (a 44% increase).

Randomized field experiment in a 3-month accelerator; sample of 515 high-growth startups, 255 treatment and 260 control; weekly progress reports capturing AI use cases; treatment delivered case-study workshops prompting broader search for AI use cases.

high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of AI use cases discovered

Under an extreme calibration where A.I. makes the entire economy grow like the computer industry, growth 'explodes' with incomes becoming infinite in finite time; infinite income does not occur until around 2060 even in this extreme calibration.

Simulation of the endogenous-automation endogenous-growth model calibrated to the fast-automation (computer industry) scenario.

high positive Past Automation and Future A.I.: How Weak Links Tame the Gro... occurrence and timing of a finite-time singularity (infinite income) in simulate...

« Prev 1 2 3 … 153 154 155 … 276 277 Next »