Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
The impact of patient capital on the high-quality development of enterprises exhibits regional heterogeneity: enterprises in the central region are more sensitive to patient capital in terms of high-quality development.
Subsample/regional heterogeneity analysis on the panel of 743 listed enterprises (2014–2023) comparing region-specific coefficients and finding a larger/stronger effect in the central region.
The application of artificial intelligence enhances the positive impact of patient capital on the high-quality development of enterprises in strategic emerging industries.
Moderation analysis using the same firm panel (743 listed enterprises, 2014–2023) that includes an interaction term between patient capital and measures of AI application, with the interaction reported as positive and statistically significant.
Patient capital promotes the high-quality development of these enterprises by easing financing constraints.
Mediation analysis on panel data of 743 listed firms (2014–2023) reporting that financing-constraint indicators mediate the impact of patient capital on firm high-quality development.
Patient capital promotes the high-quality development of these enterprises by alleviating information asymmetry.
Mediation tests using firm-level panel data (743 listed enterprises, 2014–2023) that include measures of information asymmetry and show a mediating effect in the patient capital → high-quality development pathway.
Patient capital promotes the high-quality development of these enterprises by enhancing the level of synergy in digital and green transformation (digital-green transformation synergy).
Mediation analysis on the same panel (743 listed enterprises, 2014–2023) showing that measures of digital-green transformation synergy mediate the relationship between patient capital and firm high-quality development.
Patient capital plays a significant role in promoting the high-quality development of enterprises in strategic emerging industries.
Empirical analysis using panel data from 743 listed enterprises in China’s strategic emerging industries over 2014–2023; regression analysis reporting a statistically significant positive coefficient for patient capital on a firm-level measure of high-quality development.
Average ratings [for same-caste matches were] up to 25% higher (on a 10-point scale) than inter-caste matches.
Quantitative result reported in the analysis comparing average ratings (10-point scale) between same-caste and inter-caste matches; statement specifies magnitude 'up to 25%'.
Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably.
Reported results across evaluated LLMs showing consistent patterns where same-caste profile pairings received higher ratings than inter-caste pairings.
We share our methodology and lessons learned to enable other organizations to construct similar production-derived benchmarks.
Paper states intention and contribution: releasing methodology and lessons to allow replication by other organizations.
We detail data collection and curation practices including LLM-based task classification, test relevance validation, and multi-run stability checks to address challenges in constructing reliable evaluation signals from monorepo environments.
Methodological description in paper listing specific practices (LLM-based classification, test relevance validation, multi-run stability checks) aimed at producing reliable evaluation signals in monorepos.
Models making greater use of work validation tools, such as executing tests and invoking static analysis, achieve higher solve rates.
Reported relationship from paper's analysis correlating models' use of verification tools (test execution, static analysis) with higher solve rates across evaluated models.
Systematic analysis of four foundation models yields solve rates from 53.2% to 72.2%.
Empirical evaluation reported in paper: four foundation models were evaluated on the ProdCodeBench benchmark producing reported solve-rate range.
Each curated sample consists of a verbatim prompt, a committed code change and fail-to-pass tests spanning seven programming languages.
Descriptive dataset claim in paper specifying components of each sample and that samples cover seven programming languages.
We present ProdCodeBench, a benchmark built from real sessions with a production AI coding assistant.
Paper describes methodology and introduces ProdCodeBench explicitly as constructed from real production assistant sessions.
Benchmarks that reflect production workloads are better for evaluating AI coding agents in industrial settings.
Argument presented in paper motivating creation of production-derived benchmark; no specific empirical comparison to alternative benchmarks reported in the abstract.
Carbon emissions initially increase with the expansion of robotics manufacturing.
Panel regressions on the 277 Chinese prefecture-level cities (2008–2019) showing the left-hand (rising) portion of the inverted U-shaped relationship.
A representative incident (ISS-004) demonstrated boundary-based containment with 10-minute detection latency, zero user exposure, and 80-minute resolution.
Incident ISS-004 report in the paper giving specific timings for detection latency (10 minutes), user exposure (zero), and resolution (80 minutes).
The multi-agent approach improved reliability: audited handoffs detected and blocked a coordinate transformation error affecting all 2,452 stations before publication.
Incident detection reported in the SF2Bench deployment where audited handoffs prevented publication of a coordinate transformation error that would have affected all 2,452 stations.
The multi-agent approach improved efficiency — the SF2Bench deployment was completed by a single operator in two days with repeated artifact reuse across deployments.
Operational report from the production deployment: single operator completion time of two days and reuse of artifacts across deployments as stated in the paper.
SF2Bench, a compound flooding benchmark comprising 2,452 monitoring stations and 8,557 published files spanning 39 years, validates the multi-agent workflow.
Reported dataset composition and use in the paper: SF2Bench with stated counts and temporal span used to validate the multi-agent workflow.
EnviSmart treats reliability as an architectural property through two mechanisms: (1) a three-track knowledge architecture that externalizes behaviors (governance constraints), domain knowledge (retrievable context), and skills (tool-using procedures) as persistent, interlocking artifacts; and (2) a role-separated multi-agent design where deterministic validators and audited handoffs restore fail-stop semantics at trust boundaries before irreversible steps.
System architecture and design description in the paper; presented as the core reliability mechanisms implemented in EnviSmart.
We introduce EnviSmart, a production data management system deployed on campus-wide storage infrastructure for environmental research.
System description and statement of deployment in the paper; presented as a production deployment (no randomized evaluation reported).
Embedding LLM-driven agents into environmental FAIR data management can externalize operational knowledge and scale curation across heterogeneous data and evolving conventions.
Conceptual / argumentative claim made in the paper as a motivation for the system; no quantitative experiment tied to this statement in the excerpt.
Overcoming the structural skill deficit through deliberate investment in tertiary education reform and strong private-public partnerships for continuous vocational learning is mandatory for Nigeria to successfully leverage the AI revolution for inclusive economic growth and ensure long-term workforce resilience.
Study conclusion synthesizing survey results (150 firms) and qualitative policy/workforce analysis to make policy recommendations.
The rate of new job creation hinges critically on the immediate implementation of targeted, scalable reskilling programs.
Paper's projections and analysis drawing on the survey of 150 firms and qualitative interviews; presented as a conditional/projection based on current skills gap and training initiatives.
The agentic-specificity classification helps organizations distinguish challenges that require novel approaches from those that are addressable with established practices.
Authors' proposed classification (agentic-specific vs. carried-over/amplified) intended as a practical decision aid; derived from the coding and comparative analysis.
The taxonomy provides a diagnostic framework for identifying priority barrier dimensions and understanding cross-dimensional amplification mechanisms.
Authors present a taxonomy derived from the review and claim it can be used diagnostically by organizations; supported by the coded barrier classification and STS mapping.
Azar et al. (2023) show that monopsonistic employers have stronger incentives to automate, and US commuting zones with higher labor market concentration experienced more robot adoption.
Citation to Azar et al. (2023) empirical evidence reported in the paper.
Noy and Zhang (2023) and Brynjolfsson et al. (2025) provide emerging empirical evidence that AI can function as a labor-complementary technology when designed to do so.
Cited empirical studies referenced in the paper arguing that certain AI applications complement human labor.
Eloundou et al. (2024) predict that half of US jobs are significantly exposed to recent advances in generative AI.
Citation to Eloundou et al. (2024) empirical study reported in the paper's introduction.
Firms may not sufficiently account for non-monetary aspects (safety, meaning of work) when choosing technologies; a planner would include these non-monetary considerations in steering technological progress.
Theoretical argument and model extension in Section 6 on monetary vs non-monetary aspects of technology choices.
In multi-good economies, a planner can raise poor agents' real incomes not only by affecting factor incomes but also by focusing technological progress on making goods cheaper that are disproportionately consumed by poorer agents.
Extension of the baseline model to multiple goods (Section 5) identifying distributional consumption-channel effects.
When capital and labor are gross complements, a planner concerned with workers' welfare would favor capital-augmenting innovations to raise wages.
Analytical result from a factor-augmenting application of the paper's model examining complementarity conditions between capital and labor.
A welfare-maximizing planner will impose positive robot taxes when robots substitute for human labor, with the optimal tax rate increasing in the planner's concern for workers' welfare.
Model application to robot taxation presented in the paper; comparative statics on planner weights.
When redistribution is costly or incomplete, production efficiency is no longer optimal and a planner will distort technology choice to improve distribution (i.e., engage more in steering).
Theoretical derivation extending Atkinson-Stiglitz framework with endogenous technology and costly redistribution; comparative statics on redistribution cost.
The welfare benefits of steering technological progress are greater the less efficient social safety nets are.
Theoretical result derived in the paper's baseline and extended models analyzing a planner who can shape technology choices and faces costly/incomplete redistribution.
In the short run, with fixed human capital, wages, and job boundaries, AI raises productivity by reducing the time required to perform steps.
Model distinction between short-run (fixed job design and skills) and long-run horizons; short-run optimization shows AI reduces expected execution times for steps, thereby raising productivity.
Aggregating heterogeneous firms that deploy a commonly available AI technology yields an aggregate production function that admits a constant elasticity of substitution (CES) representation with three inputs: aggregate manual labor, aggregate AI-assisted labor, and aggregate capital.
Theoretical aggregation argument drawing on Houthakker (1955) and Levhari (1968), deriving a macro-level CES representation from a microfounded algorithmic cost function defined by firms' joint optimization over AI deployment and job design.
Improvements in AI quality generate non-linear effects on labor demand and wages because firms' cost-minimizing AI deployment and job designs change discretely at particular AI quality thresholds (microfoundation for the productivity J-curve).
Theoretical analysis of discrete switches in the cost-minimizing arrangement as AI success probability and execution times change; characterization of threshold effects and discussion linking to the J-curve phenomenon (model results and comparative statics).
Adjacency to AI-executed steps increases the likelihood that a given step is executed by AI (local complementarities): a step is more likely to be AI-executed in occupations where its neighboring steps are also AI-executed.
Empirical comparisons of conceptually similar steps across occupations paired with workflow adjacency information and realized AI execution outcomes from Anthropic’s Economic Index; statistical tests reported in the paper.
AI-executed steps co-occur in contiguous chains rather than being randomly scattered across a production workflow.
Empirical analysis linking O*NET tasks to human assessments of AI exposure (Eloundou et al., 2024), realized AI execution outcomes from Anthropic’s Economic Index (Handa et al., 2025), and GPT-generated workflow orderings for occupations; statistical tests comparing observed contiguity to random/scaled baselines reported in the paper.
Instrumenting AI use cases with treatment assignment suggests each additional AI use case prompted by treatment leads to approximately 26% higher revenue.
Instrumental variable analysis using randomized treatment as instrument for number of AI use cases in the 515-firm sample; outcome measured as revenue.
Instrumenting AI use cases with treatment assignment suggests each additional AI use case prompted by treatment leads to 0.85 more completed tasks.
Instrumental variable analysis using randomized treatment as instrument for number of AI use cases in the 515-firm sample; outcome measured as completed tasks.
Revenue and investment gains are largest at the 90th percentile and above, suggesting AI expands the upper range of what firms achieve.
Quantile/upper-tail analysis of revenue and investment outcomes in the randomized sample (515 firms); reported concentration of gains at the 90th percentile+.
Treated firms generate 1.9x higher revenue compared to control firms.
RCT with 515 firms; revenue reported by firms during and after the accelerator; comparison of mean revenues between treated and control groups.
Treated firms are 11 percentage points (18%) more likely to acquire paying customers.
RCT with 515 firms; customer acquisition measured in weekly reports / traction outcomes; treatment vs control comparison.
Treated firms complete 12% more tasks.
RCT with 515 firms; weekly progress reports used to measure tasks completed; comparison of completed tasks between treatment (255) and control (260) groups.
The additional AI use cases discovered by treated firms are concentrated in product development and strategy-related domains.
Analysis of categorized AI use cases reported in weekly progress reports from the randomized accelerator sample (515 firms); comparison of functional distribution of use cases between treated and control firms.
Treated firms discover 2.7 additional AI use cases (a 44% increase).
Randomized field experiment in a 3-month accelerator; sample of 515 high-growth startups, 255 treatment and 260 control; weekly progress reports capturing AI use cases; treatment delivered case-study workshops prompting broader search for AI use cases.
Under an extreme calibration where A.I. makes the entire economy grow like the computer industry, growth 'explodes' with incomes becoming infinite in finite time; infinite income does not occur until around 2060 even in this extreme calibration.
Simulation of the endogenous-automation endogenous-growth model calibrated to the fast-automation (computer industry) scenario.