The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
The impact of patient capital on the high-quality development of enterprises exhibits regional heterogeneity: enterprises in the central region are more sensitive to patient capital in terms of high-quality development.
Subsample/regional heterogeneity analysis on the panel of 743 listed enterprises (2014–2023) comparing region-specific coefficients and finding a larger/stronger effect in the central region.
high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (differential effect across regions)
The application of artificial intelligence enhances the positive impact of patient capital on the high-quality development of enterprises in strategic emerging industries.
Moderation analysis using the same firm panel (743 listed enterprises, 2014–2023) that includes an interaction term between patient capital and measures of AI application, with the interaction reported as positive and statistically significant.
high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (moderated by AI application)
Patient capital promotes the high-quality development of these enterprises by easing financing constraints.
Mediation analysis on panel data of 743 listed firms (2014–2023) reporting that financing-constraint indicators mediate the impact of patient capital on firm high-quality development.
high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by financing constraints)
Patient capital promotes the high-quality development of these enterprises by alleviating information asymmetry.
Mediation tests using firm-level panel data (743 listed enterprises, 2014–2023) that include measures of information asymmetry and show a mediating effect in the patient capital → high-quality development pathway.
high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by information asymmetry)
Patient capital promotes the high-quality development of these enterprises by enhancing the level of synergy in digital and green transformation (digital-green transformation synergy).
Mediation analysis on the same panel (743 listed enterprises, 2014–2023) showing that measures of digital-green transformation synergy mediate the relationship between patient capital and firm high-quality development.
high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (mediated by digital-green transformatio...
Patient capital plays a significant role in promoting the high-quality development of enterprises in strategic emerging industries.
Empirical analysis using panel data from 743 listed enterprises in China’s strategic emerging industries over 2014–2023; regression analysis reporting a statistically significant positive coefficient for patient capital on a firm-level measure of high-quality development.
high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (firm-level)
Average ratings [for same-caste matches were] up to 25% higher (on a 10-point scale) than inter-caste matches.
Quantitative result reported in the analysis comparing average ratings (10-point scale) between same-caste and inter-caste matches; statement specifies magnitude 'up to 25%'.
high positive Sima AIunty: Caste Audit in LLM-Driven Matchmaking average rating on a 10-point scale
Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably.
Reported results across evaluated LLMs showing consistent patterns where same-caste profile pairings received higher ratings than inter-caste pairings.
high positive Sima AIunty: Caste Audit in LLM-Driven Matchmaking favorability ratings for same-caste vs inter-caste matches
We share our methodology and lessons learned to enable other organizations to construct similar production-derived benchmarks.
Paper states intention and contribution: releasing methodology and lessons to allow replication by other organizations.
high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... ability of other organizations to construct similar benchmarks
We detail data collection and curation practices including LLM-based task classification, test relevance validation, and multi-run stability checks to address challenges in constructing reliable evaluation signals from monorepo environments.
Methodological description in paper listing specific practices (LLM-based classification, test relevance validation, multi-run stability checks) aimed at producing reliable evaluation signals in monorepos.
high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... reliability of evaluation signals derived from monorepo environments
Models making greater use of work validation tools, such as executing tests and invoking static analysis, achieve higher solve rates.
Reported relationship from paper's analysis correlating models' use of verification tools (test execution, static analysis) with higher solve rates across evaluated models.
high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... solve rate (task success) as a function of verification tool usage
Systematic analysis of four foundation models yields solve rates from 53.2% to 72.2%.
Empirical evaluation reported in paper: four foundation models were evaluated on the ProdCodeBench benchmark producing reported solve-rate range.
high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... solve rate (task success rate)
Each curated sample consists of a verbatim prompt, a committed code change and fail-to-pass tests spanning seven programming languages.
Descriptive dataset claim in paper specifying components of each sample and that samples cover seven programming languages.
high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... dataset composition (prompt, code change, tests) and language coverage (7 langua...
We present ProdCodeBench, a benchmark built from real sessions with a production AI coding assistant.
Paper describes methodology and introduces ProdCodeBench explicitly as constructed from real production assistant sessions.
high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... existence and provenance of benchmark (production-derived dataset)
Benchmarks that reflect production workloads are better for evaluating AI coding agents in industrial settings.
Argument presented in paper motivating creation of production-derived benchmark; no specific empirical comparison to alternative benchmarks reported in the abstract.
high positive ProdCodeBench: A Production-Derived Benchmark for Evaluating... quality of evaluation for AI coding agents (suitability of benchmark)
Carbon emissions initially increase with the expansion of robotics manufacturing.
Panel regressions on the 277 Chinese prefecture-level cities (2008–2019) showing the left-hand (rising) portion of the inverted U-shaped relationship.
A representative incident (ISS-004) demonstrated boundary-based containment with 10-minute detection latency, zero user exposure, and 80-minute resolution.
Incident ISS-004 report in the paper giving specific timings for detection latency (10 minutes), user exposure (zero), and resolution (80 minutes).
high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... incident detection latency, user exposure, and time-to-resolution
The multi-agent approach improved reliability: audited handoffs detected and blocked a coordinate transformation error affecting all 2,452 stations before publication.
Incident detection reported in the SF2Bench deployment where audited handoffs prevented publication of a coordinate transformation error that would have affected all 2,452 stations.
high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... detection/blocking of a systemic coordinate transformation error (error preventi...
The multi-agent approach improved efficiency — the SF2Bench deployment was completed by a single operator in two days with repeated artifact reuse across deployments.
Operational report from the production deployment: single operator completion time of two days and reuse of artifacts across deployments as stated in the paper.
high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... time to complete deployment (task completion time) and operator effort
SF2Bench, a compound flooding benchmark comprising 2,452 monitoring stations and 8,557 published files spanning 39 years, validates the multi-agent workflow.
Reported dataset composition and use in the paper: SF2Bench with stated counts and temporal span used to validate the multi-agent workflow.
high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... scale and temporal coverage of benchmark used to validate workflow (stations, fi...
EnviSmart treats reliability as an architectural property through two mechanisms: (1) a three-track knowledge architecture that externalizes behaviors (governance constraints), domain knowledge (retrievable context), and skills (tool-using procedures) as persistent, interlocking artifacts; and (2) a role-separated multi-agent design where deterministic validators and audited handoffs restore fail-stop semantics at trust boundaries before irreversible steps.
System architecture and design description in the paper; presented as the core reliability mechanisms implemented in EnviSmart.
high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... architectural approach to reliability (design features implemented)
We introduce EnviSmart, a production data management system deployed on campus-wide storage infrastructure for environmental research.
System description and statement of deployment in the paper; presented as a production deployment (no randomized evaluation reported).
high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... existence and production deployment of EnviSmart
Embedding LLM-driven agents into environmental FAIR data management can externalize operational knowledge and scale curation across heterogeneous data and evolving conventions.
Conceptual / argumentative claim made in the paper as a motivation for the system; no quantitative experiment tied to this statement in the excerpt.
high positive Exploring Robust Multi-Agent Workflows for Environmental Dat... ability to externalize operational knowledge and scale curation
Overcoming the structural skill deficit through deliberate investment in tertiary education reform and strong private-public partnerships for continuous vocational learning is mandatory for Nigeria to successfully leverage the AI revolution for inclusive economic growth and ensure long-term workforce resilience.
Study conclusion synthesizing survey results (150 firms) and qualitative policy/workforce analysis to make policy recommendations.
high positive Human Capital and the AI-Powered Future of Work: (Training, ... inclusive economic growth and long-term workforce resilience
The rate of new job creation hinges critically on the immediate implementation of targeted, scalable reskilling programs.
Paper's projections and analysis drawing on the survey of 150 firms and qualitative interviews; presented as a conditional/projection based on current skills gap and training initiatives.
The agentic-specificity classification helps organizations distinguish challenges that require novel approaches from those that are addressable with established practices.
Authors' proposed classification (agentic-specific vs. carried-over/amplified) intended as a practical decision aid; derived from the coding and comparative analysis.
high positive BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION practical_utility_of_agentic_specificity_classification
The taxonomy provides a diagnostic framework for identifying priority barrier dimensions and understanding cross-dimensional amplification mechanisms.
Authors present a taxonomy derived from the review and claim it can be used diagnostically by organizations; supported by the coded barrier classification and STS mapping.
high positive BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION usefulness_of_taxonomy_for_diagnosis
Azar et al. (2023) show that monopsonistic employers have stronger incentives to automate, and US commuting zones with higher labor market concentration experienced more robot adoption.
Citation to Azar et al. (2023) empirical evidence reported in the paper.
high positive Steering Technological Progress robot adoption correlated with labor market concentration
Noy and Zhang (2023) and Brynjolfsson et al. (2025) provide emerging empirical evidence that AI can function as a labor-complementary technology when designed to do so.
Cited empirical studies referenced in the paper arguing that certain AI applications complement human labor.
high positive Steering Technological Progress AI's complementarity to labor / effect on labor demand
Eloundou et al. (2024) predict that half of US jobs are significantly exposed to recent advances in generative AI.
Citation to Eloundou et al. (2024) empirical study reported in the paper's introduction.
high positive Steering Technological Progress share of US jobs exposed to generative AI
Firms may not sufficiently account for non-monetary aspects (safety, meaning of work) when choosing technologies; a planner would include these non-monetary considerations in steering technological progress.
Theoretical argument and model extension in Section 6 on monetary vs non-monetary aspects of technology choices.
high positive Steering Technological Progress inclusion of non-monetary considerations in technology choice
In multi-good economies, a planner can raise poor agents' real incomes not only by affecting factor incomes but also by focusing technological progress on making goods cheaper that are disproportionately consumed by poorer agents.
Extension of the baseline model to multiple goods (Section 5) identifying distributional consumption-channel effects.
high positive Steering Technological Progress real income of poorer agents
When capital and labor are gross complements, a planner concerned with workers' welfare would favor capital-augmenting innovations to raise wages.
Analytical result from a factor-augmenting application of the paper's model examining complementarity conditions between capital and labor.
high positive Steering Technological Progress wages
A welfare-maximizing planner will impose positive robot taxes when robots substitute for human labor, with the optimal tax rate increasing in the planner's concern for workers' welfare.
Model application to robot taxation presented in the paper; comparative statics on planner weights.
high positive Steering Technological Progress optimal robot tax rate
When redistribution is costly or incomplete, production efficiency is no longer optimal and a planner will distort technology choice to improve distribution (i.e., engage more in steering).
Theoretical derivation extending Atkinson-Stiglitz framework with endogenous technology and costly redistribution; comparative statics on redistribution cost.
high positive Steering Technological Progress extent of technological steering
The welfare benefits of steering technological progress are greater the less efficient social safety nets are.
Theoretical result derived in the paper's baseline and extended models analyzing a planner who can shape technology choices and faces costly/incomplete redistribution.
high positive Steering Technological Progress welfare benefits of technological steering
In the short run, with fixed human capital, wages, and job boundaries, AI raises productivity by reducing the time required to perform steps.
Model distinction between short-run (fixed job design and skills) and long-run horizons; short-run optimization shows AI reduces expected execution times for steps, thereby raising productivity.
high positive Chaining Tasks, Redefining Work: A Theory of AI Automation time required to complete production steps (task completion time)
Aggregating heterogeneous firms that deploy a commonly available AI technology yields an aggregate production function that admits a constant elasticity of substitution (CES) representation with three inputs: aggregate manual labor, aggregate AI-assisted labor, and aggregate capital.
Theoretical aggregation argument drawing on Houthakker (1955) and Levhari (1968), deriving a macro-level CES representation from a microfounded algorithmic cost function defined by firms' joint optimization over AI deployment and job design.
high positive Chaining Tasks, Redefining Work: A Theory of AI Automation form of the aggregate production function (CES representation and separability o...
Improvements in AI quality generate non-linear effects on labor demand and wages because firms' cost-minimizing AI deployment and job designs change discretely at particular AI quality thresholds (microfoundation for the productivity J-curve).
Theoretical analysis of discrete switches in the cost-minimizing arrangement as AI success probability and execution times change; characterization of threshold effects and discussion linking to the J-curve phenomenon (model results and comparative statics).
high positive Chaining Tasks, Redefining Work: A Theory of AI Automation labor demand and wages response to AI quality improvements (non-linear threshold...
Adjacency to AI-executed steps increases the likelihood that a given step is executed by AI (local complementarities): a step is more likely to be AI-executed in occupations where its neighboring steps are also AI-executed.
Empirical comparisons of conceptually similar steps across occupations paired with workflow adjacency information and realized AI execution outcomes from Anthropic’s Economic Index; statistical tests reported in the paper.
high positive Chaining Tasks, Redefining Work: A Theory of AI Automation probability (or likelihood) that a step is AI-executed conditional on neighborin...
AI-executed steps co-occur in contiguous chains rather than being randomly scattered across a production workflow.
Empirical analysis linking O*NET tasks to human assessments of AI exposure (Eloundou et al., 2024), realized AI execution outcomes from Anthropic’s Economic Index (Handa et al., 2025), and GPT-generated workflow orderings for occupations; statistical tests comparing observed contiguity to random/scaled baselines reported in the paper.
high positive Chaining Tasks, Redefining Work: A Theory of AI Automation contiguity of AI-executed steps in occupation workflows
Instrumenting AI use cases with treatment assignment suggests each additional AI use case prompted by treatment leads to approximately 26% higher revenue.
Instrumental variable analysis using randomized treatment as instrument for number of AI use cases in the 515-firm sample; outcome measured as revenue.
high positive Mapping AI into Production: A Field Experiment on Firm Perfo... firm revenue (per additional AI use case)
Instrumenting AI use cases with treatment assignment suggests each additional AI use case prompted by treatment leads to 0.85 more completed tasks.
Instrumental variable analysis using randomized treatment as instrument for number of AI use cases in the 515-firm sample; outcome measured as completed tasks.
high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of tasks completed (per additional AI use case)
Revenue and investment gains are largest at the 90th percentile and above, suggesting AI expands the upper range of what firms achieve.
Quantile/upper-tail analysis of revenue and investment outcomes in the randomized sample (515 firms); reported concentration of gains at the 90th percentile+.
high positive Mapping AI into Production: A Field Experiment on Firm Perfo... distribution of revenue and investment gains (percentile analysis)
Treated firms generate 1.9x higher revenue compared to control firms.
RCT with 515 firms; revenue reported by firms during and after the accelerator; comparison of mean revenues between treated and control groups.
Treated firms are 11 percentage points (18%) more likely to acquire paying customers.
RCT with 515 firms; customer acquisition measured in weekly reports / traction outcomes; treatment vs control comparison.
high positive Mapping AI into Production: A Field Experiment on Firm Perfo... probability of acquiring paying customers
Treated firms complete 12% more tasks.
RCT with 515 firms; weekly progress reports used to measure tasks completed; comparison of completed tasks between treatment (255) and control (260) groups.
high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of tasks completed
The additional AI use cases discovered by treated firms are concentrated in product development and strategy-related domains.
Analysis of categorized AI use cases reported in weekly progress reports from the randomized accelerator sample (515 firms); comparison of functional distribution of use cases between treated and control firms.
high positive Mapping AI into Production: A Field Experiment on Firm Perfo... distribution of AI use cases across firm functions (e.g., product development, s...
Treated firms discover 2.7 additional AI use cases (a 44% increase).
Randomized field experiment in a 3-month accelerator; sample of 515 high-growth startups, 255 treatment and 260 control; weekly progress reports capturing AI use cases; treatment delivered case-study workshops prompting broader search for AI use cases.
high positive Mapping AI into Production: A Field Experiment on Firm Perfo... number of AI use cases discovered
Under an extreme calibration where A.I. makes the entire economy grow like the computer industry, growth 'explodes' with incomes becoming infinite in finite time; infinite income does not occur until around 2060 even in this extreme calibration.
Simulation of the endogenous-automation endogenous-growth model calibrated to the fast-automation (computer industry) scenario.
high positive Past Automation and Future A.I.: How Weak Links Tame the Gro... occurrence and timing of a finite-time singularity (infinite income) in simulate...