The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (14156 claims)

Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 761 200 101 904 2020
Governance & Regulation 829 400 191 122 1566
Organizational Efficiency 784 193 125 84 1197
Technology Adoption Rate 637 236 124 97 1103
Research Productivity 431 131 58 340 972
Output Quality 481 183 59 47 770
Decision Quality 332 177 82 49 647
Firm Productivity 439 57 88 20 610
AI Safety & Ethics 218 279 66 33 602
Market Structure 181 170 123 24 503
Task Allocation 214 64 72 33 388
Skill Acquisition 174 62 62 17 315
Innovation Output 204 27 45 18 295
Employment Level 105 54 108 13 282
Fiscal & Macroeconomic 132 69 43 26 277
Consumer Welfare 117 63 42 11 233
Firm Revenue 154 48 26 3 231
Task Completion Time 173 31 8 12 225
Inequality Measures 44 123 50 6 223
Worker Satisfaction 89 65 22 12 188
Error Rate 71 92 10 2 175
Regulatory Compliance 77 69 14 5 165
Automation Exposure 58 56 26 13 156
Training Effectiveness 96 21 14 19 152
Wages & Compensation 77 37 25 6 145
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 81 21 1 115
Hiring & Recruitment 52 7 8 3 70
Creative Output 32 20 8 3 64
Skill Obsolescence 5 47 6 1 59
Social Protection 28 16 8 2 54
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Current research has largely focused on short-horizon tasks over a limited set of software with limited economic value (e.g., basic e-commerce and OS-configuration tasks).
Narrative literature/field observation reported in paper introduction (no numeric study reported in excerpt).
high negative Gym-Anything: Turn any Software into an Agent Environment scope and horizon of existing research tasks
There is a fundamental gap in current agent capabilities: functional correctness alone is insufficient for design-aware issue resolution, motivating design-aware evaluation beyond functional correctness.
Synthesis of experimental findings: low design-satisfaction despite functional correctness, prevalence of design violations, and only partial improvement from guidance support the conclusion.
high negative Does Pass Rate Tell the Whole Story? Evaluating Design Const... agent capability for design-aware issue resolution
Design violations are widespread in agent-produced patches.
Empirical results from experiments on the benchmark showing many patches violate validated design constraints; backed by counts/percentages in evaluation (as summarized in abstract).
high negative Does Pass Rate Tell the Whole Story? Evaluating Design Const... number/occurrence of design violations
Test-based correctness substantially overestimates patch quality: fewer than half of resolved issues are fully design-satisfying.
Experimental evaluation with state-of-the-art LLM-based agents on the benchmark (reported in paper). Sample implicit: benchmark issues (495) used to evaluate agents; comparison between test pass rates and design-satisfaction measured by verifier.
high negative Does Pass Rate Tell the Whole Story? Evaluating Design Const... design-satisfaction of patches (design compliance)
Despite growing investment in data analytics, the decision-making and coordination layers of these workflows remain predominantly manual, reactive, and fragmented across outlets, distribution centers, and supplier networks.
Stated as an observation in the paper (abstract); no quantitative evidence, metrics, or comparative analysis provided in the excerpt.
high negative Flowr -- Scaling Up Retail Supply Chain Operations Through A... degree of manual decision-making and coordination (fragmentation/reactivity)
Retail supply chain operations in supermarket chains involve continuous, high-volume manual workflows spanning demand forecasting, procurement, supplier coordination, and inventory replenishment.
Descriptive claim stated in the paper's introduction/abstract; no empirical data, sample, or methods reported to substantiate this characterization within the text provided.
high negative Flowr -- Scaling Up Retail Supply Chain Operations Through A... degree of manual operations / automation exposure
We identify a temporal constraint: the window during which semiconductor manufacturing concentration makes hardware-level governance implementable is narrowing, while R&D timelines for critical mechanisms span years.
Authors' temporal analysis combining industry structure observations (semiconductor manufacturing concentration) with estimated R&D timelines for mechanisms (qualitative/engineering timeline estimates). No empirical time-series sample size provided.
high negative Hardware-Level Governance of AI Compute: A Feasibility Taxon... temporal feasibility window for hardware-level governance
We assess principal threats to compute-based governance, including algorithmic efficiency gains, distributed training methods, and sovereignty concerns.
Authors' threat analysis (qualitative assessment of technical and geopolitical threat vectors). No quantitative sample size; based on literature and engineering reasoning.
high negative Hardware-Level Governance of AI Compute: A Feasibility Taxon... threats to feasibility and effectiveness of compute-based governance
Our analysis reveals a structural mismatch: the mechanisms most needed for treaty verification, including on-chip compute metering, cryptographic proof-of-training, and hardware-embedded enforcement, are also the least mature.
Authors' feasibility assessments of mechanisms (qualitative/engineering evaluation across the taxonomy); identification of critical mechanisms for treaty verification and corresponding feasibility ratings. No empirical trial or sample size reported.
high negative Hardware-Level Governance of AI Compute: A Feasibility Taxon... maturity/feasibility of treaty-relevant hardware mechanisms
The governance of frontier AI increasingly relies on controlling access to computational resources, yet the hardware-level mechanisms invoked by policy proposals remain largely unexamined from an engineering perspective.
Authors' framing and literature review presented in the paper (conceptual/qualitative argument; no empirical sample size reported).
high negative Hardware-Level Governance of AI Compute: A Feasibility Taxon... hardware-level governance examination / policy-technical gap
The review identifies persistent gaps in population coverage, multimodal integration, equity optimization, explainability, validation, and governance that constrain inclusiveness and robustness of GeoAI applications in urban mobility research.
Authors' gap analysis based on the contents and limitations of the 18 included studies.
high negative GeoAI and Multimodal Geospatial Data Fusion for Inclusive Ur... coverage and robustness limitations in multimodal GeoAI research (population cov...
Urban mobility is a central challenge for sustainable and inclusive cities, as climate change, congestion, and spatial inequality increasingly reveal mobility patterns as expressions of deeper social and spatial structures.
Introductory framing statement in the paper; general literature/contextual claim (no original empirical test reported in this paper).
high negative GeoAI and Multimodal Geospatial Data Fusion for Inclusive Ur... centrality of urban mobility as a challenge for sustainability and inclusivity
In an additive model where human utility and fitness differ, if deception increases fitness beyond genuine utility then evolution will select for deception.
Mathematical analysis of an additive model in the paper showing selection pressure favors traits (deception) that increase the fitness function even when they reduce true human utility (theoretical derivation).
high negative A mathematical theory of evolution for self-designing AIs selection for deception trait versus genuine utility alignment
The two margins interact through a self-undermining feedback that can generate low-archive traps (multiple equilibria with low accumulated public archive).
Dynamic equilibrium analysis in the theoretical model showing interacting feedbacks and possible trap equilibria (model-derived result).
high negative When AI Improves Answers but Slows Knowledge Creation: Match... accumulated archive size / equilibrium archive level
Resolution margin: the probability that posted queries are resolved declines because AI raises contributors' outside options, thinning the contributor pool and creating congestion on the platform.
Mechanism and comparative-static implication produced by the paper's theoretical model; no empirical sample provided in the excerpt.
high negative When AI Improves Answers but Slows Knowledge Creation: Match... probability that posted queries are resolved (conditional resolution rate)
Flow margin: the posted volume of knowledge-enhancing queries declines as AI resolves more problems privately before they reach the platform.
Mechanism derived in the theoretical model; stated as the flow-margin channel (no empirical quantification in the provided text).
high negative When AI Improves Answers but Slows Knowledge Creation: Match... posted volume of knowledge-enhancing queries
AI reduces archive creation through two distinct margins: a flow margin and a resolution margin.
Analytical decomposition derived within the paper's theoretical model (mechanism claimed by the model).
high negative When AI Improves Answers but Slows Knowledge Creation: Match... archive creation (rate and quality of accumulated solutions)
Generative AI resolves user problems without leaving a public trace, so fewer discussions and solutions reach public platforms.
Stated as an empirical motivation in the paper; no empirical sample or quantified measurement reported in the provided text.
high negative When AI Improves Answers but Slows Knowledge Creation: Match... volume of public posts / archival content
The literature remains fragmented, with limited integrative frameworks to explain how AI-human dynamics and decision-making typologies shape outcomes.
Conclusion drawn from the systematic review and bibliometric analysis of the 627-article corpus as reported in the abstract.
high negative Advancing Decision-Making through AI-Human Collaboration: A ... degree of integration/coherence of the academic literature; presence of integrat...
Green AI research has largely measured the footprint of models rather than the downstream workflows in which GenAI is a tool.
Literature review / mapping of recent Green AI literature reported in the paper; descriptive claim about the focus of the field (no sample size or numerical counts reported in the abstract).
high negative On the Carbon Footprint of Economic Research in the Age of G... scope/emphasis of Green AI research (model-level vs. workflow-level measurement)
These findings highlight how existing caste hierarchies are reproduced in LLM decision-making and underscore the need for culturally grounded evaluation and intervention strategies in AI systems deployed in socially sensitive domains.
Interpretation and policy recommendation based on empirical patterns found in the audit (consistent hierarchical ratings and up-to-25% differences).
high negative Sima AIunty: Caste Audit in LLM-Driven Matchmaking risk of reinforcing historical exclusion through LLM decision-making
Inter-caste matches are further ordered according to traditional caste hierarchy.
Reported analytic pattern where inter-caste match ratings follow the traditional caste ranking (implied ordering across Brahmin, Kshatriya, Vaishya, Shudra, Dalit).
high negative Sima AIunty: Caste Audit in LLM-Driven Matchmaking ordinal rating/order of inter-caste matches by caste
Existing benchmarks differ from real usage in programming language distribution, prompt style and codebase structure.
Paper asserts mismatch between existing benchmarks and production usage as motivation for producing a production-derived benchmark (stated differences: language distribution, prompt style, codebase structure).
high negative ProdCodeBench: A Production-Derived Benchmark for Evaluating... representativeness of benchmarks relative to real usage
Within robotics subsectors, system integration delivers earlier and stronger carbon-reduction effects than ontology manufacturing.
Subsector analysis in the panel data (277 prefecture-level cities, 2008–2019) comparing effects of system integration versus ontology manufacturing on urban carbon emissions.
high negative Exploring the nonlinear relationship between robotics manufa... urban carbon emissions (subsector-differentiated effects)
The carbon-mitigation effects of robotics manufacturing are more pronounced in the central region of China than in the eastern region, indicating a latecomer advantage in green industrialization.
Heterogeneity analysis across geographic regions (central vs eastern regions) using the same panel of 277 prefecture-level cities (2008–2019).
high negative Exploring the nonlinear relationship between robotics manufa... urban carbon emissions (heterogeneous effect by region)
A stage-dependent sequential mechanism operates: mature robotics manufacturing promotes robot adoption, which improves urban energy efficiency, and ultimately reduces carbon emissions; this channel is inactive at early stages of industry development.
Mechanism/mediation analysis using the panel data of 277 prefecture-level cities (2008–2019), presented as sequential pathway evidence in the paper.
high negative Exploring the nonlinear relationship between robotics manufa... robot adoption; urban energy efficiency; urban carbon emissions
Once robotics manufacturing reaches a moderate scale, further expansion leads to declines in urban carbon emissions.
Same panel dataset (277 prefecture-level cities, 2008–2019); econometric identification of the right-hand (declining) portion of the inverted U-shaped curve.
Replacing deterministic components with probabilistic workflows changes the failure mode: LLM pipelines may generate plausible but incorrect outputs that pass superficial checks and propagate into irreversible actions such as DOI minting and public release.
Conceptual argument supported by the paper's incident descriptions (e.g., a detected coordinate transformation error); the statement is presented as a general risk rationale.
high negative Exploring Robust Multi-Agent Workflows for Environmental Dat... propensity for plausible-but-incorrect outputs to bypass checks and propagate to...
Up to 25% of routine administrative tasks face high automation risk.
Quantitative survey of 150 leading Nigerian firms across finance, tech, and manufacturing reporting the share of tasks at high automation risk.
high negative Human Capital and the AI-Powered Future of Work: (Training, ... share of routine administrative tasks at high automation risk
There is a significant deficit in high-demand technical competencies such as data engineering, machine learning maintenance, and AI ethics within the Nigerian workforce.
Findings reported from the quantitative survey of 150 leading Nigerian firms (finance, tech, manufacturing) supplemented by qualitative workforce interviews and policy analysis.
high negative Human Capital and the AI-Powered Future of Work: (Training, ... availability/deficit of technical competencies (data engineering, ML maintenance...
The remaining 26 barriers are carried over from prior digital transformation waves — 22 in amplified form and 4 unchanged.
Comparative coding/classification within the review corpus indicating whether each barrier is novel or carried over, and whether it is amplified versus unchanged.
high negative BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION novelty_vs_carried_over_of_barriers
Three barriers were identified as agentic-specific: error propagation in multi-agent systems, role ambiguity, and accountability diffusion.
Classification of the 29 coded barriers by 'agentic specificity' within the literature review; these three barriers were labeled agentic-specific by the authors.
high negative BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION agentic_specific_barriers
Acemoglu and Restrepo (2022) attribute 50–70% of the increase in US wage inequality between 1980 and 2016 to displacement of workers from tasks by automation.
Citation to Acemoglu and Restrepo (2022) empirical decomposition reported in the paper.
high negative Steering Technological Progress contribution of automation-driven displacement to wage inequality growth
Dechezleprêtre et al. (2025), exploiting Germany's Hartz reforms, estimate an elasticity of automation innovation to low-skill wages of 2–5 at the firm level.
Citation to Dechezleprêtre et al. (2025) empirical estimate reported in the literature review.
high negative Steering Technological Progress elasticity of automation innovation with respect to low-skill wages
When employers have monopsony power, they choose technologies that expand this power beyond what a social planner would consider optimal.
Model results and discussion in Section 7 on interaction of technological choices and monopsony power.
high negative Steering Technological Progress extent of monopsony-enhancing technology adoption
Profit-maximizing firms pursue innovations that erode workers' market power (make them more replaceable), even at the expense of production efficiency; a social planner would instead prefer technologies that preserve workers' market power.
Theoretical analysis in the paper of firms' profit-maximizing technology choices under market power considerations, plus comparative planner outcome.
high negative Steering Technological Progress technology choice with respect to workers' replaceability
A welfare-maximizing planner chooses to automate fewer tasks than a production-efficiency benchmark would dictate when workers' welfare is heavily weighted.
Model analysis of optimal task automation vs. production efficiency under different welfare weights on workers.
high negative Steering Technological Progress level of task automation
Occupations whose AI-exposed steps are more dispersed across the production workflow (higher fragmentation) exhibit a substantially lower share of their steps actually executed by AI, conditional on AI exposure share.
Empirical regression analysis controlling for share of AI-exposed steps; uses dataset linking O*NET tasks, human AI exposure assessments, Anthropic Economic Index execution outcomes, and GPT-generated workflow orderings (details in Sections 5.1 and 7).
high negative Chaining Tasks, Redefining Work: A Theory of AI Automation share (fraction) of steps executed by AI at the occupation/job level
Treated firms' demand for external capital investment falls by just over $220,000 relative to the control group.
RCT with 515 firms; reported dollar-change in external investment demand between treated and control firms.
high negative Mapping AI into Production: A Field Experiment on Firm Perfo... change in external capital investment demand (USD)
Despite faster growth, treated firms do not scale inputs proportionally: their demand for external capital investment falls by 39.5% relative to the control group.
RCT with 515 firms; firms reported external capital demand/investment requests; comparison of investment demand between treatment and control groups.
high negative Mapping AI into Production: A Field Experiment on Firm Perfo... demand for external capital investment
For the private business sector, if the set of automated tasks were frozen in 1950, 87% of TFP growth between 1950 and 2023 would have been eliminated.
Counterfactual growth-accounting exercise that freezes the set of automated tasks at 1950 while allowing capital, labor, and other productivity growth to follow historical rates (simulation based on calibrated accounting).
high negative Past Automation and Future A.I.: How Weak Links Tame the Gro... fraction of historical TFP growth eliminated by freezing automation
The sum of "other" TFP growth and average labor productivity growth (ˆZt + ˆψℓt) is small — for example equal to -0.1% per year for the private business sector since 1950.
Growth-accounting decomposition for the private business sector since 1950 using BEA/BLS data in the task-based framework.
high negative Past Automation and Future A.I.: How Weak Links Tame the Gro... combined growth rate of other TFP and average labor productivity (ˆZt + ˆψℓt)
Under the rapid scenario, economists forecast the share of wealth held by the wealthiest 10% of households rising to 80.0% by 2050.
Conditional forecasts in Key Findings for the economist respondent group under the rapid AI scenario (2050 horizon).
high negative Forecasting the Economic Effects of AI fraction of wealth held by top 10% of households by 2050 (rapid scenario)
Conditional on the rapid scenario, economists forecast the labor force participation rate falling from its current level of 62% to 55% by 2050.
Conditional forecasts in Key Findings for the economist respondent group under the rapid AI scenario (2050 horizon).
high negative Forecasting the Economic Effects of AI labor force participation rate (LFPR) by 2050 under rapid scenario
There are macroeconomic risks associated with AI-led unemployment.
Paper's macroeconomic analysis drawing on labor economics and technology adoption research; no quantitative estimates or sample sizes provided in the summary.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... macroeconomic risk indicators (e.g., unemployment, aggregate demand shortfalls)
Managerial incentives drive premature workforce contraction during AI adoption.
Analytical claim grounded in labor economics and organizational behavior review; the summary indicates examination of managerial incentives but does not report primary empirical tests or sample sizes.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... timing and extent of workforce contraction
Premature workforce contraction in response to AI adoption foreshadows deeper structural challenges as AI systems mature.
Forward-looking claim based on synthesis of literature and theoretical projection; no empirical quantification or sample provided in the summary.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... long-run structural economic challenges (e.g., systemic instability, labor marke...
This pattern of premature workforce reductions reflects longstanding corporate short-termism rather than genuine technological displacement.
The paper's interpretation drawing on labor economics and organizational behavior literature; no empirical study or sample size reported in the summary.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... drivers of workforce reduction (managerial incentives vs. actual automation capa...
Organizations face mounting pressure to demonstrate immediate returns on AI investments, often through workforce reductions that outpace actual automation capabilities.
Argument in paper citing accelerating AI adoption across sectors and observed managerial responses; no primary dataset or sample size reported in the text.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... workforce reductions / layoffs
In the limiting case of full automation, the model predicts that optimal recombination distance collapses to zero, suggesting that fully AI-driven research would undermine the very knowledge creation that it seeks to accelerate.
Limiting-case analytical result of the model: as the share of AI-automated tasks approaches 1 (full automation), the derived optimal recombination distance converges to zero.
high negative Bridging Distant Ideas: the Impact of AI on R&D and Recombin... optimal recombination distance (approaches zero under full automation)