The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6507 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Productivity Remove filter
Retail supply chain operations in supermarket chains involve continuous, high-volume manual workflows spanning demand forecasting, procurement, supplier coordination, and inventory replenishment.
Descriptive claim stated in the paper's introduction/abstract; no empirical data, sample, or methods reported to substantiate this characterization within the text provided.
high negative Flowr -- Scaling Up Retail Supply Chain Operations Through A... degree of manual operations / automation exposure
The two margins interact through a self-undermining feedback that can generate low-archive traps (multiple equilibria with low accumulated public archive).
Dynamic equilibrium analysis in the theoretical model showing interacting feedbacks and possible trap equilibria (model-derived result).
high negative When AI Improves Answers but Slows Knowledge Creation: Match... accumulated archive size / equilibrium archive level
Resolution margin: the probability that posted queries are resolved declines because AI raises contributors' outside options, thinning the contributor pool and creating congestion on the platform.
Mechanism and comparative-static implication produced by the paper's theoretical model; no empirical sample provided in the excerpt.
high negative When AI Improves Answers but Slows Knowledge Creation: Match... probability that posted queries are resolved (conditional resolution rate)
Flow margin: the posted volume of knowledge-enhancing queries declines as AI resolves more problems privately before they reach the platform.
Mechanism derived in the theoretical model; stated as the flow-margin channel (no empirical quantification in the provided text).
high negative When AI Improves Answers but Slows Knowledge Creation: Match... posted volume of knowledge-enhancing queries
AI reduces archive creation through two distinct margins: a flow margin and a resolution margin.
Analytical decomposition derived within the paper's theoretical model (mechanism claimed by the model).
high negative When AI Improves Answers but Slows Knowledge Creation: Match... archive creation (rate and quality of accumulated solutions)
Generative AI resolves user problems without leaving a public trace, so fewer discussions and solutions reach public platforms.
Stated as an empirical motivation in the paper; no empirical sample or quantified measurement reported in the provided text.
high negative When AI Improves Answers but Slows Knowledge Creation: Match... volume of public posts / archival content
Green AI research has largely measured the footprint of models rather than the downstream workflows in which GenAI is a tool.
Literature review / mapping of recent Green AI literature reported in the paper; descriptive claim about the focus of the field (no sample size or numerical counts reported in the abstract).
high negative On the Carbon Footprint of Economic Research in the Age of G... scope/emphasis of Green AI research (model-level vs. workflow-level measurement)
Existing benchmarks differ from real usage in programming language distribution, prompt style and codebase structure.
Paper asserts mismatch between existing benchmarks and production usage as motivation for producing a production-derived benchmark (stated differences: language distribution, prompt style, codebase structure).
high negative ProdCodeBench: A Production-Derived Benchmark for Evaluating... representativeness of benchmarks relative to real usage
Replacing deterministic components with probabilistic workflows changes the failure mode: LLM pipelines may generate plausible but incorrect outputs that pass superficial checks and propagate into irreversible actions such as DOI minting and public release.
Conceptual argument supported by the paper's incident descriptions (e.g., a detected coordinate transformation error); the statement is presented as a general risk rationale.
high negative Exploring Robust Multi-Agent Workflows for Environmental Dat... propensity for plausible-but-incorrect outputs to bypass checks and propagate to...
Occupations whose AI-exposed steps are more dispersed across the production workflow (higher fragmentation) exhibit a substantially lower share of their steps actually executed by AI, conditional on AI exposure share.
Empirical regression analysis controlling for share of AI-exposed steps; uses dataset linking O*NET tasks, human AI exposure assessments, Anthropic Economic Index execution outcomes, and GPT-generated workflow orderings (details in Sections 5.1 and 7).
high negative Chaining Tasks, Redefining Work: A Theory of AI Automation share (fraction) of steps executed by AI at the occupation/job level
Treated firms' demand for external capital investment falls by just over $220,000 relative to the control group.
RCT with 515 firms; reported dollar-change in external investment demand between treated and control firms.
high negative Mapping AI into Production: A Field Experiment on Firm Perfo... change in external capital investment demand (USD)
Despite faster growth, treated firms do not scale inputs proportionally: their demand for external capital investment falls by 39.5% relative to the control group.
RCT with 515 firms; firms reported external capital demand/investment requests; comparison of investment demand between treatment and control groups.
high negative Mapping AI into Production: A Field Experiment on Firm Perfo... demand for external capital investment
For the private business sector, if the set of automated tasks were frozen in 1950, 87% of TFP growth between 1950 and 2023 would have been eliminated.
Counterfactual growth-accounting exercise that freezes the set of automated tasks at 1950 while allowing capital, labor, and other productivity growth to follow historical rates (simulation based on calibrated accounting).
high negative Past Automation and Future A.I.: How Weak Links Tame the Gro... fraction of historical TFP growth eliminated by freezing automation
The sum of "other" TFP growth and average labor productivity growth (ˆZt + ˆψℓt) is small — for example equal to -0.1% per year for the private business sector since 1950.
Growth-accounting decomposition for the private business sector since 1950 using BEA/BLS data in the task-based framework.
high negative Past Automation and Future A.I.: How Weak Links Tame the Gro... combined growth rate of other TFP and average labor productivity (ˆZt + ˆψℓt)
Under the rapid scenario, economists forecast the share of wealth held by the wealthiest 10% of households rising to 80.0% by 2050.
Conditional forecasts in Key Findings for the economist respondent group under the rapid AI scenario (2050 horizon).
high negative Forecasting the Economic Effects of AI fraction of wealth held by top 10% of households by 2050 (rapid scenario)
Conditional on the rapid scenario, economists forecast the labor force participation rate falling from its current level of 62% to 55% by 2050.
Conditional forecasts in Key Findings for the economist respondent group under the rapid AI scenario (2050 horizon).
high negative Forecasting the Economic Effects of AI labor force participation rate (LFPR) by 2050 under rapid scenario
There are macroeconomic risks associated with AI-led unemployment.
Paper's macroeconomic analysis drawing on labor economics and technology adoption research; no quantitative estimates or sample sizes provided in the summary.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... macroeconomic risk indicators (e.g., unemployment, aggregate demand shortfalls)
Managerial incentives drive premature workforce contraction during AI adoption.
Analytical claim grounded in labor economics and organizational behavior review; the summary indicates examination of managerial incentives but does not report primary empirical tests or sample sizes.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... timing and extent of workforce contraction
Premature workforce contraction in response to AI adoption foreshadows deeper structural challenges as AI systems mature.
Forward-looking claim based on synthesis of literature and theoretical projection; no empirical quantification or sample provided in the summary.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... long-run structural economic challenges (e.g., systemic instability, labor marke...
This pattern of premature workforce reductions reflects longstanding corporate short-termism rather than genuine technological displacement.
The paper's interpretation drawing on labor economics and organizational behavior literature; no empirical study or sample size reported in the summary.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... drivers of workforce reduction (managerial incentives vs. actual automation capa...
Organizations face mounting pressure to demonstrate immediate returns on AI investments, often through workforce reductions that outpace actual automation capabilities.
Argument in paper citing accelerating AI adoption across sectors and observed managerial responses; no primary dataset or sample size reported in the text.
high negative A Shorter Workweek as Economic Infrastructure: Managing AI-D... workforce reductions / layoffs
Applying the Auditor-Corrector methodology to ELT-Bench uncovers that most failed transformation tasks contain benchmark-attributable errors — including rigid evaluation scripts, ambiguous specifications, and incorrect ground truth — that penalize correct agent outputs.
Audit results on ELT-Bench identifying categories of benchmark errors (rigid scripts, ambiguous specs, incorrect ground truth) and attributing many failed transformation tasks to these errors; no numeric breakdown or sample count given in the excerpt.
high negative ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... proportion of failed transformation tasks attributable to benchmark errors (qual...
On ELT-Bench, the first benchmark for end-to-end ELT pipeline construction, AI agents initially showed low success rates, suggesting they lacked practical utility.
Reference to initial evaluation results on ELT-Bench showing low success rates for AI agents; the provided excerpt does not give numerical success rates or sample size.
high negative ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... agent success rate on ELT-Bench (agent capability / practical utility)
The way we're thinking about generative AI right now is fundamentally individual (this appears in how users interact with models, how models are built, how they're benchmarked, and how commercial and research strategies using AI are defined).
Author's observational/descriptive claim supported by argumentative examples (mentions user interaction patterns, model design and benchmarking practices, and commercial/research strategies); no empirical sample or quantitative analysis reported in the excerpt.
high negative The Future of AI is Many, Not One conceptual framing and practices around generative AI (individual-focused design...
Traditional questionnaires yielded slightly higher accuracy in risk assessment.
Result reported from the two experiments comparing traditional questionnaires to adaptive ARQuest versions; no numeric accuracy or sample size provided in the excerpt.
Insurers must blindly trust users' responses, increasing the chances of fraud.
Stated as a motivating problem in the paper; presented as logical/empirical concern rather than supported by a reported study within the paper.
high negative AI in Insurance: Adaptive Questionnaires for Improved Risk P... fraud risk from self-reported responses
Insurance application processes often rely on lengthy and standardized questionnaires that struggle to capture individual differences.
Descriptive claim in paper introduction arguing limitations of standard questionnaires; no experiment or sample size reported for this assertion.
high negative AI in Insurance: Adaptive Questionnaires for Improved Risk P... ability of standardized questionnaires to capture individual differences
Using a stylised inpatient capacity signalling example and minimal game-theoretic reasoning, task optimisation alone is unlikely to change system outcomes when incentives are unchanged.
Theoretical analysis using a stylised inpatient capacity signalling example and game-theoretic reasoning presented in the paper (no empirical data/sample reported in the abstract).
high negative Incentives, Equilibria, and the Limits of Healthcare AI: A G... system-level outcomes in healthcare (response to task optimisation interventions...
Deployment of AI systems carries significant costs including ongoing costs of monitoring and it is unclear whether optimism of a deus ex machina solution is well-placed.
Conceptual/argumentative claim made by the authors in the paper (no empirical study or sample size reported in the abstract).
high negative Incentives, Equilibria, and the Limits of Healthcare AI: A G... costs and uncertainty associated with AI deployment (including monitoring costs)
Improvements in operational resilience (OR) effectively reduce corporate operational risk.
Further analysis reported in the paper linking higher OR to lower operational risk measures for firms in the sample.
high negative Does Artificial Intelligence Improve the Operational Resilie... corporate operational risk (reduction)
AI promotes operational resilience by reducing management agency conflicts.
Mechanism (mediation) tests reported in the paper showing AI associated with reductions in measures of agency/management conflict, which in turn relate to OR improvements.
high negative Does Artificial Intelligence Improve the Operational Resilie... management agency conflicts (reduction)
No regulatory framework requires disclosure of machine/AI labor output.
Author's assertion in the paper (policy claim; no legislative survey or quantification reported).
high negative HEWU: A Standardized Framework for Measuring Machine-Generat... presence of regulatory disclosure requirements for machine labor
No index tracks machine labor output over time.
Author's assertion in the paper (stated lack of existing indices; no systematic review/sample reported).
high negative HEWU: A Standardized Framework for Measuring Machine-Generat... existence of time-series index for machine labor output
This labor force is entirely invisible to the economic infrastructure humanity has built to measure work: no standardized unit of measurement exists.
Author's assertion/diagnosis in the paper (argumentative/observational, no empirical survey or sample reported).
high negative HEWU: A Standardized Framework for Measuring Machine-Generat... existence of standardized unit for machine labor
Agent contributions are associated with more churn over time compared to human-authored code.
Longitudinal comparison between agent-generated and human-authored contributions reported in the paper (churn/survival estimates described; association between agent contributions and higher churn asserted).
high negative Investigating Autonomous Agent Contributions in the Wild: Ac... code churn rate over time (agent-generated vs human-authored)
Unbalanced or poorly governed adoption of Big Data and AI contributes to increased systemic risk, cybersecurity vulnerability, regulatory fragmentation and third-party dependence on BigTech platforms.
Argument based on qualitative literature review and synthesis of international empirical studies and comparative sector analysis; no single-sample empirical study in this paper.
high negative Implications of Big Data Technologies for the Resilience of ... systemic risk; cybersecurity vulnerability; regulatory fragmentation; third-part...
Extreme automation (high AI intensity) causes employment decline.
Part of the U-shaped relationship reported by the paper's empirical results; described qualitatively in the abstract/summary.
high negative Impact Of Artificial Intelligence (AI) On Employment employment decline
Task orchestration is the most under-researched dimension among the five workplace-design components.
Finding from the PRISMA-guided systematic review of 120 papers, which mapped coverage across the five dimensions and identified task orchestration as having the least research attention.
high negative From Automation to Augmentation: A Framework for Designing H... volume/coverage of research on task orchestration
Decision authority allocation emerges as the binding constraint for Society 5.0 transitions.
Result synthesized from the systematic review and theoretical analysis mapping the five workplace-design dimensions; stated as the binding constraint in the paper's findings.
high negative From Automation to Augmentation: A Framework for Designing H... constraint on transitions to human-centric (Society 5.0) technology integration
The literature shows persistent gaps in empirical validation, standardized evaluation methods, and sector-specific comparative analyses of agentic AI in financial services.
Review-level assessment noting limited empirical studies, heterogeneous evaluation metrics, and few direct cross-sector comparisons up to mid-2024.
high negative A Comparative & Systematic Review of Literature on the I... availability/quality of empirical validation and evaluation standards
Significant implementation barriers persist, notably workforce transformation challenges, legacy system integration difficulties, and trust deficits.
Thematic synthesis across empirical and conceptual papers in the review reporting implementation barriers and change management issues.
high negative A Comparative & Systematic Review of Literature on the I... implementation barriers (workforce, legacy systems, trust)
Ethical concerns—including bias, lack of transparency, and regulatory compliance risks—remain critical for agentic AI in financial services and necessitate layered governance and human-AI collaboration.
Collation of ethical, legal, and governance issues reported across the reviewed multidisciplinary studies and normative discussions.
high negative A Comparative & Systematic Review of Literature on the I... prevalence/severity of ethical and regulatory risks and governance needs
Insurance is comparatively underrepresented in the literature and in reported agentic AI deployments compared with banking and investment.
Review finding (counts/themes across included studies indicating fewer studies/applications in insurance relative to banking and investment).
high negative A Comparative & Systematic Review of Literature on the I... relative representation/adoption across financial subsectors
A weak manager directing a weak worker achieves a 42% success rate, performing worse than the weak agent alone which achieves 44%.
Empirical comparison across the same 200 SWE-bench Lite instances and pipeline configurations, comparing weak-manager+weak-worker pipeline to weak single-agent baseline.
high negative Can AI Models Direct Each Other? Organizational Structure as... task success rate (percentage of tasks solved)
Task complexity shapes substitution: low-complexity tasks see high substitution, while high-complexity tasks favor limited partial automation.
Calibration of the model to O*NET tasks + expert survey + GPT-4o decompositions; implementation results reported for computer vision showing substitution varies with task complexity.
high negative Economics of Human and AI Collaboration: When is Partial Aut... degree of labor substitution as a function of task complexity
AI systems exhibit predictable but diminishing returns to data, compute, and model size (scaling-law experiments), implying the cost of higher accuracy is convex: good performance may be inexpensive, but near-perfect accuracy is disproportionately costly.
Scaling-law experiments estimating performance as a function of data, compute, and model size; described experimental estimation of production function.
high negative Economics of Human and AI Collaboration: When is Partial Aut... marginal returns to inputs (data, compute, model size) and marginal cost of accu...
The common claim that generative AI simply amplifies the Dunning–Kruger effect is too coarse to capture the available evidence.
Paper's synthesis of heterogenous empirical findings from human–AI interaction, learning research, and model evaluation used to critique the uniform-amplification interpretation; no single empirical countertest reported.
high negative Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupli... validity of the 'amplified Dunning–Kruger' interpretation
LLM use degrades metacognitive accuracy and flattens the classic competence–confidence gradient across skill groups (i.e., reduces calibration and narrows differences in self-assessed confidence by skill level).
Synthesis of studies from human–AI interaction and learning research reported in the paper that document worsened calibration and a reduction in the competence–confidence gradient when users rely on LLM outputs; the paper does not report a single combined sample size.
high negative Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupli... metacognitive accuracy / calibration and competence–confidence gradient
The agent team topology exhibits higher operational fragility due to multi-author code generation.
Reported empirical observation from experiments comparing architectures, attributing increased fragility/errors to multi-author code generation in the agent team setup (stated qualitatively; no quantitative failure rates provided in the abstract).
high negative An Empirical Study of Multi-Agent Collaboration for Automate... operational fragility / error-proneness associated with multi-author code genera...
Azar et al. (2023) show that monopsonistic employers have stronger incentives to automate and document that US commuting zones with higher labor market concentration experienced more robot adoption.
Citation reported in the paper summarizing Azar et al. (2023); empirical analysis across US commuting zones (no sample size provided here).
high negative NBER WORKING PAPER SERIES robot adoption correlated with labor market concentration; incentives to automat...