The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (4560 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Productivity Remove filter
When employers have monopsony power, they choose technologies that expand this power beyond what a social planner would consider optimal.
Model results on monopsonistic employer incentives and their technological choices; discussion supported by citations.
high negative NBER WORKING PAPER SERIES expansion of monopsony power via technological choice
Profit-maximizing firms pursue innovations that erode workers' market power by making them more easily replaceable, even at the expense of production efficiency; a social planner who values worker welfare would employ technologies that preserve workers' market power.
Theoretical analysis of interactions between technological choice and market power; supported by cited empirical evidence (e.g., Azar et al. 2023) in the paper.
high negative NBER WORKING PAPER SERIES choice of innovation affecting workers' market power / production efficiency tra...
A welfare-maximizing planner would choose to automate fewer tasks than production efficiency would dictate when workers' welfare is heavily weighted.
Model analysis of welfare-maximizing automation level compared to production-efficient automation; analytical result in the automation application.
high negative NBER WORKING PAPER SERIES extent/level of task automation chosen
Observed declines in browsing time due to ChatGPT adoption are concentrated in website categories such as search and news, which are highly exposed to substitution by generative AI.
Category-level browsing time changes across website classification; concentration of declines in categories identified as highly overlap-exposed to chatbot capabilities using web-scraping and LLM site-level overlap classification.
high negative https://arxiv.org/pdf/2603.03144 browsing time on search and news website categories
High-income and younger households adopt generative AI substantially faster than low-income and older counterparts, and this gap is widening over time ('generative AI divide').
Descriptive heterogeneity analysis using Comscore household demographics (income and age bins) and observed adoption trajectories across 2021–2024; authors report widening gap rather than convergence.
high negative https://arxiv.org/pdf/2603.03144 heterogeneity in adoption rates by income and age (inequality in adoption)
Diminishing returns are not only a geometric flattening of the loss curve, but also rising pressure for cost reduction, system-level innovation, and the breakthroughs needed to sustain Moore-like efficiency doublings.
Analytical claim in the paper about the implications of diminishing returns for cost pressure and innovation requirements (qualitative; no sample size in excerpt).
high negative The Unreasonable Effectiveness of Scaling Laws in AI pressure for cost reduction and need for system-level innovation/breakthroughs
Prominent studies predict substantial job displacement due to automation.
Paper asserts this as background, referencing the existence of prominent studies in the literature (no specific citations or sample sizes provided in the abstract).
high negative AI Civilization and the Transformation of Work job losses / displacement
For organizations of n humans with AI agents, the optimal team size decreases with agent capability.
Derived implication from the stylized model's analysis of multi-human organizations interacting with AI agents.
high negative The Novelty Bottleneck: A Framework for Understanding Human ... optimal team size as a function of agent capability
There is no smooth sublinear regime for human effort; it transitions sharply from O(E) to O(1) with no intermediate scaling class.
Mathematical derivation from a stylized model of human-AI collaboration that assumes tasks decompose into atomic decisions, a fraction ν are novel, and specification/verification/error correction scale with task size.
high negative The Novelty Bottleneck: A Framework for Understanding Human ... human effort scaling (human time/effort required as task size E grows)
So far the maintenance and migration work was done largely manually by human experts.
Background assertion in the paper's introduction/abstract; no empirical backing provided in abstract.
high negative A Multi-agent AI System for Deep Learning Model Migration fr... degree of manual effort for model maintenance and migration historically
The regime divide deepens under AI capital concentration, admits a permanent displacement attractor in shallow markets, and generates equity market participation hysteresis in which the ERP remains elevated after employment has normalised.
Model-based assertions: analysis shows capital concentration magnifies regime separation, yields a permanent displacement attractor in shallow-market parameterizations, and produces hysteresis in participation leading to persistently elevated ERP after employment recovery.
high negative When Does AI Raise the Equity Risk Premium? Displacement, Pa... equity risk premium (ERP) persistence / participation hysteresis
The alignment risk channel is specific to agentic AI: correlated misalignment in AI objectives generates aggregate output shocks with fat left tails; formalised via Hansen-Sargent multiplier preferences, the resulting alignment risk premium (ARP) enters the equilibrium ERP decomposition as a priced factor additively separable from the participation wedge.
Theoretical formalisation in the paper: uses Hansen-Sargent multiplier preferences to capture model uncertainty/robustness and defines an ARP that is additively separable in the ERP decomposition.
high negative When Does AI Raise the Equity Risk Premium? Displacement, Pa... alignment risk premium (ARP) contribution to ERP
The participation compression channel operates through household wealth: displacement pushes marginal households below the equity market entry cost κ, concentrating aggregate consumption risk on a shrinking investor pool and—by the Basak-Cuoco mechanism—raising the required risk premium even as fundamentals improve.
Model mechanism described in the paper: heterogeneous-agent model with an explicit market entry cost κ and reference to the Basak-Cuoco mechanism leading to a higher required risk premium when investor base shrinks.
high negative When Does AI Raise the Equity Risk Premium? Displacement, Pa... equity risk premium (ERP)
AI can worsen financial and market performance if it crowds out normal R&D.
Paper's empirical analysis and interpretation linking AI dependence to poorer financial/market performance through displacement of standard R&D activities; presented as a study finding.
high negative The 'Intelligent Trap' in Corporate Finance—A Study Based on... financial and market performance
High AI dependency disclosed in financial reports does not improve firms' financial health and may even endanger it.
Empirical results drawn from the study's analysis of listed new energy vehicle and automobile manufacturers (2013–2023); statement appears in the paper's findings/conclusions.
high negative The 'Intelligent Trap' in Corporate Finance—A Study Based on... financial health / corporate financial condition
AI dependency reduces financial safety for listed new energy vehicle and automobile manufacturers.
Empirical analysis of a sample of listed new energy vehicle and automobile manufacturers covering 2013–2023; the paper reports data analysis showing AI dependency reduces financial safety.
high negative The 'Intelligent Trap' in Corporate Finance—A Study Based on... financial safety / corporate financial risk
Performance degradation persists even when context is provided via structured semantic layers including AST-extracted function context and import graph resolution.
Experiments comparing unstructured versus structured context provision; structured semantic layers (AST context, import graph resolution) were evaluated and models still degraded with more context.
high negative SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... model detection/performance when given structured semantic context
Models' performance degrades monotonically from diff-only (config_A) to diff+file content (config_B) to full context (config_C) across all 8 models.
Systematic ablation across three frozen context configurations (config_A, config_B, config_C) reported; all 8 evaluated models show monotonic performance decline as more context is provided.
high negative SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... model performance score across context-provision configurations
Eight frontier models detect only 15–31% of human-flagged issues on the diff-only configuration (config_A).
Empirical evaluation across 8 models on SWE-PRBench (350 PRs) under the diff-only configuration; reported detection rates of 15–31% relative to human-flagged issues.
high negative SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... detection rate of human-flagged issues
There is a growing gap between rapid experimentation with AI tools and limited organizational capability to institutionalize them in everyday workflows.
Argument supported by targeted literature synthesis and review of recent scholarly and institutional sources; no primary empirical sample reported in this paper.
high negative Behavioral Factors as Determinants of Successful Scaling of ... organizational capability to institutionalize AI initiatives (pilot-to-productio...
Evaluations across eight state-of-the-art multimodal models reveal that models achieved only 55.0% accuracy on help prediction.
Experimental evaluation reported in the paper comparing eight multimodal models on the Help Prediction task with reported accuracy metric.
Evaluations across eight state-of-the-art multimodal models reveal that models achieved only 44.6% accuracy on behavior state detection.
Experimental evaluation reported in the paper comparing eight multimodal models on the Behavior State Detection task with reported accuracy metric.
high negative GUIDE: A Benchmark for Understanding and Assisting Users in ... behavior state detection accuracy
Ikema is a severely endangered Ryukyuan language spoken in Okinawa, Japan, with approximately 1,300 remaining speakers, most of whom are over 60 years old.
Demographic/descriptive claim reported in the paper's background (likely citing prior surveys or census estimates); the abstract states the ~1,300 speakers figure and age distribution.
high negative Automatic Speech Recognition for Documenting Endangered Lang... number and age distribution of speakers
The financial planning and investment management profession is undergoing a radical transformation driven by Generative AI (GenAI) and Agentic AI, creating urgent workforce displacement challenges that require coordinated government policy intervention alongside educational reform.
Author assertion in the paper's introduction/abstract; framing argument based on the paper's synthesized analysis (no empirical sample, no reported statistical test).
high negative STRENGTHENING FINANCIAL WORKFORCE COMPETITIVENESS: A CURRICU... rate of workforce displacement in the financial planning and investment manageme...
LLM design agents can fixate on existing paradigms and fail to explore alternatives when solving design challenges, potentially leading to suboptimal solutions (a pathology analogous to human designers).
Literature/background claim and authors' characterization of observed agent behavior; motivated the proposed metacognitive interventions. No numerical sample size reported.
high negative Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regul... tendency to fixate on existing paradigms / lack of exploration leading to subopt...
Real estate pro forma development remains one of the most time-intensive functions in property investment, typically requiring twenty to forty hours per multifamily project through manual research, Excel-based modeling, and iterative scenario analysis.
Statement in paper asserting typical industry practice; not tied to the paper's controlled test. No empirical sample size or survey data reported alongside this assertion.
Traditional car-following models, such as the Intelligent Driver Model (IDM), often struggle to generalize across diverse traffic scenarios and typically do not account for fuel efficiency.
Literature-based statement within the paper motivating the study (review of limitations of traditional car-following models). No sample size reported.
high negative Macroscopic Characteristics of Mixed Traffic Flow with Deep ... model generalizability and accounting for fuel efficiency
Standard evaluation of LLM confidence relies on calibration metrics (ECE, Brier score) that conflate two distinct capacities: how much a model knows (Type-1 sensitivity) and how well it knows what it knows (Type-2 metacognitive sensitivity).
Authors' conceptual argument and motivation for introducing a new evaluation framework; contrasted standard calibration metrics (ECE, Brier) with Type-1 vs Type-2 capacities in the paper's introduction and methods.
high negative Do LLMs Know What They Know? Measuring Metacognitive Efficie... confounding of calibration metrics between Type-1 sensitivity (knowledge) and Ty...
Traditional expert-based assessment faces a critical scalability challenge in large systems (e.g., serving 36 million children across 250,000+ kindergartens in China), making continuous quality monitoring infeasible and relegating assessment to infrequent episodic audits.
Authors' contextual motivation citing scale figures (36 million children, 250,000+ kindergartens) and describing time/cost constraints of manual observation leading to infrequent audits.
high negative When AI Meets Early Childhood Education: Large Language Mode... feasibility/scalability of manual expert-based assessment
Preliminary evaluation reveals that current foundation action models struggle substantially with professional desktop applications (~60% task failure rate).
Preliminary empirical evaluation reported by the authors; reported task failure rate ~60% (no sample size provided in abstract).
high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... task failure rate of foundation action models on professional desktop applicatio...
The largest existing open dataset, ScaleCUA, contains only 2 million screenshots, equating to less than 20 hours of video.
Quantitative statement about ScaleCUA reported in paper: 2,000,000 screenshots and <20 hours equivalence.
high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... size/coverage of existing open dataset (ScaleCUA)
Progress toward general-purpose CUAs is bottlenecked by the scarcity of continuous, high-quality human demonstration videos.
Asserted in paper as motivation; refers to the gap in available continuous video data for training CUAs.
high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... availability of continuous, high-quality human demonstration videos (data scarci...
Reliance on massive, schema-heavy prompts results in prohibitive per-token API costs and high latency, hindering scalable production deployment.
Introductory problem statement in the paper arguing that large context prompts increase per-token API costs and latency for API-based LLMs; no quantitative study or sample size provided for this claim within the excerpt.
high negative Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... latency and per-token API cost
AI-enabled, democratised production is more likely to intensify competition and produce winner-take-most outcomes than to generate broadly distributed entrepreneurial success.
Synthesised theoretical prediction based on the unified framework (attention scarcity + free-entry dilution + superstar/preferential attachment dynamics) developed in the paper; no empirical validation provided.
high negative The Economics of Builder Saturation in Digital Markets prevalence of broadly distributed entrepreneurial success versus concentration
When the framework is extended to include quality heterogeneity and reinforcement dynamics, equilibrium outcomes exhibit declining average payoffs.
Analytical extension of the baseline formal model to incorporate heterogeneous quality and reinforcement (preferential attachment) dynamics; theoretical derivation in the paper; no empirical sample.
high negative The Economics of Builder Saturation in Digital Markets average payoffs to producers
In markets with near-zero marginal costs and free entry, increases in the number of producers dilute average attention and returns per producer.
Formal theoretical model introduced in the paper (Builder Saturation Effect) that assumes near-zero marginal costs, free entry, and finite human attention; no empirical sample or experimental data reported.
high negative The Economics of Builder Saturation in Digital Markets average returns per producer
Agent memories currently remain private and non-transferable because there is no way to validate their value.
Descriptive assertion in the paper about current state of agent memories; no empirical survey or measurement cited.
high negative Infrastructure for Valuable, Tradable, and Verifiable Agent ... transferability and marketability of agent memories under current conditions
Insufficient organizational resources significantly inhibit AI adoption in procurement (β = -0.19, p < 0.05).
Same questionnaire survey (n=326) and multiple linear regression analysis; reported coefficient β=-0.19 with p<0.05.
high negative Research on the Adoption of Artificial Intelligence and Proc... AI adoption in procurement
Measuring only technical model performance (such as predictive accuracy) is insufficient for assessing the strategic impact of AI in drug discovery.
Argued in the paper as a critique of current evaluation practices; presented as a conceptual point rather than supported by new empirical data in the excerpt.
high negative Strategic Key Performance Indicators for AI in Lead Optimiza... adequacy of technical model performance metrics for capturing strategic impact
Pressure remains high to increase the probability of success to improve the effectiveness of pharmaceutical R&D.
Asserted in the paper as motivational context for the work; framed as an industry pressure point rather than backed by a specific empirical sample or quantified survey in the excerpt.
high negative Strategic Key Performance Indicators for AI in Lead Optimiza... probability of success in pharmaceutical R&D
Increasing cost and failure rates in the pharmaceutical R&D process have not fundamentally improved over the last decade.
Stated as a contextual observation in the paper's opening paragraph; presented as a summary of industry trends (no specific dataset, sample size, or citation included in the excerpt).
high negative Strategic Key Performance Indicators for AI in Lead Optimiza... cost and failure rates in pharmaceutical R&D
Without support, performance stays stable up to three issues but declines as additional issues increase cognitive load.
Empirical study / human-AI negotiation case study in a property rental scenario that varied the number of negotiated issues; the paper reports observed performance across different numbers of issues (no sample size for this specific comparison stated in the abstract).
high negative From Overload to Convergence: Supporting Multi-Issue Human-A... negotiation performance (ability to find good agreements) under increasing numbe...
Reliance on automated content generation introduces risks of cognitive overreliance, algorithmic bias, and strategic misalignment.
The paper articulates these risks as conceptual/qualitative concerns in its discussion; no quantitative estimates or empirical tests of these specific risks are reported in the provided excerpt.
high negative The Strategic Impact of Generative Artificial Intelligence o... risks to decision-making including cognitive overreliance, algorithmic bias, str...
Wide disagreement among AIs created confusion and undermined appropriate reliance on advice.
Reported experimental finding from the paper: manipulating within-panel disagreement across tasks produced wide disagreement conditions that, according to the abstract, led to confusion and reduced appropriate reliance. No quantitative metrics reported in abstract.
high negative More Isn't Always Better: Balancing Decision Accuracy and Co... appropriate reliance on advice / decision-making
High within-panel consensus fostered overreliance on AI advice.
Experimental manipulation of within-panel consensus across the three tasks; the abstract reports that high consensus increased participants' reliance on AI (interpreted as overreliance). Specific measures and sample size not provided in abstract.
high negative More Isn't Always Better: Balancing Decision Accuracy and Co... reliance on AI advice (overreliance)
Improvements in AI ('better' AI) amplify the excess automation as well.
Model comparative statics: increased AI capabilities raise private incentives to automate, leading to more displacement than is socially optimal; theoretical analysis only.
high negative The AI Layoff Trap level of automation / worker displacement as a function of AI capability
More competition amplifies the excess automation (the automation arms race).
Comparative-statics result in the competitive task-based theoretical model showing increased competition raises firms' incentives to automate; no empirical sample.
high negative The AI Layoff Trap level of automation / worker displacement as a function of competition intensity
The resulting loss from excess automation harms both workers and firm owners.
Welfare comparisons from the model showing negative payoff changes for workers (lower wages/less employment) and reduced owner returns when automation is excessive; theoretical analysis, no empirical data.
high negative The AI Layoff Trap welfare/profits of workers and firm owners (losses caused by excess automation)
In a competitive task-based model, demand externalities trap rational firms in an automation arms race, displacing workers well beyond what is collectively optimal.
Formal equilibrium analysis in the paper's theoretical competitive task-based model; comparative statics and welfare analysis (no empirical sample).
high negative The AI Layoff Trap extent of worker displacement relative to social optimum
Knowing that AI-driven displacement can erode demand is not enough for firms to stop automating.
Analytical result from the paper's competitive task-based model showing firms' incentives do not internalize demand externalities; no empirical sample.
high negative The AI Layoff Trap firm automation decisions (propensity to automate) despite awareness of aggregat...