The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (4560 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Productivity Remove filter
The four-variable account (produced output, underlying understanding, calibration accuracy, self-assessed ability) better explains phenomena like overconfidence, over- and under-reliance on AI, 'crutch' effects, and weak transfer than the simpler claim that generative AI merely amplifies the Dunning–Kruger effect.
Argumentative synthesis in the paper comparing explanatory power of the proposed four-variable framework against the more general Dunning–Kruger metaphor; draws on examples and empirical patterns from the reviewed literature rather than a single empirical test.
high mixed Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupli... explanatory fit for phenomena such as overconfidence, reliance patterns, crutch ...
A useful working model is 'AI-mediated metacognitive decoupling': LLM use widens the gap among produced output, underlying understanding, calibration accuracy, and self-assessed ability.
Conceptual synthesis and theoretical proposal grounded in reviewed empirical findings from multiple literatures (human–AI interaction, learning research, model evaluation); presented as the paper's working model rather than as a single empirical estimate.
high mixed Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupli... degree of alignment/decoupling between produced output, underlying understanding...
There is a fundamental trade-off between operational stability and theoretical deliberation across multi-agent coordination frameworks.
Empirical results from controlled benchmarks comparing agent architectures under fixed computational time budgets, as reported in the paper (no numeric sample size or statistical details provided in the abstract).
high mixed An Empirical Study of Multi-Agent Collaboration for Automate... operational stability versus depth/quality of theoretical deliberation
As technological progress devalues labor, the welfare benefits of steering are at first increased but, beyond a critical threshold, decline and optimal policy shifts toward greater redistribution.
Theoretical model extension analyzing planner's optimal choice as labor's economic value changes; the paper states a non-monotonic relationship with a critical threshold.
high mixed NBER WORKING PAPER SERIES welfare benefits of steering; optimal policy (steering vs redistribution)
Using pre-existing exposure as an instrument for ChatGPT adoption in a long-difference IV design, ChatGPT adoption causes households to spend more time on digital leisure activities while leaving total time spent on productive online activities unchanged.
IV long-difference empirical design: instrumenting household adoption with pre-ChatGPT exposure (2021 browsing); outcome measured as changes in categorized browsing durations (LLM-based classification into 'leisure' vs 'productive' sites); controls include demographic-by-region fixed effects and browsing composition controls.
high mixed https://arxiv.org/pdf/2603.03144 change in time spent on digital leisure activities and total time on productive ...
Once efficiency is made explicit, the main practical question becomes how many efficiency doublings are required to keep scaling productive despite diminishing returns.
Framing/forecasting claim in the paper presenting an operational research question (conceptual; no empirical sample in excerpt).
high mixed The Unreasonable Effectiveness of Scaling Laws in AI required number of efficiency doublings to sustain productive scaling
The practical burden of scaling depends on how efficiently real resources are converted into that (logical) compute.
Argument in the paper linking conceptual 'logical compute' to real-world conversion efficiency (qualitative claim; no empirical sample in excerpt).
high mixed The Unreasonable Effectiveness of Scaling Laws in AI efficiency of converting real resources into logical compute
The compute variable is best understood as logical compute, an implementation-agnostic notion of model-side work.
Conceptual argument presented in the paper reframing 'compute' as an abstract, implementation-agnostic quantity (no empirical sample provided).
high mixed The Unreasonable Effectiveness of Scaling Laws in AI definition/interpretation of the 'compute' variable
These patterns are consistent with a reorganization of the scientific production process rather than immediate efficiency gains, in line with theories of general-purpose technologies.
Interpretation linking observed changes in budget allocation, team size, and task breadth (from the proposal dataset and task-level analyses) to theoretical predictions about general-purpose technologies (GPTs); empirical findings show organizational change rather than large average short-run productivity gains.
high mixed Artificial Intelligence in Science: Returns, Reallocation, a... organizational reorganization vs efficiency gains (qualitative interpretation)
This paper offers a forward-looking framework that emphasizes the decentralizing potential of AI on labor markets, moving beyond the traditional displacement-versus-creation dichotomy.
Paper's stated contribution; based on conceptual framework and synthesis of historical and contemporary analyses (no empirical validation presented in the abstract).
high mixed AI Civilization and the Transformation of Work conceptual framing of AI's labor-market effects
The emergence of artificial intelligence and robotics is catalyzing a profound transformation in the nature of human labor.
Stated as a central premise in the paper's abstract; supported by the paper's synthesis of economic history, contemporary labor market data, and analysis of digital platform growth (no specific datasets or sample sizes reported in the abstract).
high mixed AI Civilization and the Transformation of Work nature of human labor / structure of labor markets
The resulting AI safety profile is asymmetric: AI is bottlenecked on frontier research (novel tasks) but unbottlenecked on exploiting existing knowledge.
Theoretical implication of the novelty-bottleneck model distinguishing novel (human-judgment) vs. routine (covered by agent prior) components of tasks.
high mixed The Novelty Bottleneck: A Framework for Understanding Human ... AI capability bottlenecks in frontier research vs. exploitation
Wall-clock time can be reduced to O(√E) through team parallelism, but total human effort remains O(E).
Model-derived result showing parallelism across humans can speed wall-clock completion time while aggregate human effort does not drop asymptotically.
high mixed The Novelty Bottleneck: A Framework for Understanding Human ... wall-clock task completion time and total human effort
Better agents improve the coefficient on human effort but not the exponent (i.e., they reduce the constant factor but do not change the asymptotic scaling class).
Analytic result from the stylized model under the paper's assumptions about task decomposition and novelty fraction ν.
high mixed The Novelty Bottleneck: A Framework for Understanding Human ... human effort (coefficient vs. asymptotic scaling exponent)
India's systematic investment plan (SIP) flows provide a high-frequency observable for the model's endogenous participation rate and constitute the natural empirical laboratory for the displacement–participation mechanism.
Empirical suggestion in the paper proposing SIP flows as an observable proxy for the modelled participation rate and recommending India as a lab to test the displacement–participation channel (no empirical test reported in the excerpt).
high mixed When Does AI Raise the Equity Risk Premium? Displacement, Pa... equity market participation rate (proxied by SIP flows)
Three analytical results characterise non-linear financial fragility, regime-contingent risk premium divergence, and the general equilibrium alignment squeeze.
Stated analytical results in the paper derived from the theoretical model describing three named phenomena (non-linear fragility, regime-contingent divergence, alignment squeeze).
high mixed When Does AI Raise the Equity Risk Premium? Displacement, Pa... financial fragility / risk premium behaviour / alignment-induced output effects
Whether AI is equity-bullish or equity-bearish depends on which channel dominates—a condition that differs sharply between deep financial markets, where the ARP is the dominant driver of elevated risk premia (Regime D), and shallow markets, where participation compression dominates (Regime E).
Model regime analysis in the paper distinguishing Regime D (deep markets, ARP-dominated) and Regime E (shallow markets, participation-compression-dominated) and stating comparative dominance determines net bullish/bearish outcome.
high mixed When Does AI Raise the Equity Risk Premium? Displacement, Pa... net effect of AI on equity returns / ERP
The equilibrium equity risk premium decomposes into three additively separable terms corresponding to these three channels (Proposition 1).
Formal proposition (Proposition 1) in the paper deriving an additive decomposition of the equilibrium ERP into the productivity, participation compression, and alignment risk terms.
high mixed When Does AI Raise the Equity Risk Premium? Displacement, Pa... equity risk premium (ERP) decomposition
We develop a heterogeneous-agent framework in which AI-driven labour displacement affects the equity risk premium (ERP) through three co-equal channels.
Stated model contribution in the paper: a theoretical heterogeneous-agent framework that posits three channels linking AI-driven labour displacement to the ERP (productivity, participation compression, alignment risk).
The top four models are statistically indistinguishable (mean score 0.147–0.153) while a clear tier gap separates them from the remaining four models (mean score <= 0.113).
Reported mean performance scores across 8 models and statement of statistical indistinguishability for the top four vs lower-tier four; numerical means provided.
high mixed SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... mean model performance score
Behavioral factors — specifically trust calibration, cognitive load, and affective reactions — shape the transition of corporate AI initiatives from pilot deployments to scalable, sustained use.
Synthesis of human-AI interaction literature integrated with adoption frameworks (TAM and TOE); conceptual linkage rather than new empirical testing in this paper.
high mixed Behavioral Factors as Determinants of Successful Scaling of ... success of pilot-to-production transition (scalability and sustained use)
AI accelerates value-chain maturation while creating distinct risks — including professional responsibility tensions and potential system-level externalities.
Conceptual argument and risk analysis in the Article (theoretical reasoning and synthesis of management/ethics literature). No empirical causal estimate reported in the excerpt.
high mixed Rewired: Reconceptualizing Legal Services for the AI Age acceleration of value-chain maturation and emergence of professional responsibil...
The legal profession is at a crossroads, caught between intensifying fears of AI-driven displacement and a generational opportunity for transformation.
Author's synthesis and framing in the Article (conceptual assessment; literature/contextual synthesis). No empirical sample or experiment reported in the excerpt.
high mixed Rewired: Reconceptualizing Legal Services for the AI Age risk of AI-driven displacement and opportunity for transformation in the legal p...
This advantage is contingent upon robust AI governance, ethical frameworks, and the transition from 'pilot-lite' projects to integrated, data-driven 'AI-first' business models.
Conditional claim in the paper linking success to governance, ethics, and organizational integration; appears to be normative/analytical rather than empirical in the abstract.
high mixed The AI Advantage: Strategic Innovation and Global Expansion ... dependency of AI-driven advantage on governance, ethics, and organizational inte...
Machine-readable metrics and open scholarly infrastructure are reshaping scholarly profiles and incentives.
Conceptual and historical discussion referring to platforms and metrics (e.g., arXiv, Google Scholar, ORCID) as mechanisms changing incentives; no new empirical estimates provided.
high mixed A Brief History of AI for Scientific Discovery: Open Researc... changes in scholarly incentives and profile construction due to machine-readable...
That interconnected ecosystem is fundamentally restructuring who can do science (access), how fast discoveries propagate, and what counts as a valid scientific contribution.
Argumentative claim linking infrastructural and tool changes to changes in access, dissemination speed, and norms of contribution. The paper presents examples and narrative but no systematic empirical evaluation or sample.
high mixed A Brief History of AI for Scientific Discovery: Open Researc... access to scientific practice, speed of discovery dissemination, and norms of sc...
The most consequential development is not any single tool but the emergence of an interconnected ecosystem—AI agents, preprint platforms, open source codebases, and citation infrastructure—that forms a feedback loop.
Synthesis/argument based on multiple examples (LLM agents, preprint servers like arXiv, open-source code repositories, citation indices). No quantitative measurement or causal identification reported.
high mixed A Brief History of AI for Scientific Discovery: Open Researc... emergence of an interconnected scientific infrastructure ecosystem
The central tension in AI for science is between automation (building systems that replace human researchers) and augmentation (tools that amplify human creativity and judgement).
Analytical claim based on the paper's review of historical examples and conceptual discussion; no primary data or experimental design reported.
high mixed A Brief History of AI for Scientific Discovery: Open Researc... relationship between automation and augmentation in research practice
Science has repeatedly delegated its bottlenecks to machines—first inference, then search, then measurement, then the full workflow—and each delegation solves one problem while exposing a harder one underneath.
Interpretive historical argument drawing on examples across AI-for-science milestones (e.g., DENDRAL, search and inference systems, measurement automation, and contemporary end-to-end workflows). No quantitative sample or experimental method reported.
high mixed A Brief History of AI for Scientific Discovery: Open Researc... pattern of delegation and emergent bottlenecks in research workflows
Testing revealed AI excels at computational tasks but consistently misses nuanced factors like new construction rent premiums and infrastructure proximity impacts, validating the framework's hybrid structure as essential for professional-grade underwriting.
Findings from the controlled ChatGPT-4 test on the single 150-unit scenario: qualitative and comparative observations showing AI handled computations well but failed to capture specific local-market nuances, leading authors to endorse a hybrid human-AI framework.
Phase Two requires human-led professional validation to correct AI limitations, apply local market knowledge, and integrate risk factors.
Framework description supported by observations from the controlled test where human review was used to correct AI outputs and apply local knowledge (e.g., adjusting for nuanced market factors).
Traffic performance is sensitive to the distribution of safe time gaps and the proportion of RL vehicles.
Simulation results comparing Fundamental Diagrams across scenarios with different distributions of safe time gaps and shares of RL-controlled vehicles. Number of simulation runs or replicates not stated in the claim text.
high mixed Macroscopic Characteristics of Mixed Traffic Flow with Deep ... traffic performance (e.g., flow, capacity) sensitivity to time-gap distribution ...
AUROC_2 and M-ratio produce fully inverted model rankings, demonstrating these metrics answer fundamentally different evaluation questions.
Metric comparison across models showing that AUROC_2-based ranking and M-ratio-based ranking are fully inverted in the reported results on the evaluated dataset.
high mixed Do LLMs Know What They Know? Measuring Metacognitive Efficie... model ranking by AUROC_2 versus model ranking by M-ratio
Temperature manipulation shifts Type-2 criterion while meta-d' remains stable for two of four models, dissociating confidence policy from metacognitive capacity.
Experimental manipulation (temperature changes) applied to models; reported result that Type-2 criterion shifted with temperature while meta-d' was stable for two models (out of four) in the 224,000-trial dataset.
high mixed Do LLMs Know What They Know? Measuring Metacognitive Efficie... Type-2 criterion (confidence policy) and meta-d' (metacognitive capacity)
Metacognitive efficiency is domain-specific, with different models showing different weakest domains, invisible to aggregate metrics.
Domain-level analyses reported in the paper showing per-domain M-ratio results and identification of different weakest domains per model, contrasted with aggregate metric behavior.
high mixed Do LLMs Know What They Know? Measuring Metacognitive Efficie... domain-specific metacognitive efficiency (M-ratio) across task domains
Metacognitive efficiency varies substantially across models even when Type-1 sensitivity is similar — Mistral achieves the highest d' but the lowest M-ratio.
Empirical comparison of Type-1 sensitivity (d') and metacognitive efficiency (M-ratio) across the four evaluated LLMs on the 224,000 QA trials; explicit statement that Mistral had highest d' but lowest M-ratio.
high mixed Do LLMs Know What They Know? Measuring Metacognitive Efficie... Type-1 sensitivity (d') and metacognitive efficiency (M-ratio)
The paper's primary contribution is to combine established ingredients—attention scarcity, free-entry dilution, superstar effects, and preferential attachment—into a unified framework directed at claims about AI-enabled entrepreneurship.
Stated contribution and methodological description in the paper (synthesis and applied formalisation); this is a descriptive/methodological claim rather than an empirical result.
high mixed The Economics of Builder Saturation in Digital Markets n/a (methodological contribution)
Modern pretrained time-series foundation models can forecast without task-specific training, but they do not fully incorporate economic behavior.
Statement in paper's introduction/abstract summarizing prior capabilities and limitations of pretrained time-series foundation models (no experimental sample or numeric evidence provided in the excerpt).
high mixed GARP-EFM: Improving Foundation Models with Revealed Preferen... ability of pretrained time-series models to forecast and degree to which they in...
The governance risk-mitigation effects of AI operate through increasing financial risk exposure.
Authors' mechanism tests indicate a relationship between AI adoption and changes in financial risk exposure measures, which they interpret as a channel affecting executive behavior.
high mixed The risk-mitigation effects of artificial intelligence adopt... financial risk exposure (financial risk/proxy metrics)
Organizational culture and technological readiness moderate the effectiveness of generative AI integration in decision-making processes.
The paper reports moderation effects tested in the SEM framework using survey data from senior managers, decision-makers, and AI adoption specialists (SmartPLS). No numeric moderator effect sizes or sample size provided in the excerpt.
high mixed The Strategic Impact of Generative Artificial Intelligence o... effectiveness of generative AI integration in decision-making (moderation effect...
Small language models offer privacy-preserving alternatives to frontier models, but their specialization is hindered by fragmented development pipelines that separate tool integration, data generation, and training.
Background claim stated in paper/abstract; no experimental data provided for this statement within the abstract.
high mixed EnterpriseLab: A Full-Stack Platform for developing and depl... privacy-preserving capability and ease of specialization of small LMs (vs fronti...
Extensive synthetic experiments show that policy regularizations reshape the narrative on what is the best DRL method for inventory management.
Paper states results from extensive synthetic experiments that change which DRL methods are considered best under policy regularization; abstract does not provide the experimental sample size, specific methods, or quantitative comparisons.
high mixed DeepStock: Reinforcement Learning with Policy Regularization... relative performance/ranking of DRL methods for inventory management
Implementation of human-replacing technologies leads to significant transformations in skill demand: it reduces reliance on low-skilled labour while increasing demand for qualified engineers, system operators and specialists in digital technologies.
Sector-specific analysis and review of international labour-market studies cited in the article documenting skill-biased effects of automation and digitalization; qualitative assessment for Ukraine's mining and metallurgical sector under workforce shortage conditions.
high mixed Human-replacing technologies as a driver of labour productiv... skill demand composition (shift from low-skilled to high-skilled roles)
Foreign direct investment (FDI) shows an insignificantly positive direct effect on local TFCP but a significantly negative indirect (spillover) effect, attributed to a 'pollution haven' effect.
Spatial Durbin Model estimates for FDI on panel (30 provinces, 2010–2023): direct coefficient positive but not significant; indirect coefficient significantly negative; interpretation given as pollution-haven mechanism.
high mixed Study on the impact of industrial intelligence and the digit... total factor carbon productivity (TFCP)
Industrial intelligence exhibits regional heterogeneity: a significantly negative direct effect in the east, a significantly positive direct effect in the central region, an insignificant direct effect in the west, and positive indirect (spillover) effects in the east and west.
Regional/subsample Spatial Durbin Model analyses dividing the sample into east, central, and west regions (30 provinces, 2010–2023); reported region-specific direct and indirect coefficients and significance levels.
high mixed Study on the impact of industrial intelligence and the digit... total factor carbon productivity (TFCP)
Industrial intelligence has an insignificantly negative direct effect on local TFCP, but its positive spatial spillover effect is significant at the 1% level, producing a significantly positive total effect.
Spatial Durbin Model results for industrial intelligence on panel (30 provinces, 2010–2023): direct coefficient negative and not statistically significant; indirect coefficient positive and significant at 1%; total effect positive and significant.
high mixed Study on the impact of industrial intelligence and the digit... total factor carbon productivity (TFCP)
China's TFCP rose overall from 2010 to 2023 but exhibited a widening regional gap of 'higher in the east, lower in the west'.
Panel data of 30 Chinese provincial-level regions (2010–2023); TFCP measured using an undesirable-output super-efficiency SBM model and summarized temporal and spatial patterns.
high mixed Study on the impact of industrial intelligence and the digit... total factor carbon productivity (TFCP)
The study identifies the main AI-enabled mechanisms advancing CE principles in smart manufacturing, waste valorisation, supply-chain transparency, and sustainable design.
Bibliometric network analysis of 196 peer-reviewed articles (2023–2024) and systematic review of 104 studies, per the abstract; identification is presented as a product of these analyses.
high mixed Artificial intelligence as a catalyst for the circular econo... AI-enabled mechanisms advancing circular economy principles (e.g., in smart manu...
Governmental structures, labor supply and demand, and incorporation of financial measures act as key intervening variables affecting achieved ROI from GenAI implementations.
Qualitative synthesis and theoretical analysis reported in the paper identifying contextual/intervening variables.
high mixed Measuring Business ROI of Generative AI Adoption on Azure Cl... influence of governance and labor market factors on ROI
Generative AI serves as an effective 'wingman' for employment lawyers, capable of replacing substantial junior associate work while requiring continued human expertise for client counseling, supervision, and final legal advice preparation.
Authors' synthesis of experimental results showing AI-produced substantive analysis plus discussion about remaining limitations (e.g., citation errors) and required human oversight; qualitative assertion about substitutability for junior associate tasks.
high mixed Robot Wingman: Using AI to Assess an Employment Termination potential replacement of junior associate tasks and required human oversight