The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (14055 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
We re-recruited 530 participants from 52 countries two years after they gave their preferences in the PRISM dataset to evaluate personalised and non-personalised language models in blinded multi-turn conversations (large-scale within-subject experiment).
Study methodology reported in paper: within-subject experiment, re-recruitment of 530 participants from 52 countries, blinded multi-turn conversations comparing models.
high null result PRISM-X: Experiments on Personalised Fine-Tuning with Human ... experimental sample composition and study design
Personalisation is a standard feature of conversational AI systems used by millions; yet, the efficacy of personalisation methods is often evaluated in academic research using simulated users rather than real people.
Authors' literature and field observation stated in introduction; contextual claim about common practice in academic evaluations (no numeric experiment reported for this claim).
high null result PRISM-X: Experiments on Personalised Fine-Tuning with Human ... prevalence of simulation-based evaluation in academic research
The study includes Natural Language Processing (NLP) analysis of 5 million consumer contacts.
Methodological statement in the paper specifying the NLP data volume.
high null result Augmented Intelligence: Resolving the AI integration-obsoles... methodological sample (NLP consumer contacts)
The study includes surveys of 800 marketers.
Methodological statement in the paper specifying the survey sample size.
high null result Augmented Intelligence: Resolving the AI integration-obsoles... methodological sample (marketer survey)
The study includes AI adoption audits from 120 organizations.
Methodological statement in the paper specifying the audits sample size.
high null result Augmented Intelligence: Resolving the AI integration-obsoles... methodological sample (AI adoption audits)
LLM-generated solutions contain roughly the same number of ideas as participant-generated solutions.
Comparative analysis of idea counts within solutions reported in the paper; phrased as 'roughly the same number of ideas' (no numeric effect size provided in the abstract).
high null result "Like Taking the Path of Least Resistance": Exploring the Im... number of ideas per solution
The findings are consolidated via the AI Engineering Integration Framework and the Skills Transition Risk Matrix, which provide guidelines for strategically harnessing AI while safeguarding the Engineering profession.
Paper reports development of two conceptual/practical tools (framework and matrix) as outputs of the study; no validation details provided in abstract.
high null result The AI-engineering imperative - Navigating synergy and obsol... existence of the AI Engineering Integration Framework and Skills Transition Risk...
Case studies were performed covering five major industries.
Paper's reported methodology (number of case studies stated in abstract).
high null result The AI-engineering imperative - Navigating synergy and obsol... number of industry case studies
A Delphi study was conducted with 40 global experts.
Paper's reported methodology (Delphi sample explicitly stated in abstract).
high null result The AI-engineering imperative - Navigating synergy and obsol... Delphi panel size (experts consulted)
A comprehensive mixed-methods study was conducted, incorporating a survey of 320 organizations.
Paper's reported methodology (survey sample explicitly stated in abstract).
high null result The AI-engineering imperative - Navigating synergy and obsol... survey sample size (organizations surveyed)
AwareLLM was evaluated in a user study with 20 participants, compared to a standard LLM assistant across multiple tasks.
Experimental methods statement in paper; explicitly reports a user study and sample size.
high null result AwareLLM: A Proactive Multimodal Ecosystem for Personalized ... evaluation study (design)
Using an agent-based simulation of a multi-SKU convenience store environment, the study evaluates deployment efficiency, inventory responsiveness, and managerial cognitive reallocation.
Methodological claim: the paper reports an agent-based simulation experiment in a multi-SKU convenience store context; details such as number of simulations, parameter settings, or statistical results are not provided in the excerpt.
high null result From Configuration to Cognition: A Self-Configuring Agentic ... deployment efficiency; inventory responsiveness; managerial cognitive reallocati...
The dominant paradigm for AI agents is an "on-the-fly" loop in which agents synthesize plans and execute actions within seconds or minutes in response to user prompts.
Statement in paper presenting a characterization of current AI agent design; conceptual/observational claim with no empirical data or sample reported.
Persistent data gaps—especially concerning worker-level outcomes, informal labor, and non-Anglophone markets—warrant urgent research investment.
Authors' assessment based on scope of included studies and acknowledged limitations in observation windows and geographic/labor-form coverage.
high null result Creation, validation, obsolescence: observed evidence of AI-... availability of data on worker-level outcomes, informal labor, and non-Anglophon...
Following PRISMA 2020 guidelines, we systematically searched six academic databases (Scopus, Web of Science, EconLit, SSRN, IEEE Xplore, Google Scholar) for empirical studies documenting observed—not predicted—labor market changes since 2020; from 1,847 initial records, 94 studies meeting inclusion criteria were retained for qualitative synthesis and 42 for quantitative data extraction.
Methods: systematic literature search following PRISMA 2020 across six named databases; initial records = 1,847; retained = 94 for qualitative synthesis, 42 for quantitative extraction.
high null result Creation, validation, obsolescence: observed evidence of AI-... systematic_review_search_and_screen_counts (initial records; studies retained)
We thematically analysed twelve semi-structured interviews with SME owners and managers conducted in early 2025 using Atlas.ti, yielding 19 codes grouped into six categories.
Methods statement in the paper describing qualitative sample and analysis procedures.
high null result Artificial Intelligence, Social Capital, and Sustainable Emp... qualitative_analysis_results (codes/categories)
We examine the interplay between AI adoption, social capital formation, workforce dynamics, and sustainable development in Eastern Macedonia and Thrace (EMT), one of the EU's least developed regions.
Study context and scope as stated in the paper; empirical work conducted in EMT.
high null result Artificial Intelligence, Social Capital, and Sustainable Emp... regional_AI_adoption_and_social_capital_interplay
Research has concentrated on advanced urban economies, leaving the implications of AI for peripheral small and medium-sized enterprises (SMEs) operating under weak human capital, thin digital infrastructure, and constrained social capital — underexplored.
Statement in the paper contrasting existing research focus (advanced urban economies) with a lack of attention to peripheral SMEs; no empirical sample size for this bibliographic claim reported in the excerpt.
high null result Artificial Intelligence, Social Capital, and Sustainable Emp... research_coverage_of_peripheral_SMEs
Once functional deployment and operational investment are controlled for, worker-task use is not associated with employment declines.
Multivariate regression results reported in the paper using BTOS AI supplement data showing the coefficient on worker-task use becomes statistically indistinguishable from zero after controlling for functional deployment and operational investment; exact model details and sample size not provided in excerpt.
high null result The Microstructure of AI Diffusion: Evidence from Firms, Bus... association between worker-task AI use and employment change conditional on othe...
This study conducts an empirical analysis using data on industrial robots from the International Federation of Robotics (IFR) and panel data from 14 sub-sectors of China's manufacturing industry.
Statement in paper describing data and methods: use of IFR robot data combined with panel data covering 14 manufacturing sub-sectors (panel regression framework implied).
high null result Research on the impact of industrial robot application on th... data and sample composition (use of IFR robot data and panel of 14 sub-sectors)
The location of the Pareto frontier depends only on population characteristics, utility functions and the fairness score, but not on the technical design of the algorithm — the findings hold for pre-processing, in-processing, and post-processing approaches alike.
Theoretical proof/argument demonstrating that the Pareto frontier characterization is a function of distributions, utilities and fairness metric, independent of algorithmic implementation approach (pre-, in-, post-processing).
high null result Fairness vs Performance: Characterizing the Pareto Frontier ... dependence of Pareto frontier location on algorithmic design
Under the Brier score specifically, with type-independent inflation cost, the second-best welfare equals the first-best welfare (welfare equivalence).
Analytical result/proof specialized to the Brier score and the assumption of type-independent inflation costs; comparative welfare analysis in the model.
high null result The Endogeneity of Miscalibration: Impossibility and Escape ... principal welfare (second-best vs. first-best) under Brier scoring and type-inde...
Return forecasts are translated into long–short portfolios to assess economic performance.
Stated evaluation approach: conversion of predicted returns into long–short portfolios for economic/performance assessment.
high null result Optimizing stock market prediction and stock trading strateg... economic performance of long–short portfolios constructed from forecasts
The analysis is based on 30 market, liquidity, valuation, profitability, technical and risk factors and compares linear models, tree-based machine learning and deep learning architectures (including GRU, LSTM and Transformer) within a rolling-window forecasting framework.
Description of empirical design: use of 30 factor variables and explicit listing of model families (linear, tree-based, GRU, LSTM, Transformer) and use of a rolling-window forecasting setup.
high null result Optimizing stock market prediction and stock trading strateg... model comparison across 30 factors within rolling-window forecasting
We introduce the weighted evaluation index (WEI), a finance-specific performance metric that integrates prediction accuracy with market adaptability.
Methodological contribution stated in the paper: introduction of a new performance metric called WEI described as integrating accuracy and market adaptability.
high null result Optimizing stock market prediction and stock trading strateg... performance evaluation metric (WEI)
We introduce the Diff-RMSE method for nonlinear factor identification.
Methodological contribution stated in the paper: introduction of a new method named 'Diff-RMSE' for identifying nonlinear factors.
high null result Optimizing stock market prediction and stock trading strateg... method for nonlinear factor identification
The study uses A-share market data from 2013 to 2024 with equity and firm-characteristic data available from databases such as RESSET and CSMAR for more than 5,000 listed firms.
Empirical dataset description in the paper: time period 2013–2024, sources named (RESSET, CSMAR), and statement 'more than 5,000 listed firms'.
high null result Optimizing stock market prediction and stock trading strateg... dataset coverage (time span and number of firms)
The synthesis covers research and practitioner guidance from the years 2023–2025.
Methods statement specifying the temporal scope of sources used for the synthesis.
This paper synthesizes recent research and practitioner guidance (2023–2025) to develop a practical model for designing human–AI collaboration in the financial reporting function (controllership).
Methods section declaration describing scope and approach (literature/practitioner guidance synthesis covering 2023–2025).
high null result Collaborative Intelligence in Accounting: A Human + AI Compl... organizational_efficiency
We conducted a controlled experiment comparing traditional task-splitting methods with AI-assisted approaches using GitLab Duo.
Methodological statement in the paper reporting a controlled experiment using GitLab Duo; sample size not stated in the provided summary.
high null result Splitting User Stories Into Tasks with AI -- A Foe or an All... method comparison (experimental design)
We audited 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central.
Direct data collection and audit described in the paper: dataset of 111,000,000 references from 2,500,000 papers across the four named preprint/repository sources.
high null result LLM hallucinations in the wild: Large-scale evidence from no... number of references audited / dataset coverage
Future research should test these findings across different institutional contexts, particularly European economies.
Paper's stated limitations and suggestions for future research.
high null result The Inverted-U Relationship Between AI and Corporate Innovat... recommendation for external validation across contexts
The analysis employs fixed-effects models, U-tests, bootstrap mediation, and patent text similarity analysis.
Methods statement listing econometric and text-analytic techniques used in the paper.
The study uses a sample of 25,204 firm-year observations from Chinese A-share manufacturing companies (2010–2023).
Paper statement of sample and period; descriptive sample construction (firm-year observations = 25,204).
The empirical analysis is based on Chinese A–share listed firms observed from 2012 to 2024 and uses a difference‑in‑differences (DID) identification strategy.
Study description in the paper's methods/abstract specifying sample period (2012–2024), population (Chinese A–share listed firms), and methodology (DID).
high null result Government-Guided Funds and Corporate Digital–Intelligent Tr... study design / data sample
We validate the framework empirically on five benchmarks (MATH, MMLU, TriviaQA, SimpleQA, LiveCodeBench) across eight models from five providers.
Empirical experiments reported in the paper using five named datasets and eight models from five providers (experimental evaluation / benchmarking).
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... empirical validation of theoretical framework via experiments on benchmarks
For k-model cascades, first-order conditions imply a single shadow price that equalizes marginal quality-per-cost across stage boundaries.
Analytical derivation of first-order conditions for k-stage cascades within the decision-theoretic constrained-optimization framework presented in the paper.
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... marginal quality-per-cost equality across cascade stages (first-order optimality...
Given a pool of k models, the frontier achievable by deterministic two-model threshold cascades is the pointwise envelope over choose(k,2) pairwise cascades, with switching points where the optimal pair changes.
Theoretical characterization/derivation in the paper (mathematical result about deterministic two-model threshold cascades and combinatorial envelope over pairwise cascades).
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... achievable cost-quality frontier for a k-model pool under deterministic two-mode...
Reciprocal shadow prices link the budget-constrained and quality-constrained formulations of the cascade optimization.
Analytical derivation in the decision-theoretic framework using constrained optimization and duality presented in the paper.
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... relationship between budget- and quality-constrained optimization formulations (...
For a two-model cascade, the cost-quality frontier is piecewise concave on decreasing-benefit regions of the confidence support.
Theoretical development in a decision-theoretic framework using constrained optimization and duality; proven properties for the two-model case reported in the paper (analytical result).
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... shape of the cost-quality frontier (concavity properties) for two-model cascades
These results are robust to alternative model specifications, including different lag lengths and forecast horizons.
Robustness checks reported in the paper: re-estimation of TVP-VAR with alternative lag lengths and forecast horizons producing consistent qualitative results.
high null result Artificial Intelligence and Financial Market Connectedness: ... stability of connectedness findings across model specifications
The emergence of generative AI is not associated with a uniform increase in financial connectedness.
Empirical TVP-VAR analysis comparing connectedness measures before and after the emergence of generative AI (paper compares connectedness over the sample period and reports no uniform increase).
high null result Artificial Intelligence and Financial Market Connectedness: ... level of financial connectedness
This study uses daily data from January 2021 to December 2025 to analyze spillover dynamics among AI-related equities, cryptocurrencies, and traditional financial assets within a time-varying parameter vector autoregression (TVP-VAR) framework.
Statement of data frequency and sample period plus description of methodology (TVP-VAR) in the paper; empirical analysis applied to specified asset groups.
high null result Artificial Intelligence and Financial Market Connectedness: ... spillover dynamics / connectedness among asset classes
Under standard smoothness and finite variance conditions, SGD is minimax optimal for finding stationary points measured by l2-norms, thereby fundamentally precluding any complexity gains for sign-based methods in standard settings.
Theoretical statement based on prior minimax optimality results for SGD under standard smoothness and finite-variance assumptions (as cited/used in the paper). No new experiment; relies on worst-case lower-bound theory.
high null result When and Why SignSGD Outperforms SGD: A Theoretical Study Ba... minimax optimality for finding l2-norm stationary points (optimization complexit...
The boundaries (critical thresholds) separating the tax regimes are derived from the workers' budget constraint.
Analytic derivation in the paper showing that constraints coming from the workers' budget constraint produce critical values of τ_ai and τ_f that determine transitions between the three regimes.
high null result The Economic Singularity: Core Mathematical Model critical_thresholds for tax parameters
The model features quadratic self-amplification in both AI capability (λ A^2) and financial capital (γ_F K_f^2), coupled through investment flows.
Model specification and equations in the paper showing terms λ A^2 for AI capability growth and γ_F K_f^2 for financial capital growth, with explicit investment flow terms linking AI and financial capital.
high null result The Economic Singularity: Core Mathematical Model model_dynamics (self-amplification terms)
The study uses a panel dataset of 35,347 firm-year observations from 2010 to 2023.
Reported sample description in the paper: panel dataset covering 2010–2023 with 35,347 firm-year observations.
high null result When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... N/A (sample description)
AI-assisted decision-making paradigms do not have a significant direct effect on task performance.
Experimental study of 59 pre-service teachers using a two-factor mixed design (between-subjects: AI-assisted decision-making paradigms; within-subjects: human-AI consistency). Data analyzed with Bayesian cumulative link mixed model and structural equation modeling; authors report no significant direct effect.
In the U.S., no single 'AI Act' has passed (as of 2026).
Stated in the paper as a factual legal/policy status; this is verifiable via legislative records and is presented without an underlying sample (paper cites status as of 2026).
high null result Emerging AI Trends passage of a comprehensive federal 'AI Act' in the U.S.
The authors ran a within-subjects study comparing authoring AD from scratch against editing AI drafts of varying quality.
Explicit methodological statement in the paper (within-subjects study design); sample size not reported in the excerpt.
high null result Making AI Drafts Count: A Quality Threshold in Audio Descrip... comparison of authoring modes