The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (8625 claims)

Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 761 200 101 904 2020
Governance & Regulation 829 400 191 122 1566
Organizational Efficiency 784 193 125 84 1197
Technology Adoption Rate 637 236 124 97 1103
Research Productivity 431 131 58 340 972
Output Quality 481 183 59 47 770
Decision Quality 332 177 82 49 647
Firm Productivity 439 57 88 20 610
AI Safety & Ethics 218 279 66 33 602
Market Structure 181 170 123 24 503
Task Allocation 214 64 72 33 388
Skill Acquisition 174 62 62 17 315
Innovation Output 204 27 45 18 295
Employment Level 105 54 108 13 282
Fiscal & Macroeconomic 132 69 43 26 277
Consumer Welfare 117 63 42 11 233
Firm Revenue 154 48 26 3 231
Task Completion Time 173 31 8 12 225
Inequality Measures 44 123 50 6 223
Worker Satisfaction 89 65 22 12 188
Error Rate 71 92 10 2 175
Regulatory Compliance 77 69 14 5 165
Automation Exposure 58 56 26 13 156
Training Effectiveness 96 21 14 19 152
Wages & Compensation 77 37 25 6 145
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 81 21 1 115
Hiring & Recruitment 52 7 8 3 70
Creative Output 32 20 8 3 64
Skill Obsolescence 5 47 6 1 59
Social Protection 28 16 8 2 54
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Adoption Remove filter
Behavioral findings from any single framework therefore warrant cross-configuration validation before being claimed as general.
Prescriptive conclusion derived from the observed cross-configuration heterogeneity in the paper's empirical results.
high null result Same Signal, Different Semantics: A Cross-Framework Behavior... validity/generalizability of behavioral findings across agent configurations
Framework identity accounts for more of the between-configuration variation than LLM family: for mean turns, framework explains 64% of the between-configuration variance against the LLM's 10%.
Variance decomposition / explained-variance analysis reported for 'mean turns' across configurations (reported percentages: 64% vs 10%).
high null result Same Signal, Different Semantics: A Cross-Framework Behavior... mean turns (average number of turns per task)
The analysis separates framework effects from LLM effects by holding each layer fixed in turn and measures one behavior–outcome effect per configuration to examine agreement across configurations.
Methods description in the paper: experimental design holding LLM or framework fixed to disentangle effects.
high null result Same Signal, Different Semantics: A Cross-Framework Behavior... behavior–outcome effects per configuration (methodological approach)
This study analyzes 64,380 SWE-bench runs from 126 agent configurations spanning 43 frameworks, where each configuration pairs an LLM with a framework supplying tools and workflow.
Dataset and experimental design reported in the paper: 64,380 runs; 126 configurations; 43 frameworks.
high null result Same Signal, Different Semantics: A Cross-Framework Behavior... number of benchmark runs / experimental scale
The paper's contribution is an evaluation and benchmark paradigm (discipline stability / trace-based evaluation), not a new optimizer or a universal claim about MARL.
Author statement in the abstract/summary clarifying the contribution is methodological (evaluation/benchmark) rather than proposing a new optimizer or making universal claims about multi-agent RL.
high null result When Outcome Looks Right But Discipline Fails: Trace-Based E... scope of contribution (evaluation paradigm vs. optimizer/new universal claim)
These factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis.
Methodological observation motivating simulation/sequence-based evaluation; asserted in the paper's rationale.
high null result Designing Datacenter Power Delivery Hierarchies for the AI E... tractability of closed-form analysis for power delivery design
Higher sectoral digitalization potential (telework feasibility and digital intensity) does not significantly affect aggregate employment levels.
Difference-in-differences (DiD) analysis using the COVID-19 shock as a quasi-natural experiment on a quarterly panel for 27 EU Member States (2018–2024), N = 36,685; reported DiD coefficient = 0.06, p ≈ 0.98.
high null result Digital transformation and labor market indicators in the EU... aggregate employment levels
The convergence properties of the explore-then-exploit pricing pipeline can be characterized via a fluid-limit ordinary differential equation (ODE) analysis.
Analytical method used in the paper: fluid-limit ODE analysis applied to the multi-firm explore-then-exploit model to study convergence.
high null result Misspecified Explore-then-Exploit Leads to Supra-Competitive... convergence behavior of prices under the pricing pipeline
Firms following an explore-then-exploit pipeline randomize prices during an initial exploration phase, then estimate demand from their own historical data and set prices myopically thereafter; the estimation relies on a misspecified, monopoly-style model that omits competitors' prices.
Model specification and assumptions described in the paper (methodological setup).
high null result Misspecified Explore-then-Exploit Leads to Supra-Competitive... pricing algorithm structure (exploration then myopic exploitation based on missp...
The paper constructs estimators for the own-adoption, spillover, and total effects and an inference procedure that allows for spatial dependence.
Presentation of concrete estimators and an inference procedure in the paper; the inference approach explicitly accommodates spatial dependence (methodological contribution).
high null result Identification and Estimation of Staggered Difference-in-Dif... estimator definitions and inference procedure robustness to spatial dependence
Spillover effects are learned from never-treated units and evaluated for treated cohorts under the exposure distribution they face.
Methodological procedure in the paper: estimation of spillover effects using never-treated units as the source of variation, then applying those estimates to treated cohorts based on their observed exposure distributions.
high null result Identification and Estimation of Staggered Difference-in-Dif... spillover effect estimation strategy (learning from never-treated units)
Identification uses a prespecified summary of spillover exposure and parallel trends comparisons among units with the same exposure at the baseline and target dates.
Identification strategy articulated in the paper: assumption of a prespecified exposure summary and use of parallel trends comparisons conditional on equal exposure profiles at baseline and event dates.
high null result Identification and Estimation of Staggered Difference-in-Dif... identification of causal effects under specified exposure summaries and parallel...
For each treated cohort and event time, the framework separates the effect of own adoption, the spillover effect generated by other adopters, and the total effect under the realized rollout.
Analytical decomposition provided in the paper that defines separate estimands for (i) own-adoption effect, (ii) spillover effect from other adopters, and (iii) total realized effect for cohorts and event times.
high null result Identification and Estimation of Staggered Difference-in-Dif... decomposition of treatment effects into own adoption, spillover, and total effec...
The paper develops a difference-in-differences framework for staggered policy adoption when units can be affected by other units' adoption.
Theoretical development in the paper: presentation of a DID framework that explicitly allows units to be affected by other units' adoption (methodological derivation and formal description).
high null result Identification and Estimation of Staggered Difference-in-Dif... availability of an econometric framework for staggered adoption with spillovers
IIQ is positioned as a deployment-oriented measurement framework: a formal proposal for tracking AI embedding in workflows, not a direct measure of model capability or a substitute for causal productivity evaluation.
Explicit positioning statement in paper: authors state scope and limits of IIQ as deployment/usage metric rather than capability or causal productivity estimator (conceptual/positioning).
high null result Intelligence Impact Quotient (IIQ): A Framework for Measurin... scope/limitations (not measuring model capability or causal productivity)
Sources were selected purposively through explicit inclusion and exclusion criteria tied to conceptual relevance, scholarly quality, and direct contribution to framework building; higher-order categories were retained only after iterative comparison across the four literature streams.
Author-reported sampling and analytic procedure for the integrative review.
high null result RegTech-enabled governance of sanctions-safe enterprise ecos... review source selection and analytic procedure
Methodologically, the paper uses a structured integrative review combined with interpretive theory synthesis to connect literature on RegTech, sanctions compliance, institutional voids, supply chain governance, and algorithmic accountability.
Explicit methodological description in the paper (authors' stated approach).
high null result RegTech-enabled governance of sanctions-safe enterprise ecos... methodological approach used
Existing studies on regulatory technology mainly present it as a firm-level compliance tool, giving little attention to its role in shaping coordination across wider enterprise ecosystems in post-conflict and sanctions-affected settings.
Review finding based on purposive selection and comparison of literature on RegTech and related fields (method: structured integrative review and interpretive theory synthesis).
high null result RegTech-enabled governance of sanctions-safe enterprise ecos... scope of RegTech literature (firm-level focus vs ecosystem coordination)
The study uses World Bank Enterprise Survey firm-level data from 2007 to 2024 and employs feasible generalized least squares (FGLS), robust ordinary least squares (OLS), and high-dimensional fixed effects (HDFE) linear regression techniques.
Direct methodological statement in the paper's abstract/summary. This is a descriptive factual claim about data and methods.
high null result Estimation of Firm Labour Productivity and Sales Growth from... data source and econometric methods
Neither survey nor transcript-based measures of participation equity improved under LLM facilitation (an "illusion of inclusion").
Quantitative survey measures and transcript-based analyses of participation equity (e.g., measures of turn-taking, speaking/typing share) showed no improvement in equity metrics for facilitated conditions compared to controls across the experiments.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... participation equity (survey and transcript-derived measures of participation ba...
Across both studies, LLM facilitation did not significantly improve group consensus.
Experimental comparison across the two studies (total N=879) measuring agreement/consensus metrics for groups randomized to LLM facilitation versus other facilitators or no facilitation; reported null effect on consensus.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... group consensus (agreement level among group members)
Study 2 (N=675) compares facilitator strategies against a no-facilitation baseline.
Study 2 comprised N=675 participants (groups of three) randomized to different LLM facilitation strategies and a no-facilitation control.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... comparison of facilitation strategies vs no-facilitation
Study 1 (N=204) compares three frontier LLMs as facilitators.
Study 1 comprised N=204 participants (groups of three) randomized to facilitator conditions comparing three frontier language models.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... comparison of facilitator LLM models
We present two empirical studies (N=879) of real-time, text-based group deliberation in an incentive-compatible charity allocation task with real financial stakes ($7,200 USD).
Two online experiments involving real-time, text-based group deliberation. Total participants N=879 in groups of three; total monetary stakes for the charity allocation task equal $7,200 USD.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... experiment setup (incentive-compatible charity allocation, total stakes $7,200 U...
We re-recruited 530 participants from 52 countries two years after they gave their preferences in the PRISM dataset to evaluate personalised and non-personalised language models in blinded multi-turn conversations (large-scale within-subject experiment).
Study methodology reported in paper: within-subject experiment, re-recruitment of 530 participants from 52 countries, blinded multi-turn conversations comparing models.
high null result PRISM-X: Experiments on Personalised Fine-Tuning with Human ... experimental sample composition and study design
Personalisation is a standard feature of conversational AI systems used by millions; yet, the efficacy of personalisation methods is often evaluated in academic research using simulated users rather than real people.
Authors' literature and field observation stated in introduction; contextual claim about common practice in academic evaluations (no numeric experiment reported for this claim).
high null result PRISM-X: Experiments on Personalised Fine-Tuning with Human ... prevalence of simulation-based evaluation in academic research
Using an agent-based simulation of a multi-SKU convenience store environment, the study evaluates deployment efficiency, inventory responsiveness, and managerial cognitive reallocation.
Methodological claim: the paper reports an agent-based simulation experiment in a multi-SKU convenience store context; details such as number of simulations, parameter settings, or statistical results are not provided in the excerpt.
high null result From Configuration to Cognition: A Self-Configuring Agentic ... deployment efficiency; inventory responsiveness; managerial cognitive reallocati...
The dominant paradigm for AI agents is an "on-the-fly" loop in which agents synthesize plans and execute actions within seconds or minutes in response to user prompts.
Statement in paper presenting a characterization of current AI agent design; conceptual/observational claim with no empirical data or sample reported.
Persistent data gaps—especially concerning worker-level outcomes, informal labor, and non-Anglophone markets—warrant urgent research investment.
Authors' assessment based on scope of included studies and acknowledged limitations in observation windows and geographic/labor-form coverage.
high null result Creation, validation, obsolescence: observed evidence of AI-... availability of data on worker-level outcomes, informal labor, and non-Anglophon...
Following PRISMA 2020 guidelines, we systematically searched six academic databases (Scopus, Web of Science, EconLit, SSRN, IEEE Xplore, Google Scholar) for empirical studies documenting observed—not predicted—labor market changes since 2020; from 1,847 initial records, 94 studies meeting inclusion criteria were retained for qualitative synthesis and 42 for quantitative data extraction.
Methods: systematic literature search following PRISMA 2020 across six named databases; initial records = 1,847; retained = 94 for qualitative synthesis, 42 for quantitative extraction.
high null result Creation, validation, obsolescence: observed evidence of AI-... systematic_review_search_and_screen_counts (initial records; studies retained)
We thematically analysed twelve semi-structured interviews with SME owners and managers conducted in early 2025 using Atlas.ti, yielding 19 codes grouped into six categories.
Methods statement in the paper describing qualitative sample and analysis procedures.
high null result Artificial Intelligence, Social Capital, and Sustainable Emp... qualitative_analysis_results (codes/categories)
We examine the interplay between AI adoption, social capital formation, workforce dynamics, and sustainable development in Eastern Macedonia and Thrace (EMT), one of the EU's least developed regions.
Study context and scope as stated in the paper; empirical work conducted in EMT.
high null result Artificial Intelligence, Social Capital, and Sustainable Emp... regional_AI_adoption_and_social_capital_interplay
Research has concentrated on advanced urban economies, leaving the implications of AI for peripheral small and medium-sized enterprises (SMEs) operating under weak human capital, thin digital infrastructure, and constrained social capital — underexplored.
Statement in the paper contrasting existing research focus (advanced urban economies) with a lack of attention to peripheral SMEs; no empirical sample size for this bibliographic claim reported in the excerpt.
high null result Artificial Intelligence, Social Capital, and Sustainable Emp... research_coverage_of_peripheral_SMEs
Once functional deployment and operational investment are controlled for, worker-task use is not associated with employment declines.
Multivariate regression results reported in the paper using BTOS AI supplement data showing the coefficient on worker-task use becomes statistically indistinguishable from zero after controlling for functional deployment and operational investment; exact model details and sample size not provided in excerpt.
high null result The Microstructure of AI Diffusion: Evidence from Firms, Bus... association between worker-task AI use and employment change conditional on othe...
This study conducts an empirical analysis using data on industrial robots from the International Federation of Robotics (IFR) and panel data from 14 sub-sectors of China's manufacturing industry.
Statement in paper describing data and methods: use of IFR robot data combined with panel data covering 14 manufacturing sub-sectors (panel regression framework implied).
high null result Research on the impact of industrial robot application on th... data and sample composition (use of IFR robot data and panel of 14 sub-sectors)
Return forecasts are translated into long–short portfolios to assess economic performance.
Stated evaluation approach: conversion of predicted returns into long–short portfolios for economic/performance assessment.
high null result Optimizing stock market prediction and stock trading strateg... economic performance of long–short portfolios constructed from forecasts
The analysis is based on 30 market, liquidity, valuation, profitability, technical and risk factors and compares linear models, tree-based machine learning and deep learning architectures (including GRU, LSTM and Transformer) within a rolling-window forecasting framework.
Description of empirical design: use of 30 factor variables and explicit listing of model families (linear, tree-based, GRU, LSTM, Transformer) and use of a rolling-window forecasting setup.
high null result Optimizing stock market prediction and stock trading strateg... model comparison across 30 factors within rolling-window forecasting
We introduce the weighted evaluation index (WEI), a finance-specific performance metric that integrates prediction accuracy with market adaptability.
Methodological contribution stated in the paper: introduction of a new performance metric called WEI described as integrating accuracy and market adaptability.
high null result Optimizing stock market prediction and stock trading strateg... performance evaluation metric (WEI)
We introduce the Diff-RMSE method for nonlinear factor identification.
Methodological contribution stated in the paper: introduction of a new method named 'Diff-RMSE' for identifying nonlinear factors.
high null result Optimizing stock market prediction and stock trading strateg... method for nonlinear factor identification
The study uses A-share market data from 2013 to 2024 with equity and firm-characteristic data available from databases such as RESSET and CSMAR for more than 5,000 listed firms.
Empirical dataset description in the paper: time period 2013–2024, sources named (RESSET, CSMAR), and statement 'more than 5,000 listed firms'.
high null result Optimizing stock market prediction and stock trading strateg... dataset coverage (time span and number of firms)
Future research should test these findings across different institutional contexts, particularly European economies.
Paper's stated limitations and suggestions for future research.
high null result The Inverted-U Relationship Between AI and Corporate Innovat... recommendation for external validation across contexts
The analysis employs fixed-effects models, U-tests, bootstrap mediation, and patent text similarity analysis.
Methods statement listing econometric and text-analytic techniques used in the paper.
The study uses a sample of 25,204 firm-year observations from Chinese A-share manufacturing companies (2010–2023).
Paper statement of sample and period; descriptive sample construction (firm-year observations = 25,204).
The empirical analysis is based on Chinese A–share listed firms observed from 2012 to 2024 and uses a difference‑in‑differences (DID) identification strategy.
Study description in the paper's methods/abstract specifying sample period (2012–2024), population (Chinese A–share listed firms), and methodology (DID).
high null result Government-Guided Funds and Corporate Digital–Intelligent Tr... study design / data sample
We validate the framework empirically on five benchmarks (MATH, MMLU, TriviaQA, SimpleQA, LiveCodeBench) across eight models from five providers.
Empirical experiments reported in the paper using five named datasets and eight models from five providers (experimental evaluation / benchmarking).
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... empirical validation of theoretical framework via experiments on benchmarks
For k-model cascades, first-order conditions imply a single shadow price that equalizes marginal quality-per-cost across stage boundaries.
Analytical derivation of first-order conditions for k-stage cascades within the decision-theoretic constrained-optimization framework presented in the paper.
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... marginal quality-per-cost equality across cascade stages (first-order optimality...
Given a pool of k models, the frontier achievable by deterministic two-model threshold cascades is the pointwise envelope over choose(k,2) pairwise cascades, with switching points where the optimal pair changes.
Theoretical characterization/derivation in the paper (mathematical result about deterministic two-model threshold cascades and combinatorial envelope over pairwise cascades).
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... achievable cost-quality frontier for a k-model pool under deterministic two-mode...
Reciprocal shadow prices link the budget-constrained and quality-constrained formulations of the cascade optimization.
Analytical derivation in the decision-theoretic framework using constrained optimization and duality presented in the paper.
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... relationship between budget- and quality-constrained optimization formulations (...
For a two-model cascade, the cost-quality frontier is piecewise concave on decreasing-benefit regions of the confidence support.
Theoretical development in a decision-theoretic framework using constrained optimization and duality; proven properties for the two-model case reported in the paper (analytical result).
high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... shape of the cost-quality frontier (concavity properties) for two-model cascades
In the U.S., no single 'AI Act' has passed (as of 2026).
Stated in the paper as a factual legal/policy status; this is verifiable via legislative records and is presented without an underlying sample (paper cites status as of 2026).
high null result Emerging AI Trends passage of a comprehensive federal 'AI Act' in the U.S.