The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (14055 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
This study analyzes 64,380 SWE-bench runs from 126 agent configurations spanning 43 frameworks, where each configuration pairs an LLM with a framework supplying tools and workflow.
Dataset and experimental design reported in the paper: 64,380 runs; 126 configurations; 43 frameworks.
high null result Same Signal, Different Semantics: A Cross-Framework Behavior... number of benchmark runs / experimental scale
The paper's contribution is an evaluation and benchmark paradigm (discipline stability / trace-based evaluation), not a new optimizer or a universal claim about MARL.
Author statement in the abstract/summary clarifying the contribution is methodological (evaluation/benchmark) rather than proposing a new optimizer or making universal claims about multi-agent RL.
high null result When Outcome Looks Right But Discipline Fails: Trace-Based E... scope of contribution (evaluation paradigm vs. optimizer/new universal claim)
The formal semantics and proof-checked admission model are specified and under active development, with evaluation of the verified core reserved for future work.
Author statement in the paper about the current development status and that evaluation of the verified core is deferred to future work.
high null result GraphFlow: An Architecture for Formally Verifiable Visual Wo... development status and lack of current evaluation
Reward is non-positive in the CybORG CAGE-2 environment, so all configurations operate in a failure-mitigation mode.
Environment specification reported in the paper (CybORG CAGE-2 modeled as a POMDP with non-positive reward structure).
high null result Context, Reasoning, and Hierarchy: A Cost-Performance Study ... sign and interpretation of reward
The evaluation spanned five model families, six models, and twelve configurations, totaling 3,475 episodes with token-level cost accounting.
Methods description in the paper reporting the experimental design and sample counts.
high null result Context, Reasoning, and Hierarchy: A Cost-Performance Study ... study scope (models, configurations, episodes)
Skills can be mapped into three categories: those AI is absorbing, those needed to work alongside AI today, and those that make humans irreplaceable tomorrow.
Conceptual taxonomy offered in the chapter, based on labour market data and workplace evidence; presented as an analytical framework rather than a quantified finding.
high null result 7. AI and the Future of Work classification of skills relative to AI impact
Fear and hype about technological transitions are temporary.
One of five lessons drawn from historical analogy and labour market history as presented in the chapter.
high null result 7. AI and the Future of Work duration of public fear/hype following technological change
Virtually every job is being touched by AI.
Stated in chapter summary; claimed on the basis of labour market data and emerging workplace evidence (no numeric sample given in excerpt).
high null result 7. AI and the Future of Work incidence of AI affecting jobs
Only 9% of jobs are fully automatable.
Reported directly in chapter; based on labour market data (specific data source and sample size not stated in the excerpt).
high null result 7. AI and the Future of Work share of jobs fully automatable
AI automates tasks, not jobs.
Conceptual argument in chapter drawing on labour market data and historical analogy; presented as a framing claim rather than a specific empirical estimate.
high null result 7. AI and the Future of Work unit of automation (tasks vs jobs)
These factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis.
Methodological observation motivating simulation/sequence-based evaluation; asserted in the paper's rationale.
high null result Designing Datacenter Power Delivery Hierarchies for the AI E... tractability of closed-form analysis for power delivery design
Higher sectoral digitalization potential (telework feasibility and digital intensity) does not significantly affect aggregate employment levels.
Difference-in-differences (DiD) analysis using the COVID-19 shock as a quasi-natural experiment on a quarterly panel for 27 EU Member States (2018–2024), N = 36,685; reported DiD coefficient = 0.06, p ≈ 0.98.
high null result Digital transformation and labor market indicators in the EU... aggregate employment levels
The study used a structured questionnaire (five-point Likert) administered to employees in AI-enabled organizations across various sectors and analyzed the data using SPSS (descriptive statistics, reliability analysis, correlation analysis, regression analysis).
Methods section summary provided in the paper (survey instrument description and analytical techniques).
high null result Opportunities and Challenges of Human- AI Collaboration in W... methodological approach / data collection and analysis procedures
The convergence properties of the explore-then-exploit pricing pipeline can be characterized via a fluid-limit ordinary differential equation (ODE) analysis.
Analytical method used in the paper: fluid-limit ODE analysis applied to the multi-firm explore-then-exploit model to study convergence.
high null result Misspecified Explore-then-Exploit Leads to Supra-Competitive... convergence behavior of prices under the pricing pipeline
Firms following an explore-then-exploit pipeline randomize prices during an initial exploration phase, then estimate demand from their own historical data and set prices myopically thereafter; the estimation relies on a misspecified, monopoly-style model that omits competitors' prices.
Model specification and assumptions described in the paper (methodological setup).
high null result Misspecified Explore-then-Exploit Leads to Supra-Competitive... pricing algorithm structure (exploration then myopic exploitation based on missp...
We evaluate PRISM across 35 enterprise conversational agents over a three-week deployment period on the Yellow.ai V3 platform.
Statement in abstract: evaluation across 35 agents over a three-week deployment on Yellow.ai V3 platform (empirical deployment described).
high null result PRISM: Prompt Reliability via Iterative Simulation and Monit... deployment evaluation sample and duration
A four-dimensional Flexibility Index is developed to assess reallocation authority, forecast cycles, AI integration, and transparency.
Methods section: construction of an index with four dimensions (reallocation authority, forecast cycles, AI integration, transparency).
high null result Budgeting for Agility: A Cross-Sectoral Analysis of Fiscal F... budget flexibility (measured via Flexibility Index)
The analysis draws on Form 10-K filings from Microsoft, Johnson & Johnson, Procter & Gamble, and ExxonMobil (2019–2023), alongside public sector data from the Open Budget Survey 2023, the OECD Budget Practices Database, and U.S. GAO oversight reports.
Methods/data section listing data sources and firm sample (four named firms, 2019–2023) and public datasets.
high null result Budgeting for Agility: A Cross-Sectoral Analysis of Fiscal F... data sources and sample composition
The study investigates the non-linear impact of AI on economic growth in 19 G20 countries (2005–2023) using the Generalized Method of Moments (GMM) with both linear and quadratic models.
Methodological description provided in the paper: panel dataset covering 19 G20 countries over 2005–2023 and estimation via GMM with linear and quadratic specifications.
The paper constructs estimators for the own-adoption, spillover, and total effects and an inference procedure that allows for spatial dependence.
Presentation of concrete estimators and an inference procedure in the paper; the inference approach explicitly accommodates spatial dependence (methodological contribution).
high null result Identification and Estimation of Staggered Difference-in-Dif... estimator definitions and inference procedure robustness to spatial dependence
Spillover effects are learned from never-treated units and evaluated for treated cohorts under the exposure distribution they face.
Methodological procedure in the paper: estimation of spillover effects using never-treated units as the source of variation, then applying those estimates to treated cohorts based on their observed exposure distributions.
high null result Identification and Estimation of Staggered Difference-in-Dif... spillover effect estimation strategy (learning from never-treated units)
Identification uses a prespecified summary of spillover exposure and parallel trends comparisons among units with the same exposure at the baseline and target dates.
Identification strategy articulated in the paper: assumption of a prespecified exposure summary and use of parallel trends comparisons conditional on equal exposure profiles at baseline and event dates.
high null result Identification and Estimation of Staggered Difference-in-Dif... identification of causal effects under specified exposure summaries and parallel...
For each treated cohort and event time, the framework separates the effect of own adoption, the spillover effect generated by other adopters, and the total effect under the realized rollout.
Analytical decomposition provided in the paper that defines separate estimands for (i) own-adoption effect, (ii) spillover effect from other adopters, and (iii) total realized effect for cohorts and event times.
high null result Identification and Estimation of Staggered Difference-in-Dif... decomposition of treatment effects into own adoption, spillover, and total effec...
The paper develops a difference-in-differences framework for staggered policy adoption when units can be affected by other units' adoption.
Theoretical development in the paper: presentation of a DID framework that explicitly allows units to be affected by other units' adoption (methodological derivation and formal description).
high null result Identification and Estimation of Staggered Difference-in-Dif... availability of an econometric framework for staggered adoption with spillovers
IIQ is positioned as a deployment-oriented measurement framework: a formal proposal for tracking AI embedding in workflows, not a direct measure of model capability or a substitute for causal productivity evaluation.
Explicit positioning statement in paper: authors state scope and limits of IIQ as deployment/usage metric rather than capability or causal productivity estimator (conceptual/positioning).
high null result Intelligence Impact Quotient (IIQ): A Framework for Measurin... scope/limitations (not measuring model capability or causal productivity)
Sources were selected purposively through explicit inclusion and exclusion criteria tied to conceptual relevance, scholarly quality, and direct contribution to framework building; higher-order categories were retained only after iterative comparison across the four literature streams.
Author-reported sampling and analytic procedure for the integrative review.
high null result RegTech-enabled governance of sanctions-safe enterprise ecos... review source selection and analytic procedure
Methodologically, the paper uses a structured integrative review combined with interpretive theory synthesis to connect literature on RegTech, sanctions compliance, institutional voids, supply chain governance, and algorithmic accountability.
Explicit methodological description in the paper (authors' stated approach).
high null result RegTech-enabled governance of sanctions-safe enterprise ecos... methodological approach used
Existing studies on regulatory technology mainly present it as a firm-level compliance tool, giving little attention to its role in shaping coordination across wider enterprise ecosystems in post-conflict and sanctions-affected settings.
Review finding based on purposive selection and comparison of literature on RegTech and related fields (method: structured integrative review and interpretive theory synthesis).
high null result RegTech-enabled governance of sanctions-safe enterprise ecos... scope of RegTech literature (firm-level focus vs ecosystem coordination)
The study uses World Bank Enterprise Survey firm-level data from 2007 to 2024 and employs feasible generalized least squares (FGLS), robust ordinary least squares (OLS), and high-dimensional fixed effects (HDFE) linear regression techniques.
Direct methodological statement in the paper's abstract/summary. This is a descriptive factual claim about data and methods.
high null result Estimation of Firm Labour Productivity and Sales Growth from... data source and econometric methods
AI deployment has limited effects on retrial rates.
Same randomized field experiment; retrial rates (repeat customer contacts) were measured and reported as showing limited/no substantive change under AI deployment.
high null result Agentic AI and Human-in-the-Loop Interventions: Field Experi... retrial rates (repeat contact rate)
The findings are based on India-focused samples.
Paper explicitly notes the sample/context is India-focused.
high null result Enhancing Forensic Accounting Practice: A Proactive Risk Man... geographic scope of sample
PRIF was developed and validated using mixed-method design: interviews with 30 risk advisors, case studies, and analysis of 30 forensic reports, with validation via thematic coding, risk metrics, and Delphi panel refinement.
Reported methods in the paper: mixed-method design including 30 risk advisor interviews and analysis of 30 forensic reports; validation methods named (thematic coding, risk metrics, Delphi panel).
high null result Enhancing Forensic Accounting Practice: A Proactive Risk Man... methodological validation and sample description
Five structural characteristics define the Metis AI zone: consequential irreversibility, relational irreducibility, normative open texture, adversarial co-evolution, and accountability anchoring.
Theoretical specification and definition of five characteristics grounded in social science, philosophy, and humanitarian practice; no empirical prevalence or measurement reported.
high null result Metis AI: The Overlooked Middle Zone Between AI-Native and W... defining properties of Metis tasks
The dominant discourse on AI limitations frames the boundary of AI capability as a divide between digital tasks (where AI excels) and physical tasks (where embodiment is required).
Statement in paper framing prevailing discourse; conceptual observation rather than empirical test (literature critique). No sample size reported.
high null result Metis AI: The Overlooked Middle Zone Between AI-Native and W... framing of AI capability boundary
Including the 2020-2021 COVID-19 lockdowns allows leveraging the pandemic to isolate structural inequalities from transient market shocks.
Design choice: use of data spanning 2016–2021, including pandemic lockdown period, to separate persistent structural disparities from short-term shock effects.
high null result The Broken Shield of European Palliative Care: Evidence from... Ability to distinguish structural inequalities from transient shocks using pre/p...
Neither survey nor transcript-based measures of participation equity improved under LLM facilitation (an "illusion of inclusion").
Quantitative survey measures and transcript-based analyses of participation equity (e.g., measures of turn-taking, speaking/typing share) showed no improvement in equity metrics for facilitated conditions compared to controls across the experiments.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... participation equity (survey and transcript-derived measures of participation ba...
Across both studies, LLM facilitation did not significantly improve group consensus.
Experimental comparison across the two studies (total N=879) measuring agreement/consensus metrics for groups randomized to LLM facilitation versus other facilitators or no facilitation; reported null effect on consensus.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... group consensus (agreement level among group members)
Study 2 (N=675) compares facilitator strategies against a no-facilitation baseline.
Study 2 comprised N=675 participants (groups of three) randomized to different LLM facilitation strategies and a no-facilitation control.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... comparison of facilitation strategies vs no-facilitation
Study 1 (N=204) compares three frontier LLMs as facilitators.
Study 1 comprised N=204 participants (groups of three) randomized to facilitator conditions comparing three frontier language models.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... comparison of facilitator LLM models
We present two empirical studies (N=879) of real-time, text-based group deliberation in an incentive-compatible charity allocation task with real financial stakes ($7,200 USD).
Two online experiments involving real-time, text-based group deliberation. Total participants N=879 in groups of three; total monetary stakes for the charity allocation task equal $7,200 USD.
high null result Real-Time Group Dynamics with LLM Facilitation: Evidence fro... experiment setup (incentive-compatible charity allocation, total stakes $7,200 U...
The study used a qualitative interpretivist research design drawing on semistructured interviews with 28 managers and professionals from 12 organizations across technology, finance and knowledge-intensive service sectors in Europe and Asia, using thematic and interpretive analysis supported by organizational document review.
Methodology statement from the paper (explicit description of sample, sectors, regions and analytic approach).
high null result Reimagining work in the age of intelligent automation: a qua... research design and sample characteristics
AI should be conceptualized as a co-evolving organizational capability rather than a deterministic technology.
Argument developed from interpretive analysis of interview data (n=28), literature engagement and organizational document review.
high null result Reimagining work in the age of intelligent automation: a qua... conceptual framing of AI within organizations
The study develops an emergent framework of AI–human co-adaptation comprising three interrelated dimensions: technological alignment, cognitive calibration and ethical anchoring.
Framework derived from thematic/interpretive analysis of interview data (n=28) and supporting organizational documents.
high null result Reimagining work in the age of intelligent automation: a qua... dimensions of AI–human co-adaptation
The paper introduces the concept of 'augmented work agency' as a multi-level, interpretive form of human agency in algorithmically mediated environments.
Conceptual development within the paper grounded in literature review and qualitative interview data (28 participants) and organizational document review.
high null result Reimagining work in the age of intelligent automation: a qua... agency, control and coordination in algorithmic workplaces
This study used a three-wave lagged survey design with 381 valid matched employees from knowledge-intensive firms in China.
Methods statement in paper reporting study design and sample composition: three-wave lagged survey and 381 valid matched employee responses from knowledge-intensive Chinese firms.
high null result The impact of generative artificial intelligence (GenAI) usa... study sample and design (methodological description)
The overall impact of prompt design on readability remains limited.
Reported results from prompt-dimension experiments indicating that while some prompt elements influence readability, the aggregate effect size of prompt engineering on overall readability was limited.
high null result The Readability Spectrum: Patterns, Issues, and Prompt Effec... overall_effect_of_prompt_design_on_readability
Current LLMs produce code with overall readability comparable to human-written code.
Comparison of readability scores (from the paper's readability model) between LLM-generated code and human-written code across 5,869 scenarios; reported summary conclusion that overall readability is comparable.
high null result The Readability Spectrum: Patterns, Issues, and Prompt Effec... code_readability (overall/readability score)
The analysis proceeded through within-case coding and cross-case pattern matching across five dimensions: intelligence source, AI mechanism, decision domain, economic implication, and boundary condition.
Method section describing coding and analytical procedures applied to the archival corpus across the four cases.
high null result Artificial Intelligence Enabled Competitive Intelligence as ... analytic method (coding and cross-case pattern matching across specified dimensi...
The empirical corpus comprises annual reports, 10-K filings, earnings releases, and official corporate materials published mainly between 2024 and 2026, complemented by recent peer-reviewed literature.
Paper's data description listing document types and time window for archival evidence; number of documents not enumerated.
high null result Artificial Intelligence Enabled Competitive Intelligence as ... composition and timeframe of empirical corpus (document types and years)
The study adopts a qualitative comparative multiple-case design using four theoretically sampled cases: Walmart, Unilever, Sprinklr, and DoubleVerify.
Methodological statement in the paper describing case selection and study design.
high null result Artificial Intelligence Enabled Competitive Intelligence as ... study design and sample (case selection)