The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13661 claims)

Adoption
8339 claims
Productivity
7479 claims
Governance
6715 claims
Human-AI Collaboration
6267 claims
Org Design
4098 claims
Innovation
3987 claims
Labor Markets
3488 claims
Skills & Training
2888 claims
Inequality
2016 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 740 192 95 871 1945
Governance & Regulation 796 388 185 119 1512
Organizational Efficiency 765 186 123 82 1166
Technology Adoption Rate 610 227 121 95 1061
Research Productivity 409 121 56 331 928
Output Quality 464 174 58 47 743
Decision Quality 318 173 75 42 615
Firm Productivity 432 55 88 20 601
AI Safety & Ethics 214 273 65 33 589
Market Structure 175 165 120 24 489
Task Allocation 206 64 70 31 376
Skill Acquisition 161 57 57 16 291
Innovation Output 201 27 41 18 288
Fiscal & Macroeconomic 130 69 43 26 275
Employment Level 104 50 105 13 274
Consumer Welfare 116 62 42 11 231
Firm Revenue 149 45 26 3 223
Inequality Measures 43 120 49 6 218
Task Completion Time 164 29 8 12 214
Worker Satisfaction 89 60 20 12 181
Error Rate 69 89 9 2 169
Regulatory Compliance 74 67 14 4 159
Training Effectiveness 91 19 13 19 144
Wages & Compensation 77 33 25 6 141
Team Performance 86 17 27 9 140
Automation Exposure 49 50 22 12 136
Developer Productivity 91 17 14 5 128
Job Displacement 12 80 19 1 112
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 16 7 2 57
Skill Obsolescence 5 43 6 1 55
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
AI integration significantly enhances customer satisfaction.
Paper reports statistically significant positive association between AI integration and customer satisfaction using System GMM and robustness checks (no details on customer satisfaction measurement or sample size in the supplied text).
AI integration significantly enhances risk-adjusted returns.
Reported empirical results using System GMM with FE and RE robustness checks; the paper states statistical significance but does not provide effect magnitudes in the supplied summary.
AI integration significantly enhances operational efficiency.
Same empirical analysis using System GMM, with FE and RE models for robustness (no sample size or numeric estimates provided in the supplied text).
AI integration significantly enhances return on assets (ROA).
Empirical analysis reported in the paper using System Generalized Method of Moments (System GMM) estimator, with Fixed Effects (FE) and Random Effects (RE) models used as robustness checks. (No sample size or test statistics provided in the text supplied.)
Synthesizing evidence, the paper identifies gaps and opportunities in current responsible AI research: (1) to engage with the diverse range of levers that influence organizations to abandon AI development, and (2) to better support appropriate engagement or disengagement with AI system development.
Synthesis and discussion section combining the taxonomy and empirical case analysis to produce research agenda and recommendations.
high positive To Build or Not to Build? Factors that Lead to Non-Developme... research and practical opportunities to influence AI development decisions
Decisions taken in earlier stages of development shape which systems are ultimately released, representing potential points for intervention to influence AI deployment outcomes.
Conceptual argument supported by the paper's taxonomy and case analyses showing pre-deployment factors that lead to abandonment.
high positive To Build or Not to Build? Factors that Lead to Non-Developme... influence of early-stage development decisions on eventual system release/abando...
Academic responsible AI communities often emphasize ethical risks as reasons to not develop AI.
Observation from the scoping review and literature synthesis comparing academic emphases with other sources.
high positive To Build or Not to Build? Factors that Lead to Non-Developme... frequency/degree of emphasis on ethical risks in responsible AI academic literat...
The authors collected data on real-world cases of AI system abandonment via an AI incident database and a practitioner survey to evidence and compare factors that drive abandonment both prior to and following system deployment.
Empirical data collection described in the paper: use of an AI incident database and a practitioner survey; summary does not report sample sizes or survey response counts.
high positive To Build or Not to Build? Factors that Lead to Non-Developme... observed drivers of AI abandonment in real-world cases (pre- vs post-deployment)
Through thematic analysis of reviewed sources, the paper develops a taxonomy of six categories of factors contributing to AI abandonment: ethical concerns, stakeholder feedback, development lifecycle challenges, organizational dynamics, resource constraints, and legal/regulatory concerns.
Qualitative thematic analysis of the scoping review materials, resulting taxonomy enumerated in the paper; number of documents/sources not stated in the summary provided.
high positive To Build or Not to Build? Factors that Lead to Non-Developme... categorization of factors driving AI abandonment
The authors performed a scoping review of academic literature, civil society resources, and grey literature (including journalism and industry reports) to identify factors influencing AI abandonment.
Methods statement in the paper describing a systematic scoping review of multiple source types; no numeric sample size reported in the summary.
high positive To Build or Not to Build? Factors that Lead to Non-Developme... scope and composition of reviewed sources
The paper reframes AI safety as layered control, authorization, and externally reviewable limits, linking alignment, security engineering, organizational economics, and institutional design.
Synthesis and prescriptive claim based on the paper's theoretical analysis and proposed framework; supported by conceptual integration rather than empirical testing.
high positive AI Safety as Control of Irreversibility: A Systems Framework... safety governance approach (layered controls and limits)
The main result is a boundary stabilization theorem showing that safety need not require proving that advanced systems are always correct; instead it requires institutional and technical designs that prevent irreversible power from being released by a single high-efficiency node.
Formal/theoretical claim presented as the paper's primary theorem (a 'boundary stabilization theorem') demonstrated within the paper's formal model.
high positive AI Safety as Control of Irreversibility: A Systems Framework... safety (effectiveness of layered controls vs. proof-of-correctness)
The index diverges sharply from existing AI exposure measures for specific occupation groups: power plant operators, railroad conductors, and aircraft cargo handling supervisors score high on RL feasibility but low on general AI exposure.
Empirical comparison between the RL Feasibility Index and existing AI-exposure measures, with named occupation groups showing opposite rankings.
high positive What Jobs Can AI Learn? Measuring Exposure by Reinforcement ... relative RL feasibility vs. general AI exposure for named occupations
Using LLM annotators guided by a rubric developed with RL experts and validated against confirmed deployment cases, we score all 17,951 O*NET tasks for training feasibility and aggregate to the occupation level, producing an RL Feasibility Index.
Empirical method described in paper: LLM-based annotation process guided by expert-developed rubric; validation against confirmed deployment cases; explicit enumeration of 17,951 O*NET tasks scored and aggregated into an index.
high positive What Jobs Can AI Learn? Measuring Exposure by Reinforcement ... training feasibility of O*NET tasks; RL Feasibility Index at task and occupation...
We examine this for every occupation in the US economy.
Statement of study scope in the paper (methodological claim about coverage).
high positive What Jobs Can AI Learn? Measuring Exposure by Reinforcement ... coverage of US occupations in the RL feasibility analysis
The no-talk baseline establishes that communication is necessary.
Experimental no-talk baseline showing worse coordination without communication between agents.
high positive Talk is Cheap, Communication is Hard: Dynamic Grounding Fail... coordination performance with vs without communication
These results highlight dynamic grounding as a critical and understudied axis of multi-agent coordination.
Synthesis/interpretation of the experimental findings reported in the paper.
high positive Talk is Cheap, Communication is Hard: Dynamic Grounding Fail... importance of dynamic grounding for multi-agent coordination
We introduce an iterated, multi-turn negotiation game in which two agents allocate shared resources toward private projects with verifiable jointly optimal outcomes.
Methodological contribution described in the paper (design of a new multi-turn negotiation game).
high positive Talk is Cheap, Communication is Hard: Dynamic Grounding Fail... existence of a multi-turn negotiation benchmark with verifiable optimal outcomes
Grounding is the collaborative process of establishing mutual belief sufficient for the current communicative purpose.
Conceptual/definitional statement presented by the authors (no empirical data reported).
The frontier for AI-augmented science is not acceleration; it is the redesign of the certifying infrastructure around these new scarcities.
Prescriptive conclusion in the paper arguing priority of institutional redesign over mere speed gains; presented without empirical testing in the excerpt.
high positive AI-Augmented Science and the New Institutional Scarcities prioritization of redesigning certifying infrastructure versus accelerating scie...
Competent-looking judgment, including selecting, ranking, attributing, and certifying, is now produced at scale at marginal cost approaching zero, inverting the dominant economics-of-AI reading that treats judgment as the scarce complement to cheap prediction.
Argumentative/theoretical claim in the paper; no empirical sample, experiment, or quantitative data reported in the excerpt (implicit basis: observation of scalable AI outputs).
high positive AI-Augmented Science and the New Institutional Scarcities production of competent-looking judgment (selecting, ranking, attributing, certi...
Policy recommendations: invest in digital infrastructure, human capital development, and inclusive technology diffusion strategies to ensure more equitable distribution of AI-driven economic value.
Policy implications drawn from study findings (heterogeneous effects and mediation by structural conditions).
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... equitable distribution of AI-driven economic value (policy interventions)
The magnitude of AI's growth effects varies across economic contexts: developed economies experience substantially stronger growth impacts (approximately 0.33) than emerging economies (approximately 0.15).
Heterogeneity analysis / subgroup comparisons (developed vs emerging economies) using the panel data regressions and/or quantile regressions on the 2015–2024 dataset; exact sample sizes per subgroup not reported.
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... economic growth (heterogeneous treatment effects by country group)
AI adoption has a comparatively weaker direct effect on economic growth (direct effect β = 0.09).
Mediation/structural decomposition from the paper showing direct (non-mediated) coefficient from AI adoption to growth.
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... economic growth (direct effect)
Agentic AI influences economic growth primarily through a productivity channel (mediated effect β = 0.35, p < 0.01).
Mediation analysis (panel data) estimating indirect effect of AI adoption on GDP growth via measured productivity channel; data sources: World Bank and OECD indicators, 2015–2024.
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... economic growth (mediated via productivity)
AI adoption significantly improves firm-level productivity (β = 0.18, p < 0.01).
Fixed-effects panel regression using an AI Adoption Index as predictor on firm-level productivity; data drawn from World Bank (World Development Indicators and Enterprise Surveys) and OECD AI indicators for 2015–2024 (sample size not reported in text).
Agentic AI has strong potential to boost productivity and growth.
Statement in paper motivated by literature review and the study's empirical results linking AI adoption to productivity and growth.
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... productivity and economic growth (general)
HAAS can serve as a pre-deployment workbench for comparing and inspecting human–AI allocation policies before organisational commitment.
Claim about intended use and demonstration of HAAS as an implemented tool; based on the framework implementation and benchmark experiments reported. No deployment-scale evaluation or sample sizes provided in the excerpt.
high positive HAAS: A Policy-Aware Framework for Adaptive Task Allocation ... ability to compare and inspect allocation policies prior to deployment
In manufacturing, stronger governance can improve operational performance and reduce fatigue simultaneously — a workload-buffering effect.
Domain-specific empirical result reported for the manufacturing benchmark in the paper, comparing operational performance and fatigue under different governance strengths. No numeric sample size or effect sizes provided in the excerpt.
high positive HAAS: A Policy-Aware Framework for Adaptive Task Allocation ... operational performance and worker fatigue
Task–agent fit is represented through five auditable cognitive dimensions and a five-mode autonomy spectrum (from human-only to fully autonomous) embedded in a reproducible benchmark spanning software engineering and manufacturing.
Design and benchmark description within the paper; specification of five cognitive dimensions and a five-mode autonomy spectrum and a reproducible benchmark across two domains. No numeric sample size provided.
high positive HAAS: A Policy-Aware Framework for Adaptive Task Allocation ... representation of task–agent fit and benchmarking across domains
HAAS combines a rule-based expert system that enforces governance constraints before any learning occurs, and a contextual-bandit learner that selects among feasible collaboration modes from outcome feedback.
Descriptive claim about the implemented HAAS framework as presented in the paper; method description of system architecture (rule-based expert system + contextual-bandit learner). No sample size reported.
high positive HAAS: A Policy-Aware Framework for Adaptive Task Allocation ... mechanism for adaptive task allocation (selected collaboration mode)
The field's near-term research agenda should explicitly include collecting and using triadic data.
Normative recommendation in the paper; presented as the authors' advised research priority rather than empirically justified within the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... inclusion of triadic data collection/use in near-term research agendas in the SW...
This data is the empirical key to four open questions in agent training.
Argumentative claim in the paper asserting centrality of triadic data to addressing unspecified four open research questions; no empirical demonstration included in the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... resolvability of four open questions in agent training using triadic data
This triadic data is capturable in 12-18 months with methods already mature in adjacent fields.
Claim in the paper based on authors' assessment of methodological maturity in adjacent fields; no empirical project timeline or pilot data is provided in the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... time required to collect a triadic dataset using existing methods
Any such corpus -- triadic or otherwise -- must justify its quality to a fine-tuning researcher through a four-tier evidence framework: mechanical verification, statistical corpus characterization, probe experiments, and pre-registered blind evaluation.
Methodological proposal in the paper outlining a four-tier evidence framework; presented as normative guidance rather than validated by application to a corpus in the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... quality and trustworthiness of fine-tuning corpora as judged by the four-tier fr...
The canonical instantiation of triadic data is two complementary products: long-horizon expert trajectories captured under stimulated-recall protocols, and simulated cross-functional companies -- instrumented teams of senior engineers, product managers, designers, and data scientists working through ambiguous deliverables on shared infrastructure.
Prescriptive specification in the paper proposing two concrete dataset types as canonical instantiations; presented as design/recommendation rather than empirically tested.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... availability and suitability of dataset modalities (stimulated-recall expert tra...
The substrate for the next generation of software-engineering (SWE) agents is neither larger GitHub scrapes nor more solo-agent trajectories nor -- sufficient by itself -- open human-AI dialogue logs; it is triadic data: synchronized capture of the human-human conversations where engineering context is formed, the human-AI sessions where that context is partially consumed, and the multi-week cross-functional work that surrounds both.
Argument and conceptual proposal in the paper; no empirical validation or comparative experiments are provided in the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... effectiveness of training data substrates for improving agent performance on lon...
SCDPs are a useful framework for policy simulation for the digital economy, mechanism design for information systems, and digital twin modeling of cyberinfrastructure.
Paper posits these applications as prospective uses of the framework (argumentative/speculative; no empirical evaluation reported in abstract).
high positive The Design and Composition of Structural Causal Decision Pro... usefulness for policy simulation, mechanism design, and digital twin modeling
SCDPs are capable of modeling variable discounting, a tool used widely in social scientific modeling.
Paper states the capability as part of SCDP definition and examples (theoretical claim).
high positive The Design and Composition of Structural Causal Decision Pro... modeling of variable discounting
An SCDP can endogenously model the memory-formation process and is thus useful for modeling resource‑rational agents in dynamic settings.
Paper asserts SCDP can represent memory-formation endogenously and discusses application to resource-rational agents (theoretical modeling capability).
high positive The Design and Composition of Structural Causal Decision Pro... ability to model endogenous memory formation / resource-rational agents
SCDPs are strictly more expressive than POMDPs because they do not assume rational belief formation.
Comparative expressiveness claim stated in the paper; supported by theoretical argument or formal separation result (paper text states the claim explicitly).
high positive The Design and Composition of Structural Causal Decision Pro... expressiveness relative to POMDPs (ability to represent non-rational belief form...
SCDPs inherit the composition properties of SCDMs (i.e., SCDPs benefit from SCDM composability).
Logical consequence argued in the paper from SCDP being constructed from SCDMs; likely supported by formal argumentation in the text.
high positive The Design and Composition of Structural Causal Decision Pro... inheritance of composability by SCDPs
A Structural Causal Decision Process (SCDP) is defined as a recurring SCDM with a discount variable.
Formal definition introduced in the paper (theoretical definition).
high positive The Design and Composition of Structural Causal Decision Pro... definition of SCDP as recurring SCDM with discounting
SCDMs have a well-defined and computationally useful property of composability.
Paper states and demonstrates ("We show") composability property — presumably via formal proofs or constructive arguments in the text (theoretical proofs/exposition).
high positive The Design and Composition of Structural Causal Decision Pro... composability of causal decision models
SCDMs can have open root variables for which no probability distribution or structural equation is given.
Model definitions in the paper explicitly allow open root variables (theoretical description).
high positive The Design and Composition of Structural Causal Decision Pro... support for open root variables in model formalism
In SCDMs, agent decisions can be constrained by their causal antecedents (i.e., decisions can be constrained by their causal parents).
Model specification and definitions in the paper describing constraints on decisions as part of SCDM structure (theoretical construction).
high positive The Design and Composition of Structural Causal Decision Pro... decision constraints by causal antecedents
Structural Causal Decision Models (SCDMs) expand on Structural Causal Influence Models by explicitly representing the causal relationships between model variables and the payoffs of agent decisions.
Formal model development and comparison to existing SCIMs provided in the paper (theoretical definitions and arguments).
high positive The Design and Composition of Structural Causal Decision Pro... explicit representation of causal relationships between variables and payoffs
We present two new classes of causal models of decision-making agents: Structural Causal Decision Models (SCDMs) and Structural Causal Decision Processes (SCDPs).
Paper introduces formal definitions for two model classes and describes their properties in the text (theoretical exposition).
high positive The Design and Composition of Structural Causal Decision Pro... introduction of new model classes (SCDMs and SCDPs)
We propose PAEF (Production Agentic Evaluation Framework), a five-dimension evaluation framework with an open-source reference implementation, designed for continuous evaluation on production traffic rather than episodic benchmark runs.
Author contribution: design and open-source implementation of PAEF described in the paper.
high positive Evaluating Agentic AI in the Wild: Failure Modes, Drift Patt... provision of a continuous, production-focused evaluation framework (PAEF)
The taxonomy and its failure modes are grounded in observations from systems operating at billion-event scale.
Author statement that observations underlying the taxonomy come from systems operating at billion-event scale.
high positive Evaluating Agentic AI in the Wild: Failure Modes, Drift Patt... empirical grounding (scale) of observations used to derive the taxonomy