The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2340 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Org Design Remove filter
Recommended next steps for validation include controlled pilots, before-after studies on operational metrics, and cross-firm panel analyses to estimate economic impacts and risk reductions.
Authors' explicit recommendations for empirical validation in the Data & Methods and Implications sections.
high null result Governed Hyperautomation for CRM and ERP: A Reference Patter... feasibility of empirical validation designs and future measurement (research des...
There is no reported large-scale quantitative evaluation (e.g., productivity gains, cost-benefit metrics, or causal impact estimates) supporting the framework in the paper.
Explicit limitation noted by the authors stating absence of large-scale quantitative evaluation.
high null result Governed Hyperautomation for CRM and ERP: A Reference Patter... existence/absence of large-scale quantitative evaluation
The evidence base for the paper is qualitative: a synthesis of industry best practices and lessons from multi-sector enterprise implementations; methods used include conceptual framework development, architecture design, and case-based illustration.
Explicit methodological statement in the Data & Methods section of the paper.
high null result Governed Hyperautomation for CRM and ERP: A Reference Patter... type of evidence and methods used (qualitative, case-based, conceptual)
Limitation: the study analyzes national‑level formal policy texts only and does not measure enforcement, implementation outcomes, or public reactions.
Author‑stated limitations in the paper specifying scope restricted to formal policy documents and absence of empirical enforcement/compliance data.
high null result Balancing openness and security in scientific data governanc... study scope and limitations (no enforcement/implementation measurement)
The paper uses qualitative content analysis, coding documents against the four analytical dimensions to generate a comparative typology of coordination approaches.
Method description: manual qualitative coding of the 36 documents into the specified dimensions, producing the typology distinguishing Chinese and U.S. approaches.
high null result Balancing openness and security in scientific data governanc... methodological approach (qualitative content analysis / coding)
The study's empirical basis comprises 36 national‑level policy documents (18 from China; 18 from the United States) focused on scientific data governance.
Author‑reported dataset and sampling description in the Data & Methods section.
high null result Balancing openness and security in scientific data governanc... dataset size and composition (number of documents by country)
The comparative analysis is organized across four dimensions: coordination objectives, institutional actors, governance mechanisms, and stakeholder legitimacy.
Methodological design reported in the paper; documents were coded against these four analytic categories.
high null result Balancing openness and security in scientific data governanc... analytic framework / coding schema
The authors recommend empirical approaches for future work including randomized controlled trials in labs, before-after adoption studies, and collection of microdata on instrument usage, model versions, and provenance to measure impacts.
Explicit methodological recommendations in the Measurement and empirical research agenda section; these are proposals rather than executed studies.
high null result ChatMicroscopy: A Perspective Review of Large Language Model... recommended empirical metrics: throughput, cost, error rates, time-to-discovery,...
There is a need for rigorous evaluation metrics and benchmarks for safety, reproducibility, and empirical studies quantifying productivity or scientific impact of LLM-driven instrument control.
Identified research gaps and recommended empirical research agenda described by the authors; these are recommendations rather than empirical findings.
high null result ChatMicroscopy: A Perspective Review of Large Language Model... gap in evaluation infrastructure and lack of benchmarks for LLM-driven instrumen...
The evidence presented consists mainly of qualitative arguments drawn from documented advances and discussion of prototypes; no controlled experimental evaluation is presented.
Authors' own description in the Data & Methods section about the nature of evidence supporting their perspective.
high null result ChatMicroscopy: A Perspective Review of Large Language Model... availability and type of empirical evidence for claims (qualitative/prototype vs...
This paper is a conceptual perspective/review rather than an original empirical study.
Explicit statement in the Data & Methods section that the contribution is a perspective synthesizing literature and illustrative examples with no controlled experimental evaluation.
high null result ChatMicroscopy: A Perspective Review of Large Language Model... type of scholarly contribution (conceptual review)
Modern microscopes are increasingly software-driven and data-intensive, while existing ML tools for microscopy are task-specific and fragmented.
Synthesis of recent literature on optical microscopes, detectors, and task-specific ML for image analysis referenced in the perspective (descriptive claim; no new empirical data collected).
high null result ChatMicroscopy: A Perspective Review of Large Language Model... degree of software control and data volume/intensity in modern microscopy system...
Empirical validation of the book’s proposals would require complementary case studies, model documentation, and outcome measurements.
Author/reviewer recommendation in the blurb about methodological limitations and next steps; not an empirical finding.
high null result Governing The Future need for empirical case studies, documented models, and outcome metrics to valid...
The book is predominantly conceptual and policy-analytic and uses illustrative case vignettes rather than presenting a single empirical study.
Explicit methodological description in the Data & Methods blurb: synthesis of technical ideas, governance requirements, and illustrative vignettes; no empirical sample or experimental protocol described.
high null result Governing The Future presence or absence of empirical methodology in the book
The evidence base is qualitative: the study uses conceptual framework synthesis, comparative analysis of multi-sector implementations, and case examples rather than randomized or large-sample empirical evaluation.
Methods and limitations section of the paper explicitly describing the evidence base and methods (qualitative synthesis, pattern extraction, cross-case lessons).
high null result Governed Hyperautomation for CRM and ERP: A Reference Patter... type and rigor of empirical evidence supporting claims
The paper presents a deployment pattern intended to be adapted by sector and regulatory context rather than a one-size-fits-all blueprint.
Explicit statement in the paper and the described pattern design; based on qualitative pattern extraction and prescriptive guidance.
high null result Governed Hyperautomation for CRM and ERP: A Reference Patter... character of the deployment guidance (adaptable pattern vs. fixed blueprint)
Methodological claim: combining fixed-effects panel estimation, mediation analysis, and panel threshold models is an effective multi-method approach to (a) estimate average effects, (b) unpack causal channels, and (c) detect nonlinear stage-dependent impacts.
The paper's applied methodology: fixed-effects panel regressions, mediation framework, and panel threshold modeling on the 2012–2022 provincial panel.
high null result Digital rural development and agricultural green total facto... Methodological validity / estimation strategy
The paper constructs a multidimensional digitalization index composed of digital infrastructure, digital service capacity, and the digital development environment.
Index construction described in data/methods: composite indicator combining measures of connectivity/broadband (infrastructure), e-commerce/digital finance (service capacity), and policy/institutional/human capital indicators (development environment).
high null result Digital rural development and agricultural green total facto... Digitalization index components (infrastructure, service capacity, development e...
The study is observational (panel) and subject to limitations: residual confounding is possible; two-way fixed-effects estimators can be biased with heterogeneous treatment timing or dynamics; external validity beyond China and non-grain crops is not established.
Authors' stated limitations and caveats in the paper regarding identification and generalizability of results from the CLDS 2014–2018 observational panel.
high null result Whole-Process Agricultural Production Chain Management and L... study validity and generalizability (methodological limitation)
The study uses two-way fixed-effects (household and year) models as the primary identification strategy and employs propensity score matching (PSM) as a robustness check.
Methods section of the paper describing estimation strategy applied to the CLDS 2014–2018 panel of grain-producing households.
high null result Whole-Process Agricultural Production Chain Management and L... methodological approach (no substantive outcome)
The ManagerWorker two-agent pipeline (expensive text-only manager + cheaper worker with repo access) can substitute expensive execution by using expensive reasoning in the manager and cheaper execution in the worker.
System design description plus empirical results on 200 SWE-bench Lite instances showing parity in success rates between a strong-manager/weak-worker pipeline and a strong single agent while using fewer strong-model tokens.
high positive Can AI Models Direct Each Other? Organizational Structure as... ability to substitute expensive execution with expensive reasoning (operationali...
A minimal review-only manager loop adds only 2 percentage points over the baseline, whereas structured exploration and planning by the manager add 11 percentage points, demonstrating that active direction (not mere reviewing) produces most of the benefit.
Ablation-style comparison of pipeline variants on the 200-instance SWE-bench Lite evaluation: review-only manager loop versus manager with structured exploration and planning; reported improvements in percentage points.
high positive Can AI Models Direct Each Other? Organizational Structure as... improvement in task success rate (percentage-point increase)
A strong manager directing a weak worker achieves a 62% success rate on software-engineering tasks, matching a strong single agent which achieves 60%, while using a fraction of the strong-model token usage.
Empirical evaluation on 200 instances from SWE-bench Lite across five pipeline configurations and model pairings; measured task success rates and token usage for manager-worker pipelines versus single-agent baselines.
high positive Can AI Models Direct Each Other? Organizational Structure as... task success rate (percentage of tasks solved)
Overall, the HCT is a robust, accurate, and transparent alternative to the AI-as-advisor approach, offering a simple mechanism to tap into the wisdom of hybrid crowds.
Overall conclusion drawn from the empirical comparisons across datasets and analyses described in the paper (summary statement in abstract).
high positive Beyond AI advice -- independent aggregation boosts human-AI ... overall decision-making performance / robustness / transparency
Using signal detection theory, the paper finds that the HCT outperforms the AI-as-advisor approach because people cannot discriminate well enough between correct and incorrect AI advice.
Analysis in the paper applying signal detection theory to the empirical results (as stated in abstract).
high positive Beyond AI advice -- independent aggregation boosts human-AI ... discriminability between correct and incorrect AI advice (signal detection metri...
The HCT also performed better in almost all cases in which the AI offered an explanation of its judgment.
Empirical results on the subset of four datasets with AI explanations (abstract reports HCT performed better in 'almost all' of these cases).
high positive Beyond AI advice -- independent aggregation boosts human-AI ... decision accuracy when AI provides explanations
The HCT outperformed the AI-as-advisor approach in all datasets.
Empirical comparisons reported across the 10 datasets (statement in abstract that HCT 'outperformed' in all datasets). Specific performance metrics not provided in abstract.
high positive Beyond AI advice -- independent aggregation boosts human-AI ... decision accuracy / task performance
The results (conceptual/model results) support corporate GenAI policies, leadership development programs, and HR assessment of leader readiness for GenAI-enabled delegation and communication.
Practical implications and recommendations section arguing policy and HR applications based on the conceptual model.
high positive LEADER EMOTIONAL INTELLIGENCE IN THE GENERATIVE AI ERA: “HUM... policy and HR adoption/application
The article introduces an EI-driven trust-calibration framework as an explanatory mechanism showing when generative AI improves leadership effectiveness and when it amplifies managerial errors.
Novel theoretical framework developed in the paper synthesizing EI, trust calibration, and psychological safety to explain boundary conditions of AI in leadership.
high positive LEADER EMOTIONAL INTELLIGENCE IN THE GENERATIVE AI ERA: “HUM... leadership effectiveness (and amplification of managerial errors)
The paper provides an operationalization toolkit including measures: GenAI use intensity; delegation quality indices (clarity, boundaries, success criteria); communication quality indices (empathy, tone, transparency); psychological safety markers; and behavioral trust-calibration measures.
Operationalization section in the paper listing suggested indices and markers for empirical measurement.
high positive LEADER EMOTIONAL INTELLIGENCE IN THE GENERATIVE AI ERA: “HUM... measurement constructs for empirical studies (e.g., GenAI use intensity, delegat...
As a follow-up validation path, the paper proposes a two-wave time-lag design and 180° assessment (leader + subordinates) to reduce common-method bias.
Methodological proposal in the paper describing longitudinal and multi-rater validation approaches.
high positive LEADER EMOTIONAL INTELLIGENCE IN THE GENERATIVE AI ERA: “HUM... robustness/validity of empirical findings (reduction of common-method bias)
The paper proposes a 'Package B' rapid empirical design: a randomized online experiment manipulating access to generative AI in core managerial tasks (decision, delegation, team communication), combined with EI measurement and trust-calibration indicators.
Methodology section proposing the rapid randomized online experiment design as the primary empirical test.
high positive LEADER EMOTIONAL INTELLIGENCE IN THE GENERATIVE AI ERA: “HUM... experimental test of human–AI leadership effects
Emotional intelligence strengthens the positive impact of generative AI on managerial outcomes when trust is properly calibrated and psychological safety is maintained.
Conceptual model and integrative argument combining EI, trust-calibration, and psychological safety; supported by proposed empirical test design.
high positive LEADER EMOTIONAL INTELLIGENCE IN THE GENERATIVE AI ERA: “HUM... managerial outcomes (e.g., decision quality)
The paper conceptualizes human–AI leadership as an integrated managerial competence.
Conceptual modeling presented in the paper integrating EI theory, psychological safety, and trust calibration (theoretical synthesis).
high positive LEADER EMOTIONAL INTELLIGENCE IN THE GENERATIVE AI ERA: “HUM... human–AI leadership competence (integrated managerial competence)
Large language model (LLM) use can improve observable output and short-term task performance.
Paper synthesizes empirical findings from human–AI interaction studies, learning-research experiments, and model-evaluation work indicating improved produced outputs and short-term task performance when humans use LLMs; no single pooled sample size or unified effect estimate is reported in the paper.
high positive Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupli... observable output quality and short-term task performance
These empirical insights provide actionable guidelines advocating dynamically routed architectures that adapt their collaborative structures to real-time task complexity.
Authors' recommendation derived from reported empirical findings comparing architectures under varying time budgets and task complexities (prescriptive claim based on study results).
high positive An Empirical Study of Multi-Agent Collaboration for Automate... effectiveness of dynamically routed architectures in matching collaborative stru...
Given extended compute budgets, the agent team topology achieves the deep theoretical alignment necessary for complex architectural refactoring.
Empirical benchmarks run with longer/extended computational budgets showing agent teams perform better on complex architectural refactoring tasks (qualitative claim; no numeric effect sizes or sample counts provided in the abstract).
high positive An Empirical Study of Multi-Agent Collaboration for Automate... ability to perform complex architectural refactoring / depth of theoretical alig...
The subagent mode functions as a highly resilient, high-throughput search engine optimal for broad, shallow optimizations under strict time constraints.
Benchmark comparisons in the execution-based testbed under strictly fixed computational time budgets showing subagent architecture excels in throughput/resilience for broad, shallow optimization tasks (qualitative claim in paper; no numeric effect sizes provided).
high positive An Empirical Study of Multi-Agent Collaboration for Automate... search throughput/resilience and effectiveness on broad, shallow optimization ta...
Task-level analyses show that activities expanded in AI-enabled projects—particularly ideation and experimentation—are increasingly compatible with large language model capabilities, suggesting potential for future productivity gains as these technologies mature.
Task-level classification mapping tasks described in proposals to LLM-relevant capabilities using LLM-based classification; finding that tasks expanded in AI-enabled projects cluster on ideation and experimentation, which align with current LLM strengths.
high positive Artificial Intelligence in Science: Returns, Reallocation, a... frequency/expansion of specific task categories (ideation, experimentation) and ...
AI-enabled projects undertake a broader set of tasks.
Task-level analysis of proposal descriptions (task inventories) classifying tasks via keyword extraction and LLMs, showing AI-enabled proposals list a wider variety of activities than non-AI proposals.
high positive Artificial Intelligence in Science: Returns, Reallocation, a... breadth/variety of tasks undertaken in projects
AI-enabled projects involve larger teams.
Comparison of team structure in proposals (team size) between AI-enabled and non-AI projects using the same comprehensive proposal dataset and LLM-based classification of AI presence.
high positive Artificial Intelligence in Science: Returns, Reallocation, a... team size / team structure
AI-enabled projects reallocate resources toward human capital (i.e., shift budget allocations toward labor / human capital).
Analysis of detailed budget allocations in the proposal dataset, comparing projects identified as AI-enabled versus non-AI projects using keyword extraction and LLM classification to identify AI presence and role.
high positive Artificial Intelligence in Science: Returns, Reallocation, a... budget allocation share toward human capital (labor share)
In the short run, AI adoption is associated with modest improvements in scientific outcomes concentrated in the upper tail.
Observational analysis linking identified AI presence in a comprehensive dataset of research proposals (funded and unfunded) to subsequent publication outcomes; AI presence identified via keyword extraction combined with large language model (LLM) classification; publication outcomes measured after proposal submission.
high positive Artificial Intelligence in Science: Returns, Reallocation, a... subsequent publication outcomes (scientific outcomes)
Education and workforce development should shift focus from rote knowledge accumulation to cultivating skills in human-AI collaboration, creative problem-solving, and the design of novel economic domains.
Normative policy recommendation derived from the paper's framework and analysis of anticipated labor market changes (no empirical evaluation or trial data reported in the abstract).
high positive AI Civilization and the Transformation of Work educational focus / skill composition
Human-AI co-evolution will significantly increase individual productivity and open new frontiers of economic activity.
Projected outcome based on combined analysis of AI capabilities, historical patterns, and platform growth; the abstract does not report empirical measurement or sample sizes for this projection.
high positive AI Civilization and the Transformation of Work individual productivity and emergence of new economic activities
AI-driven productivity augmentation dramatically lowers the barriers to creating economic value, enabling the decentralized generation of employment.
Argument supported by paper's analysis of contemporary labor market dynamics and the growth of digital platforms; no quantified empirical estimates or sample sizes provided in the abstract.
high positive AI Civilization and the Transformation of Work barriers to entry for value creation / individual productivity
The transition to an AI-civilization will fundamentally restructure the mechanisms of employment creation from a centralized model (few organizations creating jobs for the many) to a decentralized ecosystem where individuals are empowered to generate their own employment opportunities.
Central thesis of the paper, motivated by theoretical argumentation and synthesis of contemporary data on labor markets and digital platforms (no empirical test or sample sizes specified in the abstract).
high positive AI Civilization and the Transformation of Work structure/mechanism of employment creation (centralized vs decentralized)
Historical precedents from past technological revolutions suggest that innovation tends to expand, rather than shrink, the scope of economic activity and employment in the long run.
Paper draws on analysis of economic history (qualitative historical analysis implied; no specific historical datasets or sample sizes provided in the abstract).
high positive AI Civilization and the Transformation of Work scope of economic activity and long-run employment levels
By formalizing the end-to-end transaction model together with its asset and incentive layers, EpochX reframes agentic AI as an organizational design problem focused on infrastructures where verifiable work leaves persistent, reusable artifacts and value flows support durable human-agent collaboration.
Theoretical framing and normative claim in the paper; no empirical evaluation demonstrating that this reframing yields measurable benefits.
high positive EpochX: Building the Infrastructure for an Emergent Agent Ci... organizational framing and potential for durable human-agent collaboration
Credits lock task bounties, allow budget delegation, settle rewards upon acceptance, and compensate creators when verified assets are reused.
Functional description of the credit mechanics and settlement rules within the proposed EpochX marketplace; presented as part of system design without empirical settlement or user-behavior data.
high positive EpochX: Building the Infrastructure for an Emergent Agent Ci... incentive flows, reward settlement, and compensation for asset reuse