The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (3470 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Org Design Remove filter
The paper uses a comprehensive longitudinal dataset comprising tens of millions of users from a leading Chinese video-sharing platform.
Statement in paper summarizing data source: a longitudinal dataset covering 'tens of millions of users' from a major Chinese video-sharing platform; used for descriptive and comparative analyses of creation and consumption behavior.
high null result Scale over Preference: The Impact of AI-Generated Content on... dataset coverage (number of users observed)
These chats were committed to public repositories as part of routine development, capturing in-the-wild behavior.
Data collection method: analysis of chat transcripts that were committed to public repositories (authors state collected from repos and describe them as routine commits).
high null result Programming by Chat: A Large-Scale Behavioral Analysis of 11... degree to which collected chats represent in-the-wild developer behavior (public...
We analyze 74,998 developer messages from 11,579 chat sessions across 1,300 repositories and 899 developers using Cursor and GitHub Copilot.
Reported dataset counts in the paper (message, session, repository, developer counts) drawn from public commit histories of chats.
high null result Programming by Chat: A Large-Scale Behavioral Analysis of 11... number of developer messages / chat sessions / repositories / developers analyze...
Conventional microeconomic models often treat interactions between algorithmic platforms and workers as static principal-agent problems.
Literature statement in paper (conceptual framing / literature review); no empirical sample reported.
high null result THE RED QUEEN in the DASHBOARD: CO-EVOLUTIONARY DYNAMICS of ... characterization of theoretical models (static principal-agent framing)
The study sample comprises 21,428 firm-year observations from Chinese A-share listed manufacturing companies over 2010–2022.
Data description provided in the paper's abstract/introduction specifying the sample frame and time period.
high null result Artificial Intelligence Innovation, Internal Structure Optim... sample composition (firm-year observations)
This paper employs a staggered difference-in-differences (DID) model using data from Chinese A-share listed manufacturing companies from 2012 to 2023 and uses the National Artificial Intelligence Innovative Application Pioneer Zone (AIIAPZ) policy as a quasi-natural experiment.
Staggered DID empirical design; sample described as Chinese A-share listed manufacturing firms, 2012–2023; AIIAPZ policy used as treatment assignment (quasi-natural experiment).
high null result Does Artificial Intelligence Improve the Operational Resilie... methodological design / identification strategy (use of staggered DID and policy...
This study uses semi-structured interviews with 10 practitioners to examine perceptions of collaborating with human versus AI teammates.
Methods statement in the paper: semi-structured interviews; sample size explicitly reported as 10 practitioners.
high null result Bridging the Socio-Emotional Gap: The Functional Dimension o... methodological description (data collection approach)
The study is based on a qualitative analysis of recent academic literature, comparative analysis of sector-specific applications of Big Data technologies, and synthesis of empirical findings from international studies using a systemic and structural analysis approach.
Methodological statement within the paper describing data sources and analytic approach; not an empirical claim about outcomes.
high null result Implications of Big Data Technologies for the Resilience of ... methodological approach (literature synthesis, comparative analysis, systemic/st...
Society 5.0 and Industry 5.0 call for human-centric technology integration, but the concept lacks an operational definition that can be measured, optimized, or evaluated at the firm level.
Motivating claim grounded in literature gap analysis presented in the paper (argument that normative frameworks lack formal, operational metrics at firm level).
high null result From Automation to Augmentation: A Framework for Designing H... operationalizability/measurability of 'human-centricity' at firm level
We propose the Workplace Augmentation Design Index (WADI), a 36-item theory-grounded instrument for diagnosing human-centricity at the firm level.
Instrument design/proposal presented in the paper (36 items mapped to the five workplace-design dimensions); no validation sample reported in the abstract.
high null result From Automation to Augmentation: A Framework for Designing H... diagnosis/measurement of firm-level human-centric workplace design
We conducted a PRISMA-guided systematic review of 120 papers (screened from 6,096 records) to map the evidence base for each workplace-design dimension.
Systematic literature review using PRISMA protocol; final sample = 120 papers; initial records screened = 6,096.
high null result From Automation to Augmentation: A Framework for Designing H... coverage/evidence for each workplace-design dimension in the literature
Existing models of human-AI complementarity treat the augmentation function phi(D) as exogenous and thus ignore that two firms with identical technology investments can achieve radically different augmentation outcomes depending on workplace organization.
Argument based on literature review of prior models (the paper contrasts its approach with existing complementarity models). No new empirical sample reported for this specific claim.
high null result From Automation to Augmentation: A Framework for Designing H... augmentation outcomes (human-AI augmentation productivity)
A subset of four datasets included settings in which the AI provided explanations of its decision.
Paper states that four of the datasets involved AI explanations (explicitly stated in abstract).
high null result Beyond AI advice -- independent aggregation boosts human-AI ... presence_of_AI_explanation
The study compared HCT to the AI-as-advisor approach using 10 datasets from various domains, including medical diagnostics and misinformation discernment.
Paper reports an empirical comparison across 10 datasets spanning multiple domains (explicitly stated in abstract).
The hybrid confirmation tree (HCT) elicits a human judgment and an AI judgment independently; if they agree that decision is accepted, and if they disagree a second human breaks the tie.
Description of the HCT method in the paper (procedural/design specification).
We implement a rigorously controlled execution-based testbed featuring Git worktree isolation and explicit global memory to evaluate agent coordination frameworks.
Methodological description in the paper indicating the testbed design choices (Git worktree isolation, explicit global memory) used to ensure controlled, reproducible execution of agent-generated code.
high null result An Empirical Study of Multi-Agent Collaboration for Automate... experimental reproducibility and isolation (testbed design)
We benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs) using a rigorously controlled, execution-based testbed.
Description of experimental setup in the paper: an execution-based testbed with Git worktree isolation and explicit global memory; experiments explicitly compare single-agent, subagent, and agent-team architectures under fixed computational time budgets.
high null result An Empirical Study of Multi-Agent Collaboration for Automate... comparative performance of agent architectures (benchmarking setup)
Methods combine targeted literature synthesis, comparative conceptual analysis, and framework building (with recent scholarly and institutional sources reviewed).
Explicit methodological statement in the paper describing the review and analytic approach; no primary-data methods used.
high null result Behavioral Factors as Determinants of Successful Scaling of ... methodological approach (literature synthesis and conceptual framework developme...
AI coding assistants are a high-visibility class of corporate AI and are given special attention as an illustrative case in the paper.
Paper specifically calls out AI coding assistants as a focal example in the conceptual analysis and discussion; based on literature review rather than original measurement.
high null result Behavioral Factors as Determinants of Successful Scaling of ... role of coding assistants as illustrative case for scaling and behavioral dynami...
The Article translates these insights into risk-sensitive guideposts for modernizing governance of AI-enabled tools and emerging modalities, from agentic systems to blockchain-deployed smart contracts.
Prescriptive/conceptual policy guidance presented in the Article (normative recommendations; governance framework).
high null result Rewired: Reconceptualizing Legal Services for the AI Age provision of governance guideposts for AI-enabled legal technologies
The Innovation Frontier traces LegalTech’s evolution from 2000s-vintage e-discovery to generative AI.
Historical/chronological analysis in the Article (literature review/history of LegalTech provided by authors).
high null result Rewired: Reconceptualizing Legal Services for the AI Age narrative/historical scope of LegalTech evolution covered in the Article
The Legal Services Value Chain disaggregates the lifecycle of a legal matter into five distinct nodes of activity.
Model description in the Article (conceptual architecture; decomposition of legal work).
high null result Rewired: Reconceptualizing Legal Services for the AI Age number and structure of nodes in the proposed value-chain model
The Article develops two core organizing models: the Legal Services Value Chain and the Innovation Frontier.
Explicit claim in the Article describing conceptual/model contributions (theoretical/model-building).
high null result Rewired: Reconceptualizing Legal Services for the AI Age presence of two organizing conceptual models in the Article
This Article provides a practical framework for navigating the shifting terrain of legal innovation and AI.
Statement of purpose in the Article (conceptual contribution; framework development). No empirical validation reported in the excerpt.
high null result Rewired: Reconceptualizing Legal Services for the AI Age existence of a practical framework for legal-AI governance and strategy
Three interlocking threads characterize AI for science: (1) AI as research instrument, (2) AI for research infrastructure, and (3) the reshaping of scholarly profiles and incentives by machine-readable metrics.
Conceptual framework presented in the paper; organization of topics rather than empirical measurement. The paper indicates these threads are followed through historical and contemporary examples.
high null result A Brief History of AI for Scientific Discovery: Open Researc... conceptual decomposition of AI-for-science developments
The history of artificial intelligence for scientific discovery is not a two year story about chatbots learning to write papers; it is a sixty year story beginning with DENDRAL (1965).
Historical narrative / literature review citing early systems such as DENDRAL (1965) and subsequent developments in scholarly infrastructure (arXiv, Google Scholar, ORCID). No empirical sample or statistical test reported.
high null result A Brief History of AI for Scientific Discovery: Open Researc... historical scope and timeline of AI for scientific discovery
Both the positive (approach) and negative (avoidance) AI job crafting pathways failed to significantly affect life satisfaction, indicating domain specificity of AI-related psychological mechanisms.
Analysis of the same multi-source, multi-wave dataset of 287 employee–leader dyads; tests of effects on life satisfaction showed non-significant results for both pathways.
For readers less familiar with the Bayesian and decision-theoretic language, key terms are defined in a glossary at the end of the article.
Statement about the article's structure and supporting material (presence of glossary noted in the article).
high null result Retraining as Approximate Bayesian Inference availability of glossary/terminology definitions
The gap between a continuously updated belief state and your frozen deployed model is 'learning debt.'
Terminology/definition introduced by the author in the article (glossary and definitional exposition).
high null result Retraining as Approximate Bayesian Inference definition/labeling of model staleness
Model retraining is usually treated as an ongoing maintenance task.
Author's descriptive claim in the article; presented as an observation about prevailing practice (no empirical sample or data reported).
high null result Retraining as Approximate Bayesian Inference how retraining is operationalized (treated as maintenance)
The study was conducted by the Mohammed bin Rashid School of Government’s Future of Government Center, in collaboration with global AI pioneers.
Authorship and collaboration statement in the report.
high null result Charting AI Governance Future in the Arab Region: A Policy R... institutional authorship and collaboration on the study
The report highlights the key findings of a field study covering ten Arab countries to explore the realities and challenges of AI governance.
Report statement describing the geographic scope of the field study (explicitly: ten Arab countries).
high null result Charting AI Governance Future in the Arab Region: A Policy R... geographic coverage of the field study (number of countries)
The recommendations are based on regional research that included hundreds of leaders active in the AI domains, from the public and private sectors.
Report statement claiming participant base of the underlying research (described as 'hundreds of leaders').
high null result Charting AI Governance Future in the Arab Region: A Policy R... scope and participant coverage of the underlying research
Data sources include field research conducted in 2024 and public reports from the Ministry of Industry and Information Technology and the National Bureau of Statistics.
Paper statement describing data provenance: field surveys in 2024 (n=326) plus public reports from MIIT and National Bureau of Statistics.
high null result Research on the Adoption of Artificial Intelligence and Proc... data provenance / sources
The visualization avoided redistributing value.
Reported result from the within-subjects experiment (N=32) stating that the visualization did not redistribute value between parties (i.e., it improved outcomes/efficiency without changing value split).
high null result From Overload to Convergence: Supporting Multi-Issue Human-A... distribution of value between negotiating parties (value split / surplus allocat...
Human-like presentations did not raise conformity pressure.
Reported experimental result: manipulaton of presentation style (human-like vs not) and measurement of conformity pressure; the abstract states that human-like presentation increased perceived usefulness/agency without increasing conformity pressure. No quantitative details provided in abstract.
Larger panels yielded no gains in accuracy relative to a single AI.
Reported experimental comparison manipulating panel size in the study (three tasks). The abstract states that larger panels did not produce accuracy gains versus a single AI. (No sample size or numerical effect reported in abstract.)
The paper proposes an original 'Revenue-Sharing as Infrastructure' (RSI) model in which the platform offers its AI infrastructure for free and takes a percentage of the revenues generated by developers' applications, reversing the traditional upstream payment logic.
Theoretical model proposal and conceptual description in the paper; presented as original contribution (no empirical implementation reported).
high null result Revenue-Sharing as Infrastructure: A Distributed Business Mo... business model design (revenue-sharing vs pay-upfront)
Recent literature distinguishes three generations of business models: a first generation modeled on cloud computing (pay-per-use), a second characterized by diversification (freemium, subscriptions), and a third, emerging generation exploring multi-layer market architectures with revenue-sharing mechanisms.
Literature review and conceptual synthesis presented in the paper; no empirical study or sample reported.
high null result Revenue-Sharing as Infrastructure: A Distributed Business Mo... classification of business model generations
We evaluate our approach on spapi, a production in-vehicle API system at Volvo Group involving 192 endpoints, 420 properties, and 776 CAN signals across six functional domains.
Case study / evaluation dataset description (explicit counts provided in paper).
high null result LLM-Powered Workflow Optimization for Multidisciplinary Soft... evaluation dataset scale and scope (endpoints, properties, CAN signals, domains)
We document a systematic pattern we call the 'Intent-Source Divide' (experiential vs transactional intent is associated with different source mixes).
Labeling of the observed consistent association between query intent (experiential vs transactional) and citation-source mix in the audited dataset of Google Gemini responses.
high null result The End of Rented Discovery: How AI Search Redistributes Pow... association between query intent and source mix
We audit 1,357 grounding citations from Google Gemini across 156 hotel queries in Tokyo.
Manual audit of Google Gemini grounding citations for 156 hotel queries in Tokyo; counted 1,357 grounding citations.
high null result The End of Rented Discovery: How AI Search Redistributes Pow... number of grounding citations audited
This study uses a mixed-method research design combining quantitative ROI modelling and cost–benefit analysis, qualitative synthesis of secondary enterprise case studies, and architectural analysis of Azure-native GenAI services.
Explicit methodological description in the abstract of the paper.
high null result Measuring Business ROI of Generative AI Adoption on Azure Cl... research design / methods
This Article presents the results of an experiment in which a transcript of a hypothetical client interview involving potential disability discrimination, retaliation, and wrongful termination claims was submitted to each AI system, with prompts requesting identification and assessment of viable legal theories.
Methodological description of the experiment: one hypothetical client interview transcript fed to each of four AI engines with prompts to identify and assess legal theories.
high null result Robot Wingman: Using AI to Assess an Employment Termination experimental procedure (input and prompts)
The paper derives formal conditions under which the inversion (smaller, orchestrated models outperforming frontier models) holds.
Mathematical derivations and stated sufficient/necessary conditions presented in the paper.
high null result Punctuated Equilibria in Artificial Intelligence: The Instit... parameter conditions for comparative performance inversion
We develop the Institutional Fitness Manifold, a mathematical framework that evaluates AI systems along four dimensions: capability, institutional trust, affordability, and sovereign compliance.
Theoretical/model development presented in the paper (formal definition of the manifold and its four dimensions).
high null result Punctuated Equilibria in Artificial Intelligence: The Instit... institutional fitness evaluated across four dimensions
There have been five eras of AI development since 1943, and within the current Generative AI Era there are four distinct epochs, each initiated by a discontinuous event.
Descriptive/historical classification within the paper (counts of eras and epochs; named initiating events such as the transformer and the 'DeepSeek Moment').
high null result Punctuated Equilibria in Artificial Intelligence: The Instit... count and classification of historical AI eras/epochs
Despite fears of mass unemployment, aggregate labor-market data through 2025 show limited labor-market disruption from generative AI.
Review of aggregate employment and labor-market studies and macro-level data through 2025 cited in the brief; methods include analyses of employment statistics and macro labor indicators (no single sample size reported).
high null result AI, Productivity, and Labor Markets: A Review of the Empiric... aggregate employment / labor-market disruption
We scored rule-breaking and abuse outcomes with an independent rubric-based judge across 28,112 transcript segments from multi-agent governance simulations.
Reported methodology: multi-agent governance simulations with agents in formal governmental roles, outcomes evaluated by an independent rubric-based judge; explicit sample count of 28,112 transcript segments.
high null result I Can't Believe It's Corrupt: Evaluating Corruption in Multi... rule-breaking and abuse outcomes (as assessed by rubric-based judge)
Controlled experiments were run with N = 250 across five content types to validate the mechanisms.
Experimental methods reported in the paper: controlled experiments with specified sample size and content-type breakdown.
high null result Governed Memory: A Production Architecture for Multi-Agent W... experimental sample size and content-type breadth (N=250, 5 content types)