The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13870 claims)

Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 196 98 892 1984
Governance & Regulation 817 394 188 121 1544
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 627 233 123 96 1088
Research Productivity 411 123 56 332 933
Output Quality 467 178 59 47 751
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 167 122 24 496
Task Allocation 207 64 71 32 379
Skill Acquisition 165 59 60 17 301
Innovation Output 203 27 43 18 292
Employment Level 105 52 107 13 279
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 150 48 26 3 227
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 63 20 12 184
Error Rate 69 92 10 2 173
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 93 21 13 19 148
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 17 7 3 59
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Zero-shot baselines and standard retrieval stagnate around 50-60% accuracy across model generations on the graduate-level final exam.
Pilot study reported on a full graduate-level final exam comparing zero-shot and standard retrieval baselines across model generations; reported accuracy range given as ~50-60%. Exact number of exam questions or models compared not stated.
high null result From 50% to Mastery in 3 Days: A Low-Resource SOP for Locali... exam accuracy (percentage correct)
Afriat's theorem guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint.
Theoretical claim citing Afriat's theorem (mathematical result used as foundational justification in the paper).
high null result GARP-EFM: Improving Foundation Models with Revealed Preferen... logical equivalence between GARP and utility-maximizing demand
We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents.
Methods described in the paper: authors report fine-tuning Chronos-2 on synthetically generated time series from utility-maximizing agents (methodological statement).
high null result GARP-EFM: Improving Foundation Models with Revealed Preferen... model fine-tuning procedure / training data source
This yields a common scale (bits of usable information) for comparing a wide range of interventions, contexts, and models.
Theoretical implication of the authors' formalization combining Bayesian persuasion and V-usable information (paper argues for a common information scale measured in bits).
high null result Mecha-nudges for Machines bits of usable information as a comparability metric
To formalize mecha-nudges, we combine the Bayesian persuasion framework with V-usable information, a generalization of Shannon information that is observer-relative.
Methodological/theoretical development described in the paper (formal combination of two theoretical frameworks).
high null result Mecha-nudges for Machines formal representation of information available to observers/agents
We introduce mecha-nudges: changes to how choices are presented that systematically influence AI agents without degrading the decision environment for humans.
Conceptual/definitional contribution made in the paper (novel concept introduced by authors).
high null result Mecha-nudges for Machines influence on AI agents while preserving human decision environment
Nudges are subtle changes to the way choices are presented to human decision-makers (e.g., opt-in vs. opt-out by default) that shift behavior without restricting options or changing incentives.
Background/definition stated in the paper (conceptual; references to standard behavioral-economics definition of nudges).
high null result Mecha-nudges for Machines behavioral response to choice presentation
Data sources include field research conducted in 2024 and public reports from the Ministry of Industry and Information Technology and the National Bureau of Statistics.
Paper statement describing data provenance: field surveys in 2024 (n=326) plus public reports from MIIT and National Bureau of Statistics.
high null result Research on the Adoption of Artificial Intelligence and Proc... data provenance / sources
The visualization avoided redistributing value.
Reported result from the within-subjects experiment (N=32) stating that the visualization did not redistribute value between parties (i.e., it improved outcomes/efficiency without changing value split).
high null result From Overload to Convergence: Supporting Multi-Issue Human-A... distribution of value between negotiating parties (value split / surplus allocat...
We conduct an in-depth case study of SWE-bench GitHub issue resolution using two representative models, GPT-5 mini and DeepSeek v3.2.
Descriptive: authors report running an in-depth case study on the SWE-bench GitHub issue resolution dataset using two named models (GPT-5 mini and DeepSeek v3.2).
high null result Computational Arbitrage in AI Model Markets execution of a case study on SWE-bench GitHub issue resolution with two named mo...
Human-like presentations did not raise conformity pressure.
Reported experimental result: manipulaton of presentation style (human-like vs not) and measurement of conformity pressure; the abstract states that human-like presentation increased perceived usefulness/agency without increasing conformity pressure. No quantitative details provided in abstract.
Larger panels yielded no gains in accuracy relative to a single AI.
Reported experimental comparison manipulating panel size in the study (three tasks). The abstract states that larger panels did not produce accuracy gains versus a single AI. (No sample size or numerical effect reported in abstract.)
The authors construct a mean-reverting jump-diffusion stochastic process model and conduct Monte Carlo simulations to evaluate hedging efficiency of the proposed futures contracts.
Methodological claim: explicit description of the mathematical model (mean-reverting jump-diffusion) and simulation method (Monte Carlo) used in the paper.
high null result AI Token Futures Market: Commoditization of Compute and Deri... hedging efficiency (as evaluated via simulation)
The paper proposes an original 'Revenue-Sharing as Infrastructure' (RSI) model in which the platform offers its AI infrastructure for free and takes a percentage of the revenues generated by developers' applications, reversing the traditional upstream payment logic.
Theoretical model proposal and conceptual description in the paper; presented as original contribution (no empirical implementation reported).
high null result Revenue-Sharing as Infrastructure: A Distributed Business Mo... business model design (revenue-sharing vs pay-upfront)
Recent literature distinguishes three generations of business models: a first generation modeled on cloud computing (pay-per-use), a second characterized by diversification (freemium, subscriptions), and a third, emerging generation exploring multi-layer market architectures with revenue-sharing mechanisms.
Literature review and conceptual synthesis presented in the paper; no empirical study or sample reported.
high null result Revenue-Sharing as Infrastructure: A Distributed Business Mo... classification of business model generations
Capital income taxes, worker equity participation, universal basic income, upskilling, and Coasian bargaining cannot eliminate the excess automation.
Model-based policy counterfactuals evaluated in the paper showing these interventions fail to achieve the social optimum in the theoretical framework; no empirical sample.
high null result The AI Layoff Trap effectiveness of listed policies at preventing excessive automation / preserving...
Wage adjustments and free entry cannot eliminate the excess automation.
Analytical result in the model showing endogenous wage changes and free entry do not restore the socially optimal level of employment; theoretical equilibrium analysis, no empirical data.
high null result The AI Layoff Trap ability of wage adjustments and free entry to correct excessive automation / res...
We analyze a regional standardized sentiment database (97,719 responses).
Dataset description in the paper specifying the size of the standardized sentiment database.
high null result Engineering Distributed Governance for Regional Prosperity: ... data sample size (sentiment responses)
We analyze a raw Fukui spending database (90,350 records).
Dataset description in the paper specifying the size of the raw Fukui spending database.
high null result Engineering Distributed Governance for Regional Prosperity: ... data sample size (spending records)
We evaluate our approach on spapi, a production in-vehicle API system at Volvo Group involving 192 endpoints, 420 properties, and 776 CAN signals across six functional domains.
Case study / evaluation dataset description (explicit counts provided in paper).
high null result LLM-Powered Workflow Optimization for Multidisciplinary Soft... evaluation dataset scale and scope (endpoints, properties, CAN signals, domains)
The analysis relies on partial least squares path modeling (PLS-PM) to test eight predictions linking technological perceptions, organizational factors, and adoption outcomes.
Author-stated analytical method: PLS-PM; eight predictions tested; uses the survey data described above.
high null result Artificial Intelligence Adoption in Talent Acquisition: Effe... analytical approach / hypothesis testing
The study uses cross-sectional survey data from 523 human resource professionals and hiring managers representing 184 organizations across multiple industries in the United States.
Author-stated sample description in the paper: cross-sectional survey; 523 HR professionals/hiring managers; 184 organizations; multiple industries; U.S.
high null result Artificial Intelligence Adoption in Talent Acquisition: Effe... sample composition / data source
Each task is evaluated under three agent configurations (no-skills, LLM-generated skills, and human-expert skills) and validated through real hardware execution.
Experimental design described in the paper specifying three agent configurations per task and hardware validation of task runs.
high null result Skilled AI Agents for Embedded and IoT Systems Development evaluation configuration and validation modality
IoT-SkillsBench spans three representative embedded platforms, 23 peripherals, and 42 tasks across three difficulty levels.
Benchmark composition statistics reported in the paper (counts of platforms, peripherals, tasks, and difficulty levels).
high null result Skilled AI Agents for Embedded and IoT Systems Development benchmark scope (platforms, peripherals, tasks, difficulty levels)
We introduce a skills-based agentic framework for HIL embedded development together with IoT-SkillsBench, a benchmark designed to systematically evaluate AI agents in real embedded programming environments.
Methodological contribution described in the paper (introduction of framework and benchmark; the paper reports design and implementation).
high null result Skilled AI Agents for Embedded and IoT Systems Development availability of a skills-based agentic framework and benchmark
The cooperative video game KeyWe, with a scripted agent, served as a valid testbed for studying human-agent teamwork and the effects of the training intervention.
Methodological choice: KeyWe was used as the experimental environment and the agent behavior was scripted for consistency; all behavioral and performance measures were collected within this setting.
high null result Teaming Up With an AI Agent: Training Humans to Develop Huma... experimental_testbed_description
Half of the participants received the teamwork training and half did not (between-subjects comparison).
Experimental design description: participants were split into trained and untrained groups (50/50).
high null result Teaming Up With an AI Agent: Training Humans to Develop Huma... experimental_assignment (trained vs. untrained)
The study observes five delivery configurations: a traditional baseline and four successive platform versions (V1–V4).
Study design described by the authors; outcomes measured across these five configurations for the three programs.
high null result Orchestrating Human-AI Software Delivery: A Retrospective Lo... delivery configuration variations (baseline, V1–V4)
The study covers three real software modernization programs: a COBOL banking migration (~30k LOC), a large accounting modernization (~400k LOC), and a .NET/Angular mortgage modernization (~30k LOC).
Study design / sample description provided by the authors in the paper's methods section.
high null result Orchestrating Human-AI Software Delivery: A Retrospective Lo... study programs and codebase sizes (lines of code)
Evidence on AI in software engineering still leans heavily toward individual task completion, while evidence on team-level delivery remains scarce.
Paper's literature-context statement (intro); asserted by the authors as motivation for the study (no primary data supporting this meta-claim provided within the study).
high null result Orchestrating Human-AI Software Delivery: A Retrospective Lo... distribution of prior evidence (individual task vs team-level delivery) in the l...
The research methodology is based on the envelope model ("input" orientation) to assess the level of transformation of labor resources and labor markets due to the spread of artificial intelligence.
Methodological statement in the paper specifying the use of an input-oriented envelope model applied to a sample of European Union countries.
high null result Artificial intelligence as a driver of economic growth: Chal... method of measurement / assessment approach
We document a systematic pattern we call the 'Intent-Source Divide' (experiential vs transactional intent is associated with different source mixes).
Labeling of the observed consistent association between query intent (experiential vs transactional) and citation-source mix in the audited dataset of Google Gemini responses.
high null result The End of Rented Discovery: How AI Search Redistributes Pow... association between query intent and source mix
We audit 1,357 grounding citations from Google Gemini across 156 hotel queries in Tokyo.
Manual audit of Google Gemini grounding citations for 156 hotel queries in Tokyo; counted 1,357 grounding citations.
high null result The End of Rented Discovery: How AI Search Redistributes Pow... number of grounding citations audited
The model yields two limits on the speed of learning and adoption: a structural limit determined by prerequisite reachability and an epistemic limit determined by uncertainty about the target.
Theoretical result stated in the paper (model-derived identification of two distinct limiting factors on learning speed).
high null result A Mathematical Theory of Understanding speed of learning / adoption
Teaching is modeled as sequential communication with a latent target.
Modeling assumption explicitly stated in the paper (formalization of teaching in the theoretical framework).
high null result A Mathematical Theory of Understanding model specification (teaching process)
The paper models the learner as a mind: an abstract learning system characterized by a prerequisite structure over concepts.
Modeling assumption explicitly stated in the paper (definition of the 'mind' in the theoretical model).
high null result A Mathematical Theory of Understanding model specification (representation of learner)
The findings provide evidence against concerns that AI mediation undermines people's ability to distinguish truth from lies.
Synthesis of experimental results showing unchanged lie-detection accuracy despite declines in perceived trust/confidence.
high null result Through the Looking-Glass: AI-Mediated Video Communication R... ability to distinguish truth from lies (lie-detection accuracy)
Participants were no more inclined to suspect those using AI tools of lying.
Experimental comparisons assessing participants' propensity to suspect AI-mediated speakers of deception showed no increase in suspicion for users of AI tools.
high null result Through the Looking-Glass: AI-Mediated Video Communication R... inclination to suspect AI-mediated speakers of lying
Participants' actual judgment accuracy (ability to detect lies) remained unchanged across AI-mediated and non-AI-mediated videos.
Primary experimental result comparing lie-detection accuracy (truthful vs deceptive statements) across the three AI mediation conditions in the preregistered experiments (N = 2,000).
high null result Through the Looking-Glass: AI-Mediated Video Communication R... judgment accuracy (lie-detection accuracy)
We conducted two preregistered online experiments (N = 2,000).
Methods statement in the paper: two preregistered online experiments with a combined sample size of 2,000 participants.
high null result Through the Looking-Glass: AI-Mediated Video Communication R... study design / sample size (methodological claim)
The study collected data from 293 questionnaire respondents and 12 interview participants.
Mixed-methods data collection reported in the paper: n=293 survey respondents and n=12 interviewees.
high null result The Impact of Artificial Intelligence on Financial Inclusion... study sample / data collection
The study synthesises findings from 36 peer-reviewed articles published between 2015 and 2025.
Systematic literature synthesis / review of peer-reviewed articles; sample = 36 articles (2015–2025) as stated in the paper.
high null result The Influence of Automation on Tax Compliance Behaviour scope of evidence base (number of articles reviewed)
This research deepens theoretical understanding by integrating CE principles, Industry 4.0 architectures, green innovation theory, and lifecycle assessment into a unified conceptual framework.
Authors' description of theoretical contribution in the abstract, based on their synthesis of the bibliometric and systematic review findings.
high null result Artificial intelligence as a catalyst for the circular econo... conceptual/theoretical integration (framework development)
This study offers the first comprehensive mixed-methods assessment of how AI transforms industrial production ecosystems in the post-ChatGPT era.
Authors' methodological/novelty claim in the abstract; supported by description of methods (bibliometric analysis of 196 articles and systematic review of 104 studies).
high null result Artificial intelligence as a catalyst for the circular econo... novelty / comprehensiveness of the study
We construct a multidimensional energy justice index to analyze AI’s net effects, pathways, and institutional dependencies.
Methodological statement: authors create an energy justice index (multidimensional) used as dependent variable in empirical analysis.
high null result Artificial intelligence adoption for advancing energy justic... multidimensional energy justice index
This study uses a panel dataset for 30 Chinese provinces from 2008 to 2022.
Statement of dataset coverage in the paper: 30 provinces, years 2008–2022 (panel data).
high null result Artificial intelligence adoption for advancing energy justic... dataset coverage (30 provinces, 2008–2022)
This study uses a mixed-method research design combining quantitative ROI modelling and cost–benefit analysis, qualitative synthesis of secondary enterprise case studies, and architectural analysis of Azure-native GenAI services.
Explicit methodological description in the abstract of the paper.
high null result Measuring Business ROI of Generative AI Adoption on Azure Cl... research design / methods
Ninety-five high-quality studies were analyzed using principal component analysis and k-means clustering.
Paper states screening produced 95 high-quality studies which were subjected to PCA and k-means clustering for analysis.
high null result AI Governance Risk Tiering for Sustainable Digital Infrastru... number of studies analyzed and analytical methods applied
A systematic literature review of 450 records from major databases was conducted using PRISMA 2020 guidelines.
Statement in the paper describing methods: systematic literature review using PRISMA 2020; initial search returned 450 records from major databases.
high null result AI Governance Risk Tiering for Sustainable Digital Infrastru... number of records screened in systematic review
This Article presents the results of an experiment in which a transcript of a hypothetical client interview involving potential disability discrimination, retaliation, and wrongful termination claims was submitted to each AI system, with prompts requesting identification and assessment of viable legal theories.
Methodological description of the experiment: one hypothetical client interview transcript fed to each of four AI engines with prompts to identify and assess legal theories.
high null result Robot Wingman: Using AI to Assess an Employment Termination experimental procedure (input and prompts)