The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (4560 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Productivity Remove filter
The history of artificial intelligence for scientific discovery is not a two year story about chatbots learning to write papers; it is a sixty year story beginning with DENDRAL (1965).
Historical narrative / literature review citing early systems such as DENDRAL (1965) and subsequent developments in scholarly infrastructure (arXiv, Google Scholar, ORCID). No empirical sample or statistical test reported.
high null result A Brief History of AI for Scientific Discovery: Open Researc... historical scope and timeline of AI for scientific discovery
At the macroeconomic level, Kazakhstan's state programs (e.g., 'Digital Kazakhstan' and the Industrial and Innovation Development Program) and international indices (WIPO Global Innovation Index, OECD digital assessments, IMF data) are used to evaluate and position Kazakhstan within the global digital economy.
Macro-level analysis using national programs and international indices described in the article to assess Kazakhstan's digital economy standing.
high null result Digitalization and labor costs: efficiency of industrial ent... Kazakhstan's position in global digital economy (evaluative metric)
Deep Reinforcement Learning (DRL) has shown strong microscopic performance in car-following conditions, but its macroscopic traffic flow characteristics remain underexplored.
Literature synthesis / motivation in the paper (review of existing DRL work focused on microscopic performance). No empirical sample size.
high null result Macroscopic Characteristics of Mixed Traffic Flow with Deep ... extent of prior research on macroscopic traffic flow characteristics for DRL mod...
For readers less familiar with the Bayesian and decision-theoretic language, key terms are defined in a glossary at the end of the article.
Statement about the article's structure and supporting material (presence of glossary noted in the article).
high null result Retraining as Approximate Bayesian Inference availability of glossary/terminology definitions
The gap between a continuously updated belief state and your frozen deployed model is 'learning debt.'
Terminology/definition introduced by the author in the article (glossary and definitional exposition).
high null result Retraining as Approximate Bayesian Inference definition/labeling of model staleness
Model retraining is usually treated as an ongoing maintenance task.
Author's descriptive claim in the article; presented as an observation about prevailing practice (no empirical sample or data reported).
high null result Retraining as Approximate Bayesian Inference how retraining is operationalized (treated as maintenance)
Afriat's theorem guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint.
Theoretical claim citing Afriat's theorem (mathematical result used as foundational justification in the paper).
high null result GARP-EFM: Improving Foundation Models with Revealed Preferen... logical equivalence between GARP and utility-maximizing demand
We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents.
Methods described in the paper: authors report fine-tuning Chronos-2 on synthetically generated time series from utility-maximizing agents (methodological statement).
high null result GARP-EFM: Improving Foundation Models with Revealed Preferen... model fine-tuning procedure / training data source
Data sources include field research conducted in 2024 and public reports from the Ministry of Industry and Information Technology and the National Bureau of Statistics.
Paper statement describing data provenance: field surveys in 2024 (n=326) plus public reports from MIIT and National Bureau of Statistics.
high null result Research on the Adoption of Artificial Intelligence and Proc... data provenance / sources
The visualization avoided redistributing value.
Reported result from the within-subjects experiment (N=32) stating that the visualization did not redistribute value between parties (i.e., it improved outcomes/efficiency without changing value split).
high null result From Overload to Convergence: Supporting Multi-Issue Human-A... distribution of value between negotiating parties (value split / surplus allocat...
Human-like presentations did not raise conformity pressure.
Reported experimental result: manipulaton of presentation style (human-like vs not) and measurement of conformity pressure; the abstract states that human-like presentation increased perceived usefulness/agency without increasing conformity pressure. No quantitative details provided in abstract.
Larger panels yielded no gains in accuracy relative to a single AI.
Reported experimental comparison manipulating panel size in the study (three tasks). The abstract states that larger panels did not produce accuracy gains versus a single AI. (No sample size or numerical effect reported in abstract.)
Capital income taxes, worker equity participation, universal basic income, upskilling, and Coasian bargaining cannot eliminate the excess automation.
Model-based policy counterfactuals evaluated in the paper showing these interventions fail to achieve the social optimum in the theoretical framework; no empirical sample.
high null result The AI Layoff Trap effectiveness of listed policies at preventing excessive automation / preserving...
Wage adjustments and free entry cannot eliminate the excess automation.
Analytical result in the model showing endogenous wage changes and free entry do not restore the socially optimal level of employment; theoretical equilibrium analysis, no empirical data.
high null result The AI Layoff Trap ability of wage adjustments and free entry to correct excessive automation / res...
We analyze a regional standardized sentiment database (97,719 responses).
Dataset description in the paper specifying the size of the standardized sentiment database.
high null result Engineering Distributed Governance for Regional Prosperity: ... data sample size (sentiment responses)
We analyze a raw Fukui spending database (90,350 records).
Dataset description in the paper specifying the size of the raw Fukui spending database.
high null result Engineering Distributed Governance for Regional Prosperity: ... data sample size (spending records)
We evaluate our approach on spapi, a production in-vehicle API system at Volvo Group involving 192 endpoints, 420 properties, and 776 CAN signals across six functional domains.
Case study / evaluation dataset description (explicit counts provided in paper).
high null result LLM-Powered Workflow Optimization for Multidisciplinary Soft... evaluation dataset scale and scope (endpoints, properties, CAN signals, domains)
The analysis relies on partial least squares path modeling (PLS-PM) to test eight predictions linking technological perceptions, organizational factors, and adoption outcomes.
Author-stated analytical method: PLS-PM; eight predictions tested; uses the survey data described above.
high null result Artificial Intelligence Adoption in Talent Acquisition: Effe... analytical approach / hypothesis testing
The study uses cross-sectional survey data from 523 human resource professionals and hiring managers representing 184 organizations across multiple industries in the United States.
Author-stated sample description in the paper: cross-sectional survey; 523 HR professionals/hiring managers; 184 organizations; multiple industries; U.S.
high null result Artificial Intelligence Adoption in Talent Acquisition: Effe... sample composition / data source
Each task is evaluated under three agent configurations (no-skills, LLM-generated skills, and human-expert skills) and validated through real hardware execution.
Experimental design described in the paper specifying three agent configurations per task and hardware validation of task runs.
high null result Skilled AI Agents for Embedded and IoT Systems Development evaluation configuration and validation modality
IoT-SkillsBench spans three representative embedded platforms, 23 peripherals, and 42 tasks across three difficulty levels.
Benchmark composition statistics reported in the paper (counts of platforms, peripherals, tasks, and difficulty levels).
high null result Skilled AI Agents for Embedded and IoT Systems Development benchmark scope (platforms, peripherals, tasks, difficulty levels)
We introduce a skills-based agentic framework for HIL embedded development together with IoT-SkillsBench, a benchmark designed to systematically evaluate AI agents in real embedded programming environments.
Methodological contribution described in the paper (introduction of framework and benchmark; the paper reports design and implementation).
high null result Skilled AI Agents for Embedded and IoT Systems Development availability of a skills-based agentic framework and benchmark
The study observes five delivery configurations: a traditional baseline and four successive platform versions (V1–V4).
Study design described by the authors; outcomes measured across these five configurations for the three programs.
high null result Orchestrating Human-AI Software Delivery: A Retrospective Lo... delivery configuration variations (baseline, V1–V4)
The study covers three real software modernization programs: a COBOL banking migration (~30k LOC), a large accounting modernization (~400k LOC), and a .NET/Angular mortgage modernization (~30k LOC).
Study design / sample description provided by the authors in the paper's methods section.
high null result Orchestrating Human-AI Software Delivery: A Retrospective Lo... study programs and codebase sizes (lines of code)
Evidence on AI in software engineering still leans heavily toward individual task completion, while evidence on team-level delivery remains scarce.
Paper's literature-context statement (intro); asserted by the authors as motivation for the study (no primary data supporting this meta-claim provided within the study).
high null result Orchestrating Human-AI Software Delivery: A Retrospective Lo... distribution of prior evidence (individual task vs team-level delivery) in the l...
This research deepens theoretical understanding by integrating CE principles, Industry 4.0 architectures, green innovation theory, and lifecycle assessment into a unified conceptual framework.
Authors' description of theoretical contribution in the abstract, based on their synthesis of the bibliometric and systematic review findings.
high null result Artificial intelligence as a catalyst for the circular econo... conceptual/theoretical integration (framework development)
This study offers the first comprehensive mixed-methods assessment of how AI transforms industrial production ecosystems in the post-ChatGPT era.
Authors' methodological/novelty claim in the abstract; supported by description of methods (bibliometric analysis of 196 articles and systematic review of 104 studies).
high null result Artificial intelligence as a catalyst for the circular econo... novelty / comprehensiveness of the study
This study uses a mixed-method research design combining quantitative ROI modelling and cost–benefit analysis, qualitative synthesis of secondary enterprise case studies, and architectural analysis of Azure-native GenAI services.
Explicit methodological description in the abstract of the paper.
high null result Measuring Business ROI of Generative AI Adoption on Azure Cl... research design / methods
This Article presents the results of an experiment in which a transcript of a hypothetical client interview involving potential disability discrimination, retaliation, and wrongful termination claims was submitted to each AI system, with prompts requesting identification and assessment of viable legal theories.
Methodological description of the experiment: one hypothetical client interview transcript fed to each of four AI engines with prompts to identify and assess legal theories.
high null result Robot Wingman: Using AI to Assess an Employment Termination experimental procedure (input and prompts)
Industrial robot penetration is used as a proxy measure for AI adoption in Chinese provinces.
Paper explicitly states industrial robot penetration was used as the proxy for AI adoption in the empirical analysis.
high null result Nonlinear effects of ageing population and AI on China’s GDP... AI adoption (proxied by industrial robot penetration)
The study uses panel data on 31 Chinese provinces for the period 2000–2022 and employs panel threshold regression models with ageing and AI adoption as threshold variables.
Paper description: panel data from 31 provinces (2000–2022); use of panel threshold regression models; threshold variables specified as ageing and AI adoption (industrial robot penetration).
high null result Nonlinear effects of ageing population and AI on China’s GDP... methodological approach (panel threshold regression)
The experiment compared three prompt conditions: (A) simple prompts, (B) raw PPS JSON, and (C) natural-language-rendered PPS.
Method description of the three prompt conditions used in the controlled experiment.
The study used three specific LLMs: DeepSeek-V3, Qwen-Max, and Kimi.
Method section listing the three models evaluated in the experiment.
We ran a controlled three-condition study across 60 tasks in three domains (business, technical, and travel), three large language models (DeepSeek-V3, Qwen-Max, and Kimi), and three prompt conditions, collecting 540 AI-generated outputs evaluated by an LLM judge.
Authors report an experimental study design: 60 tasks × 3 models × 3 prompt conditions = 540 outputs, with outputs evaluated by an LLM judge (methodological description in the paper).
high null result Evaluating 5W3H Structured Prompting for Intent Alignment in... experimental_data_collection (AI outputs evaluated by LLM judge)
The study uses panel data for 30 Chinese provinces from 2013–2022 to measure urban circular economy efficiency (UCEE) with a Super-SBM model including undesirable outputs, track dynamics via the Global Malmquist–Luenberger index, and estimate spatial effects with a spatial Durbin model.
Methodological description in the abstract: explicit statement of data (30 provinces, 2013–2022) and the three methods used (Super-SBM with undesirable outputs, GML index, spatial Durbin model).
high null result How artificial intelligence and environmental regulation inf... use of Super-SBM measurement, GML dynamics, and spatial Durbin estimation (metho...
Despite fears of mass unemployment, aggregate labor-market data through 2025 show limited labor-market disruption from generative AI.
Review of aggregate employment and labor-market studies and macro-level data through 2025 cited in the brief; methods include analyses of employment statistics and macro labor indicators (no single sample size reported).
high null result AI, Productivity, and Labor Markets: A Review of the Empiric... aggregate employment / labor-market disruption
We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines.
Empirical study design described in the paper: open competition with reported counts of teams and participants (29 teams, 80 participants); comparison between participant submissions and AI-only baselines.
high null result AgentDS Technical Report: Benchmarking the Future of Human-A... competition participation enabling comparison
AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking.
Descriptive dataset/benchmark specification in the paper stating task count and industry coverage.
high null result AgentDS Technical Report: Benchmarking the Future of Human-A... number of challenges and industry coverage
Open research challenges that define the research agenda include scaling beyond benchmarks, achieving compositionality over changes, metrics for validating specifications, handling rich logics, and designing human-AI specification interactions.
Authors' explicit enumeration of open problems and a proposed multi-disciplinary research agenda; presented as expert opinion rather than empirical finding.
high null result Intent Formalization: A Grand Challenge for Reliable Coding ... progress on research questions (research agenda advancement)
Data ethics, as a central pillar of digital ethics, emphasizes the responsible use and protection of personal information.
Conceptual/definitional statement in the paper situating data ethics within digital ethics and highlighting protection of personal information as a core concern.
Big data usage is proxied by keyword frequency in firms' annual reports.
Operationalization described in the paper: frequency/count of big-data-related keywords in annual reports used as the proxy for firms' big data application.
high null result How Big Data Enhances Firm Value Under Data Privacy Regulati... big data usage (proxy)
The empirical analysis uses a fixed-effects regression approach to measure the impact of big data application on firm value.
Methodological statement in the paper specifying fixed-effects regression as the primary econometric approach.
The study analyzes panel data covering Chinese A-share listed companies from 2007 to 2021.
Description of dataset in the paper: panel of Chinese A-share listed companies spanning the years 2007–2021 (sample period stated).
The analysis extends the dynamic taxation setup of Slavik and Yazici (2014).
Methodological claim: the model and solution approach build on and modify the framework from Slavik and Yazici (2014) (reference to prior theoretical framework rather than empirical data).
high null result Workers' Incentives and the Optimal Taxation of AI scope and structure of the theoretical model (extension of the referenced dynami...
We characterize the optimal tax policy in an economy with human manual and cognitive labor, physical capital, and artificial intelligence (AI).
Theoretical/analytical work: the paper develops and analyzes a dynamic general-equilibrium model that includes manual and cognitive human labor, physical capital, and AI. (No empirical sample; model-based characterization.)
high null result Workers' Incentives and the Optimal Taxation of AI form and properties of the optimal tax policy in the specified theoretical econo...
All code, data, and logs are publicly available at https://github.com/pepealonso95/TDAD.
Provision of a public GitHub repository URL in the paper.
high null result TDAD: Test-Driven Agentic Development - Reducing Code Regres... availability of code, data, and logs (public repository)
Evaluation was performed on SWE-bench Verified with two local models: Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances.
Experimental setup reported in the paper specifying benchmark (SWE-bench Verified) and model-instance counts.
high null result TDAD: Test-Driven Agentic Development - Reducing Code Regres... evaluation sample size / benchmark coverage (number of instances per model)
Controlled experiments were run with N = 250 across five content types to validate the mechanisms.
Experimental methods reported in the paper: controlled experiments with specified sample size and content-type breakdown.
high null result Governed Memory: A Production Architecture for Multi-Agent W... experimental sample size and content-type breadth (N=250, 5 content types)
Research agenda: empirical microdata on managerial time use, task-level automation, performance outcomes, and wage impacts are needed to quantify substitution versus complementarity and to evaluate human-in-the-loop designs' effects on firm performance and distributional outcomes.
Explicit methodological recommendation within the paper; identifies gaps due to the paper's conceptual (non-empirical) approach.
high null result Comparative analysis of strategic vs. computational thinking... availability and use of microdata on managerial tasks, automation, firm performa...
No original quantitative dataset or controlled evaluation is reported in this paper.
Methodological description in the paper stating reliance on prior literature, conceptual analysis, and prescriptive recommendations; paper does not present new experiments.
high null result LLM Alignment should go beyond Harmlessness–Helpfulness and ... existence of original empirical data or controlled experiments in the paper