The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13870 claims)

Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 196 98 892 1984
Governance & Regulation 817 394 188 121 1544
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 627 233 123 96 1088
Research Productivity 411 123 56 332 933
Output Quality 467 178 59 47 751
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 167 122 24 496
Task Allocation 207 64 71 32 379
Skill Acquisition 165 59 60 17 301
Innovation Output 203 27 43 18 292
Employment Level 105 52 107 13 279
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 150 48 26 3 227
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 63 20 12 184
Error Rate 69 92 10 2 173
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 93 21 13 19 148
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 17 7 3 59
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
The study sample comprises 21,428 firm-year observations from Chinese A-share listed manufacturing companies over 2010–2022.
Data description provided in the paper's abstract/introduction specifying the sample frame and time period.
high null result Artificial Intelligence Innovation, Internal Structure Optim... sample composition (firm-year observations)
We find little evidence of crashing waves (in contrast to recent work by METR).
Analysis of the >3,000 tasks and >17,000 evaluations which reportedly do not show abrupt, concentrated surges in AI capability on small sets of tasks.
high null result Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... presence of abrupt concentrated capability surges ('crashing waves')
The evaluation is based on more than 17,000 evaluations by workers from these jobs.
Reported sample of >17,000 human evaluations of model outputs.
high null result Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... number of human evaluations
We test for these effects in preliminary evidence from an ongoing evaluation of AI capabilities across over 3,000 broad-based tasks derived from the U.S. Department of Labor O*NET categorization that are text-based and thus LLM-addressable.
Empirical study design reporting an ongoing evaluation covering >3,000 text-based tasks mapped from O*NET.
high null result Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... coverage of LLM-addressable tasks (task sample)
Green innovation does not yet significantly reduce carbon inequality.
Empirical results from the provincial panel analysis (2003–2021) showing that measures of green innovation are not associated with a statistically significant reduction in carbon inequality.
This paper employs a staggered difference-in-differences (DID) model using data from Chinese A-share listed manufacturing companies from 2012 to 2023 and uses the National Artificial Intelligence Innovative Application Pioneer Zone (AIIAPZ) policy as a quasi-natural experiment.
Staggered DID empirical design; sample described as Chinese A-share listed manufacturing firms, 2012–2023; AIIAPZ policy used as treatment assignment (quasi-natural experiment).
high null result Does Artificial Intelligence Improve the Operational Resilie... methodological design / identification strategy (use of staggered DID and policy...
The paper characterizes the symmetric Nash equilibrium in a preemption game of competing frontier-AI firms.
Analytic game-theoretic model and equilibrium derivations presented in the paper (formal characterization/propositions).
high null result Optimal Release Timing of AI Systems: A Strategic Analysis w... strategic equilibrium (symmetric Nash) in release-timing preemption game
The study uses panel data from listed manufacturing firms in China and employs a quasi-natural experiment approach.
Statement in the abstract describing data source (panel of listed manufacturing firms in China) and empirical strategy (quasi-natural experiment).
high null result The impact of R&D innovation strategy on the sustainable... data source and identification strategy
Big data analytics and blockchain technologies show no significant correlations with exports to specific destinations (multivariate probit result).
Multivariate probit model of destination-specific export decisions showing non-significant coefficients for big data analytics and blockchain across destinations (sample size not reported in prompt).
high null result How Digitalization Shapes Export Potential: Firm-Level Insig... exporting to specific destination regions (binary/region-specific firm export de...
Adopting blockchain technologies does not have a statistically significant effect on a firm's likelihood of exporting (probit model result).
Probit regression analysis showing non-significant coefficient for blockchain adoption (sample size not reported in prompt).
high null result How Digitalization Shapes Export Potential: Firm-Level Insig... likelihood/probability of exporting (firm-level)
Adopting big data analytics does not have a statistically significant effect on a firm's likelihood of exporting (probit model result).
Probit regression analysis showing non-significant coefficient for big data analytics adoption (sample size not reported in prompt).
high null result How Digitalization Shapes Export Potential: Firm-Level Insig... likelihood/probability of exporting (firm-level)
We introduce the Agentic Task Exposure (ATE) score, a composite measure computed algorithmically from O*NET task data using calibrated adoption parameters (not a regression estimate), incorporating AI capability scores, workflow coverage factors, and logistic adoption velocity.
Methodological description in the paper; algorithmic construction from O*NET task data with specified calibrated adoption parameters and components (AI capability scores, workflow coverage, logistic adoption).
high null result Agentic AI and Occupational Displacement: A Multi-Regional T... NA (methodological construct for measuring exposure/adoption)
Code authoring and review are only a small part of the larger software engineering process; the resulting code must also be maintained and updated over time.
Conceptual/argumentative claim presented in the paper to motivate longitudinal analysis (not presented as an empirical estimate from the dataset).
high null result Investigating Autonomous Agent Contributions in the Wild: Ac... relative share of authoring/review versus maintenance in software development (c...
We offer several longitudinal estimates of survival and churn rates for agent-generated versus human-authored code.
Longitudinal analysis reported in the paper comparing survival and churn for agent-generated and human-authored code over time using the dataset (paper states these estimates were produced).
high null result Investigating Autonomous Agent Contributions in the Wild: Ac... survival rates and churn rates of code contributions
We compare five popular coding agents, including OpenAI Codex, Claude Code, GitHub Copilot, Google Jules, and Devin, examining how their usage differs in various development aspects such as merge frequency, edited file types, and developer interaction signals, including comments and reviews.
Comparative analysis across agents using the constructed dataset of ~110,000 PRs (paper states these five agents were compared on metrics like merge frequency, edited file types, and interaction signals).
high null result Investigating Autonomous Agent Contributions in the Wild: Ac... merge frequency, edited file types, developer interaction signals (comments, rev...
We construct a novel dataset of approximately 110,000 open-source pull requests, including associated commits, comments, reviews, issues, and file changes, collectively representing millions of lines of source code.
Descriptive dataset construction reported in the paper (stated sample size ~110,000 PRs including commits, comments, reviews, issues, file changes; representing millions of lines of code).
high null result Investigating Autonomous Agent Contributions in the Wild: Ac... number of pull requests and total lines of source code in dataset
The paper extends classical (Solow) and endogenous (Romer) growth models to incorporate TAI, producing a dynamic framework for analyzing AI-driven structural change.
Methodological claim: the authors explicitly state they build on Solow (1956) and Romer (1990) to develop an integrated dynamic model that incorporates TAI; evidence is described model extension and formalization within the paper.
high null result Transformative AI and the Evolution of Growth Models: Extend... modeling framework / analytical capacity to study AI-driven structural change
The study uses dynamic fixed-effects and dynamic panel threshold regression techniques on a panel of 23 developed and developing countries from 2002 to 2023.
Methodological statement in the abstract specifying the estimation techniques and the dataset: a panel of 23 countries over 2002–2023.
This study uses semi-structured interviews with 10 practitioners to examine perceptions of collaborating with human versus AI teammates.
Methods statement in the paper: semi-structured interviews; sample size explicitly reported as 10 practitioners.
high null result Bridging the Socio-Emotional Gap: The Functional Dimension o... methodological description (data collection approach)
The study is based on a qualitative analysis of recent academic literature, comparative analysis of sector-specific applications of Big Data technologies, and synthesis of empirical findings from international studies using a systemic and structural analysis approach.
Methodological statement within the paper describing data sources and analytic approach; not an empirical claim about outcomes.
high null result Implications of Big Data Technologies for the Resilience of ... methodological approach (literature synthesis, comparative analysis, systemic/st...
The research documents a transition in the literature (2013–2025) from early 'risk-of-automation' evaluations toward task-based and firm-level econometric models.
Literature review/synthesis across the 2013–2025 body of research as described in the paper.
high null result Impact Of Artificial Intelligence (AI) On Employment research methods / framework change
Society 5.0 and Industry 5.0 call for human-centric technology integration, but the concept lacks an operational definition that can be measured, optimized, or evaluated at the firm level.
Motivating claim grounded in literature gap analysis presented in the paper (argument that normative frameworks lack formal, operational metrics at firm level).
high null result From Automation to Augmentation: A Framework for Designing H... operationalizability/measurability of 'human-centricity' at firm level
We propose the Workplace Augmentation Design Index (WADI), a 36-item theory-grounded instrument for diagnosing human-centricity at the firm level.
Instrument design/proposal presented in the paper (36 items mapped to the five workplace-design dimensions); no validation sample reported in the abstract.
high null result From Automation to Augmentation: A Framework for Designing H... diagnosis/measurement of firm-level human-centric workplace design
We conducted a PRISMA-guided systematic review of 120 papers (screened from 6,096 records) to map the evidence base for each workplace-design dimension.
Systematic literature review using PRISMA protocol; final sample = 120 papers; initial records screened = 6,096.
high null result From Automation to Augmentation: A Framework for Designing H... coverage/evidence for each workplace-design dimension in the literature
Existing models of human-AI complementarity treat the augmentation function phi(D) as exogenous and thus ignore that two firms with identical technology investments can achieve radically different augmentation outcomes depending on workplace organization.
Argument based on literature review of prior models (the paper contrasts its approach with existing complementarity models). No new empirical sample reported for this specific claim.
high null result From Automation to Augmentation: A Framework for Designing H... augmentation outcomes (human-AI augmentation productivity)
The widening effect of AI adoption on the electricity output growth gap diminishes over time and becomes statistically insignificant after approximately three years.
Temporal (dynamic) empirical analysis / event-study-style estimation tracing the AI adoption effect over multiple years post-adoption; statistical significance reported to fade by year ~3. Sample size / exact time windows not provided in the summary.
high null result The Impact of AI Adoption on Electricity Output Growth Gap: ... corporate electricity output growth gap (time-varying effect)
The review employed a systematic analysis of multidisciplinary studies (qualitative, quantitative, and bibliometric) focused on agentic AI technologies in financial domains, covering literature published up to mid-2024.
Stated methodology of the paper (systematic review description).
high null result A Comparative & Systematic Review of Literature on the I... scope and methods of the review itself
A subset of four datasets included settings in which the AI provided explanations of its decision.
Paper states that four of the datasets involved AI explanations (explicitly stated in abstract).
high null result Beyond AI advice -- independent aggregation boosts human-AI ... presence_of_AI_explanation
The study compared HCT to the AI-as-advisor approach using 10 datasets from various domains, including medical diagnostics and misinformation discernment.
Paper reports an empirical comparison across 10 datasets spanning multiple domains (explicitly stated in abstract).
The hybrid confirmation tree (HCT) elicits a human judgment and an AI judgment independently; if they agree that decision is accepted, and if they disagree a second human breaks the tie.
Description of the HCT method in the paper (procedural/design specification).
The cross-sectional, self-reported survey design prevents strong causal claims about the effect of algorithms or selective exposure on polarization.
Authors explicitly note methodological limitations: cross-sectional survey of N = 450, reliance on self-reported consumption, and lack of platform log or longitudinal/experimental data.
high null result Echo Chambers, Filter Bubbles, and Selective Exposure: Media... causal inference ability (limitation due to design)
The study adopted a positivist philosophy and a descriptive-correlational design.
Methods section statement in the paper describing the research philosophy and study design.
high null result Technology Innovation Strategy and the Competitiveness of Ke... research design / methodology
Data were collected from innovation-focused executives across 39 licensed Kenyan commercial banks.
Paper statement specifying sample source: 'Using data from innovation-focused executives across 39 licensed banks.'
high null result Technology Innovation Strategy and the Competitiveness of Ke... sample composition / data source
Technological innovation was assessed via adoption of new systems, integration of digital channels, and use of Artificial Intelligence and data analytics.
Measurement description provided in the paper listing the components used to operationalize technological innovation.
high null result Technology Innovation Strategy and the Competitiveness of Ke... measurement/operationalization of technological innovation
Competitiveness in the study was measured through market share, return on equity and customer satisfaction.
Measurement description provided in the paper describing dependent variable operationalization (explicit list of three indicators).
high null result Technology Innovation Strategy and the Competitiveness of Ke... measurement/operationalization of competitiveness
Metode penelitian yang digunakan adalah penelitian hukum normatif dengan pendekatan perundang-undangan, konseptual, dan komparatif, didukung oleh analisis literatur dari jurnal nasional terindeks SINTA dan jurnal internasional bereputasi.
Pernyataan metode yang jelas tercantum dalam abstrak/metodologi makalah.
high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... metodologi penelitian (penelitian hukum normatif dan tinjauan literatur)
Penelitian menilai kecukupan perlindungan hukum yang tersedia bagi pekerja terdampak PHK akibat adopsi AI.
Pernyataan tujuan penelitian dan pendekatan analitis (normatif, komparatif) yang didukung oleh tinjauan literatur pada jurnal-jurnal terpilih.
high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... kecukupan perlindungan hukum bagi pekerja terdampak AI
Penelitian ini bertujuan menganalisis bagaimana Undang-Undang Cipta Kerja dan peraturan turunannya mengklasifikasikan dan menjustifikasi Pemutusan Hubungan Kerja (PHK) akibat adopsi AI.
Pernyataan tujuan penelitian yang tercantum di bagian metodologi/pendahuluan; pendekatan peraturan-perundang-undangan dalam penelitian hukum normatif.
high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... klasifikasi dan justifikasi PHK dalam kerangka UU Cipta Kerja
The user study had N=50 participants.
Reported user study sample size (N=50) used to evaluate AI-assisted intent expansion in ecologically valid settings.
high null result Structured Intent as a Protocol-Like Communication Layer: Cr... user study sample size
Under the current evaluation resolution, 5W3H, CO-STAR, and RISEN achieve similarly high goal-alignment scores, suggesting that dimensional decomposition itself is an important active ingredient.
Controlled comparison between three structured frameworks (5W3H, CO-STAR, RISEN) across the evaluated outputs, with no meaningful differences reported between them.
The study evaluated 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks) using an independent judge (DeepSeek-V3).
Reported experimental design and evaluation: 3 languages, 6 conditions, 3 models, 3 domains, 20 tasks; judged by DeepSeek-V3.
high null result Structured Intent as a Protocol-Like Communication Layer: Cr... number of model outputs evaluated / evaluation procedure
The paper frames the LLM-politician relationship through principal-agent theory and bounded rationality, conceptualizing the legislator as a principal delegating advisory tasks to a boundedly rational agent under structural information asymmetry.
Explicit theoretical framing described in the introduction or theory section of the paper.
Model outputs were evaluated using a dual framework combining LLM-as-Judge semantic scoring and programmatic text similarity metrics.
Paper describes the evaluation methodology: semantic scoring via LLM-as-Judge plus programmatic text similarity measures applied to model-generated rationales vs official memoranda.
high null result Can Commercial LLMs Be Parliamentary Political Companions? C... evaluation method / scoring approach
Six LLMs were evaluated: GPT-5-mini, GPT-5-chat (OpenAI), Claude Haiku 4.5 (Anthropic), and Llama 4 Maverick, Llama 3.3 70B, Llama 3.1 8B (Meta).
Paper explicitly lists the six evaluated models spanning three provider families and multiple capability tiers.
The study uses a dataset of 15 Romanian Senate law proposals paired with their official explanatory memoranda (expuneri de motive).
Explicit statement in the paper describing the dataset composition: 15 Romanian Senate law proposals each paired with its official explanatory memorandum.
high null result Can Commercial LLMs Be Parliamentary Political Companions? C... dataset size / data corpus
We implement a rigorously controlled execution-based testbed featuring Git worktree isolation and explicit global memory to evaluate agent coordination frameworks.
Methodological description in the paper indicating the testbed design choices (Git worktree isolation, explicit global memory) used to ensure controlled, reproducible execution of agent-generated code.
high null result An Empirical Study of Multi-Agent Collaboration for Automate... experimental reproducibility and isolation (testbed design)
We benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs) using a rigorously controlled, execution-based testbed.
Description of experimental setup in the paper: an execution-based testbed with Git worktree isolation and explicit global memory; experiments explicitly compare single-agent, subagent, and agent-team architectures under fixed computational time budgets.
high null result An Empirical Study of Multi-Agent Collaboration for Automate... comparative performance of agent architectures (benchmarking setup)
Data construction: The authors treat Wikipedia technology pages as distinct technologies and trace them across patents and job postings from 1976 to 2007, using technical bigrams to identify technologies in texts.
Description of dataset construction building on Kalyani et al. (2025) in Section 2; methodological description of linking Wikipedia pages, patent text, and job postings.
high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE coverage and method of technology identification in data
Proposition 1: With a constant pace of technology creation (m(b)=m), the model admits a unique balanced growth path (BGP) along which real wages and output grow at rate g, the skill premium remains constant and is independent of m.
Analytical result (proposition) proved in the paper's model appendix under model assumptions.
high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE skill premium dependence on pace parameter m along BGP
The modal technology in the top 1% densest locations (e.g., New York, San Francisco) is 34 years old, while the modal technology in the bottom 50% lowest-density locations is 48 years old, indicating sizable diffusion gaps.
Empirical measurement from the text-based technology dataset tracking vintage of technologies across locations; reported modal ages by location density percentile.
high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE modal technology age by location density