The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (7156 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Significantly more heavy LLM users reported that the writing was less creative and not in their voice.
Self-reported measures from participants in the human user study comparing heavy LLM users to others; no sample size or exact statistics provided in the excerpt.
high negative How LLMs Distort Our Written Language self-reported creativity and 'in-your-voice' authenticity of writing
In Chicago, the model shows moderate under-detection of Black residents with DIR equal to 0.22.
Reported DIR value from simulation results on Chicago 2022 data.
high negative Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... Disparate Impact Ratio (DIR) indicating under-detection of Black residents
It is impractical to uniformly apply an alignment method across diverse, independently developed AI models in strategic settings.
Paper assertion / motivating argument (stated as motivation for investigating zero-shot Nash-like behavior); not presented as an empirical finding within the paper.
high negative Reasonably reasoning AI agents can avoid game-theoretic fail... practicality/adoption feasibility of universal alignment methods
The gap between informal natural language requirements and precise program behavior (the 'intent gap') has always plagued software engineering, but AI-generated code amplifies it to an unprecedented scale.
Conceptual claim and argumentation in the paper; presented as an observed escalation in the scale of the existing 'intent gap' due to AI code generation. No quantitative evidence or sample size given in the excerpt.
high negative Intent Formalization: A Grand Challenge for Reliable Coding ... mismatch between intended and actual program behavior (intent gap) / resulting c...
The crowding-out effect of AI washing on green innovation is heterogeneous: private enterprises, small and medium-sized enterprises (SMEs), and firms in highly competitive sectors suffer more severe negative impacts.
Subgroup/heterogeneity analysis reported in the paper on the same sample of Chinese A-share listed companies (2006–2024); abstract identifies private firms, SMEs, and firms in highly competitive industries as more affected.
high negative The Spillover Effects of Peer AI Rinsing on Corporate Green ... green innovation (heterogeneous treatment effects across firm types and industri...
The negative relationship between AI washing and green innovation is transmitted through dual channels in both product and capital markets.
Mechanism analysis reported in the paper (presumably mediation or channel analysis) using the same dataset of Chinese A-share firms' annual reports and firm-level market data; abstract states product- and capital-market channels convey the crowding-out effect.
high negative The Spillover Effects of Peer AI Rinsing on Corporate Green ... green innovation (via product-market and capital-market channels)
Corporate AI washing exerts a significant crowding-out effect on green innovation.
Empirical analysis using semantic measures of 'AI washing' derived from large language model (LLM) analysis of annual reports for Chinese A-share listed companies (2006–2024); paper reports statistically significant negative relationship between AI washing and firms' green innovation (details of regression models not provided in abstract).
The capital-output elasticity dropped significantly, from 0.42 in 2010–2015 to 0.35 in 2016–2022.
Estimated from an extended Cobb–Douglas production function applied to China's economy over 2010–2022, with period split 2010–2015 vs 2016–2022 (as reported in the study summary).
high negative Analysis of China's Economic Growth Drivers: An Empirical St... capital-output elasticity (elasticity of output with respect to capital)
These dynamics amplify initial disparities and produce persistent performance gaps across the population.
Main theoretical conclusion of the paper: analysis of the proposed dynamical system showing amplification and persistence of gaps (authors' demonstrated result).
high negative Actionable Recourse in Competitive Environments: A Dynamic G... magnitude and persistence of performance disparities across population over time
Exclusion-based cohesion can produce state-contingent illusory precision together with effective input concentration and dynamic lock-in simultaneously—i.e., these phenomena co-occur under the model's parameter regimes.
Analytical model results showing co-occurrence of multiple adverse phenomena (bias that grows in tails, illusory precision, input concentration, lock-in) under the same exclusion mechanisms; derived within the paper's theoretical framework.
high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... co-occurrence of multiple adverse outcomes: tail bias, observed disagreement, ef...
When the anchor belief is updated from internally filtered aggregates, the system can exhibit dynamic lock-in: delayed recognition of regime shifts followed by abrupt correction.
Analytical dynamics studied in the model when anchor updates depend on filtered (excluded) aggregates; derivations demonstrate delayed detection and abrupt adjustments. This is a theoretical/dynamical model result, no empirical data.
high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... delay in regime recognition and magnitude/timing of corrective update
Exclusion leads to effective concentration of decision inputs: the effective number of independent inputs falls below the nominal participant count.
Model-derived analytic result showing that report shrinkage and discarding reduce effective information contributions, quantified relative to nominal participation in the theoretical framework. No empirical sample.
high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... effective number of independent decision inputs (information concentration)
Exclusion-based cohesion induces 'illusory precision': observed disagreement can fall while actual estimation error in tail regimes rises (i.e., lower recorded variance despite higher true error).
Theoretical result derived from the signal-aggregation model showing a regime in which filtered reports reduce observed variance even as tail-regime estimation error increases. No empirical validation provided.
high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... observed disagreement (reported variance) versus true estimation error in tail r...
Relative to a full-inclusion benchmark, exclusion-based cohesion produces state-contingent bias that is small in normal regimes but grows sharply under regime displacement (tail events).
Analytical comparisons between the exclusion model and a full-inclusion benchmark within the theoretical model; derivations showing bias as a function of regime and exclusion parameters. The result is from model analysis, not empirical data.
high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... estimation bias (especially under regime displacement/tail events)
The establishment of the China–ASEAN Free Trade Area (CAFTA) reduced regional trade policy uncertainty.
Empirical analysis treats CAFTA as an exogenous policy shock and measures a decline in regional trade policy uncertainty using firm‑ and trade‑level data from the China Industrial Enterprise Database and China Customs Database covering 2000–2014; identification via difference‑in‑differences (DID). (Sample sizes not specified in provided summary.)
high negative How regional trade policy uncertainty affects agricultural i... regional trade policy uncertainty (measured at regional/firm level)
Limitations include possible limited organizational generalizability due to a single Fortune 500 lab context; ABS results depend on model specification/calibration; and operational definitions of 'resilience' and 'planning cycle' require careful reading.
Authors' reported limitations based on study design: single lab context (n = 23), dependence of ABS on model choices, and nontrivial operational definitions.
high negative The Algorithmic Canvas: On the Autopoietic Redefinition of S... generalizability and robustness of study findings
Some declines (in self-efficacy and meaningfulness) from passive AI use persist after participants return to manual work.
Within-experiment assessment of outcomes after participants returned to manual (no-AI) tasks following the AI-use manipulation in the pre-registered experiment (N = 269); reported persistent reductions in self-efficacy and meaningfulness for the passive condition.
high negative Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy; perceived meaningfulness (measured post-return to manual work)
Passive use of AI reduces perceived meaningfulness of work.
Pre-registered experiment (N = 269) with self-reported measure of work meaningfulness; passive-copy condition showed lower meaningfulness ratings than No-AI and Active-collaboration conditions.
high negative Relying on AI at work reduces self-efficacy, ownership, and ... perceived meaningfulness of work
Passive use of AI reduces psychological ownership of the produced outputs.
Same pre-registered experiment (N = 269). Participants in the passive-copy AI condition reported lower psychological ownership of their outputs (self-report scales) relative to No-AI and Active-collaboration conditions.
high negative Relying on AI at work reduces self-efficacy, ownership, and ... psychological ownership of outputs
Passive use of AI (copying AI-generated output) reduces workers' self-efficacy.
Pre-registered between-subjects experiment (N = 269) using occupation-specific writing tasks. Participants assigned to a passive-copy AI condition reported lower self-efficacy (self-reported confidence to complete tasks without AI) compared to the No-AI (manual) and Active-collaboration conditions.
high negative Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy (confidence to complete tasks without AI)
Securitization of economic dependencies—especially in strategic sectors (semiconductors, telecoms, cloud)—frames partner states as security risks and exposes them to blacklists, de-risking campaigns, and sudden loss of market access.
Process tracing of export controls and blacklisting episodes; chronologies of sanction/policy actions affecting firms and partners; policy documents and public lists (e.g., export-control lists). (Data sources: export-control lists, sanction policy documents, corporate/access denials; sample sizes not specified.)
high negative China-US Trade War and the Challenges for Developing Countri... incidence of blacklisting/sanctions affecting partners, sudden changes in market...
Large-scale AI models have significant energy and resource costs, creating a notable environmental footprint that must be addressed.
Narrative integration of prior empirical studies measuring compute, energy consumption, and embodied emissions of large models (cited literature); the review does not present new quantitative measurements itself.
high negative The Evolution and Societal Impact of Artificial Intelligence... energy consumption, carbon emissions, and resource use associated with large-sca...
As AI is deployed in safety-critical domains, reliability, regulation, and human-oriented system design become essential to avoid harms.
Review of literature on safety-critical systems, human–machine interaction studies, and regulatory policy discussions; the paper reports this as a consensus implication rather than presenting new empirical tests.
high negative The Evolution and Societal Impact of Artificial Intelligence... system reliability/safety and risk of harm in safety-critical deployments
Stronger empirical evidence is needed on how hazard, exposure, and vulnerability interact across space and time to shape aggregated multi-risks.
Evaluation of project activities and case studies identifying gaps in empirical spatio-temporal analyses of interacting risk components; synthesis recommends targeted empirical work.
high negative Reducing risk together: moving towards a more holistic appro... empirical understanding of spatio-temporal interactions among hazard, exposure, ...
The current literature is skewed toward descriptive and engineering work; there is a lack of causal, field‑experimental evidence on NLP interventions' effects on customer behavior and firm profits.
Review coding of study types in the sample (engineering/descriptive vs. experimental/causal) showing few field experiments or causal designs.
high negative Natural language processing in bank marketing: a systematic ... presence vs. absence of causal/experimental studies measuring effects on custome...
Important gaps include customer acquisition, personalization at scale, use of external text sources (social media, news, reviews), operational process improvement, and cross‑channel integration.
Gap detection via low‑density regions in the UMAP thematic map of sentence‑transformer embeddings and manual review showing low article counts for these topics within the 109‑article sample.
high negative Natural language processing in bank marketing: a systematic ... topical coverage by customer journey stage and source type (acquisition, persona...
Existing literature on NLP in marketing is concentrated around customer retention tasks (e.g., churn prediction, complaint handling, relationship management).
Thematic clustering from sentence‑transformer embeddings of article text combined with UMAP visualization, and manual review of article topics and keywords identifying frequent retention‑related themes.
high negative Natural language processing in bank marketing: a systematic ... topical frequency/coverage by customer journey stage (retention)
NLP applications in bank marketing are severely under‑studied.
Descriptive result from the PRISMA review showing only 8/109 articles focused on NLP in bank marketing (≈7%), plus thematic mapping showing sparse coverage in bank‑marketing/NLP intersection.
high negative Natural language processing in bank marketing: a systematic ... proportion and absolute count of studies at the intersection of NLP and bank mar...
AI‑enabled platforms can magnify winner‑takes‑most dynamics in digital services trade, concentrating market power.
Theoretical and empirical literature on network effects and platform markets reviewed in the paper; illustrative examples (no novel empirical aggregation).
high negative Analysis of Digital Services Trade and Export Competitivenes... market concentration / competition in digital services
Current data governance regimes in China can impede cross‑border data flows.
Comparative policy analysis and literature documenting data localization and privacy/regulatory regimes that restrict flows (descriptive evidence in the review).
high negative Analysis of Digital Services Trade and Export Competitivenes... volume/feasibility of cross‑border data flows
Institutional barriers—fragmented international rules on data flows and privacy, regulatory divergence including data localization, weak participation in multilateral rule setting, and uneven domestic regulation of platforms—impede digital services trade.
Comparative policy analysis and literature review, supported by policy documents and case examples (qualitative evidence; no original econometric tests).
high negative Analysis of Digital Services Trade and Export Competitivenes... cross‑border digital services trade / export competitiveness
Problem C is the practical difficulty of attributing responsibility and agency across distributed socio-technical systems (robots, algorithms, institutions, humans).
Conceptual diagnosis developed in the paper and exemplified with vignettes from three application domains; defined as an analytic concept rather than empirically measured.
high negative Examining ethical challenges in human–robot interaction usin... ability to attribute responsibility/agency in distributed socio-technical system...
Jurisdictions are taking divergent policy approaches (e.g., U.S. emphasis on innovation/competition, EU emphasis on rights/standards like GDPR), producing fragmented digital trade rules.
Comparative legal and policy analysis of existing national/regional rules and international instruments (examples cited include GDPR and U.S. policy orientations); descriptive, with specific regulatory texts analyzed.
high negative Path Analysis of Digital Economy and Reconstruction of Inter... regulatory fragmentation / interoperability of digital trade rules
AI creates novel non-tariff frictions, e.g., pressures toward data localization and regulatory requirements for algorithmic transparency.
Comparative legal and policy analysis of emerging regulations (e.g., data localization laws, algorithmic regulation initiatives) and illustrative jurisdictional examples.
high negative Path Analysis of Digital Economy and Reconstruction of Inter... non-tariff regulatory frictions (data-flow restrictions, transparency/compliance...
Vietnam's civil-law features—statutory specificity, formal procedures, and constitutional principles like legal certainty and fairness—make straightforward AI deployment legally fraught.
Close textual analysis of Vietnam's statutes, constitutional provisions, and administrative procedures (doctrinal legal analysis); no quantitative sample.
high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... legal compatibility of AI deployment (degree of legal obstacles to deployment)
Automated decisions complicate assigning responsibility and hinder judicial and administrative reviewability.
Doctrinal examination of accountability and review mechanisms in administrative law plus comparative institutional analysis of automated decision-making governance.
high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... clarity of accountability (ability to assign responsibility) and effectiveness o...
Opaque AI models risk violating notice, reason-giving, and appeal rights protected under administrative due process.
Analysis of procedural due-process requirements (notice, reason-giving, appeal) in Vietnam's legal framework and assessment of opacity issues in algorithmic systems; qualitative reasoning, no empirical testing.
high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... compliance with due-process requirements (notice, reasons, appealability)
Provider incentives may be misaligned (e.g., optimizing for engagement or test performance instead of durable learning), requiring contracts, regulation, or purchaser design to align incentives.
Consensus from interdisciplinary workshop (50 scholars) highlighting incentive risks and market-design considerations; descriptive, not empirical.
high negative The Future of Feedback: How Can AI Help Transform Feedback t... provider optimization metrics (engagement/test performance) vs. durable learning...
Extensive learner data needed to personalize AI feedback raises privacy and data-governance concerns (consent, storage, usage).
Qualitative consensus from workshop participants (50 scholars) noting data-collection requirements and governance risks; no empirical governance studies included.
high negative The Future of Feedback: How Can AI Help Transform Feedback t... volume/type of learner data collected; privacy risk indicators; compliance with ...
Automated feedback may not capture pedagogical nuances expert teachers use (motivation, socio-emotional cues, complex reasoning), limiting pedagogical fit.
Expert syntheses from the workshop of 50 scholars highlighting limits of automation relative to expert teacher judgment; no empirical comparisons presented.
high negative The Future of Feedback: How Can AI Help Transform Feedback t... coverage of socio-emotional and complex-reasoning cues in feedback; corresponden...
AI-generated feedback can be incorrect, misleading, or misaligned with learning objectives; assessing feedback quality is nontrivial.
Repeated concern raised across workshop participants (50 scholars) in qualitative synthesis; noted as a substantive risk and open challenge rather than empirically quantified here.
high negative The Future of Feedback: How Can AI Help Transform Feedback t... feedback factual correctness; alignment with stated learning objectives; rate of...
Exposure to top-rated exemplar papers produced large reductions in interquartile range (IQR) of estimates—within converging measure families, IQR fell by roughly 80–99%.
Stage 3 of the protocol: after agents were shown top-rated exemplar papers, measured within-measure-family IQRs of agents' estimates decreased substantially; reported quantitative reduction range of 80%–99% within measure families that converged.
high negative Nonstandard Errors in AI Agents percentage reduction in interquartile range (IQR) of effect estimates within mea...
Frontier language models and human editors do not reliably reproduce the evaluative signal contained in institutional publication records.
Comparison of zero-shot frontier-model average accuracy (31%) and human-panel majority-vote accuracy (42%) versus fine-tuned models (up to 59% and higher in economics), indicating that neither zero-shot frontier models nor the human panels matched fine-tuned performance on the held-out benchmarks.
high negative Machines acquire scientific taste from institutional traces Relative prediction accuracy on held-out benchmark(s) of research-pitch quality
Eleven frontier language models (proprietary and open) averaged 31% accuracy on a held-out four-tier benchmark of management research pitches (chance ≈25%); this is only marginally above chance.
Zero-shot (or as-provided) evaluation of eleven state-of-the-art language models on the held-out four-tier management pitches benchmark, yielding an average accuracy of 31% versus chance ≈25%. (Exact list of models and number of benchmark examples not provided in the supplied text.)
high negative Machines acquire scientific taste from institutional traces Accuracy on the four-tier management research-pitch benchmark
Generalization across domains and long-term robustness to adversarial adaptation require further validation.
Authors explicitly note the need for further validation; the paper's reported experiments do not (in the provided summary) disclose broad domain coverage, longitudinal tests, or adversarial evolution studies.
high negative CoMAI: A Collaborative Multi-Agent Framework for Robust and ... generalization across domains; long-term robustness to adaptive adversaries
A modular system may increase engineering complexity and compute overhead compared to a single LLM endpoint.
Authors' caveat in the paper noting higher engineering and compute costs as a trade-off for modularity; the summary does not provide quantitative cost or latency measurements.
high negative CoMAI: A Collaborative Multi-Agent Framework for Robust and ... engineering complexity and compute/resource overhead
Quality of CoMAI depends on rubric design and on how the finite-state machine and agent prompts are specified.
Authors' noted limitation/caveat in the paper that system performance hinges on rubric and prompt/FSM design choices; this is a qualitative dependency rather than an empirically quantified effect in the summary.
high negative CoMAI: A Collaborative Multi-Agent Framework for Robust and ... assessment quality as a function of rubric/FSM/agent prompt design
Using C.A.P. entails trade-offs: potential increases in latency and compute cost and a risk of over-correction (unnecessary clarification).
Paper explicitly notes these trade-offs as part of the design discussion and proposes measuring latency, compute cost, and unnecessary clarification rate in evaluations; this is an acknowledged design risk rather than an empirically quantified result.
high negative A Context Alignment Pre-processor for Enhancing the Coherenc... response latency, compute cost per session, rate of unnecessary clarifications
Integration costs—domain modeling, human-in-the-loop protocols, and regulatory/liability frameworks—are significant barriers to deployment.
Conceptual assessment of operational and regulatory requirements; no quantified cost studies provided.
high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... implementation cost and organizational burden for deploying argumentative AI sys...
AFs and LLMs may be gamed or misled; adversaries may exploit systems leading to strategic argumentation or manipulation.
Conceptual security/adversarial concern based on known vulnerabilities in ML and strategic behavior; no adversarial tests reported.
high negative Argumentative Human-AI Decision-Making: Toward AI Agents Tha... system vulnerability metrics / susceptibility to adversarial manipulation