The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (7156 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Current paradigms indiscriminately apply computation-intensive strategies like Chain-of-Thought (CoT) to billions of daily queries, causing LLM overthinking that amplifies carbon emissions and operational barriers.
Claim/assertion in the paper framing the problem (conceptual/observational argument; no specific empirical backing provided in the abstract).
high negative EcoThink: A Green Adaptive Inference Framework for Sustainab... carbon emissions and operational barriers from LLM overthinking
There is a potential for exclusion due to limited digital footprints, which can limit who benefits from AI-driven finance.
Abstract explicitly identifies potential exclusion of people with limited digital footprints as a challenge, based on qualitative interviews and case-study evidence.
high negative Artificial Intelligence, Climate Resilience, and Financial I... exclusion due to digital footprints
Data privacy concerns are a notable challenge in deploying AI-driven financial solutions.
Abstract lists data privacy concerns among identified challenges drawn from interviews and analysis across the three case studies.
Infrastructure limitations pose a barrier to adoption and effective use of AI-enabled financial services.
Abstract identifies infrastructure limitations as a challenge, based on qualitative interviews and case-study evidence.
high negative Artificial Intelligence, Climate Resilience, and Financial I... infrastructure constraints on adoption
Digital literacy gaps are a challenge limiting the effectiveness and inclusion of AI-driven financial solutions.
Abstract lists digital literacy gaps among identified challenges, based on qualitative insights from the 1,500 interviews and case-study observations.
high negative Artificial Intelligence, Climate Resilience, and Financial I... digital literacy barriers to adoption
Triangulation with market data and sentiment analysis confirms that public enthusiasm often outpaces actual technological readiness.
Paper states market data and sentiment analysis were used to triangulate findings and reports this systematic gap; no numeric effect sizes or sample counts provided.
high negative Emerging Technologies Based on Large AI Models and the Desig... gap between public enthusiasm (sentiment) and technological readiness
Algorithmic management functions as 'psychological governance' that erodes worker mental health through surveillance, opacity, and precarity.
Synthesis/conclusion from integrating findings across the reviewed literature (48 studies) and the trilevel theoretical framework.
high negative Algorithmic Control and Psychological Risk in Digitally Mana... worker mental health (general deterioration)
Fear of deactivation (automated sanctions) creates chronic precarity; 78% report chronic fear.
Reported prevalence in the paper's synthesis of studies that measured fear of deactivation / account suspension among platform workers.
high negative Algorithmic Control and Psychological Risk in Digitally Mana... self-reported chronic fear of deactivation
Task defragmentation (fragmenting tasks via platform algorithms) leads to a reduced sense of accomplishment among drivers.
Thematic finding/proposition from the trilevel framework based on qualitative and quantitative evidence synthesized across studies.
high negative Algorithmic Control and Psychological Risk in Digitally Mana... reduced sense of accomplishment
Rating pressure is associated with emotional exhaustion, with 41–67% reporting high burnout.
Reported prevalence range in the paper's synthesis of included studies measuring burnout/emotional exhaustion among workers exposed to rating systems.
high negative Algorithmic Control and Psychological Risk in Digitally Mana... emotional exhaustion / high burnout prevalence
Income volatility from dynamic pricing is associated with depressive symptoms (reported prevalence range 23–41%).
Reported prevalence range in the paper's synthesized findings (from included empirical studies reporting depressive symptom prevalence among affected workers).
high negative Algorithmic Control and Psychological Risk in Digitally Mana... prevalence of depressive symptoms
Algorithmic opacity is linked to procedural anxiety.
Thematic proposition from the trilevel framework reported in the paper synthesizing pathways from algorithmic control to psychological risk.
Real estate pro forma development remains one of the most time-intensive functions in property investment, typically requiring twenty to forty hours per multifamily project through manual research, Excel-based modeling, and iterative scenario analysis.
Statement in paper asserting typical industry practice; not tied to the paper's controlled test. No empirical sample size or survey data reported alongside this assertion.
Policymakers in the EU and beyond will need to change course, and soon, if they are to effectively govern the next generation of AI technology.
Authors' prescriptive conclusion based on their analysis of shortcomings in the EU AI Act and institutional frameworks (policy recommendation; no empirical sample size in excerpt).
high negative Regulating AI Agents need for regulatory/policy change to effectively govern AI agents
The Act's allocation of monitoring and enforcement responsibilities, reliance on industry self-regulation, and level of government resourcing illustrate how a regulatory framework designed for conventional AI systems can be ill-suited to AI agents.
Authors' institutional analysis of the EU AI Act's monitoring/enforcement allocation, reliance on self-regulation, and resourcing (qualitative legal/institutional analysis; no quantitative sample size in excerpt).
high negative Regulating AI Agents fit between regulatory institutional design and requirements for governing AI ag...
The EU AI Act faces significant obstacles in confronting governance challenges arising from AI agents, such as unequal access to the economic opportunities afforded by AI agents.
Authors' argument that the Act may not prevent or address unequal access to benefits of AI agents (policy/legal analysis; no empirical sample size in excerpt).
high negative Regulating AI Agents distribution of economic opportunities from AI agents
The EU AI Act faces significant obstacles in confronting governance challenges arising from AI agents, such as the risk of misuse of agents by malicious actors.
Authors' analysis highlighting misuse risks and the Act's limitations in addressing them (policy/legal analysis; no empirical sample size in excerpt).
high negative Regulating AI Agents risk of malicious misuse and regulatory capacity to mitigate it
The EU AI Act faces significant obstacles in confronting governance challenges arising from AI agents, such as performance failures in autonomous task execution.
Authors' analytical argument that the Act's design and provisions do not adequately address autonomous performance failures (policy/legal analysis; no empirical sample size provided in excerpt).
high negative Regulating AI Agents ability of regulation to address performance failures (error rates / autonomous ...
The EU AI Act was promulgated prior to the development and widespread use of AI agents.
Factual/timing claim by the authors referencing the Act's adoption date relative to development and proliferation of AI agents (historical/policy analysis; dates verifiable externally).
high negative Regulating AI Agents temporal alignment between regulation and technology development
AI agents present particularly pressing questions for the European Union's AI Act.
Authors' normative/analytical claim based on the perceived fit between AI agents' characteristics and the EU AI Act's design (policy/legal analysis; no empirical sample size in excerpt).
high negative Regulating AI Agents regulatory adequacy of the EU AI Act for AI agents
AI can promote enterprises to adopt different income distribution modes by improving the marginal output of capital and substituting low-skilled labor (technology bias).
Theoretical mechanism articulated in the paper based on capital-labor substitution principle and factor reward theory; implied empirical testing using firm-level data.
high negative THE IMPACT OF ARTIFICIAL INTELLIGENCE ON ENTERPRISE INCOME D... labor compensation relative to capital returns / labor share
Work autonomy weakens the positive effect of AI avoidance job crafting on work alienation (buffering moderation).
Moderation analysis in the same dataset (287 employee–leader dyads) showing a significant interaction between AI avoidance job crafting and work autonomy predicting lower work alienation when autonomy is higher.
The negative effect of AI avoidance job crafting on career-relevant outcomes (career satisfaction and performance) is mediated by increased work alienation.
Mediation analysis on the multi-wave, multi-source survey data (287 employee–leader dyads) showing a pathway from AI avoidance job crafting → work alienation → worse career outcomes.
high negative Approach or avoidance? A dual-pathway model of job crafting ... career satisfaction and performance (mediated by work alienation)
AI avoidance job crafting negatively predicts career satisfaction and performance.
Multi-source, multi-wave survey of 287 employee–leader dyads in China linking employee-reported AI avoidance job crafting to lower career satisfaction and lower performance.
high negative Approach or avoidance? A dual-pathway model of job crafting ... career satisfaction and performance
AI-driven job displacement disproportionately affects low-skilled workers.
Reported empirical result from the paper's PLS-SEM analysis on the 351-respondent dataset.
Traditional car-following models, such as the Intelligent Driver Model (IDM), often struggle to generalize across diverse traffic scenarios and typically do not account for fuel efficiency.
Literature-based statement within the paper motivating the study (review of limitations of traditional car-following models). No sample size reported.
high negative Macroscopic Characteristics of Mixed Traffic Flow with Deep ... model generalizability and accounting for fuel efficiency
Analysis of global datasets on energy dependency, economic concentration, debt levels, demographic trends, digital infrastructure, and AI adoption highlights that interconnected systemic risks can amplify economic instability.
Paper reports drawing upon multiple global datasets (energy dependency, economic concentration, debt, demographics, digital infrastructure, AI adoption) to analyze systemic risk interactions; specific datasets, sample sizes, and statistical methods are not detailed in the excerpt.
high negative Beyond Forecasting: Adaptive Economic Preparedness in a Geop... amplification of economic instability by interconnected systemic risks
Events such as supply chain disruptions, oil price surges linked to geopolitical conflicts, and sudden labour market shifts due to reverse migration have exposed the limitations of prediction-based planning frameworks.
Illustrative examples cited in the paper; the claim is supported by referenced global events and the paper's use of global datasets, but no specific empirical case-study sample sizes or quantification are provided in the excerpt.
high negative Beyond Forecasting: Adaptive Economic Preparedness in a Geop... exposure of limitations in prediction-based planning frameworks
Traditional economic models that rely heavily on historical data and linear forecasting are increasingly inadequate in capturing the complexity and unpredictability of contemporary economic shocks.
Conceptual claim supported by discussion and examples of recent shocks (supply chain disruptions, oil price surges, labor market shifts); no specific empirical evaluation or quantified model comparison reported in the excerpt.
high negative Beyond Forecasting: Adaptive Economic Preparedness in a Geop... predictive adequacy of traditional economic models
The global economic system is undergoing a structural transformation characterized by geopolitical tensions, energy price volatility, trade fragmentation, demographic imbalances, and rapid technological disruption driven by artificial intelligence.
Narrative synthesis in the paper drawing on global trends; the paper references global datasets on energy dependency, trade patterns, demographics, and AI adoption (no specific sample size or empirical study detailed in the excerpt).
high negative Beyond Forecasting: Adaptive Economic Preparedness in a Geop... structural transformation of the global economic system (presence of geopolitica...
The main risk is not merely copying, but the possibility that useful capability can be transferred more cheaply than the governance structure that originally accompanied it.
Conceptual threat model articulated in the paper; argued on normative/theoretical grounds without reported empirical measurement or sample.
high negative A Public Theory of Distillation Resistance via Constraint-Co... relative_cost/ease_of_capability_transfer_vs_governance_transmission
Distillation becomes less valuable as a shortcut when high-level capability is coupled to internal stability constraints that shape state transitions over time.
Theoretical argument presented as the paper's core claim; introduces a conceptual mechanism (capability-stability coupling) and argues why this would reduce the usefulness of distillation. No empirical data, experiments, or sample are reported.
high negative A Public Theory of Distillation Resistance via Constraint-Co... value_of_distillation / usefulness_of_distillation_as_a_shortcut
Hallucination and content filtering are the most common frustrations reported across all platforms.
Qualitative and/or survey-coded responses about user frustrations aggregated across platforms (overall N=388); paper reports these two issues as the most common.
high negative Beyond Benchmarks: How Users Evaluate AI Chat Assistants reported frustrations (hallucination and content filtering)
The competence shadow compounds multiplicatively to produce degradation far exceeding naive additive estimates.
Analytic/closed-form performance bounds derived in the paper showing multiplicative compounding (theoretical result; no empirical sample reported).
The competence shadow is a systematic narrowing of human reasoning induced by AI-generated safety analysis; it is defined as not what the AI presents, but what it prevents from being considered.
Conceptual definition and formalization within the paper (theoretical exposition; no empirical test reported).
Safety engineering resists benchmark-driven evaluation because safety competence is irreducibly multidimensional, constrained by context-dependent correctness, inherent incompleteness, and legitimate expert disagreement.
Conceptual/theoretical argument and formalization presented in the paper (no empirical sample reported).
In experimental settings, the model is able to induce belief and behaviour changes in study participants.
Controlled experimental interventions reported in the study where participant beliefs and behaviors were measured pre/post or between conditions; aggregate result: model induced changes.
high negative Evaluating Language Models for Harmful Manipulation participant beliefs and behaviour changes (manipulative efficacy)
The tested model can produce manipulative behaviours when prompted to do so.
Human-AI interaction tests in which the model was prompted to produce manipulative behaviours; empirical observations reported in study across participants and prompts.
high negative Evaluating Language Models for Harmful Manipulation frequency/occurrence of manipulative behaviours (model propensity to produce man...
Standard evaluation of LLM confidence relies on calibration metrics (ECE, Brier score) that conflate two distinct capacities: how much a model knows (Type-1 sensitivity) and how well it knows what it knows (Type-2 metacognitive sensitivity).
Authors' conceptual argument and motivation for introducing a new evaluation framework; contrasted standard calibration metrics (ECE, Brier) with Type-1 vs Type-2 capacities in the paper's introduction and methods.
high negative Do LLMs Know What They Know? Measuring Metacognitive Efficie... confounding of calibration metrics between Type-1 sensitivity (knowledge) and Ty...
Traditional expert-based assessment faces a critical scalability challenge in large systems (e.g., serving 36 million children across 250,000+ kindergartens in China), making continuous quality monitoring infeasible and relegating assessment to infrequent episodic audits.
Authors' contextual motivation citing scale figures (36 million children, 250,000+ kindergartens) and describing time/cost constraints of manual observation leading to infrequent audits.
high negative When AI Meets Early Childhood Education: Large Language Mode... feasibility/scalability of manual expert-based assessment
There is a significant boundary in the reverse confidence scenario: a substantial proportion of participants struggled to override initial inductive biases and thus had difficulty learning in that condition.
Behavioral experiment (N = 200) reporting that many participants failed or struggled in the reverse confidence mapping condition; proportion described in paper (exact proportion not given here).
high negative Learning to Trust: How Humans Mentally Recalibrate AI Confid... failure/struggle rate in reverse confidence condition (ability to learn mappings...
Preliminary evaluation reveals that current foundation action models struggle substantially with professional desktop applications (~60% task failure rate).
Preliminary empirical evaluation reported by the authors; reported task failure rate ~60% (no sample size provided in abstract).
high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... task failure rate of foundation action models on professional desktop applicatio...
The largest existing open dataset, ScaleCUA, contains only 2 million screenshots, equating to less than 20 hours of video.
Quantitative statement about ScaleCUA reported in paper: 2,000,000 screenshots and <20 hours equivalence.
high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... size/coverage of existing open dataset (ScaleCUA)
Progress toward general-purpose CUAs is bottlenecked by the scarcity of continuous, high-quality human demonstration videos.
Asserted in paper as motivation; refers to the gap in available continuous video data for training CUAs.
high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... availability of continuous, high-quality human demonstration videos (data scarci...
Refining the state (as above) raises state-action blind mass from 0.0165 at \tau=50 to 0.1253 at \tau=1000.
Empirical measurement reported on the instantiated model over the BPI 2019 log showing state-action blind mass values at two threshold (tau) settings.
high negative The Stochastic Gap: A Markovian Framework for Pre-Deployment... state-action blind mass (measure of unsupported next-step decisions)
Empirical evidence shows that many failures arise from miscalibrated reliance, including overuse when AI is wrong and underuse when it is helpful.
Paper cites empirical literature (unspecified in excerpt) as the basis for this claim; no sample size or methods given here.
high negative From Accuracy to Readiness: Metrics and Benchmarks for Human... failures due to miscalibrated reliance (overreliance/underreliance)
Evaluation practices focus primarily on model accuracy rather than whether human-AI teams are prepared to collaborate safely and effectively.
Paper-level critique / literature observation asserted in text; no empirical method or sample reported in excerpt.
high negative From Accuracy to Readiness: Metrics and Benchmarks for Human... evaluation focus (accuracy vs. team readiness)
The reduction in engagement from AI labeling (AI-generated or AI-enhanced) was particularly pronounced for emotional content compared to rational content.
Interaction of content type (emotional vs. rational) with labeling in the two online experiments (study 1: n = 325; study 2: n = 371) reported in the abstract.
high negative AI content labeling and user engagement on social media: The... affective and behavioral engagement for emotional content
Labeling content as AI-enhanced reduced both affective and behavioral engagement compared to human-created content.
Same two online experiments on Prolific (study 1: n = 325; study 2: n = 371) where participants viewed Instagram profiles labeled as human-created, AI-enhanced, or AI-generated.
high negative AI content labeling and user engagement on social media: The... affective and behavioral engagement
Labeling content as AI-generated reduced both affective and behavioral engagement compared to human-created content.
Two online experiments conducted via Prolific (study 1: n = 325; study 2: n = 371). Participants viewed Instagram profiles containing visual content labeled as human-created, AI-enhanced, or AI-generated and engagement was measured.
high negative AI content labeling and user engagement on social media: The... affective and behavioral engagement