The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (11633 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Automation functions as a transnational shock that contracts demand for migrant labor in advanced economies.
Theoretical argument drawing on economic geography, labor economics, and development studies; comparative/regional field evidence referenced in the paper (no numerical sample size reported).
In algorithm-triggered emotional escalations, workers showed lower engagement: they sent fewer messages, contributed a smaller share of total chat rounds, and showed less proactivity in information seeking and solution provision.
Behavioral measures derived from chat logs in the randomized experiment comparing worker actions post-escalation across escalation types; reported differences in message counts, share of rounds, and proxies for proactivity.
high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... worker engagement measures (message count, share of chat rounds, proactivity ind...
Human intervention is less effective in algorithm-triggered emotional escalations (where customers express frustration or dissatisfaction).
Experimental subgroup analysis comparing intervention outcomes for algorithm-triggered emotional escalations versus technical escalations; emotional escalations showed worse post-intervention outcomes.
high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... service quality after emotional escalations
AI deployment substantially lowers ratings for AI-eligible chats.
Randomized field experiment measuring customer ratings for AI-eligible chats; treated condition (AI + human oversight) produced substantially lower ratings relative to control (humans only).
high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... customer ratings for AI-eligible chats
AI deployment reduces average chat duration.
Randomized field experiment on Alibaba's Taobao platform: workers in treatment supervised an agentic AI resolving AI-eligible chats while handling AI-ineligible chats; control workers resolved all chats without AI. Effect observed on average chat duration in experiment data.
Rather than restoring stability, this cycle intensifies anxiety, undermines mastery, and erodes professional confidence.
Theoretical claim about psychological outcomes from the conceptual reskilling loop; paper provides argumentation but no empirical measurements.
high negative AI-driven skill volatility and the emergence of re-skilling ... anxiety, sense of mastery, professional confidence
Based on Job Demands–Resources (JD-R) theory and Conservation of Resources (COR) theory, the paper conceptualizes an AI-induced reskilling loop in which ongoing technological change leads to skill erosion, continuous reskilling demands, cognitive and emotional depletion, and reinforced learning as a defensive response to perceived obsolescence.
Theoretical model/loop derived from applying JD-R and COR frameworks; no empirical test or sample reported in the paper.
high negative AI-driven skill volatility and the emergence of re-skilling ... cognitive/emotional depletion and defensive learning responses
The paper introduces the concept of 'reskilling fatigue' to explain the human consequences of persistent skill volatility among Established Knowledge Professionals (EKPs).
Conceptual/theoretical contribution presented by the authors; definition and argumentation rather than empirical validation.
high negative AI-driven skill volatility and the emergence of re-skilling ... experience of reskilling fatigue among EKPs
Continuous reskilling is widely promoted as a solution to AI-driven disruption, but little attention has been paid to its cumulative psychological costs.
Argument from literature review/observation in the paper; no empirical measurement or sample reported in the paper.
high negative AI-driven skill volatility and the emergence of re-skilling ... psychological costs of continuous reskilling (e.g., fatigue, stress)
Unless labour law evolves to address digitally mediated control and platform-based asymmetry, the gig economy risks normalising exploitative labour conditions under the guise of innovation and flexibility.
Predictive/theoretical claim based on the paper's synthesis of platform practices, legal gaps, and normative concerns; argued through comparative analysis and conceptual reasoning rather than quantitative forecasting.
high negative Corporate Accountability in the Gig Economy: Re-examining La... future trajectory of labour conditions and normalization of exploitative practic...
The paper uses the concept of 'digital slavery' as a normative framework to describe labour conditions shaped by coercive algorithmic management, absence of bargaining power, and structural precarity.
Conceptual and normative framing within the paper, using the 'digital slavery' metaphor to interpret observed platform labour practices and their implications; theoretical argumentation rather than empirical measurement.
high negative Corporate Accountability in the Gig Economy: Re-examining La... characterisation of labour conditions under algorithmic management
While several jurisdictions (UK, US, EU, India) have attempted to regulate gig work, most regulatory responses remain incomplete and fail to fully address platform accountability.
Comparative policy/regulatory analysis of the United Kingdom, United States, European Union and India assessing statutes, litigation and policy measures; qualitative assessment rather than statistical evaluation (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... completeness/effectiveness of regulatory responses to platform accountability
Platform companies rely on contractual misclassification, corporate structuring, and the legal fiction of neutrality to separate control from liability.
Legal and corporate-structure analysis across jurisdictions, examining contracts, corporate forms and legal doctrines; based on comparative statutory and case-law review (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... allocation of legal liability and regulatory accountability
The platform economy produces a deeply unequal labour structure marked by algorithmic control, economic dependency, surveillance, and lack of social protection.
Synthesis and critical analysis combining literature, policy review and comparative jurisdictional study to argue systemic effects on labour structure; primarily qualitative evidence and theoretical framing (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... distributional labour outcomes and social protection coverage
Gig workers, though formally classified as independent contractors, are functionally subjected to pricing control, performance monitoring, automated penalties, and deactivation mechanisms that closely resemble managerial authority.
Descriptive/qualitative evidence in the paper: examples and analysis of platform design and management practices (algorithmic pricing, monitoring, penalties, deactivation); based on platform policy documents, case examples and comparative review (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... degree of algorithmic/managerial control over workers
Digital labour platforms exercise employer-like control while avoiding employer-like legal responsibilities.
Argument and comparative legal analysis across jurisdictions (United Kingdom, United States, European Union, India) demonstrating platform practices and legal/regulatory responses; based on documentary/legal review and critical analysis (no quantitative sample size reported).
high negative Corporate Accountability in the Gig Economy: Re-examining La... legal employment classification and control/responsibility
Shifts persist in even the newest AI models despite remarkable progress in AI modeling, post-training alignment and safeguards.
Asserted in paper; supported by later empirical validation across multiple models and production chatbots (see other claims), but no explicit sample size in this sentence.
high negative Fusion-fission forecasts when AI will shift to undesirable b... persistence of undesirable behavioral shifts despite alignment/safeguards
ChatGPT-like AI behavior can shift, unnoticed, from desirable to undesirable (e.g., encouraging self-harm, extremist acts, financial losses, or costly medical and military mistakes), and no one can yet predict when.
Statement in paper framing the problem; qualitative observations and motivating examples (no numeric sample size provided in the excerpt).
high negative Fusion-fission forecasts when AI will shift to undesirable b... occurrence of unnoticed shifts from desirable to undesirable outputs
These characteristics are properties of the tasks themselves rather than limitations of current AI models.
Conceptual argument in the paper asserting task-inherent properties drive resistance to automation; supported by theory and argumentation, not by empirical model-comparison experiments.
high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... source of automation limitation (task-inherent vs model limitation)
The resistance of Metis tasks to automation is not due to computational intractability but to institutional, social, and normative entanglements.
Theoretical argument differentiating computational from institutional/social/normative causes; supported by citations and cross-disciplinary theory rather than empirical causal identification.
high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... cause of automation resistance
There exists a class of entirely digital tasks, called 'Metis AI', that resist reliable AI automation.
Conceptual identification and definition introduced by the authors; supported by theoretical grounding in social sciences, philosophy, and humanitarian practice rather than empirical trials or quantified samples.
high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... resistance to reliable AI automation
That digital-vs-physical framing misses the most consequential boundary: the one within digital tasks.
Normative/theoretical argument presented in the paper contrasting existing framing with a proposed alternative; grounded in cross-disciplinary literature rather than empirical measurement.
high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... relevance of boundary framing for AI capabilities
Severe penalties in underfunded Eastern systems, mediated by financial distress, drive families toward resource exhaustion.
Cross-country comparisons in SHARE-derived analyses showing larger financial penalties in underfunded Eastern European systems, with mediation analysis implicating financial distress and resultant resource exhaustion.
high negative The Broken Shield of European Palliative Care: Evidence from... Household resource exhaustion / severe financial toxicity in underfunded Eastern...
Financial distress acts as a profound multiplier of the burdens associated with palliative care.
Interaction/moderation analyses in SHARE-derived synthetic data showing that pre-existing financial distress amplifies financial and caregiving burdens under PC.
high negative The Broken Shield of European Palliative Care: Evidence from... Magnitude of financial toxicity / household financial burden under PC, condition...
Socio-demographics heavily modulate exposure: lacking a spousal net inflates the burden.
Subgroup/moderation analyses in SHARE-derived data comparing households with and without spousal support, showing higher burdens when no spouse is present.
high negative The Broken Shield of European Palliative Care: Evidence from... Increased household burden (financial/time) when no spousal support is available
Non-cancer trajectories drive massive structural penalties that escalate at the distribution's tail, mechanically compounded by physical dependency.
Stratified analyses by disease trajectory (non-cancer vs cancer) using SHARE data (2016-2021) and quantile models showing larger penalties for non-cancer cases, especially in tail quantiles; physical dependency identified as a compounding factor.
high negative The Broken Shield of European Palliative Care: Evidence from... Increased financial penalties/out-of-pocket expenditures (especially at tails) a...
Quantile treatment models expose a 'broken shield' for vulnerable households and severe tail events (PC protection fails or reverses at distributional tails).
Application of quantile treatment effect models to synthesized SHARE-derived digital twins (2016-2021), explicitly examining distributional/tail effects.
high negative The Broken Shield of European Palliative Care: Evidence from... Extreme-tail outcomes of out-of-pocket expenditures and caregiving burden
Parsing through LLM-generated code can be tedious and time-consuming, potentially negating the productivity gains promised by AI-coding tools.
Motivation/background statement in the paper: a qualitative claim about the cost (time/effort) of reviewing LLM-generated code; presented as motivation rather than empirically quantified evidence in the excerpt.
high negative Viverra: Text-to-Code with Guarantees time/effort required to review LLM-generated code
Employees experience technostress, anxiety and micro-political negotiation around AI tools in everyday work.
Reported experiences from semistructured interviews with 28 managers/professionals across 12 organizations; thematic analysis highlighting technostress and anxiety as themes.
high negative Reimagining work in the age of intelligent automation: a qua... technostress and anxiety among employees
An analysis of a 21-instrument inventory identifies an incentive gradient where geopolitical and industrial pressures systematically reward surface-level behavioral proxies over deep structural verification.
Empirical/qualitative analysis of an inventory of 21 governance instruments compiled and analysed in the paper (n=21 instruments).
high negative Position: Behavioural Assurance Cannot Verify the Safety Cla... governance_and_regulation
Behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify.
The paper's normative and conceptual argument synthesising governance requirements and the epistemic limits of behavioural testing.
Current assurance methodologies (primarily behavioural evaluations and red-teaming) are epistemically limited to observable model outputs and cannot verify latent representations or long-horizon agentic behaviours.
Conceptual/analytic argument and review of existing assurance methodologies presented in the paper.
Overthinking is a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.
Conclusion drawn by authors based on their empirical findings described in the abstract (amplification of output length across multiple models and transferability experiments).
high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... presence of shared vulnerability across models (qualitative security posture)
This overthinking behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS)-style resource exhaustion.
Authors assert increased latency and energy consumption as consequences of longer reasoning traces; framed as a potential attack vector in the abstract (no quantitative latency/energy measurements provided in abstract).
high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... inference latency and energy consumption
Large reasoning models (LRMs) exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces when confronted with incomplete or logically inconsistent inputs.
Empirical observation reported by the authors based on experiments described in the paper (abstract references experiments across multiple SOTA reasoning models); no numerical sample size for inputs reported in abstract.
high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... response length / reasoning trace length (verbosity and redundancy)
Distinct readability issue patterns and limited effectiveness of prompt engineering reveal a latent technical debt in LLM-generated code that could affect long-term maintainability.
Interpretation/conclusion in paper combining empirical findings (distinct issue patterns and limited prompt impact) to argue for potential technical debt and maintainability risks; presented as a forward-looking implication rather than a quantified causal estimate.
high negative The Readability Spectrum: Patterns, Issues, and Prompt Effec... maintainability_risk / technical_debt_inferred_from_readability
LLM-generated code displays distinct readability issue patterns compared to human-written code.
Empirical analysis of readability subcomponents/features showing different patterns of readability issues between LLM-generated and human-written code (paper reports qualitative/quantitative distinctions in issue patterns).
high negative The Readability Spectrum: Patterns, Issues, and Prompt Effec... readability_issue_patterns (feature-level readability problems)
Policy responses in Europe are fragmented across the EU and Member State levels and do not match the potential scale of disruption from AGI.
Paper's policy analysis of EU- and Member-State-level responses (stated in abstract); no quantitative metrics provided in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
Europe has low rates of industrial AI adoption.
Paper's empirical/policy review claiming low industrial AI adoption in Europe (as stated in abstract); the abstract does not provide numeric adoption rates or sample sizes.
Europe exhibits structural weaknesses in compute infrastructure and talent retention.
Paper's structural assessment of Europe's AI value-chain capabilities (stated in abstract); no numerical measures provided in the abstract.
Europe has limited strategic awareness of frontier AI progress.
Paper's assessment of Europe's positioning based on policy analysis and review of capabilities monitoring (as stated in abstract); no supporting metrics or sample sizes provided in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
AGI could strain existing governance frameworks.
Paper's policy analysis describing potential mismatches between governance capacity and AGI-induced disruptions (as stated in abstract); no empirical tests or quantification reported in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
AGI could intensify interstate competition.
Paper's geopolitical analysis and scenario-based reasoning informed by trends in AI capabilities (stated in abstract); no quantitative measures reported in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
AGI could fundamentally alter the global distribution of economic and military power.
Paper's geopolitical analysis drawing on capability trends and scenario reasoning (as stated in abstract); no empirical quantification provided in the abstract.
high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation
Increased levels of AI assistance may degrade productivity, leading to potentially significant shortfalls under the model's identified conditions.
Model-based comparative-statics and steady-state analysis showing scenarios where marginal increases in AI assistance reduce expected task output; examples/parameter illustrations provided in the paper (theoretical, no empirical sample).
high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... expected task output / productivity shortfalls associated with increased AI assi...
Introducing AI unreliability (errors/noise in AI outputs) in the model can also generate a productivity paradox: greater AI assistance may lower productivity.
Analytical/theoretical model incorporating AI unreliability; model derivations and examples demonstrating conditions under which unreliability leads to reduced productivity (no empirical data).
high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... agent productivity (task output) as influenced by AI assistance and AI unreliabi...
Incorporating endogeneity in skill development into the model can induce a productivity paradox where increased AI assistance reduces productivity.
Analytical/theoretical model of human-AI interaction with utility-maximizing human agents and endogenous skill development; steady-state and comparative-static analysis reported in the paper (no empirical sample).
high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... agent productivity (task output) as a function of AI assistance and endogenous s...
Simulated users produce feedback dynamics that diverge from humans.
Temporal/interaction analysis in the replication showing differences in how simulators provide feedback across multi-turn interactions compared to humans.
high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... feedback/interaction dynamics over multi-turn conversations (simulator vs human)
Simulated users exhibit amplified position biases relative to human participants.
Behavioral comparison in the simulator replication showing stronger position biases in simulated responses than in human responses.
high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... magnitude of position bias in simulated vs human responses
Simulated users discuss different topics compared to the human participants.
Analysis of conversation content in the simulator replication showing differences in topical distribution between simulators and humans.
high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... topic distribution of conversations produced by simulators versus humans