Evidence (11633 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Automation functions as a transnational shock that contracts demand for migrant labor in advanced economies.

Theoretical argument drawing on economic geography, labor economics, and development studies; comparative/regional field evidence referenced in the paper (no numerical sample size reported).

high negative Automation, Migration, and Development: Geography of Job Pre... demand for migrant labor

In algorithm-triggered emotional escalations, workers showed lower engagement: they sent fewer messages, contributed a smaller share of total chat rounds, and showed less proactivity in information seeking and solution provision.

Behavioral measures derived from chat logs in the randomized experiment comparing worker actions post-escalation across escalation types; reported differences in message counts, share of rounds, and proxies for proactivity.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... worker engagement measures (message count, share of chat rounds, proactivity ind...

Human intervention is less effective in algorithm-triggered emotional escalations (where customers express frustration or dissatisfaction).

Experimental subgroup analysis comparing intervention outcomes for algorithm-triggered emotional escalations versus technical escalations; emotional escalations showed worse post-intervention outcomes.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... service quality after emotional escalations

AI deployment substantially lowers ratings for AI-eligible chats.

Randomized field experiment measuring customer ratings for AI-eligible chats; treated condition (AI + human oversight) produced substantially lower ratings relative to control (humans only).

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... customer ratings for AI-eligible chats

AI deployment reduces average chat duration.

Randomized field experiment on Alibaba's Taobao platform: workers in treatment supervised an agentic AI resolving AI-eligible chats while handling AI-ineligible chats; control workers resolved all chats without AI. Effect observed on average chat duration in experiment data.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... average chat duration

Rather than restoring stability, this cycle intensifies anxiety, undermines mastery, and erodes professional confidence.

Theoretical claim about psychological outcomes from the conceptual reskilling loop; paper provides argumentation but no empirical measurements.

high negative AI-driven skill volatility and the emergence of re-skilling ... anxiety, sense of mastery, professional confidence

Based on Job Demands–Resources (JD-R) theory and Conservation of Resources (COR) theory, the paper conceptualizes an AI-induced reskilling loop in which ongoing technological change leads to skill erosion, continuous reskilling demands, cognitive and emotional depletion, and reinforced learning as a defensive response to perceived obsolescence.

Theoretical model/loop derived from applying JD-R and COR frameworks; no empirical test or sample reported in the paper.

high negative AI-driven skill volatility and the emergence of re-skilling ... cognitive/emotional depletion and defensive learning responses

The paper introduces the concept of 'reskilling fatigue' to explain the human consequences of persistent skill volatility among Established Knowledge Professionals (EKPs).

Conceptual/theoretical contribution presented by the authors; definition and argumentation rather than empirical validation.

high negative AI-driven skill volatility and the emergence of re-skilling ... experience of reskilling fatigue among EKPs

Continuous reskilling is widely promoted as a solution to AI-driven disruption, but little attention has been paid to its cumulative psychological costs.

Argument from literature review/observation in the paper; no empirical measurement or sample reported in the paper.

high negative AI-driven skill volatility and the emergence of re-skilling ... psychological costs of continuous reskilling (e.g., fatigue, stress)

Unless labour law evolves to address digitally mediated control and platform-based asymmetry, the gig economy risks normalising exploitative labour conditions under the guise of innovation and flexibility.

Predictive/theoretical claim based on the paper's synthesis of platform practices, legal gaps, and normative concerns; argued through comparative analysis and conceptual reasoning rather than quantitative forecasting.

high negative Corporate Accountability in the Gig Economy: Re-examining La... future trajectory of labour conditions and normalization of exploitative practic...

The paper uses the concept of 'digital slavery' as a normative framework to describe labour conditions shaped by coercive algorithmic management, absence of bargaining power, and structural precarity.

Conceptual and normative framing within the paper, using the 'digital slavery' metaphor to interpret observed platform labour practices and their implications; theoretical argumentation rather than empirical measurement.

high negative Corporate Accountability in the Gig Economy: Re-examining La... characterisation of labour conditions under algorithmic management

While several jurisdictions (UK, US, EU, India) have attempted to regulate gig work, most regulatory responses remain incomplete and fail to fully address platform accountability.

Comparative policy/regulatory analysis of the United Kingdom, United States, European Union and India assessing statutes, litigation and policy measures; qualitative assessment rather than statistical evaluation (no quantitative sample size reported).

high negative Corporate Accountability in the Gig Economy: Re-examining La... completeness/effectiveness of regulatory responses to platform accountability

Platform companies rely on contractual misclassification, corporate structuring, and the legal fiction of neutrality to separate control from liability.

Legal and corporate-structure analysis across jurisdictions, examining contracts, corporate forms and legal doctrines; based on comparative statutory and case-law review (no quantitative sample size reported).

high negative Corporate Accountability in the Gig Economy: Re-examining La... allocation of legal liability and regulatory accountability

The platform economy produces a deeply unequal labour structure marked by algorithmic control, economic dependency, surveillance, and lack of social protection.

Synthesis and critical analysis combining literature, policy review and comparative jurisdictional study to argue systemic effects on labour structure; primarily qualitative evidence and theoretical framing (no quantitative sample size reported).

high negative Corporate Accountability in the Gig Economy: Re-examining La... distributional labour outcomes and social protection coverage

Gig workers, though formally classified as independent contractors, are functionally subjected to pricing control, performance monitoring, automated penalties, and deactivation mechanisms that closely resemble managerial authority.

Descriptive/qualitative evidence in the paper: examples and analysis of platform design and management practices (algorithmic pricing, monitoring, penalties, deactivation); based on platform policy documents, case examples and comparative review (no quantitative sample size reported).

high negative Corporate Accountability in the Gig Economy: Re-examining La... degree of algorithmic/managerial control over workers

Digital labour platforms exercise employer-like control while avoiding employer-like legal responsibilities.

Argument and comparative legal analysis across jurisdictions (United Kingdom, United States, European Union, India) demonstrating platform practices and legal/regulatory responses; based on documentary/legal review and critical analysis (no quantitative sample size reported).

high negative Corporate Accountability in the Gig Economy: Re-examining La... legal employment classification and control/responsibility

Shifts persist in even the newest AI models despite remarkable progress in AI modeling, post-training alignment and safeguards.

Asserted in paper; supported by later empirical validation across multiple models and production chatbots (see other claims), but no explicit sample size in this sentence.

high negative Fusion-fission forecasts when AI will shift to undesirable b... persistence of undesirable behavioral shifts despite alignment/safeguards

ChatGPT-like AI behavior can shift, unnoticed, from desirable to undesirable (e.g., encouraging self-harm, extremist acts, financial losses, or costly medical and military mistakes), and no one can yet predict when.

Statement in paper framing the problem; qualitative observations and motivating examples (no numeric sample size provided in the excerpt).

high negative Fusion-fission forecasts when AI will shift to undesirable b... occurrence of unnoticed shifts from desirable to undesirable outputs

These characteristics are properties of the tasks themselves rather than limitations of current AI models.

Conceptual argument in the paper asserting task-inherent properties drive resistance to automation; supported by theory and argumentation, not by empirical model-comparison experiments.

high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... source of automation limitation (task-inherent vs model limitation)

The resistance of Metis tasks to automation is not due to computational intractability but to institutional, social, and normative entanglements.

Theoretical argument differentiating computational from institutional/social/normative causes; supported by citations and cross-disciplinary theory rather than empirical causal identification.

high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... cause of automation resistance

There exists a class of entirely digital tasks, called 'Metis AI', that resist reliable AI automation.

Conceptual identification and definition introduced by the authors; supported by theoretical grounding in social sciences, philosophy, and humanitarian practice rather than empirical trials or quantified samples.

high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... resistance to reliable AI automation

That digital-vs-physical framing misses the most consequential boundary: the one within digital tasks.

Normative/theoretical argument presented in the paper contrasting existing framing with a proposed alternative; grounded in cross-disciplinary literature rather than empirical measurement.

high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... relevance of boundary framing for AI capabilities

Severe penalties in underfunded Eastern systems, mediated by financial distress, drive families toward resource exhaustion.

Cross-country comparisons in SHARE-derived analyses showing larger financial penalties in underfunded Eastern European systems, with mediation analysis implicating financial distress and resultant resource exhaustion.

high negative The Broken Shield of European Palliative Care: Evidence from... Household resource exhaustion / severe financial toxicity in underfunded Eastern...

Financial distress acts as a profound multiplier of the burdens associated with palliative care.

Interaction/moderation analyses in SHARE-derived synthetic data showing that pre-existing financial distress amplifies financial and caregiving burdens under PC.

high negative The Broken Shield of European Palliative Care: Evidence from... Magnitude of financial toxicity / household financial burden under PC, condition...

Socio-demographics heavily modulate exposure: lacking a spousal net inflates the burden.

Subgroup/moderation analyses in SHARE-derived data comparing households with and without spousal support, showing higher burdens when no spouse is present.

high negative The Broken Shield of European Palliative Care: Evidence from... Increased household burden (financial/time) when no spousal support is available

Non-cancer trajectories drive massive structural penalties that escalate at the distribution's tail, mechanically compounded by physical dependency.

Stratified analyses by disease trajectory (non-cancer vs cancer) using SHARE data (2016-2021) and quantile models showing larger penalties for non-cancer cases, especially in tail quantiles; physical dependency identified as a compounding factor.

high negative The Broken Shield of European Palliative Care: Evidence from... Increased financial penalties/out-of-pocket expenditures (especially at tails) a...

Quantile treatment models expose a 'broken shield' for vulnerable households and severe tail events (PC protection fails or reverses at distributional tails).

Application of quantile treatment effect models to synthesized SHARE-derived digital twins (2016-2021), explicitly examining distributional/tail effects.

high negative The Broken Shield of European Palliative Care: Evidence from... Extreme-tail outcomes of out-of-pocket expenditures and caregiving burden

Parsing through LLM-generated code can be tedious and time-consuming, potentially negating the productivity gains promised by AI-coding tools.

Motivation/background statement in the paper: a qualitative claim about the cost (time/effort) of reviewing LLM-generated code; presented as motivation rather than empirically quantified evidence in the excerpt.

high negative Viverra: Text-to-Code with Guarantees time/effort required to review LLM-generated code

Employees experience technostress, anxiety and micro-political negotiation around AI tools in everyday work.

Reported experiences from semistructured interviews with 28 managers/professionals across 12 organizations; thematic analysis highlighting technostress and anxiety as themes.

high negative Reimagining work in the age of intelligent automation: a qua... technostress and anxiety among employees

An analysis of a 21-instrument inventory identifies an incentive gradient where geopolitical and industrial pressures systematically reward surface-level behavioral proxies over deep structural verification.

Empirical/qualitative analysis of an inventory of 21 governance instruments compiled and analysed in the paper (n=21 instruments).

high negative Position: Behavioural Assurance Cannot Verify the Safety Cla... governance_and_regulation

Behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify.

The paper's normative and conceptual argument synthesising governance requirements and the epistemic limits of behavioural testing.

high negative Position: Behavioural Assurance Cannot Verify the Safety Cla... ai_safety_and_ethics

Current assurance methodologies (primarily behavioural evaluations and red-teaming) are epistemically limited to observable model outputs and cannot verify latent representations or long-horizon agentic behaviours.

Conceptual/analytic argument and review of existing assurance methodologies presented in the paper.

high negative Position: Behavioural Assurance Cannot Verify the Safety Cla... ai_safety_and_ethics

Overthinking is a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.

Conclusion drawn by authors based on their empirical findings described in the abstract (amplification of output length across multiple models and transferability experiments).

high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... presence of shared vulnerability across models (qualitative security posture)

This overthinking behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS)-style resource exhaustion.

Authors assert increased latency and energy consumption as consequences of longer reasoning traces; framed as a potential attack vector in the abstract (no quantitative latency/energy measurements provided in abstract).

high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... inference latency and energy consumption

Large reasoning models (LRMs) exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces when confronted with incomplete or logically inconsistent inputs.

Empirical observation reported by the authors based on experiments described in the paper (abstract references experiments across multiple SOTA reasoning models); no numerical sample size for inputs reported in abstract.

high negative Inducing Overthink: Hierarchical Genetic Algorithm-based DoS... response length / reasoning trace length (verbosity and redundancy)

Distinct readability issue patterns and limited effectiveness of prompt engineering reveal a latent technical debt in LLM-generated code that could affect long-term maintainability.

Interpretation/conclusion in paper combining empirical findings (distinct issue patterns and limited prompt impact) to argue for potential technical debt and maintainability risks; presented as a forward-looking implication rather than a quantified causal estimate.

high negative The Readability Spectrum: Patterns, Issues, and Prompt Effec... maintainability_risk / technical_debt_inferred_from_readability

LLM-generated code displays distinct readability issue patterns compared to human-written code.

Empirical analysis of readability subcomponents/features showing different patterns of readability issues between LLM-generated and human-written code (paper reports qualitative/quantitative distinctions in issue patterns).

high negative The Readability Spectrum: Patterns, Issues, and Prompt Effec... readability_issue_patterns (feature-level readability problems)

Policy responses in Europe are fragmented across the EU and Member State levels and do not match the potential scale of disruption from AGI.

Paper's policy analysis of EU- and Member-State-level responses (stated in abstract); no quantitative metrics provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

Europe has low rates of industrial AI adoption.

Paper's empirical/policy review claiming low industrial AI adoption in Europe (as stated in abstract); the abstract does not provide numeric adoption rates or sample sizes.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... adoption_rate

Europe exhibits structural weaknesses in compute infrastructure and talent retention.

Paper's structural assessment of Europe's AI value-chain capabilities (stated in abstract); no numerical measures provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... adoption_rate

Europe has limited strategic awareness of frontier AI progress.

Paper's assessment of Europe's positioning based on policy analysis and review of capabilities monitoring (as stated in abstract); no supporting metrics or sample sizes provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

AGI could strain existing governance frameworks.

Paper's policy analysis describing potential mismatches between governance capacity and AGI-induced disruptions (as stated in abstract); no empirical tests or quantification reported in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

AGI could intensify interstate competition.

Paper's geopolitical analysis and scenario-based reasoning informed by trends in AI capabilities (stated in abstract); no quantitative measures reported in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

AGI could fundamentally alter the global distribution of economic and military power.

Paper's geopolitical analysis drawing on capability trends and scenario reasoning (as stated in abstract); no empirical quantification provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

Increased levels of AI assistance may degrade productivity, leading to potentially significant shortfalls under the model's identified conditions.

Model-based comparative-statics and steady-state analysis showing scenarios where marginal increases in AI assistance reduce expected task output; examples/parameter illustrations provided in the paper (theoretical, no empirical sample).

high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... expected task output / productivity shortfalls associated with increased AI assi...

Introducing AI unreliability (errors/noise in AI outputs) in the model can also generate a productivity paradox: greater AI assistance may lower productivity.

Analytical/theoretical model incorporating AI unreliability; model derivations and examples demonstrating conditions under which unreliability leads to reduced productivity (no empirical data).

high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... agent productivity (task output) as influenced by AI assistance and AI unreliabi...

Incorporating endogeneity in skill development into the model can induce a productivity paradox where increased AI assistance reduces productivity.

Analytical/theoretical model of human-AI interaction with utility-maximizing human agents and endogenous skill development; steady-state and comparative-static analysis reported in the paper (no empirical sample).

high negative Human-AI Productivity Paradoxes: Modeling the Interplay of S... agent productivity (task output) as a function of AI assistance and endogenous s...

Simulated users produce feedback dynamics that diverge from humans.

Temporal/interaction analysis in the replication showing differences in how simulators provide feedback across multi-turn interactions compared to humans.

high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... feedback/interaction dynamics over multi-turn conversations (simulator vs human)

Simulated users exhibit amplified position biases relative to human participants.

Behavioral comparison in the simulator replication showing stronger position biases in simulated responses than in human responses.

high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... magnitude of position bias in simulated vs human responses

Simulated users discuss different topics compared to the human participants.

Analysis of conversation content in the simulator replication showing differences in topical distribution between simulators and humans.

high negative PRISM-X: Experiments on Personalised Fine-Tuning with Human ... topic distribution of conversations produced by simulators versus humans

« Prev 1 2 3 … 13 14 15 … 232 233 Next »