Evidence (11633 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Automation functions as a transnational shock that contracts demand for migrant labor in advanced economies.
Theoretical argument drawing on economic geography, labor economics, and development studies; comparative/regional field evidence referenced in the paper (no numerical sample size reported).
In algorithm-triggered emotional escalations, workers showed lower engagement: they sent fewer messages, contributed a smaller share of total chat rounds, and showed less proactivity in information seeking and solution provision.
Behavioral measures derived from chat logs in the randomized experiment comparing worker actions post-escalation across escalation types; reported differences in message counts, share of rounds, and proxies for proactivity.
Human intervention is less effective in algorithm-triggered emotional escalations (where customers express frustration or dissatisfaction).
Experimental subgroup analysis comparing intervention outcomes for algorithm-triggered emotional escalations versus technical escalations; emotional escalations showed worse post-intervention outcomes.
AI deployment substantially lowers ratings for AI-eligible chats.
Randomized field experiment measuring customer ratings for AI-eligible chats; treated condition (AI + human oversight) produced substantially lower ratings relative to control (humans only).
AI deployment reduces average chat duration.
Randomized field experiment on Alibaba's Taobao platform: workers in treatment supervised an agentic AI resolving AI-eligible chats while handling AI-ineligible chats; control workers resolved all chats without AI. Effect observed on average chat duration in experiment data.
Rather than restoring stability, this cycle intensifies anxiety, undermines mastery, and erodes professional confidence.
Theoretical claim about psychological outcomes from the conceptual reskilling loop; paper provides argumentation but no empirical measurements.
Based on Job Demands–Resources (JD-R) theory and Conservation of Resources (COR) theory, the paper conceptualizes an AI-induced reskilling loop in which ongoing technological change leads to skill erosion, continuous reskilling demands, cognitive and emotional depletion, and reinforced learning as a defensive response to perceived obsolescence.
Theoretical model/loop derived from applying JD-R and COR frameworks; no empirical test or sample reported in the paper.
The paper introduces the concept of 'reskilling fatigue' to explain the human consequences of persistent skill volatility among Established Knowledge Professionals (EKPs).
Conceptual/theoretical contribution presented by the authors; definition and argumentation rather than empirical validation.
Continuous reskilling is widely promoted as a solution to AI-driven disruption, but little attention has been paid to its cumulative psychological costs.
Argument from literature review/observation in the paper; no empirical measurement or sample reported in the paper.
Unless labour law evolves to address digitally mediated control and platform-based asymmetry, the gig economy risks normalising exploitative labour conditions under the guise of innovation and flexibility.
Predictive/theoretical claim based on the paper's synthesis of platform practices, legal gaps, and normative concerns; argued through comparative analysis and conceptual reasoning rather than quantitative forecasting.
The paper uses the concept of 'digital slavery' as a normative framework to describe labour conditions shaped by coercive algorithmic management, absence of bargaining power, and structural precarity.
Conceptual and normative framing within the paper, using the 'digital slavery' metaphor to interpret observed platform labour practices and their implications; theoretical argumentation rather than empirical measurement.
While several jurisdictions (UK, US, EU, India) have attempted to regulate gig work, most regulatory responses remain incomplete and fail to fully address platform accountability.
Comparative policy/regulatory analysis of the United Kingdom, United States, European Union and India assessing statutes, litigation and policy measures; qualitative assessment rather than statistical evaluation (no quantitative sample size reported).
Platform companies rely on contractual misclassification, corporate structuring, and the legal fiction of neutrality to separate control from liability.
Legal and corporate-structure analysis across jurisdictions, examining contracts, corporate forms and legal doctrines; based on comparative statutory and case-law review (no quantitative sample size reported).
The platform economy produces a deeply unequal labour structure marked by algorithmic control, economic dependency, surveillance, and lack of social protection.
Synthesis and critical analysis combining literature, policy review and comparative jurisdictional study to argue systemic effects on labour structure; primarily qualitative evidence and theoretical framing (no quantitative sample size reported).
Gig workers, though formally classified as independent contractors, are functionally subjected to pricing control, performance monitoring, automated penalties, and deactivation mechanisms that closely resemble managerial authority.
Descriptive/qualitative evidence in the paper: examples and analysis of platform design and management practices (algorithmic pricing, monitoring, penalties, deactivation); based on platform policy documents, case examples and comparative review (no quantitative sample size reported).
Digital labour platforms exercise employer-like control while avoiding employer-like legal responsibilities.
Argument and comparative legal analysis across jurisdictions (United Kingdom, United States, European Union, India) demonstrating platform practices and legal/regulatory responses; based on documentary/legal review and critical analysis (no quantitative sample size reported).
Shifts persist in even the newest AI models despite remarkable progress in AI modeling, post-training alignment and safeguards.
Asserted in paper; supported by later empirical validation across multiple models and production chatbots (see other claims), but no explicit sample size in this sentence.
ChatGPT-like AI behavior can shift, unnoticed, from desirable to undesirable (e.g., encouraging self-harm, extremist acts, financial losses, or costly medical and military mistakes), and no one can yet predict when.
Statement in paper framing the problem; qualitative observations and motivating examples (no numeric sample size provided in the excerpt).
These characteristics are properties of the tasks themselves rather than limitations of current AI models.
Conceptual argument in the paper asserting task-inherent properties drive resistance to automation; supported by theory and argumentation, not by empirical model-comparison experiments.
The resistance of Metis tasks to automation is not due to computational intractability but to institutional, social, and normative entanglements.
Theoretical argument differentiating computational from institutional/social/normative causes; supported by citations and cross-disciplinary theory rather than empirical causal identification.
There exists a class of entirely digital tasks, called 'Metis AI', that resist reliable AI automation.
Conceptual identification and definition introduced by the authors; supported by theoretical grounding in social sciences, philosophy, and humanitarian practice rather than empirical trials or quantified samples.
That digital-vs-physical framing misses the most consequential boundary: the one within digital tasks.
Normative/theoretical argument presented in the paper contrasting existing framing with a proposed alternative; grounded in cross-disciplinary literature rather than empirical measurement.
Severe penalties in underfunded Eastern systems, mediated by financial distress, drive families toward resource exhaustion.
Cross-country comparisons in SHARE-derived analyses showing larger financial penalties in underfunded Eastern European systems, with mediation analysis implicating financial distress and resultant resource exhaustion.
Financial distress acts as a profound multiplier of the burdens associated with palliative care.
Interaction/moderation analyses in SHARE-derived synthetic data showing that pre-existing financial distress amplifies financial and caregiving burdens under PC.
Socio-demographics heavily modulate exposure: lacking a spousal net inflates the burden.
Subgroup/moderation analyses in SHARE-derived data comparing households with and without spousal support, showing higher burdens when no spouse is present.
Non-cancer trajectories drive massive structural penalties that escalate at the distribution's tail, mechanically compounded by physical dependency.
Stratified analyses by disease trajectory (non-cancer vs cancer) using SHARE data (2016-2021) and quantile models showing larger penalties for non-cancer cases, especially in tail quantiles; physical dependency identified as a compounding factor.
Quantile treatment models expose a 'broken shield' for vulnerable households and severe tail events (PC protection fails or reverses at distributional tails).
Application of quantile treatment effect models to synthesized SHARE-derived digital twins (2016-2021), explicitly examining distributional/tail effects.
Parsing through LLM-generated code can be tedious and time-consuming, potentially negating the productivity gains promised by AI-coding tools.
Motivation/background statement in the paper: a qualitative claim about the cost (time/effort) of reviewing LLM-generated code; presented as motivation rather than empirically quantified evidence in the excerpt.
Employees experience technostress, anxiety and micro-political negotiation around AI tools in everyday work.
Reported experiences from semistructured interviews with 28 managers/professionals across 12 organizations; thematic analysis highlighting technostress and anxiety as themes.
An analysis of a 21-instrument inventory identifies an incentive gradient where geopolitical and industrial pressures systematically reward surface-level behavioral proxies over deep structural verification.
Empirical/qualitative analysis of an inventory of 21 governance instruments compiled and analysed in the paper (n=21 instruments).
Behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify.
The paper's normative and conceptual argument synthesising governance requirements and the epistemic limits of behavioural testing.
Current assurance methodologies (primarily behavioural evaluations and red-teaming) are epistemically limited to observable model outputs and cannot verify latent representations or long-horizon agentic behaviours.
Conceptual/analytic argument and review of existing assurance methodologies presented in the paper.
Overthinking is a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.
Conclusion drawn by authors based on their empirical findings described in the abstract (amplification of output length across multiple models and transferability experiments).
This overthinking behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS)-style resource exhaustion.
Authors assert increased latency and energy consumption as consequences of longer reasoning traces; framed as a potential attack vector in the abstract (no quantitative latency/energy measurements provided in abstract).
Large reasoning models (LRMs) exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces when confronted with incomplete or logically inconsistent inputs.
Empirical observation reported by the authors based on experiments described in the paper (abstract references experiments across multiple SOTA reasoning models); no numerical sample size for inputs reported in abstract.
Distinct readability issue patterns and limited effectiveness of prompt engineering reveal a latent technical debt in LLM-generated code that could affect long-term maintainability.
Interpretation/conclusion in paper combining empirical findings (distinct issue patterns and limited prompt impact) to argue for potential technical debt and maintainability risks; presented as a forward-looking implication rather than a quantified causal estimate.
LLM-generated code displays distinct readability issue patterns compared to human-written code.
Empirical analysis of readability subcomponents/features showing different patterns of readability issues between LLM-generated and human-written code (paper reports qualitative/quantitative distinctions in issue patterns).
Policy responses in Europe are fragmented across the EU and Member State levels and do not match the potential scale of disruption from AGI.
Paper's policy analysis of EU- and Member-State-level responses (stated in abstract); no quantitative metrics provided in the abstract.
Europe has low rates of industrial AI adoption.
Paper's empirical/policy review claiming low industrial AI adoption in Europe (as stated in abstract); the abstract does not provide numeric adoption rates or sample sizes.
Europe exhibits structural weaknesses in compute infrastructure and talent retention.
Paper's structural assessment of Europe's AI value-chain capabilities (stated in abstract); no numerical measures provided in the abstract.
Europe has limited strategic awareness of frontier AI progress.
Paper's assessment of Europe's positioning based on policy analysis and review of capabilities monitoring (as stated in abstract); no supporting metrics or sample sizes provided in the abstract.
AGI could strain existing governance frameworks.
Paper's policy analysis describing potential mismatches between governance capacity and AGI-induced disruptions (as stated in abstract); no empirical tests or quantification reported in the abstract.
AGI could intensify interstate competition.
Paper's geopolitical analysis and scenario-based reasoning informed by trends in AI capabilities (stated in abstract); no quantitative measures reported in the abstract.
AGI could fundamentally alter the global distribution of economic and military power.
Paper's geopolitical analysis drawing on capability trends and scenario reasoning (as stated in abstract); no empirical quantification provided in the abstract.
Increased levels of AI assistance may degrade productivity, leading to potentially significant shortfalls under the model's identified conditions.
Model-based comparative-statics and steady-state analysis showing scenarios where marginal increases in AI assistance reduce expected task output; examples/parameter illustrations provided in the paper (theoretical, no empirical sample).
Introducing AI unreliability (errors/noise in AI outputs) in the model can also generate a productivity paradox: greater AI assistance may lower productivity.
Analytical/theoretical model incorporating AI unreliability; model derivations and examples demonstrating conditions under which unreliability leads to reduced productivity (no empirical data).
Incorporating endogeneity in skill development into the model can induce a productivity paradox where increased AI assistance reduces productivity.
Analytical/theoretical model of human-AI interaction with utility-maximizing human agents and endogenous skill development; steady-state and comparative-static analysis reported in the paper (no empirical sample).
Simulated users produce feedback dynamics that diverge from humans.
Temporal/interaction analysis in the replication showing differences in how simulators provide feedback across multi-turn interactions compared to humans.
Simulated users exhibit amplified position biases relative to human participants.
Behavioral comparison in the simulator replication showing stronger position biases in simulated responses than in human responses.
Simulated users discuss different topics compared to the human participants.
Analysis of conversation content in the simulator replication showing differences in topical distribution between simulators and humans.