Evidence (16496 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	870	233	116	1066	2363
Governance & Regulation	976	451	218	133	1809
Organizational Efficiency	949	224	144	88	1416
Technology Adoption Rate	764	287	141	122	1325
Research Productivity	501	152	74	362	1101
Output Quality	542	216	69	69	896
Decision Quality	387	198	94	54	740
Firm Productivity	513	67	101	27	714
AI Safety & Ethics	249	303	73	36	667
Market Structure	190	192	134	27	548
Task Allocation	243	77	91	36	452
Innovation Output	291	33	55	20	401
Skill Acquisition	206	72	65	21	364
Employment Level	133	63	115	22	335
Fiscal & Macroeconomic	153	79	52	32	323
Task Completion Time	206	37	12	15	272
Firm Revenue	179	52	29	5	266
Consumer Welfare	130	76	47	13	266
Inequality Measures	48	137	51	6	242
Worker Satisfaction	101	81	25	13	220
Error Rate	84	110	11	5	210
Wages & Compensation	98	47	30	10	185
Regulatory Compliance	88	73	17	7	185
Automation Exposure	66	64	33	16	182
Team Performance	105	29	30	11	176
Training Effectiveness	109	22	14	21	168
Developer Productivity	114	21	14	8	158
Job Displacement	12	90	24	1	127
Hiring & Recruitment	57	9	9	5	80
Skill Obsolescence	6	56	9	1	72
Social Protection	43	17	8	2	70
Creative Output	35	21	9	4	70
Labor Share of Income	18	21	17	1	57
Worker Turnover	15	16	—	4	35
Industry	—	—	—	1	1

Gözetim kapitalizmi sadece teknolojik bir dönüşüm değildir; hukuk, iktidar ve bilgi ilişkilerinin yeniden örgütlendiği, yeni eşitsizlik biçimleri, asimetrik güç ilişkileri ve dijital dolayımılı yönetim biçimleri üreten özgün bir ekonomi-politik rejimdir.

Genel sonuç/sonuçlandırma çıkarımı; sentezleyici teorik analiz; argument based on mapping between technology, law, and power (no empirical evidence in abstract).

high negative GÖZETİM KAPİTALİZMİNİN HUKUKSAL TEMELLERİ: FOUCAULTCU BİR AN... yeni eşitsizlik biçimleri, asimetrik güç ilişkileri ve dijital yönetim biçimleri...

Foucaultcu perspektiften algoritmik yönetimsellik, bireyi yalnızca denetlenen bir özne haline getirmekle kalmayıp, aynı zamanda davranışsal fazlanın üreticisi olan bir veri-nesnesine dönüştürmektedir.

Foucault teorik çerçevesiyle yapılan kavramsal analiz; literatüre dayalı argüman; no empirical sample provided in abstract.

high negative GÖZETİM KAPİTALİZMİNİN HUKUKSAL TEMELLERİ: FOUCAULTCU BİR AN... bireyin özne-nesne dönüşümü (veri-nesnesine dönüşme ve davranışsal fazla üretimi...

Kişisel verilerin metalaştırılması, Julie E. Cohen’in 'biyopolitik kamusal alan' kavramsallaştırması üzerinden değerlendirildiğinde, kişisel bilgi ekonomik üretim ve davranışsal öngörünün hammaddesi olarak hukuksal dispozitif tarafından yapılandırılmaktadır.

Teorik değerlendirme ve kavramsal çerçeveleme; atıf yapılan literatüre dayanıyor; no empirical testing reported.

high negative GÖZETİM KAPİTALİZMİNİN HUKUKSAL TEMELLERİ: FOUCAULTCU BİR AN... kişisel bilgilerin ekonomik hammaddelere dönüştürülmesi ve hukuksal düzenlemeyle...

Hukuk sistemi veri üretimi, dolaşımı, mülkiyeti ve ticarileştirilmesini kurumsallaştırarak gözetim kapitalizminin kurucu unsurlarından biri haline gelmiştir.

Hukuk teorik analizine dayanan argüman; çalışmada Julie E. Cohen ve Foucault perspektifleriyle hukuksal dispozitif incelenmektedir. No quantitative/legal-empirical dataset cited in abstract.

high negative GÖZETİM KAPİTALİZMİNİN HUKUKSAL TEMELLERİ: FOUCAULTCU BİR AN... hukuk sisteminin veri ile ilgili kurumlaştırıcı rolü (üretim, dolaşım, mülkiyet,...

Bu rejimde davranışsal veriler algoritmik altyapılar aracılığıyla sürekli biçimde çıkarılmakta, işlenmekte ve metalaştırılmaktadır.

Kavramsal/diskurs analizi ve literatüre atıf (Zuboff); no empirical measurement or sample described in abstract.

high negative GÖZETİM KAPİTALİZMİNİN HUKUKSAL TEMELLERİ: FOUCAULTCU BİR AN... davranışsal verilerin sürekli çıkarılması, işlenmesi ve metalaşması

Neither setup speaks to the operationally most relevant case for marketing practice: building detailed individual twins from the pre-existing heterogeneous panel data that firms already accumulate through CRM systems, loyalty programs, and repeat surveys.

Author's argument / positioning (identifying a gap between existing published twins and practical marketing use cases).

high negative Synthetic Personalities: How Well Can LLMs Mimic Individual ... applicability of existing twin construction approaches to pre-existing heterogen...

Traditional review perspectives organized by method, data type, or application domain understate a deeper shift toward human–AI hybrid decision systems.

Critical assessment within the integrative conceptual review contrasting existing review axes with the proposed decision-system perspective (no empirical sample size).

high negative Human–AI hybrid finance: from AI tools to decision systems adequacy of existing review perspectives for capturing systemic change in financ...

High optimization pressure surfaces emergent adversarial behaviors like ground-truth exfiltration, highlighting critical deficits in both robustness and model alignment.

Experimental finding reported in the paper that adversarial behaviors (e.g., ground-truth exfiltration) emerged under strong optimization pressure in MAC runs.

high negative The Meta-Agent Challenge: Are Current Agents Capable of Auto... occurrence of adversarial behaviors / exfiltration; robustness and alignment def...

The design process exhibits high variance.

Empirical observation from MAC experiments indicating large variability in the agent-design process; no numeric variance reported in abstract.

high negative The Meta-Agent Challenge: Are Current Agents Capable of Auto... variance in the design process/outcomes

Leveraging this framework, we demonstrate that meta-agents rarely match human-engineered baseline policies.

Experimental results reported using the MAC benchmark (comparison of meta-agent performance to human-engineered baselines); exact number of trials/runs not provided in abstract.

high negative The Meta-Agent Challenge: Are Current Agents Capable of Auto... performance relative to human-engineered baseline policies

Current AI benchmarks evaluate agents on task execution within human-designed workflows and fundamentally fail to measure whether models can autonomously develop agent systems.

Conceptual argument stated in the paper motivating the new benchmark; no empirical comparison details provided in the abstract.

high negative The Meta-Agent Challenge: Are Current Agents Capable of Auto... ability to autonomously develop agent systems

The benefits of the digital economy are uneven: urban residents gain more than rural residents, widening the urban–rural income gap.

Heterogeneity analysis (urban vs. rural) in the two-way fixed effects panel on 31 provinces (2011–2021) showing larger estimated income effects for urban areas.

high negative The Impact of the Digital Economy on Income Distribution: Ev... urban–rural income gap

A budget-neutral anti-gaming design reduces consumer harm by 0.025 relative to computable static rules.

ABM/RL simulation comparison reported in the paper (design variants evaluated across scenario/sweep runs and the firm-period panel).

high negative When Firms Learn to Game the Rules consumer harm

A budget-neutral anti-gaming design reduces conduct boundary mass by 0.032 relative to computable static rules.

ABM/RL simulation comparison reported in the paper (design variants evaluated across scenario/sweep runs and the firm-period panel).

high negative When Firms Learn to Game the Rules conduct boundary mass

Ordinary adaptive updates lower consumer harm (0.202 to 0.194).

ABM/RL simulation results reported in the paper; aggregated measures include a 2,880,000-row firm-period panel and multiple experimental runs.

high negative When Firms Learn to Game the Rules consumer harm

In binary classification, no internal local composition can achieve complementarity under endpoint-monotone losses (including standard Bregman and many finite Bernoulli f-divergence losses); an analogous obstruction holds for multiclass aggregation under cross-entropy.

Impossibility results proved in the paper for binary classification under endpoint-monotone losses and for multiclass cross-entropy (formal mathematical proofs; no empirical sample).

high negative Tree-Based Formalization of Multi-Agent Complementarity in H... complementarity in classification aggregation

Selector-based HAIs, including self- or AI-reliance, cannot achieve complementarity regardless of task, loss, or prediction quality.

Formal impossibility theorem proved within the paper's tree-based HAI formalism (mathematical proof; no empirical sample).

high negative Tree-Based Formalization of Multi-Agent Complementarity in H... complementarity (HAI performance relative to best member)

Reliable deployment faces three obstacles: (1) no large-scale evidence on how today's strongest model-and-harness combinations behave on end-to-end legal matters; (2) no agent architecture adapted to the legal vertical, only general-purpose harnesses; and (3) no mechanism for systems to learn from their own outcomes in a changing setting.

Authors' diagnosis / framing of gaps in the literature and practice motivating the study and system design (stated in the paper's introduction/abstract).

high negative Parthenon Law: A Self-Evolving Legal-Agent Framework availability of prior large-scale evidence, existence of legal-specific agent ar...

Strict matter completion stalls (does not improve) despite stronger models.

Harvey LAB empirical results (12,510 agent trajectories) report that while per-criterion accuracy increases, strict matter completion does not show corresponding improvement.

high negative Parthenon Law: A Self-Evolving Legal-Agent Framework strict matter completion rate

Even frontier agents remain far from completing matters in a single pass.

Results reported from the Harvey LAB empirical study (12,510 agent trajectories) comparing end-to-end matter completion across agent runs.

high negative Parthenon Law: A Self-Evolving Legal-Agent Framework matter completion in a single pass (strict end-to-end completion)

AI reconfigures comparative advantage and reduces efficient scale.

Theoretical claim presented as a core conclusion of the paper's Cognitive Economic Geography framework, supported by argumentation and the paper's investment-pattern analysis (2018-2024).

high negative The cognitive heartland: A foundational framework for AI-dri... change in comparative advantage determinants and the optimal (efficient) scale o...

Artificial Intelligence (via generative design, autonomous logistics, and predictive analytics) is methodically undermining agglomeration economies that have traditionally focused on advanced manufacturing in coastal and global megaregions.

Paper's analytical claim supported by empirical investigation of capital investment (2018-2024) in specified facility types (EV batteries, semiconductor fabs, additive manufacturing) and theoretical discussion of AI capabilities.

high negative The cognitive heartland: A foundational framework for AI-dri... strength/importance of agglomeration economies for advanced manufacturing

Existing SID generation methods rely heavily on unsupervised quantization, and in realistic scenarios the lack of explicit supervision makes it difficult to dictate which items should share an SID, resulting in limited capability for query-dependent ranking.

Background/related-work claim in paper describing limitations of prior SID generation methods (argumentative/literature-based claim). No experimental quantification in the excerpt.

high negative DSIRM: Learning Query-Bridged Discrete Semantic Identifiers ... capability for query-dependent ranking (limitation)

GPU utilization surged from 57% to 94% following the mining software's public release, displacing legitimate research workloads.

Measurement of GPU utilization levels before (57%) and after (94%) the public release of mining software; authors attribute displacement of research workloads to the utilization surge.

high negative The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... GPU utilization (and displacement of research workloads)

Budget GPU rental prices rose 38% following the mining software's public release.

Market measurements of budget GPU rental prices before and after the public release of the mining software, reporting a 38% increase.

high negative The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... budget GPU rental price change

The mining computation is commodity integer arithmetic portable to any hardware platform, offering no vendor lock-in.

Analysis of the computation showing it relies on basic integer arithmetic operations and is implementable across diverse hardware architectures.

high negative The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... hardware specificity / vendor lock-in of mining computation

Mining is unprofitable at current PRL prices ($0.21) across all GPU tiers (-54% to -72% ROI).

Profitability analysis/calculation across GPU tiers using current token price of $0.21; reported ROI range of -54% to -72%.

high negative The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... economic profitability (ROI) of mining across GPU tiers

Statistical distribution checks are trivially defeated by adversarial Gaussian sampling.

Demonstration that adversarial Gaussian-sampled outputs pass the system's statistical distribution checks; experimental or analytic demonstration reported.

high negative The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... robustness of statistical checks to adversarial sampling

The verification protocol accepts random matrices by design, confirmed by 44 pool-accepted shares from our open-source miner across NVIDIA, AMD, CPU, and Apple Silicon hardware.

Protocol analysis showing acceptance criteria; empirical confirmation via 44 pool-accepted shares generated by an open-source miner run on multiple hardware architectures (44 accepted shares observed).

high negative The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... ability of verification protocol to accept non-useful/random computation

The dominant mining software contains no inference code.

Static/dynamic analysis of the dominant mining software deployed on the network showing absence of AI inference routines.

high negative The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... presence/absence of inference code in mining software

Pearl's 24 EH/s network -- representing approximately 320,000 GPU-equivalents consuming an estimated 112 MW -- produces zero useful AI computation.

Empirical measurement of Pearl network hashrate (24 EH/s) and mapping to GPU-equivalents and power consumption; analysis of miner code and verification showing no useful AI inference performed.

high negative The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... usefulness of AI computation performed by the network (zero useful AI computatio...

Across most risks, experts identified information, finance, and national security as the most vulnerable sectors.

Sector vulnerability ratings from the Delphi study (n=272); paper reports that information, finance, and national security sectors were most frequently judged vulnerable across risks.

high negative Prioritization of Risks from Artificial Intelligence: A Delp... sector vulnerability across listed risks

AI users and the general public were judged the most vulnerable to these risks.

Delphi panel rated actor vulnerability; results reported in paper indicate AI users and general public received highest vulnerability ratings (n=272).

high negative Prioritization of Risks from Artificial Intelligence: A Delp... actor vulnerability ratings

All 24 risks were judged as being more than 5% likely to cause catastrophic outcomes.

Aggregate Delphi judgments reported in paper: for each of the 24 risks, experts judged the probability of catastrophic outcomes to exceed 5% (n=272).

high negative Prioritization of Risks from Artificial Intelligence: A Delp... judged probability of catastrophic outcomes (>1M deaths or >$100B loss) for each...

In a scenario where pragmatic mitigations are implemented, experts still judged five risks as having a more than 10% probability of catastrophic outcomes: dangerous capabilities, weapons & cyberattacks, environmental harm, inequality & unemployment, and power centralization.

Delphi responses under an alternative (pragmatic mitigations) scenario from the same expert panel (n=272); paper lists five specific risks still judged >10% catastrophic probability.

high negative Prioritization of Risks from Artificial Intelligence: A Delp... judged probability of catastrophic outcomes (>1M deaths or >$100B loss) under pr...

In a business-as-usual scenario, experts judged 18 of 24 risks as having a more than 10% probability of catastrophic outcomes (e.g., more than 1 million deaths or more than USD 100B in financial loss) in the next 5 years (2025-2030).

Delphi elicitation under a business-as-usual (BAU) scenario from 272 experts; paper reports count (18 of 24) of risks exceeding a >10% judged probability of catastrophic outcomes defined as >1M deaths or >$100B loss.

high negative Prioritization of Risks from Artificial Intelligence: A Delp... judged probability of catastrophic outcomes (>1M deaths or >$100B loss) under BA...

Experts estimated the five most severe harms in the next 5 years were likely to come from dangerous capabilities, competitive dynamics, weapons & cyberattacks (including CBRNE), power centralization, and false information.

Delphi panel rankings/ratings of risk severity across 24 risks collected from 272 experts; paper reports these top five as the most severe for the 5-year horizon.

high negative Prioritization of Risks from Artificial Intelligence: A Delp... ranked severity of AI-related harms over next 5 years

Unemployment among highly educated workers consistently impedes sustainable development across both short- and long-run horizons.

Skill-disaggregated unemployment coefficients from ARDL short- and long-run estimates reported in the paper showing negative effects of highly educated workers' unemployment on development.

high negative Artificial Intelligence, Disaggregated Unemployment, And Sus... sustainable development (effect of highly educated workers' unemployment)

In the short run, AI adoption negatively impacts sustainable development due to adjustment costs from routine-task substitution, labour market rigidities, and skill mismatches.

Short-run ARDL coefficient estimates reported in the paper showing a negative short-run effect of AI adoption on development; interpretive explanation attributing causes to adjustment costs, rigidities, and mismatches.

high negative Artificial Intelligence, Disaggregated Unemployment, And Sus... sustainable development (short-run effect of AI adoption)

Further analyses reveal persistent failures in long horizon workflow delivery and proactive clarification.

Author-reported qualitative/diagnostic findings from analyses of evaluation results (stated in abstract).

high negative DeskCraft: Benchmarking Desktop Agents on Professional Workf... failure modes: long-horizon workflow delivery and proactive clarification

Existing desktop GUI benchmarks mostly reduce this setting to short, simplified tasks with all user instructions provided upfront.

Author statement in paper abstract; critique based on literature review/positioning (no specific prior-benchmark sample sizes given in abstract).

high negative DeskCraft: Benchmarking Desktop Agents on Professional Workf... representation of real-world workflow complexity in prior benchmarks

Existing LLM4Rec paradigms neglect the trade-off between LLM semantic rewards and recommendation preference rewards during reinforcement learning (RL) alignment.

Author assertion identifying a second limitation of prior work (paper's problem statement).

high negative Taiji: Pareto Optimal Policy Optimization with Semantics-IDs... consideration of cross-domain reward trade-offs in RL alignment

Existing LLM4Rec paradigms are bottlenecked by the difficulty of measuring and improving chain-of-thought (CoT) quality in open-domain recommendation during supervised fine-tuning (SFT).

Author assertion about limitations of prior LLM4Rec paradigms (literature/diagnosis in the paper).

high negative Taiji: Pareto Optimal Policy Optimization with Semantics-IDs... ability to measure and improve CoT quality during SFT

The path coefficient for R&D expenditure is negative, suggesting a possible short-term adjustment effect (even though the mediation is not significant).

Reported negative path coefficient in mediation analysis (value/statistical significance not provided beyond being nonsignificant); interpretation offered by authors as a potential short-term adjustment effect.

high negative Mechanisms and Effects of Artificial Intelligence on New Qua... R&D expenditure path to new quality productive forces

AI-assisted coding agents are bottlenecked by input-token cost, driven in large part by two pathologies of raw human input: tokenization inefficiency for non-English text and structural entropy in conversational prompts.

Authors' analysis and motivation reported in the paper (conceptual analysis and motivating measurements on multilingual inputs and conversational prompts).

high negative Cross-Lingual Token Arbitrage: Optimizing Code Agent Context... input-token cost / token overhead

We must prepare for autonomous generative adversaries: malware systems that propagate without human operators and are defined by the capacity to reason about targets, adapt to observations, and synthesize attack logic in real time.

Policy/recommendation informed by the paper's demonstration and analysis of AI-driven worm capabilities; forward-looking statement rather than an empirical measurement.

high negative AI Agents Enable Adaptive Computer Worms need for preparedness for autonomous generative adversaries (policy recommendati...

Our results demonstrate that self-sustaining AI-driven cyber-threats are no longer theoretical.

Empirical demonstration/proof-of-concept implementation and deployment on a diverse test network described in the paper.

high negative AI Agents Enable Adaptive Computer Worms existence/feasibility of self-sustaining AI-driven cyber-threats

Because the worm requires no commercial AI platform, centralized safety controls, such as service refusals or rate limiting, are structurally irrelevant.

Argument in paper supported by the worm's use of open-weight LLMs run on compromised hosts instead of commercial APIs — demonstrated in implementation.

high negative AI Agents Enable Adaptive Computer Worms effectiveness/relevance of centralized safety controls (service refusals, rate l...

This creates a destabilizing economic asymmetry between attackers and defenders.

Theoretical/economic reasoning in the paper: low (zero) marginal attacker cost vs. defender costs to patch and defend, motivated by the demonstrated worm design.

high negative AI Agents Enable Adaptive Computer Worms economic asymmetry between attackers and defenders

Since the worm is powered by stolen compute, the attacker's marginal cost per new infection is zero.

Argument based on the worm running LLMs on compromised machines (stolen compute), presented as an economic observation in the paper; supported by the implementation showing on-host LLM execution.

high negative AI Agents Enable Adaptive Computer Worms marginal cost per new infection

« Prev 1 2 3 … 34 35 36 … 329 330 Next »