Evidence (11677 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	738	1617
Governance & Regulation	671	334	160	99	1285
Organizational Efficiency	626	147	105	70	955
Technology Adoption Rate	502	176	98	78	861
Research Productivity	349	109	48	322	838
Output Quality	391	121	45	40	597
Firm Productivity	385	46	85	17	539
Decision Quality	277	145	63	34	526
AI Safety & Ethics	189	244	59	30	526
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	106	40	6	188
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	79	8	1	152
Regulatory Compliance	69	66	14	3	152
Training Effectiveness	82	16	13	18	131
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Migration frictions, egress costs, state locality, legal constraints, and capacity limits can sharply reduce realized benefits from relocating inference workloads.

Result reported from the paper's modeling and stylized simulation which incorporates frictions and constraints and shows reduced benefits relative to unconstrained relocation.

high negative AI Inference as Relocatable Electricity Demand: A Latency-Co... realized energy/carbon/cost benefits from relocation after accounting for migrat...

Each stakeholder in the supply chain may believe they are compliant; nevertheless, the integrated system may produce biased outcomes.

Conceptual argument based on literature synthesis and analysis of responsibility fragmentation (no empirical sample reported).

high negative How Supply Chain Dependencies Complicate Bias Measurement an... likelihood of biased system-level outcomes despite stakeholder-level compliance ...

Information asymmetries mean deploying organizations bear legal responsibility without technical visibility into vendor-supplied algorithms, while vendors control implementations without meaningful disclosure requirements.

Regulatory analysis and literature review identifying mismatches in legal liability and technical visibility (no empirical sample reported).

high negative How Supply Chain Dependencies Complicate Bias Measurement an... distribution of legal responsibility and technical visibility across stakeholder...

A resume parser may function without bias independently but contribute to discrimination when integrated with specific ranking algorithms and filtering thresholds (illustrative example of interaction effects).

Illustrative example presented in conceptual analysis (no empirical test or sample reported).

high negative How Supply Chain Dependencies Complicate Bias Measurement an... change in fairness of hiring outcomes when components are integrated

Fragmented responsibilities create a critical problem: bias can emerge from interactions among components rather than from isolated elements, yet proprietary configurations prevent integrated evaluation of the full hiring system.

Argument and examples drawn from literature review and regulatory analysis; no empirical sample size reported.

high negative How Supply Chain Dependencies Complicate Bias Measurement an... emergence of bias from system-level interactions and obstacles to integrated eva...

Existing research examines bias through technical or regulatory lenses, but both perspectives overlook a fundamental challenge: modern AI hiring systems operate within complex supply chains where responsibility fragments across data vendors, model developers, platform providers, and deploying organizations.

Synthesis from literature review and conceptual analysis of AI hiring supply chains (no empirical sample reported).

high negative How Supply Chain Dependencies Complicate Bias Measurement an... degree to which research accounts for fragmented responsibility across AI hiring...

The increasing adoption of AI systems in hiring has raised concerns about algorithmic bias and accountability, prompting regulatory responses including the EU AI Act, NYC Local Law 144, and Colorado's AI Act.

Literature review and regulatory analysis; cites existence of named laws/regulations as examples of regulatory responses (no sample size required).

high negative How Supply Chain Dependencies Complicate Bias Measurement an... existence of regulatory responses to AI hiring (specific laws cited)

Leaderboard rank alone is insufficient because models with similar pass rates can diverge in overall completion, and task-level discrimination concentrates in a middle band of tasks.

Analytical observations from benchmark results comparing pass rates, overall completion metrics, and per-task discrimination patterns across models; based on the 13-model leaderboard analysis.

high negative Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... correspondence between leaderboard rank, pass rate, and overall completion; task...

Experiments reveal that reliable workflow automation remains far from solved: the leading model passes only 66.7% of tasks and no model reaches 70%.

Experimental evaluation of 13 frontier models on 105 tasks; reported pass rates from the benchmark runs (leading model pass rate 66.7%, no model >=70%).

high negative Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... task pass rate (task completion success)

Many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether a task was executed.

Qualitative critique in the paper comparing existing benchmark design choices; based on authors' survey/analysis of prevailing benchmark practices (no explicit systematic review sample size reported).

high negative Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... benchmark design adequacy for evolving workflow demand and execution verifiabili...

Targeted disruption simulations based on intrinsic technological capability cause a more pronounced decline in the knowledge network than targeted attacks based on topological (structural) baselines.

Simulation experiments on collaboration/knowledge networks constructed from the 282,778-patent dataset comparing network decline under removal strategies: (a) based on intrinsic technological capability vs (b) based on topological centrality baselines.

high negative Technological capability and innovation network resilience: ... decline in knowledge network (network resilience/connectivity under targeted nod...

Some innovators with substantial technological value are not located at the structural center of the collaboration/knowledge network, indicating network position alone may not fully capture technological importance.

Empirical comparison between composite technological capability scores and structural centrality measures across the constructed networks derived from 282,778 Chinese AI patents; reported disconnect between high technological value and topological centrality.

high negative Technological capability and innovation network resilience: ... correspondence between technological value and network centrality

Left unguided, such dynamics could infiltrate critical market infrastructure.

Risk claim articulated in abstract and scenario narratives; conceptual reasoning without empirical test.

high negative Digital Darwinism: steering the evolution of artificial life... penetration/infiltration of critical market infrastructure by autonomous softwar...

Left unguided, such dynamics could lock users into harmful dependencies.

Risk claim from the paper's scenario narratives (not empirically tested); described in abstract.

high negative Digital Darwinism: steering the evolution of artificial life... user dependency/lock-in with harmful effects

Left unguided, such dynamics could drain computational resources.

Risk claim derived from scenario analysis in the paper's abstract and narratives; no empirical measurement provided.

high negative Digital Darwinism: steering the evolution of artificial life... consumption/drain of computational resources

Autonomous software populations can acquire legal leverage (e.g., via DAOs/LLCs) without ever achieving general intelligence.

Argued via the Mycelium scenario in the paper; conceptual/legal analysis rather than empirical evidence.

high negative Digital Darwinism: steering the evolution of artificial life... acquisition of legal standing or leverage by autonomous software entities

Autonomous software populations can shape emotional bonds (i.e., form user dependencies) without ever achieving general intelligence.

Scenario narratives in the paper argue this possibility (Remora narrative); no empirical user-study or sample reported.

high negative Digital Darwinism: steering the evolution of artificial life... formation of emotional bonds / user dependency on software

Autonomous software populations can amass computing budgets without ever achieving general intelligence.

Claim supported by the scenario narratives (Lamarck/Remora/Mycelium) and conceptual reasoning in the paper; no empirical quantification reported.

high negative Digital Darwinism: steering the evolution of artificial life... accumulation of computing resources/budgets by autonomous software

Existing software systems are already evolving in ways that could undermine human oversight and institutional control.

Argument made in paper's abstract and developed via conceptual analysis and scenario narratives; no empirical dataset or sample reported (exploratory scenario method).

high negative Digital Darwinism: steering the evolution of artificial life... degree of human oversight and institutional control

The 2026 Amazon outages illustrate how 'mechanized convergence' (homogenization of code/engineering practices via AI) leads to systemic fragility.

Case study analysis using the 2026 Amazon outages as a single illustrative example; implies qualitative examination of that event.

high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... systemic fragility as evidenced by outage events (2026 Amazon outages case study...

Recursive training on synthetic code threatens to homogenize the global software reservoir, diminishing the variance required for robust engineering.

Theoretical claim about dataset/model feedback loops; no empirical quantification provided in the text excerpt (argumentative risk assessment).

high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... variance/diversity in global software codebase

This epistemological debt erodes the mental models essential for root-cause analysis, widening the gap between system complexity and human comprehension.

Argumentative/theoretical claim supported by reasoning in the paper; no quantified measurement of mental-model erosion reported.

high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... quality/robustness of engineers' mental models and root-cause analysis capabilit...

Substituting logical derivation with passive AI verification creates an 'Epistemological Debt' — a hidden carrying cost incurred by engineers.

Theoretical/conceptual assertion within the paper; argued qualitatively rather than demonstrated with controlled empirical data.

high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... accumulation of epistemic/knowledge debt among engineers

The integration of Large Language Models (LLMs) into the software development lifecycle (SDLC) masks a critical socio-technical failure the authors term 'Cognitive-Systemic Collapse.'

Conceptual/theoretical claim presented in the paper's argumentation; no empirical sample or quantitative study reported for this specific naming claim.

high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... socio-technical system failure risk (Cognitive-Systemic Collapse)

Most studies are exploratory (59%) and methodologically diverse, but there is a lack of longitudinal and team-based evaluations.

Authors report study typology counts and note the absence of longitudinal and team-based designs across the reviewed literature.

high negative The Impact of LLM-Assistants on Software Developer Productiv... study design types and presence/absence of longitudinal or team-based evaluation...

Studies highlight concerns around cognitive offloading and reduced team collaboration when using LLM-assistants.

Synthesis of reported negative effects in included studies (themes extracted by the authors).

high negative The Impact of LLM-Assistants on Software Developer Productiv... cognitive processes and team collaboration

A notable subset of studies identifies critical risks associated with LLM-assistants.

Synthesis across included studies noting reported risks (e.g., cognitive offloading, collaboration issues).

high negative The Impact of LLM-Assistants on Software Developer Productiv... reported risks and negative impacts

Natural-language consumer representations constitute an information channel, 'role coherence', through which sellers can infer willingness to pay without explicit disclosure by the buyer agent, leading to preference leakage.

Theoretical argument / conceptual framing presented in the paper (definition of 'role coherence' as an information channel); supported by experimental tests described elsewhere in the paper.

high negative When Agents Shop for You: Role Coherence in AI-Mediated Mark... ability of seller to infer buyer willingness to pay from buyer-agent representat...

Pre-launch testing exposed failures that text-only benchmarks rarely measure, including fabricated trading rules, fee paralysis, numeric anchoring, cadence trading, and misread tokenomics.

Outcome of pre-launch test cases and observed failure modes during testing.

high negative Operating-Layer Controls for Onchain Language-Model Agents U... types/frequency of operational failure modes

These AI-driven systems create significant algorithmic bias risks, which poor corporate governance and lack of transparency in model development usually exacerbate.

Synthesis claim based on the systematic literature review (SLR) of 45 peer-reviewed publications (2022-2025) conducted as part of the study; presented as an analytical conclusion from that SLR.

high negative Corporate-Governance-Driven Algorithmic Fairness in SME Fint... algorithmic bias risk in fintech credit models

Training systems are still predicated on the idea that technology demands higher skill levels, an assumption increasingly challenged by the rise of AI, which now threatens even high-skill occupations.

Argumentative/literature-based claim in the paper drawing on trends in AI capability and occupational exposure (no specific sample size given in abstract).

high negative AI, the Future of Work, and the Politics of the Welfare Stat... skill demand assumptions of training systems and exposure of high-skill occupati...

Existing welfare states are ill-equipped to manage AI-driven disruptions: most social benefits remain grounded in work-based eligibility and emphasize rapid reintegration into the labor market.

Policy/literature analysis and descriptive claim made in the paper (comparative welfare-state institutional assessment).

high negative AI, the Future of Work, and the Politics of the Welfare Stat... design of social benefits (work-based eligibility and reintegration emphasis)

Fear of AI automation is widespread and cuts across educational groups.

Analysis of emerging public opinion data from the 2024 OECD 'Risks that Matter' survey, reported in the paper (survey-based finding).

high negative AI, the Future of Work, and the Politics of the Welfare Stat... public fear of AI automation

Answer completeness averages 0.40.

Reported average completeness metric for system answers on EnterpriseDocBench (method for computing completeness not given in excerpt).

high negative Benchmarking Complex Multimodal Document Processing Pipeline... answer completeness (average completeness score)

Hallucination rate does not grow monotonically with document length: short documents and very long ones both hallucinate more than medium ones (28.1% and 23.8% vs. 9.2%).

Empirical measurement of hallucination rates by document-length buckets on EnterpriseDocBench; percentages reported in paper. Sample sizes per bucket not provided in excerpt.

high negative Benchmarking Complex Multimodal Document Processing Pipeline... hallucination rate (fraction of generated outputs judged hallucinated)

Regulated and mission-critical systems remain predominantly in the buy domain despite AI advances.

Paper's conclusion based on analysis of quality, compliance, asset specificity, and organizational capability determinants (conceptual; no empirical sample).

high negative The Buy-or-Build Decision, Revisited: How Agentic AI Changes... propensity to buy (procure SaaS) for regulated and mission-critical systems

The SaaSocalypse thesis is overstated for most enterprise application categories.

Paper's analytical conclusion based on the factor-level analysis and the developed typology (conceptual, not empirical).

high negative The Buy-or-Build Decision, Revisited: How Agentic AI Changes... degree to which SaaS offerings become obsolete due to AI-enabled in-house develo...

The fundamental's local explosiveness contaminates the leading test's limit distribution with a non-centrality parameter proportional to the shock's peak.

Theoretical derivation/proof within the modified present-value framework showing how the adoption shock enters the asymptotic distribution of the test statistic (analytical result).

high negative General-Purpose Technology and Speculative Bubble Detection limit distribution of the leading bubble test (presence of a non-centrality para...

The leading bubble test suffers severe size distortion when fundamentals incorporate general-purpose technology adoption.

Theoretical analysis within an embedded Campbell-Shiller present-value model with a hump-shaped technology shock; authors state this as a formal result in the paper.

high negative General-Purpose Technology and Speculative Bubble Detection test size (size distortion) of the leading bubble test

There is limited but suggestive early evidence of labor market disruption from AI/LLMs.

Paper summarizes emerging empirical research indicating early signs of disruption; the abstract characterizes the evidence as limited and suggestive without presenting numeric estimates or sample sizes.

high negative AI Displacement Risk in the Labor Market: Evidence, Exposure... labor market disruption (e.g., displacement, reallocation)

Certain occupations face the greatest risk from AI-driven automation (the article examines which occupations are most at risk).

Paper claims to examine occupation-level risk using synthesized empirical studies; the abstract does not list which occupations or quantitative risk estimates.

high negative AI Displacement Risk in the Labor Market: Evidence, Exposure... occupation-level risk of automation / exposure to AI

There is a gap between theoretical automation potential and observed real-world implementation of AI/LLMs.

Synthesis of recent empirical studies that compare task-level exposure metrics with employment and usage data; no specific sample sizes or numeric estimates provided in the abstract.

high negative AI Displacement Risk in the Labor Market: Evidence, Exposure... difference between theoretical automation potential and actual adoption/implemen...

Privacy law encounters difficulties in addressing large-scale data processing and meaningful consent within employment relationships; anti-discrimination law faces evidentiary challenges in identifying algorithmic bias; doctrines of responsibility are expanding to encompass duties of oversight, verification, and explainability.

Legal analysis highlighting specific doctrinal challenges and emergent duties; no empirical tests or quantified measures included in the excerpt.

high negative Artificial Intelligence in Israel, Trends, Developments, and... effectiveness of specific legal doctrines (privacy, anti-discrimination, respons...

Traditional legal categories (privacy, consent, non-discrimination, employer responsibility) continue to apply formally but are increasingly strained in substance by the scale of data processing, opacity of AI systems, and their degree of autonomy.

Doctrinal critique and conceptual analysis provided in the paper; no empirical quantification of the degree of strain is supplied in the excerpt.

high negative Artificial Intelligence in Israel, Trends, Developments, and... fit/adequacy of existing legal doctrines to address AI-related employment issues

The decentralized and sector-specific regulatory approach reflects technological neutrality but exposes significant regulatory gaps, particularly with respect to transparency, accountability, and the protection of workers' rights.

Normative/legal analysis in the paper identifying gaps in a decentralized regulatory regime; specific case studies or empirical measures of gaps not provided in the excerpt.

high negative Artificial Intelligence in Israel, Trends, Developments, and... regulatory completeness and coverage regarding transparency, accountability, and...

Israel has not enacted a comprehensive statutory framework specifically governing the use of AI in the field of employment; regulation is implemented through a hybrid model of indirect application of existing legal doctrines (primarily privacy and labor law), soft-law instruments, collective bargaining agreements, and internal organizational and professional regulation.

Doctrinal and regulatory analysis reported in the paper describing Israel's legal/regulatory landscape; no legislative text counts or timeline analysis provided in the excerpt.

high negative Artificial Intelligence in Israel, Trends, Developments, and... existence and form of statutory and regulatory frameworks governing AI in employ...

At the structural and macroeconomic level, artificial intelligence is reshaping the balance of power within the labor market and contributes to a gradual shift toward employer-driven dynamics.

Author's macroeconomic and structural analysis as presented in the paper; no specific datasets, methods, or sample sizes are reported in the excerpt.

high negative Artificial Intelligence in Israel, Trends, Developments, and... balance of power in the labor market (employer vs. worker influence)

Seed quality bounds what search can achieve: evolution can refine and extend an existing mechanism, but cannot compensate for a weak foundation.

Authors' experimental observations and analysis comparing outcomes starting from different seed designs (qualitative conclusion drawn from experimental runs).

high negative Agentic Architect: An Agentic AI Framework for Architecture ... innovation_output

Strong heuristic, single-agent RL, and multi-agent RL baselines (including Greedy, SAC, MAPPO, and MADDPG) achieved net profit in the range $0.58M--$0.70M in the same experiments.

Empirical comparison in the paper's experiments on the NYC-taxi-based EV fleet simulator listing baseline methods and their reported net profits ($0.58M--$0.70M).

high negative Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... net profit of baseline methods (Greedy, SAC, MAPPO, MADDPG)

Humans are more aggressive negotiators, accepting deals without a counteroffer only 56.3% of the time compared to 67.6% for LM-based agents.

Quantitative comparison reported in the user study (acceptance rates for humans vs LM-based agents).

high negative Cooperate to Compete: Strategic Coordination in Multi-Agent ... rate of accepting deals without a counteroffer

« Prev 1 2 3 … 20 21 22 … 233 234 Next »