Evidence (4114 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Innovation Remove filter

Existing approaches remain fragmented across formal verification, runtime assurance, neuro-symbolic reasoning and trustworthy Artificial Intelligence (AI) research communities.

Author claim about the state of the research landscape; asserted fragmentation without bibliometric or survey data provided in excerpt.

high negative ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... degree of integration/coordination across research communities

Current reasoning systems still suffer from hidden logical inconsistencies, hallucinated symbolic transitions, unsupported theorem applications, and limited reliability guarantees.

Author assertion identifying failure modes of current reasoning systems; presented qualitatively without quantitative error rates or experimental sample sizes in the excerpt.

high negative ReasonOps: A Unified Operational Paradigm for Trustworthy Ve... reliability / correctness of reasoning systems

Translators have functioned as 'invisible teachers' of AI—through the construction of translation memories, post-editing, and quality assessment—without recognition as teachers of models.

Conceptual framing and synthesis of workflow practices (TM construction, post-editing, QA) and their role as supervision for ML; qualitative argument and illustrative examples in the paper. No quantitative sample reported.

high negative Translators as Invisible Teachers of AI: Copyright, Translat... lack of recognition/attribution for contributors who effectively trained AI

Translators' renditions have been bought as deliverables under contract, segmented as technical objects, and processed as 'information analysis' data under copyright law—resulting in the loss of moral, creative, and economic attribution to the translators who produced them.

Comparative reading of contract practices and copyright treatment (legal/contractual analysis across jurisdictions), descriptive examples of how translations are delivered, segmented, and processed; qualitative argumentation in the paper. No quantitative sample reported.

high negative Translators as Invisible Teachers of AI: Copyright, Translat... loss of attribution and economic recognition for translators

Existing legal perspectives on the intellectual property of AI-generated works and related enforcement challenges are inadequately addressed under current frameworks.

Analytic review of legal perspectives and enforcement issues presented in the paper; conclusion based on the author's analysis rather than quantitative data.

high negative Examining the Challenges of Intellectual Property in AI-Gene... adequacy of legal perspectives and enforcement mechanisms for AI-generated IP

The current Iranian legal framework contains significant regulatory gaps with respect to intellectual property protection for AI-generated works.

Comparative legal analysis of Iranian statutes (1969 Law for the Protection of Authors, Composers, and Artists Rights and the Patent and Trademark Registration Law) against other legal systems (European Union, United Kingdom, United States); the paper's findings are based on legal/textual analysis rather than empirical sampling.

high negative Examining the Challenges of Intellectual Property in AI-Gene... presence of regulatory gaps in Iranian IP law regarding AI-generated works

The most critical intellectual property issue raised by AI-generated outputs is ownership of moral and economic rights in the absence of a human creator.

Theoretical discussion and literature review presented in the paper identifying legal and doctrinal questions around authorship and ownership when no human creator is involved (no empirical sample size).

high negative Examining the Challenges of Intellectual Property in AI-Gene... clarity/assignment of moral and economic IP rights for works lacking a human aut...

There is an urgent question of how humans can effectively supervise and control an economy operated by AI agents when this system may expand beyond the capacity of traditional governance.

Framed as a central research/policy concern in the paper's abstract; conceptual argument rather than empirical finding.

high negative Regulatory Policy for the Agent Economy in the Digital Age: ... capacity of traditional governance to supervise/control AI-operated economy

The Agent Economy raises new regulatory challenges concerning data privacy, security, ethics, and the risk of job displacement.

Stated in paper abstract as identified risks; based on literature synthesis and comparative policy analysis approach (method described), but no empirical incidence metrics reported.

high negative Regulatory Policy for the Agent Economy in the Digital Age: ... regulatory challenges related to privacy, security, ethics, and job displacement...

Under water-constrained conditions, the framework achieves reductions of approximately 3-5% in generation-related freshwater withdrawals.

Quantitative results from simulation case studies on the IEEE test systems (reported percentage reduction ~3-5%); sample context: water-constrained simulation scenarios on IEEE 30-bus and 118-bus systems (sample_size = 2 test systems).

high negative From Accounting to Coordination: A Virtual Water-Aware Elect... generation-related freshwater withdrawals

Because they are decoupled from the optimization process, static statistical accounting approaches are incapable of guiding workload relocation or power dispatch to mitigate water stress.

Argumentative claim in paper about limitations of static accounting methods with respect to guiding operational decisions (methodological critique).

high negative From Accounting to Coordination: A Virtual Water-Aware Elect... suitability of static accounting to guide workload relocation and power dispatch...

Existing approaches typically rely on static statistical accounting to quantify these water footprints, but such static methods fail to capture how dispatch optimization and workload relocation dynamically affect water withdrawals.

Critical assessment in paper contrasting prior static statistical accounting approaches with dynamic needs; presented as methodological critique (no particular empirical sample in excerpt).

high negative From Accounting to Coordination: A Virtual Water-Aware Elect... accuracy/adequacy of static statistical accounting methods for water footprint a...

As these systems scale, the bottleneck shifts away from raw model capability toward coordination.

Analytical/argumentative claim in the paper framing a shift in primary constraint; no empirical study or quantified benchmark reported.

high negative Foundation Protocol: A Coordination Layer for Agentic Societ... primary system bottleneck (model capability versus coordination capacity)

Current systems still struggle with evidence preservation, reproducibility, weak-direction rejection, provenance tracking, cross-domain robustness, and accountable scientific closure.

Survey-identified recurring failure modes and limitations reported in literature and system descriptions; qualitative synthesis.

high negative AutoResearch AI: Towards AI-Powered Research Automation for ... capabilities related to evidence preservation, reproducibility, rejection of wea...

Current systems remain fragmented, differing in autonomy, domain scope, execution environment, validation mechanism, and human oversight.

Survey of existing systems and categorization across the listed dimensions; descriptive synthesis rather than an empirical meta-analysis.

high negative AutoResearch AI: Towards AI-Powered Research Automation for ... heterogeneity/fragmentation across AI research systems along autonomy, domain sc...

AI power demand is growing at an unprecedented rate while power grids are often ailing and struggle to keep up.

Statement in paper's motivation/background; no empirical method or sample size reported in the abstract.

high negative XWind: A Cross-site Router for Large Language Model Inferenc... strain on power grids relative to AI power demand

Monotonic baselines collapse when extrapolating beyond the training regime (e.g., predicting a 12B model up to 307B tokens) whereas the Shannon Scaling Law remains predictive.

Empirical comparison on the held-out 12B extrapolation: authors report collapse/failure of monotonic baseline scaling laws in that regime contrasted with Shannon law's successful prediction (pooled R^2 reported).

high negative LLMs as Noisy Channels: A Shannon Perspective on Model Capac... extrapolative predictive failure/success of baseline vs proposed scaling laws

This Shannon perspective reveals a fundamental Shannon capacity for LLMs: scaling model size or data without preserving a sufficient signal-to-noise ratio (SNR) inevitably amplifies noise, inducing a transition from monotonic improvement to U-shaped performance degradation.

Theoretical argument derived from the Shannon-Hartley based formulation plus supporting empirical examples claimed in the paper showing non-monotonic (U-shaped) loss/accuracy behavior when SNR is insufficient.

high negative LLMs as Noisy Channels: A Shannon Perspective on Model Capac... performance vs. scale behavior (transition from monotonic improvement to U-shape...

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute.

Author assertion based on literature/contextual observation and motivating examples (catastrophic overtraining, quantization-induced degradation) referenced in the paper; no specific numeric sample provided in the excerpt.

high negative LLMs as Noisy Channels: A Shannon Perspective on Model Capac... ability of prior scaling laws to explain non-monotonic performance phenomena (e....

Commercial or dual-use AI models and semiconductors do not meet the security exception criteria under GATT Article XXI(b), so security interests should be interpreted restrainedly.

Legal argument and interpretive analysis in the paper contending that the GATT Article XXI(b) security exception does not encompass routine commercial or dual-use AI models and semiconductors; doctrinal legal reasoning rather than empirical measurement.

high negative Strategic Stalemates: The Paradox of Export Controls in the ... applicability of GATT Article XXI(b) security exception to dual-use/commercial A...

Overusing export controls can complicate dispute resolution and hinder AI progress.

Normative and legal-political argument in the paper: overuse raises legal disputes (e.g., WTO litigation) and may slow cross-border AI development and diffusion (qualitative reasoning).

high negative Strategic Stalemates: The Paradox of Export Controls in the ... frequency/complexity of trade disputes and pace of AI progress/development

Overly strict or arbitrary controls may violate WTO obligations.

Legal analysis in the paper arguing that some export controls could conflict with WTO law (GATT) depending on scope and justification; interpretive legal reasoning cited.

high negative Strategic Stalemates: The Paradox of Export Controls in the ... compatibility of export controls with WTO obligations

The long-term effectiveness of export controls is questionable.

Paper's argumentative assessment drawing on historical examples and theoretical considerations (qualitative reasoning rather than quantitative causal inference).

high negative Strategic Stalemates: The Paradox of Export Controls in the ... effectiveness of export controls over the long term

China responded with export curbs on critical minerals and filed a WTO complaint against the U.S. under GATT.

Factual claim citing China's counter-measures (export curbs) and legal action (WTO complaint under GATT) as described in the paper.

high negative Strategic Stalemates: The Paradox of Export Controls in the ... China's retaliatory trade measures and litigation

Technical bottlenecks (cross-border data compliance, algorithm interpretability) and ethical challenges (algorithmic bias, privacy infringement, cultural conflicts) are intertwined impediments to intelligent international marketing.

Synthesis of challenges identified across the reviewed literature (systematic review and content analysis, 2010–2025) as reported in the paper.

high negative Research on International Marketing in the Context of Intell... presence and interrelation of technical and ethical barriers

Traditional international marketing theories, constrained by static assumptions and linear logic, struggle to explain intelligent contexts.

Conclusion from the paper's systematic review and content analysis of core literature (2010–2025); no quantitative test or sample size reported in the summary.

high negative Research on International Marketing in the Context of Intell... theoretical explanatory adequacy of traditional international marketing theories

Because contracts are negotiated by legal departments alone, many apparent legal disputes are incentive misalignment problems that only scientists at the table can correctly diagnose.

Argumentative claim presented in the paper (normative/diagnostic); no empirical study or sample provided in the excerpt.

high negative Position: The Pre/Post-Training Boundary Should Govern IP in... quality of contract negotiations / correct diagnosis of incentives in disputes

These failures are not for scientific reasons, but because academics must publish while companies must protect models trained on proprietary data, and no standard contract framework resolves this tension.

The paper presents this as the causal explanation (analytical/argumentative claim); no empirical testing or sample reported in the provided text.

high negative Position: The Pre/Post-Training Boundary Should Govern IP in... incentive alignment between academic publication requirements and company IP pro...

Industry-academia ML collaborations routinely fail to launch.

Asserted in the paper as an empirical observation/statement; no empirical methods, data, or sample size reported in the provided text (argument/anecdote).

high negative Position: The Pre/Post-Training Boundary Should Govern IP in... success rate of launching industry-academia ML collaborations

Current regulatory frameworks—designed for human-intermediated payments—are ill-equipped to address the dynamic and decentralised nature of agent-led transactions.

Regulatory and legal analysis asserted in the abstract (argument that existing frameworks are mismatched to agent-led payments).

high negative AI Agents in Payments: Applications, Risks and Regulations adequacy of existing regulatory frameworks for agent-led transactions

The article identifies and categorises a range of technical, legal and societal risks, including cybersecurity vulnerabilities, liability gaps, regulatory non-compliance, and potential economic disruption.

Risk identification and categorisation presented in the paper (qualitative analysis and case studies referenced in the abstract). No quantitative risk measurement reported in the abstract.

high negative AI Agents in Payments: Applications, Risks and Regulations technical, legal and societal risks (cybersecurity, liability, regulatory non-co...

The lack of prediction stability and predictability can lead to advertiser-perceivable problems such as repeatability issues, cold start, and under-exploration.

Stated as an intuitive/motivational claim in the paper linking instability to advertiser-facing problems; no empirical quantification provided in the excerpt.

high negative LLM Retrieval for Stable and Predictable Ad Recommendations repeatability, cold start, under-exploration (advertiser-perceived issues)

Traditional ads recommendation systems have primarily focused on optimizing for prediction accuracy of click or conversion events using canonical metrics such as recall or normalized discounted cumulative gain (NDCG).

Background/contextual claim about prior work and standard practice; stated in the paper as motivation (no empirical evidence provided in the excerpt).

high negative LLM Retrieval for Stable and Predictable Ad Recommendations optimization focus on click/conversion prediction accuracy (recall, NDCG)

AIO is negatively associated with the carbon emission intensity of upstream suppliers.

Authors report a negative association between firms' AIO and the carbon emission intensity of their upstream suppliers in the empirical results using Chinese listed firms (2010–2023).

high negative Artificial intelligence orientation and decarbonization spil... carbon emission intensity (upstream suppliers)

AIO is negatively associated with the carbon emission intensity of industry peers.

Authors report a negative association between a firm's AIO and the carbon emission intensity of its industry peers based on their empirical analyses of Chinese listed companies over 2010–2023.

high negative Artificial intelligence orientation and decarbonization spil... carbon emission intensity (industry peers)

Stronger AIO is associated with lower carbon emission intensity within the focal firm.

Empirical association reported between firm-level AIO (measured via LLMs) and firm carbon emission intensity in the authors' analysis of Chinese listed firms (2010–2023); result described as a negative relationship.

high negative Artificial intelligence orientation and decarbonization spil... carbon emission intensity (focal firm)

Kamunun Ar-Ge harcamalarının etkin ve verimli kullanılmadığına işaret eden bulgular vardır (kamu Ar-Ge negatif ilişki gösterdiği için).

Negatif ilişkiyi gösteren rassal etkiler regresyon sonuçlarına dayanan çıkarım (G8 + Türkiye, 2010-2020).

high negative AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... etkinlik/verimlilik (yorumsal çıkarım, doğrudan ölçülmemiş)

Ekonomik büyüme ile yapay zekâ patent sayıları arasında negatif bir ilişki bulunmaktadır.

Panel regresyon (random effects) sonuçları (G8 + Türkiye, 2010-2020) raporlanmıştır; ekonomik büyüme (muhtemelen GSMH büyüme oranı) değişkeninin AI patent sayıları ile negatif ilişki gösterdiği bildirilmiştir.

high negative AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... AI patent sayıları (yapay zekâ patent sayısı)

Kamunun Ar-Ge harcamaları ile yapay zekâ patent sayıları arasında negatif bir ilişki bulunmaktadır.

Rassal etkiler panel regresyonu üzerine raporlanan sonuçlar (G8 + Türkiye, 2010-2020); kamu Ar-Ge harcamaları değişkeninin AI patent sayısı ile negatif ilişki gösterdiği bildirilmiştir.

high negative AR-GE HARCAMALARININ VE VERGİ TEŞVİKLERİNİN YAPAY ZEKAYA ETK... AI patent sayıları (yapay zekâ patent sayısı)

Science-to-technology knowledge flow in AI has been insufficiently examined in a systematic and structural way.

Literature-gap claim in the paper motivating the study.

high negative Knowledge flows from science to AI technology: Identifying c... extent of systematic/structural study of science-to-technology knowledge flow in...

Unrestricted frontier-scale checkpoint synthesis remains open (i.e., not yet solved).

Authors' assessment in the abstract noting current limits; asserts that unrestricted synthesis at frontier/model-scale has not been achieved.

high negative Position: Weight Space Should Be a First-Class Generative AI... feasibility/status of unrestricted frontier-scale checkpoint synthesis

In the context of search retrieval, current cold-start models suffer from the misalignment between training objectives and online business metrics, and they lack effective mechanisms to measure an item's growth potential.

Claim made in paper as motivation/background; no empirical details provided in the excerpt.

high negative Towards Sustainable Growth: A Multi-Value-Aware Retrieval Fr... alignment between model training objectives and online business metrics / abilit...

Existing systems tend to prioritize presenting users with already popular items, a phenomenon often referred to as the "Matthew effect".

Statement/observation in the paper; presented as background/motivation (no empirical evidence or sample size reported in the excerpt).

high negative Towards Sustainable Growth: A Multi-Value-Aware Retrieval Fr... presentation/exposure bias toward popular items

An analysis of a 21-instrument inventory identifies an incentive gradient where geopolitical and industrial pressures systematically reward surface-level behavioral proxies over deep structural verification.

Empirical/qualitative analysis of an inventory of 21 governance instruments compiled and analysed in the paper (n=21 instruments).

high negative Position: Behavioural Assurance Cannot Verify the Safety Cla... governance_and_regulation

Behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify.

The paper's normative and conceptual argument synthesising governance requirements and the epistemic limits of behavioural testing.

high negative Position: Behavioural Assurance Cannot Verify the Safety Cla... ai_safety_and_ethics

Current assurance methodologies (primarily behavioural evaluations and red-teaming) are epistemically limited to observable model outputs and cannot verify latent representations or long-horizon agentic behaviours.

Conceptual/analytic argument and review of existing assurance methodologies presented in the paper.

high negative Position: Behavioural Assurance Cannot Verify the Safety Cla... ai_safety_and_ethics

Policy responses in Europe are fragmented across the EU and Member State levels and do not match the potential scale of disruption from AGI.

Paper's policy analysis of EU- and Member-State-level responses (stated in abstract); no quantitative metrics provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

Europe has low rates of industrial AI adoption.

Paper's empirical/policy review claiming low industrial AI adoption in Europe (as stated in abstract); the abstract does not provide numeric adoption rates or sample sizes.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... adoption_rate

Europe exhibits structural weaknesses in compute infrastructure and talent retention.

Paper's structural assessment of Europe's AI value-chain capabilities (stated in abstract); no numerical measures provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... adoption_rate

Europe has limited strategic awareness of frontier AI progress.

Paper's assessment of Europe's positioning based on policy analysis and review of capabilities monitoring (as stated in abstract); no supporting metrics or sample sizes provided in the abstract.

high negative Europe and the Geopolitics of AGI: The Need for a Preparedne... governance_and_regulation

« Prev 1 2 3 … 6 7 8 … 82 83 Next »