Evidence (11677 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	738	1617
Governance & Regulation	671	334	160	99	1285
Organizational Efficiency	626	147	105	70	955
Technology Adoption Rate	502	176	98	78	861
Research Productivity	349	109	48	322	838
Output Quality	391	121	45	40	597
Firm Productivity	385	46	85	17	539
Decision Quality	277	145	63	34	526
AI Safety & Ethics	189	244	59	30	526
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	106	40	6	188
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	79	8	1	152
Regulatory Compliance	69	66	14	3	152
Training Effectiveness	82	16	13	18	131
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Training data scarcity is an emerging challenge for organizations that aim to train proprietary LLMs.

Paper highlights training data scarcity as a challenge in its analysis and discussion sections (qualitative observation).

high negative Buy Or Build? A Practitioner’s Framework for Large Language ... feasibility of training proprietary LLMs (availability of training data)

A gender gap persists, concentrated in the most exposed occupations.

Stratified/descriptive and regression analyses of the 2024 EWCS showing gender differences in self-reported generative AI adoption, with the gap largest among occupations with highest exposure; sample >36,600 workers across 35 countries.

high negative Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI by gender

An alternative specification that makes different choices about the timing of the pervasiveness of AI yields less robust results, though it also suggests that AI is labor saving.

Reported sensitivity analysis / alternative empirical specification in the paper; authors state the alternative yields less robust results but still indicates labor-saving effects.

high negative Early Estimates of the Impact of AI Within BEA’s Industry Ec... labor use (labor-saving effect)

Our baseline model finds evidence that AI is input saving.

Outcome reported from the baseline empirical specification indicating reductions in inputs associated with AI (authors' baseline model results).

high negative Early Estimates of the Impact of AI Within BEA’s Industry Ec... use of inputs (e.g., labor/capital inputs)

AI is driving states to reconsider interdependence not as the source of peace, but as a battlefield of power.

Normative and interpretive conclusion drawn from the paper's analysis of AI's geopolitical implications; no empirical data or sample reported in the abstract.

high negative ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... states' strategic framing of interdependence (from peace-building to power conte...

AI is redefining foreign policy in a multipolar world by making the line between economic cooperation and strategic vulnerability indistinct.

Theoretical claim and synthesis in the paper's thesis; no empirical evidence or sample size provided in the abstract.

high negative ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... ambiguity between economic cooperation and strategic vulnerability in foreign po...

AI is reshaping economic relationships between countries that were previously sources of mutually beneficial relations into instruments of coercion.

The paper presents a theoretical analysis drawing on international political economy and foreign policy theory; no empirical measurements reported in the abstract.

high negative ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... transformation of international economic relationships from cooperation to coerc...

AI enhances the weaponization of economic interdependence by enabling states to monitor, predict, manipulate, and disrupt transnational networks with unprecedented accuracy.

The paper advances a theoretical argument and synthesis of international political economy and foreign policy literatures; no empirical sample or quantitative data reported in the abstract.

high negative ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... capacity to monitor, predict, manipulate, and disrupt transnational networks

The infrastructure for cross-user agent collaboration is entirely absent, let alone the governance mechanisms needed to secure it.

Authoritative claim in paper framing the research gap; presented as observational/argumentative (no empirical audit reported).

high negative ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... availability of cross-user collaboration infrastructure and governance mechanism...

Current AI agent frameworks have made remarkable progress in automating individual tasks, yet all existing systems serve a single user.

Statement in paper's introduction/positioning; conceptual survey-style claim (no empirical study or systematic benchmark reported).

high negative ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... automation scope (single-user vs multi-user)

Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations.

Paper asserts that existing/standard benchmarks do not adequately isolate parsing and computation-orchestration abilities, motivating the new benchmark.

high negative Time Series Augmented Generation for Financial Applications benchmark adequacy for isolating parsing/computation orchestration

As multimodal AI achieves human-parity understanding of speech and gesture, [the keyboard's] necessity dissolves.

Theoretical claim supported by multidisciplinary review (history, neuroscience, technology, organizational studies); no quantified empirical test reported.

high negative The Instrumental Dissolution of Typing: Why AI Challenges th... necessity/usage of keyboard as default input

General-purpose LLMs pose misinformation risks for development and policy experts, lacking epistemic humility for verifiable outputs.

Conceptual/argumentative claim stated in the paper's motivation; no empirical test reported in the abstract.

high negative Learning from AVA: Early Lessons from a Curated and Trustwor... misinformation risk / epistemic humility

Current session-based context handling (sessions ending, context windows filling, memory APIs returning flat facts) produces intelligence that is powerful per session but amnesiac across time.

Descriptive diagnostic argument in the paper; no empirical measurement reported in this text.

high negative The Continuity Layer: Why Intelligence Needs an Architecture... temporal persistence of model 'understanding' (memory/continuity)

There was a nonsignificant absolute retest performance reduction in the AI condition and a larger retest performance decrement in the AI condition (i.e., retention decreased more after using Copilot).

Comparison of retest (one-week) performance across conditions reported in results; authors report a nonsignificant reduction and larger decrement for the AI/Copilot condition (n=22).

high negative Fast and Forgettable: A Controlled Study of Novices' Perform... retest performance (learning retention) after one week

The US restricts mobility and knowledge flows and challenges regulatory efforts to protect its advantage.

Descriptive claim about US strategy (policy observation stated in the paper's framing; not quantified in the excerpt).

high negative Polarization and Integration in Global AI Research policy of restricting mobility and knowledge flows / effects on regulatory effor...

The AI race amplifies security risks and international tensions.

Introductory/interpretive claim motivating the study (no specific empirical quantification provided in the excerpt).

high negative Polarization and Integration in Global AI Research security risks and international tensions

The US and China form two poles around which global AI research increasingly revolves (i.e., global AI research is polarizing around these two countries).

Longitudinal network analysis of international collaboration and citation patterns derived from publication data compared to random realizations.

high negative Polarization and Integration in Global AI Research degree of polarization in global AI research networks

The US and China have long diverged in both cross-country collaboration and citation links, forming two poles around which global AI research increasingly revolves.

Large-scale data of scientific publications spanning three decades; analysis comparing cross-country collaboration and citation links to their random realizations (null models).

high negative Polarization and Integration in Global AI Research cross-country collaboration and citation links

Under logit demand and symmetric rivals, the QoS gap is strictly decreasing in API price and rival entry elasticity.

Comparative statics derived from the analytical model (logit demand, symmetric rivals).

high negative The Inference Bottleneck: A Formal Model of Vertical Foreclo... QoS gap

Current operational approaches typically involve scattered testing tools, resulting in partial coverage and errors that surface only after deployment.

Authors' characterization of industry practice and limitations (assertion in paper; no empirical sample size reported in abstract).

high negative Aether: Network Validation Using Agentic AI and Digital Twin test coverage and post-deployment error incidence

Network change validation remains a critical yet predominantly manual, time-consuming, and error-prone process in modern network operations.

Statement in paper framing the problem; based on authors' characterization of current operational practice (no empirical sample size reported in abstract).

high negative Aether: Network Validation Using Agentic AI and Digital Twin manual effort / error-proneness of network change validation

Thick subjectivist theories of meaning in life and meaningful work—those theories that emphasize that meaning-conferring activities are historically formed—enable us to appreciate how some losses cannot be made up, even if there are in principle ample alternative sources of meaning to be found elsewhere.

Theoretical claim about the explanatory power of 'thick subjectivist' normative theories; argued via conceptual philosophical analysis in the paper (no empirical testing reported).

high negative Is artificial intelligence a threat to meaningful work and l... capacity of theoretical framework (thick subjectivism) to account for non-substi...

Even if there are rich non-work sources of meaning, this does not entail that there is not a significant and multi-faceted loss of meaning, one that cannot be compensated for or offset elsewhere.

Normative/philosophical argument presented in the paper (conceptual reasoning rather than empirical measurement; no sample size).

high negative Is artificial intelligence a threat to meaningful work and l... loss of meaning due to automation and the (in)ability of non-work sources to com...

The argument that non-work goods can replace work-derived meaning fails to consider the embeddedness and thickness of meaning in human lives.

Philosophical/theoretical critique based on conceptual analysis (author's argument invoking the notions of embeddedness and thickness of meaning; no empirical study reported).

high negative Is artificial intelligence a threat to meaningful work and l... adequacy of non-work sources to substitute for work-derived meaning

The paper identifies governance challenges such as accountability gaps, digital sovereignty risks, ethical pluralism, and strategic weaponization arising from embedding AI in diplomatic practice.

Conceptual and normative analysis section of the paper outlining risks and governance challenges; illustrated by examples and argumentation.

high negative Strategic Cognition and Artificial Diplomacy: Designing Huma... presence of governance risks (accountability gaps, digital sovereignty, ethical ...

Traditional machine learning approaches, including the baseline methodology proposed in previous studies, typically optimize global predictive accuracy and therefore fail to capture business-critical outcomes, especially the identification of high-risk clients.

Conceptual critique and literature/contextual claim in the paper; contrasted with the study's business-aware methods (no direct external benchmarking numbers provided in the abstract).

high negative Advanced Insurance Risk Modeling for Pseudo-New Customers Us... identification_of_high-risk_clients

Classifying customers without a prior history at a given company is particularly challenging due to the absence of historical behavior, extreme class imbalance, heavy-tailed loss distributions, and strict operational constraints.

Argumentation / problem statement in the paper (no empirical test reported); descriptive characterization of the insurance cold-start classification problem.

high negative Advanced Insurance Risk Modeling for Pseudo-New Customers Us... classification_difficulty

Thin training coverage fosters anxiety about substitution and slows diffusion of AI tools.

Reported associations from surveys of mid-level managers and technical staff, interviews, and document analysis across cases; thematic coding identified links between limited training, worker anxiety, and slower diffusion. (Sample size not reported.)

high negative Overcoming Resistance to Change: Artificial Intelligence in ... worker anxiety and speed of diffusion/adoption

Upstream textile SMEs frequently exhibit constrained supply chain resilience owing to persistent information latency and structural dependence on downstream orders.

Background/contextual claim stated in paper (motivation for study); no specific quantitative test reported in abstract.

high negative Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... supply chain resilience (constrained due to information latency and downstream o...

There exist inequalities in the emergence of algorithmic bias and in transparency of these systems.

Paper states that inequalities and lack of transparency were observed/identified (citing Memarian, 2023; Bello, 2023; Gambacorta et al., 2024) and discusses these as findings.

high negative A Machine Learning Perspective on FinTech-Driven Inclusion: ... inequalities related to algorithmic bias and transparency

Algorithmic bias in automated credit scoring systems may block marginalized groups from accessing financial services.

Explicit statement in the introduction citing prior literature (Agboola, 2025; Nwafor et al., 2024; Oguntibeju, 2024) and motivating the study.

high negative A Machine Learning Perspective on FinTech-Driven Inclusion: ... access to credit for marginalized groups

The pharmaceutical R&D process is persistently challenged by high financial costs, protracted timelines, and remarkably low success rates.

Background statement in the review synthesizing prior literature and field knowledge; no original empirical data or sample sizes reported in the provided text.

high negative Artificial intelligence in drug discovery from advanced mole... financial costs, timelines, and success rates of pharmaceutical R&D

Platforms can exploit workers' uncertainty about the cost of labor to effectively suppress wages.

Interpretation / implication drawn from the theoretical model and the result that a platform can achieve coverage while paying only O(log(M)/M) fraction of total labor cost under assumptions about workers' cost estimates.

high negative Stochastic wage suppression on gig platforms and how to orga... worker wages / wage suppression

There exists a simple pricing strategy for the platform that covers all M tasks with wait time O(M) while paying only an O(log(M)/M) fraction of the total cost of labor.

Theoretical result from the paper's posted-price procurement model under stated assumptions on workers' estimated costs; formal analysis/proof showing existence of such a pricing strategy for general M (no empirical sample).

high negative Stochastic wage suppression on gig platforms and how to orga... fraction of total labor cost paid by the platform (platform payments / total wor...

Because the technical threshold for this transition is already crossed at modest engineering effort, the window for protective frameworks covering disclosure, consent, compensation and deployment restriction is the present, while deployment remains optional rather than infrastructural.

Authors' normative claim based on their implementation (distillation and deployment) and interpretation that modest engineering sufficed; used to argue policy urgency for disclosure/consent/compensation frameworks.

high negative The Relic Condition: When Published Scholarship Becomes Mate... need for protective policy frameworks and timing

We term this the Relic condition: when publication systems make stable reasoning architectures legible, extractable and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement.

Conceptual framing introduced by the authors as an interpretation of the observed results and their implications; not an empirical measurement but a named condition/argument.

high negative The Relic Condition: When Published Scholarship Becomes Mate... conceptual risk of intellectual-labor replacement derived from extractable publi...

Agency in software engineering is primarily constrained by organizational policies rather than individual preferences.

Authors' synthesis of qualitative results across the ACTA/Delphi and task/review phases indicating organizational policy factors were cited as primary constraints.

high negative From Junior to Senior: Allocating Agency and Navigating Prof... Primary source of constraint on developer agency (organizational policy vs indiv...

The authors identify five 'decoys' that seemingly critique—but in actuality co-constitute—AI's emergent power relations and material political economy.

Analytical contribution of the paper: identification and conceptual description of five decoys based on literature synthesis; this is a descriptive/theoretical taxonomy rather than an empirical enumeration with sample size.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... presence and role of five specific decoys in shaping AI power relations

Decoys contribute to the network-making power that is at the heart of the Project's extraction and exploitation.

Theoretical synthesis and interpretive argument grounded in literature across relevant fields; the paper posits a mechanism (decoys → strengthened networks → increased extraction/exploitation) but provides no empirical quantification.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... network-making power and related extraction/exploitation

Decoys often create the illusion of accountability while masking the emerging political economies that the Project of AI has set into motion.

Conceptual critique supported by literature from communication, STS, and economic sociology; argument that particular practices/instruments function rhetorically to appear accountable while obscuring material political economy. No empirical sample or quantified measures reported.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... perceived accountability versus actual visibility of political economy

As AI funders and developers expand their access to resources and configure sociotechnical conditions, they benefit from decoys that animate scholars, critics, policymakers, journalists, and the public into co-constructing industry-empowering AI futures.

Theoretical analysis and literature review; paper identifies and interprets how discursive and institutional phenomena (termed 'decoys') function to produce consent and co-construction of industry-aligned futures. No empirical sample size provided.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... co-construction of industry-empowering AI futures by multiple societal actors

Those who fund and develop AI systems operate through and seek to sustain networks of power and wealth.

Conceptual argument and literature synthesis drawing on communication studies, science & technology studies (STS), and economic sociology; no empirical sample reported.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... operation and maintenance of networks of power and wealth by AI funders/develope...

In the geographical network, both technological diversity and technological proximity inhibit main path formation, implying macro-regional evolution requires specialized focus and complementary knowledge.

ERGM results for the geographical diffusion layer showing negative (inhibitory) associations for diversity and proximity variables; interpreted in regional evolution context.

high negative Mapping China’s digital transformation: a multilayer network... effect of diversity and proximity on main path formation (geographical layer)

Existing evaluations of large language models remain limited to judgmental tasks in simple formats, such as binary or multiple-choice questions, and do not capture forecasting over continuous quantities.

Literature/benchmark critique asserted in the paper (argument that current benchmarks focus on simple judgmental formats and miss continuous numerical forecasting capabilities).

high negative QuantSightBench: Evaluating LLM Quantitative Forecasting wit... scope/coverage of existing evaluation formats

Calibration degrades sharply at extreme magnitudes, revealing systematic overconfidence across all evaluated models.

Empirical observations from QuantSightBench evaluation showing model calibration performance as a function of magnitude (paper statement noting sharp degradation and overconfidence at extremes).

high negative QuantSightBench: Evaluating LLM Quantitative Forecasting wit... calibration / overconfidence of prediction intervals across magnitudes

The top performers Gemini 3.1 Pro (79.1%), Grok 4 (76.4%), and GPT-5.4 (75.3%) all fall at least 10 percentage points short of the 90% coverage target.

Reported empirical coverage percentages from evaluation on QuantSightBench for the listed models (paper provides these percentage values).

high negative QuantSightBench: Evaluating LLM Quantitative Forecasting wit... empirical coverage (prediction interval coverage) for specific models

None of the 11 evaluated frontier and open-weight models achieves the 90% coverage target.

Empirical evaluation on the newly introduced QuantSightBench benchmark across 11 frontier and open-weight models; models were assessed on empirical coverage of prediction intervals versus a 90% target (paper statement).

high negative QuantSightBench: Evaluating LLM Quantitative Forecasting wit... empirical coverage (prediction interval coverage)

The study identified significant implementation challenges including algorithmic bias, digital divide concerns, data privacy risks, and low technology readiness among HR teams in Tier 2 cities.

Synthesis of qualitative case study findings from 4 organizations plus survey responses (N=150) reporting barriers and risks encountered during adoption.

high negative A Study on the Effectiveness of Technology-Driven Recruitmen... implementation challenges / risks

Current attack policies do not saturate LinuxArena (human-crafted attacks evade monitors at substantially higher rates than model-generated attacks, indicating headroom for attackers).

Empirical observation comparing human-crafted attacks (LaStraj) and elicited model-generated attacks; authors interpret higher human evasion rates as evidence that current automated attack policies have not saturated the challenge posed by LinuxArena.

high negative LinuxArena: A Control Setting for AI Agents in Live Producti... relative performance gap between human-crafted and model-generated attacks (impl...

« Prev 1 2 3 … 24 25 26 … 233 234 Next »