Evidence (14156 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	761	200	101	904	2020
Governance & Regulation	829	400	191	122	1566
Organizational Efficiency	784	193	125	84	1197
Technology Adoption Rate	637	236	124	97	1103
Research Productivity	431	131	58	340	972
Output Quality	481	183	59	47	770
Decision Quality	332	177	82	49	647
Firm Productivity	439	57	88	20	610
AI Safety & Ethics	218	279	66	33	602
Market Structure	181	170	123	24	503
Task Allocation	214	64	72	33	388
Skill Acquisition	174	62	62	17	315
Innovation Output	204	27	45	18	295
Employment Level	105	54	108	13	282
Fiscal & Macroeconomic	132	69	43	26	277
Consumer Welfare	117	63	42	11	233
Firm Revenue	154	48	26	3	231
Task Completion Time	173	31	8	12	225
Inequality Measures	44	123	50	6	223
Worker Satisfaction	89	65	22	12	188
Error Rate	71	92	10	2	175
Regulatory Compliance	77	69	14	5	165
Automation Exposure	58	56	26	13	156
Training Effectiveness	96	21	14	19	152
Wages & Compensation	77	37	25	6	145
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	81	21	1	115
Hiring & Recruitment	52	7	8	3	70
Creative Output	32	20	8	3	64
Skill Obsolescence	5	47	6	1	59
Social Protection	28	16	8	2	54
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

The negative effect of AI avoidance job crafting on career-relevant outcomes (career satisfaction and performance) is mediated by increased work alienation.

Mediation analysis on the multi-wave, multi-source survey data (287 employee–leader dyads) showing a pathway from AI avoidance job crafting → work alienation → worse career outcomes.

high negative Approach or avoidance? A dual-pathway model of job crafting ... career satisfaction and performance (mediated by work alienation)

AI avoidance job crafting negatively predicts career satisfaction and performance.

Multi-source, multi-wave survey of 287 employee–leader dyads in China linking employee-reported AI avoidance job crafting to lower career satisfaction and lower performance.

high negative Approach or avoidance? A dual-pathway model of job crafting ... career satisfaction and performance

AI-driven job displacement disproportionately affects low-skilled workers.

Reported empirical result from the paper's PLS-SEM analysis on the 351-respondent dataset.

high negative Navigating AI‐Induced Job Displacement and Skill Demands: In... job_displacement

Traditional car-following models, such as the Intelligent Driver Model (IDM), often struggle to generalize across diverse traffic scenarios and typically do not account for fuel efficiency.

Literature-based statement within the paper motivating the study (review of limitations of traditional car-following models). No sample size reported.

high negative Macroscopic Characteristics of Mixed Traffic Flow with Deep ... model generalizability and accounting for fuel efficiency

Analysis of global datasets on energy dependency, economic concentration, debt levels, demographic trends, digital infrastructure, and AI adoption highlights that interconnected systemic risks can amplify economic instability.

Paper reports drawing upon multiple global datasets (energy dependency, economic concentration, debt, demographics, digital infrastructure, AI adoption) to analyze systemic risk interactions; specific datasets, sample sizes, and statistical methods are not detailed in the excerpt.

high negative Beyond Forecasting: Adaptive Economic Preparedness in a Geop... amplification of economic instability by interconnected systemic risks

Events such as supply chain disruptions, oil price surges linked to geopolitical conflicts, and sudden labour market shifts due to reverse migration have exposed the limitations of prediction-based planning frameworks.

Illustrative examples cited in the paper; the claim is supported by referenced global events and the paper's use of global datasets, but no specific empirical case-study sample sizes or quantification are provided in the excerpt.

high negative Beyond Forecasting: Adaptive Economic Preparedness in a Geop... exposure of limitations in prediction-based planning frameworks

Traditional economic models that rely heavily on historical data and linear forecasting are increasingly inadequate in capturing the complexity and unpredictability of contemporary economic shocks.

Conceptual claim supported by discussion and examples of recent shocks (supply chain disruptions, oil price surges, labor market shifts); no specific empirical evaluation or quantified model comparison reported in the excerpt.

high negative Beyond Forecasting: Adaptive Economic Preparedness in a Geop... predictive adequacy of traditional economic models

The global economic system is undergoing a structural transformation characterized by geopolitical tensions, energy price volatility, trade fragmentation, demographic imbalances, and rapid technological disruption driven by artificial intelligence.

Narrative synthesis in the paper drawing on global trends; the paper references global datasets on energy dependency, trade patterns, demographics, and AI adoption (no specific sample size or empirical study detailed in the excerpt).

high negative Beyond Forecasting: Adaptive Economic Preparedness in a Geop... structural transformation of the global economic system (presence of geopolitica...

The main risk is not merely copying, but the possibility that useful capability can be transferred more cheaply than the governance structure that originally accompanied it.

Conceptual threat model articulated in the paper; argued on normative/theoretical grounds without reported empirical measurement or sample.

high negative A Public Theory of Distillation Resistance via Constraint-Co... relative_cost/ease_of_capability_transfer_vs_governance_transmission

Distillation becomes less valuable as a shortcut when high-level capability is coupled to internal stability constraints that shape state transitions over time.

Theoretical argument presented as the paper's core claim; introduces a conceptual mechanism (capability-stability coupling) and argues why this would reduce the usefulness of distillation. No empirical data, experiments, or sample are reported.

high negative A Public Theory of Distillation Resistance via Constraint-Co... value_of_distillation / usefulness_of_distillation_as_a_shortcut

Hallucination and content filtering are the most common frustrations reported across all platforms.

Qualitative and/or survey-coded responses about user frustrations aggregated across platforms (overall N=388); paper reports these two issues as the most common.

high negative Beyond Benchmarks: How Users Evaluate AI Chat Assistants reported frustrations (hallucination and content filtering)

The competence shadow compounds multiplicatively to produce degradation far exceeding naive additive estimates.

Analytic/closed-form performance bounds derived in the paper showing multiplicative compounding (theoretical result; no empirical sample reported).

high negative The Competence Shadow: Theory and Bounds of AI Assistance in... output_quality

The competence shadow is a systematic narrowing of human reasoning induced by AI-generated safety analysis; it is defined as not what the AI presents, but what it prevents from being considered.

Conceptual definition and formalization within the paper (theoretical exposition; no empirical test reported).

high negative The Competence Shadow: Theory and Bounds of AI Assistance in... decision_quality

Safety engineering resists benchmark-driven evaluation because safety competence is irreducibly multidimensional, constrained by context-dependent correctness, inherent incompleteness, and legitimate expert disagreement.

Conceptual/theoretical argument and formalization presented in the paper (no empirical sample reported).

high negative The Competence Shadow: Theory and Bounds of AI Assistance in... output_quality

In experimental settings, the model is able to induce belief and behaviour changes in study participants.

Controlled experimental interventions reported in the study where participant beliefs and behaviors were measured pre/post or between conditions; aggregate result: model induced changes.

high negative Evaluating Language Models for Harmful Manipulation participant beliefs and behaviour changes (manipulative efficacy)

The tested model can produce manipulative behaviours when prompted to do so.

Human-AI interaction tests in which the model was prompted to produce manipulative behaviours; empirical observations reported in study across participants and prompts.

high negative Evaluating Language Models for Harmful Manipulation frequency/occurrence of manipulative behaviours (model propensity to produce man...

Standard evaluation of LLM confidence relies on calibration metrics (ECE, Brier score) that conflate two distinct capacities: how much a model knows (Type-1 sensitivity) and how well it knows what it knows (Type-2 metacognitive sensitivity).

Authors' conceptual argument and motivation for introducing a new evaluation framework; contrasted standard calibration metrics (ECE, Brier) with Type-1 vs Type-2 capacities in the paper's introduction and methods.

high negative Do LLMs Know What They Know? Measuring Metacognitive Efficie... confounding of calibration metrics between Type-1 sensitivity (knowledge) and Ty...

Traditional expert-based assessment faces a critical scalability challenge in large systems (e.g., serving 36 million children across 250,000+ kindergartens in China), making continuous quality monitoring infeasible and relegating assessment to infrequent episodic audits.

Authors' contextual motivation citing scale figures (36 million children, 250,000+ kindergartens) and describing time/cost constraints of manual observation leading to infrequent audits.

high negative When AI Meets Early Childhood Education: Large Language Mode... feasibility/scalability of manual expert-based assessment

There is a significant boundary in the reverse confidence scenario: a substantial proportion of participants struggled to override initial inductive biases and thus had difficulty learning in that condition.

Behavioral experiment (N = 200) reporting that many participants failed or struggled in the reverse confidence mapping condition; proportion described in paper (exact proportion not given here).

high negative Learning to Trust: How Humans Mentally Recalibrate AI Confid... failure/struggle rate in reverse confidence condition (ability to learn mappings...

Preliminary evaluation reveals that current foundation action models struggle substantially with professional desktop applications (~60% task failure rate).

Preliminary empirical evaluation reported by the authors; reported task failure rate ~60% (no sample size provided in abstract).

high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... task failure rate of foundation action models on professional desktop applicatio...

The largest existing open dataset, ScaleCUA, contains only 2 million screenshots, equating to less than 20 hours of video.

Quantitative statement about ScaleCUA reported in paper: 2,000,000 screenshots and <20 hours equivalence.

high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... size/coverage of existing open dataset (ScaleCUA)

Progress toward general-purpose CUAs is bottlenecked by the scarcity of continuous, high-quality human demonstration videos.

Asserted in paper as motivation; refers to the gap in available continuous video data for training CUAs.

high negative CUA-Suite: Massive Human-annotated Video Demonstrations for ... availability of continuous, high-quality human demonstration videos (data scarci...

Refining the state (as above) raises state-action blind mass from 0.0165 at \tau=50 to 0.1253 at \tau=1000.

Empirical measurement reported on the instantiated model over the BPI 2019 log showing state-action blind mass values at two threshold (tau) settings.

high negative The Stochastic Gap: A Markovian Framework for Pre-Deployment... state-action blind mass (measure of unsupported next-step decisions)

Empirical evidence shows that many failures arise from miscalibrated reliance, including overuse when AI is wrong and underuse when it is helpful.

Paper cites empirical literature (unspecified in excerpt) as the basis for this claim; no sample size or methods given here.

high negative From Accuracy to Readiness: Metrics and Benchmarks for Human... failures due to miscalibrated reliance (overreliance/underreliance)

Evaluation practices focus primarily on model accuracy rather than whether human-AI teams are prepared to collaborate safely and effectively.

Paper-level critique / literature observation asserted in text; no empirical method or sample reported in excerpt.

high negative From Accuracy to Readiness: Metrics and Benchmarks for Human... evaluation focus (accuracy vs. team readiness)

The reduction in engagement from AI labeling (AI-generated or AI-enhanced) was particularly pronounced for emotional content compared to rational content.

Interaction of content type (emotional vs. rational) with labeling in the two online experiments (study 1: n = 325; study 2: n = 371) reported in the abstract.

high negative AI content labeling and user engagement on social media: The... affective and behavioral engagement for emotional content

Labeling content as AI-enhanced reduced both affective and behavioral engagement compared to human-created content.

Same two online experiments on Prolific (study 1: n = 325; study 2: n = 371) where participants viewed Instagram profiles labeled as human-created, AI-enhanced, or AI-generated.

high negative AI content labeling and user engagement on social media: The... affective and behavioral engagement

Labeling content as AI-generated reduced both affective and behavioral engagement compared to human-created content.

Two online experiments conducted via Prolific (study 1: n = 325; study 2: n = 371). Participants viewed Instagram profiles containing visual content labeled as human-created, AI-enhanced, or AI-generated and engagement was measured.

high negative AI content labeling and user engagement on social media: The... affective and behavioral engagement

Currently, the region remains reactive as a 'recipient' rather than a 'creator' or an effective partner in the AI ecosystem.

Characterization reported by the authors based on their regional research and field study (qualitative findings from leaders across public/private sectors).

high negative Charting AI Governance Future in the Arab Region: A Policy R... degree of domestic AI creation/innovation versus reception/adoption

This gap hinders the ability of many governments in the region to push their countries toward joining the ranks of those benefiting from the AI revolution—both in developing the public sector and supporting economic growth and social development.

Authors' analysis and interpretation based on the regional research/field study described in the report.

high negative Charting AI Governance Future in the Arab Region: A Policy R... governments' ability to benefit from AI (public sector development; economic and...

The Arab region’s capacity for Artificial Intelligence (AI) governance remains limited relative to the accelerating pace of global AI developments and associated challenges.

Stated conclusion in the executive report based on a regional field study (authors' analysis of interviews/surveys and research across the region).

high negative Charting AI Governance Future in the Arab Region: A Policy R... AI governance capacity

These harms increasingly translate into financial loss through litigation, enforcement penalties, brand erosion, and failed deployments.

Paper argues this linkage using conceptual reasoning and illustrative examples/case vignettes; cites regulatory and market incidents but does not provide systematic empirical estimates or a sample size.

high negative Artificial Intelligence Governance In Corporate Strategy: Et... firm_revenue

AI systems can create material harms: discriminatory outcomes, privacy and security failures, opacity in decision logic, and regulatory noncompliance.

Paper lists these harms as core risks based on prior literature, regulatory developments, and conceptual risk analysis. Presented as well-documented categories rather than as new empirical findings; no sample size reported.

high negative Artificial Intelligence Governance In Corporate Strategy: Et... ai_safety_and_ethics

As artificial intelligence assumes cognitive labor, no existing quantitative framework predicts when human capability loss becomes catastrophic.

Introductory/background claim asserted by authors motivating the study (literature gap claim).

high negative The enrichment paradox: critical capability thresholds and i... absence of prior quantitative frameworks for catastrophic human capability loss

Broader AI scope lowers the critical threshold K* (i.e., more general AI reduces the K* value at which capability collapse occurs).

Model sensitivity analysis / simulations showing K* varies with assumed scope of AI (reported in model calibration discussion).

high negative The enrichment paradox: critical capability thresholds and i... change in critical threshold K* with AI scope

The model identifies a critical threshold K* approximately 0.85 (scope-dependent; broader AI scope lowers K*) beyond which capability collapses abruptly — the 'enrichment paradox.'

Model analysis and simulations calibrated across domains (paper reports computed threshold K* ≈ 0.85 and notes dependence on AI scope).

high negative The enrichment paradox: critical capability thresholds and i... critical delegation/capability threshold (K*) at which human capability collapse...

Reliance on massive, schema-heavy prompts results in prohibitive per-token API costs and high latency, hindering scalable production deployment.

Introductory problem statement in the paper arguing that large context prompts increase per-token API costs and latency for API-based LLMs; no quantitative study or sample size provided for this claim within the excerpt.

high negative Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... latency and per-token API cost

Fabrication risk is not an anomalous glitch but a foreseeable consequence of the technology's design, with direct implications for the evolving duty of technological competence.

Conclusion drawn from the paper's theoretical/physics-based analysis and the simulated scenario; stated in the abstract as the authors' interpretation and policy/legal implication.

high negative When AI output tips to bad but nobody notices: Legal implica... foreseeability of fabrication risk and implications for professional duty/compet...

The paper presents the physics-based analysis in a legal-industry setting by walking through a simulated brief-drafting scenario.

Methodological claim explicitly stated in the abstract: use of a simulated brief-drafting scenario to demonstrate the analysis.

high negative When AI output tips to bad but nobody notices: Legal implica... demonstration of fabrication risk in a simulated legal drafting task (output qua...

Although commonly dismissed as random 'hallucination', recent physics-based analysis of the Transformer's core mechanism reveals a deterministic component: the AI's internal state can cross a calculable threshold, causing its output to flip from reliable legal reasoning to authoritative-sounding fabrication.

Paper cites/relies on 'recent physics-based analysis' of Transformer mechanisms and states that it demonstrates a calculable threshold; the paper also purports to present this science in a legal setting (via simulation). No numeric experimental sample provided in the excerpt.

high negative When AI output tips to bad but nobody notices: Legal implica... transition from reliable reasoning to fabricated outputs (failure mode / interna...

Courts confront a novel threat to the integrity of the adversarial process due to fabricated authorities produced by generative AI.

Asserted in the abstract as a consequence of fabricated outputs; supported by the paper's conceptual argument and simulation reference rather than empirical court-case analysis.

high negative When AI output tips to bad but nobody notices: Legal implica... integrity of the adversarial process / decision quality in courts

Attorneys who unknowingly file such fabrications face professional sanctions, malpractice exposure, and reputational harm.

Stated as a legal/consequential claim in the abstract; no empirical evidence, case counts, or legal-statistics provided in the excerpt.

high negative When AI output tips to bad but nobody notices: Legal implica... professional sanctions, malpractice exposure, reputational harm

For law in particular, generative AI introduces a perilous failure mode in which the AI fabricates fictitious case law, statutes, and judicial holdings that appear entirely authentic.

Claimed in the paper; supported by the paper's analytic argument and a simulated brief-drafting scenario referenced in the abstract (no numeric sample provided).

high negative When AI output tips to bad but nobody notices: Legal implica... fabrication of legal authorities (authentic-appearing fake citations/holdings)

AI-enabled, democratised production is more likely to intensify competition and produce winner-take-most outcomes than to generate broadly distributed entrepreneurial success.

Synthesised theoretical prediction based on the unified framework (attention scarcity + free-entry dilution + superstar/preferential attachment dynamics) developed in the paper; no empirical validation provided.

high negative The Economics of Builder Saturation in Digital Markets prevalence of broadly distributed entrepreneurial success versus concentration

When the framework is extended to include quality heterogeneity and reinforcement dynamics, equilibrium outcomes exhibit declining average payoffs.

Analytical extension of the baseline formal model to incorporate heterogeneous quality and reinforcement (preferential attachment) dynamics; theoretical derivation in the paper; no empirical sample.

high negative The Economics of Builder Saturation in Digital Markets average payoffs to producers

In markets with near-zero marginal costs and free entry, increases in the number of producers dilute average attention and returns per producer.

Formal theoretical model introduced in the paper (Builder Saturation Effect) that assumes near-zero marginal costs, free entry, and finite human attention; no empirical sample or experimental data reported.

high negative The Economics of Builder Saturation in Digital Markets average returns per producer

Agent memories currently remain private and non-transferable because there is no way to validate their value.

Descriptive assertion in the paper about current state of agent memories; no empirical survey or measurement cited.

high negative Infrastructure for Valuable, Tradable, and Verifiable Agent ... transferability and marketability of agent memories under current conditions

Insufficient organizational resources significantly inhibit AI adoption in procurement (β = -0.19, p < 0.05).

Same questionnaire survey (n=326) and multiple linear regression analysis; reported coefficient β=-0.19 with p<0.05.

high negative Research on the Adoption of Artificial Intelligence and Proc... AI adoption in procurement

Measuring only technical model performance (such as predictive accuracy) is insufficient for assessing the strategic impact of AI in drug discovery.

Argued in the paper as a critique of current evaluation practices; presented as a conceptual point rather than supported by new empirical data in the excerpt.

high negative Strategic Key Performance Indicators for AI in Lead Optimiza... adequacy of technical model performance metrics for capturing strategic impact

Pressure remains high to increase the probability of success to improve the effectiveness of pharmaceutical R&D.

Asserted in the paper as motivational context for the work; framed as an industry pressure point rather than backed by a specific empirical sample or quantified survey in the excerpt.

high negative Strategic Key Performance Indicators for AI in Lead Optimiza... probability of success in pharmaceutical R&D

« Prev 1 2 3 … 49 50 51 … 283 284 Next »