Evidence (6869 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Governance Remove filter

Decoys contribute to the network-making power that is at the heart of the Project's extraction and exploitation.

Theoretical synthesis and interpretive argument grounded in literature across relevant fields; the paper posits a mechanism (decoys → strengthened networks → increased extraction/exploitation) but provides no empirical quantification.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... network-making power and related extraction/exploitation

Decoys often create the illusion of accountability while masking the emerging political economies that the Project of AI has set into motion.

Conceptual critique supported by literature from communication, STS, and economic sociology; argument that particular practices/instruments function rhetorically to appear accountable while obscuring material political economy. No empirical sample or quantified measures reported.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... perceived accountability versus actual visibility of political economy

As AI funders and developers expand their access to resources and configure sociotechnical conditions, they benefit from decoys that animate scholars, critics, policymakers, journalists, and the public into co-constructing industry-empowering AI futures.

Theoretical analysis and literature review; paper identifies and interprets how discursive and institutional phenomena (termed 'decoys') function to produce consent and co-construction of industry-aligned futures. No empirical sample size provided.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... co-construction of industry-empowering AI futures by multiple societal actors

Those who fund and develop AI systems operate through and seek to sustain networks of power and wealth.

Conceptual argument and literature synthesis drawing on communication studies, science & technology studies (STS), and economic sociology; no empirical sample reported.

high negative Reckoning with the Political Economy of AI: Avoiding Decoys ... operation and maintenance of networks of power and wealth by AI funders/develope...

Current attack policies do not saturate LinuxArena (human-crafted attacks evade monitors at substantially higher rates than model-generated attacks, indicating headroom for attackers).

Empirical observation comparing human-crafted attacks (LaStraj) and elicited model-generated attacks; authors interpret higher human evasion rates as evidence that current automated attack policies have not saturated the challenge posed by LinuxArena.

high negative LinuxArena: A Control Setting for AI Agents in Live Producti... relative performance gap between human-crafted and model-generated attacks (impl...

LaStraj is a dataset of human-crafted attack trajectories that evade monitors at substantially higher rates than any model-generated attacks we elicited.

Authors release LaStraj and report empirical comparisons showing human-crafted trajectories evade monitors at higher rates than the model-generated attacks they tested (exact evasion rates and sample sizes not provided in the excerpt).

high negative LinuxArena: A Control Setting for AI Agents in Live Producti... monitor evasion rate of human-crafted attack trajectories versus model-generated...

Against a GPT-5-nano trusted monitor at a 1% step-wise false positive rate, Claude Opus 4.6 achieves roughly a 23% undetected sabotage success rate.

Empirical sabotage evaluation reported by the authors: monitoring a trusted monitor (GPT-5-nano) at a specified step-wise false positive rate and reporting attacking model (Claude Opus 4.6) undetected success rate. (Sample size / number of evaluated runs not provided in the excerpt.)

high negative LinuxArena: A Control Setting for AI Agents in Live Producti... undetected sabotage success rate (attacker success despite monitoring)

Prior research has focused mainly on functional or behavioral alignment rather than moral alignment.

Asserted as a characterization of the literature in the paper (literature-review / conceptual claim; no empirical sampling or quantitative synthesis reported in the supplied text).

high negative Smart But Not Moral? Moral Alignment In Human-AI Decision-Ma... focus/themes of prior AI alignment research

AI can exacerbate occupational polarization, digital exclusion, and discriminatory outcomes when models are trained on biased data or deployed without transparency and accountability.

Thematic synthesis across included studies identifying mechanisms (biased training data, lack of transparency/accountability) linked to negative distributional outcomes (occupational polarization, digital exclusion, discrimination).

high negative Artificial Intelligence in the Labor Market: Evidence on Wor... distributional and equity outcomes (polarization, exclusion, discrimination)

Even explicitly aligned agents exhibit intrinsic biases toward certain ethical frameworks, consistent with known left-leaning tendencies in large language models.

Empirical observation in the alignment-conditioned agents' choices and reasoning frameworks in the triage experiments; authors relate these observations to prior literature on LLM political/ideological tendencies.

high negative Beyond Arrow's Impossibility: Fairness as an Emergent Proper... intrinsic alignment bias (preference for certain ethical frameworks / ideologica...

While achieving financial autonomy, firms are also getting exposed to new constraints by shifting their reliance on third-party software, technological infrastructures and opaque algorithms (Gaviyau & Godi, 2025; Suhrab et al., 2026).

Stated with citations to Gaviyau & Godi (2025) and Suhrab et al. (2026); presented as an observed/paraphrased risk or unintended consequence in the paper. No empirical sample details in the excerpt.

high negative Re-Evaluation of Resource Dependence in AI Enabled SME Finan... increased reliance/dependency on third-party technology and opaque algorithms (n...

SMEs are suffering from various financial constraints, mostly relying heavily on traditional financial institutions for their survival (Kadzima et al., 2025).

Statement supported by citation to Kadzima et al. (2025); presented as a literature-supported empirical generalization in the paper's background/introduction. No sample size or empirical details given in the excerpt.

high negative Re-Evaluation of Resource Dependence in AI Enabled SME Finan... financial constraints / reliance on traditional financial institutions

Large language models remain confined to linguistic simulation rather than grounded understanding.

Conceptual assertion in the paper arguing limits of current models; no empirical tests or measurements reported.

high negative Governing Reflective Human-AI Collaboration: A Framework for... grounded_understanding (absence thereof)

Fluency is not reliability: without structures that stabilise both human and model reasoning, AI cannot be trusted or governed where it matters most.

Central thesis/claim of the paper; normative argument synthesising the paper's observations and proposals rather than an empirically tested finding provided here.

high negative The Missing Knowledge Layer in AI: A Framework for Stable Hu... trustworthiness/governability of AI in high-stakes contexts

Humans often mistake fluency for reliability: when a model responds smoothly, users tend to trust it, even when both model and user are drifting together.

Behavioral/psychological assertion in the paper referencing human interaction patterns with fluent outputs; no experimental data or sample size reported in this paper excerpt.

high negative The Missing Knowledge Layer in AI: A Framework for Stable Hu... user trust in model outputs

LLMs produce fluent outputs even when their internal reasoning has drifted; a confident answer can conceal uncertainty, speculation, or inconsistency, and small changes in phrasing can lead to different conclusions.

Conceptual/observational claim presented in the paper; no original empirical test or sample size reported here.

high negative The Missing Knowledge Layer in AI: A Framework for Stable Hu... reliability/consistency of model outputs (decision quality)

Stronger reasoning capabilities do not prevent LLMs from defecting in single-shot social dilemmas (i.e., models defect with or without reasoning enabled).

Authors' experiments that explicitly compared model behavior with reasoning enabled vs disabled in single-shot social dilemmas; details not provided in the excerpt.

high negative CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... cooperation/defection rates conditional on reasoning capability being enabled

Repetition-induced cooperation deteriorates drastically when co-players vary.

Authors' experimental observation comparing repeated-game cooperation under fixed vs varying co-players in their study; no quantitative metrics or sample sizes provided in the excerpt.

high negative CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... cooperation level under repeated interactions when co-players vary

Our experiments show that recent models — with or without reasoning enabled — consistently defect in single-shot social dilemmas.

Authors' experimental results comparing recent LLMs in single-shot social dilemma games, with reasoning enabled vs disabled; specific models, number of games, and sample sizes are not provided in the excerpt.

high negative CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... rate of defection (vs cooperation) in single-shot social dilemmas

Recent works report that LLMs with stronger reasoning capabilities behave less cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings.

Statement referencing prior literature (recent works) summarized in the paper's introduction/background; no specific dataset or sample size given in the excerpt.

high negative CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... cooperative behavior in mixed-motive games (e.g., prisoner's dilemma, public goo...

Most Sub-Saharan African states still lack the institutional frameworks needed to turn these innovations into sustainable development.

Comparative policy analysis stated in the paper; no quantitative sample size or formal survey data reported in the excerpt.

high negative A Framework for Sovereign AI Governance and Economic Growth ... presence/absence of institutional frameworks enabling AI-driven sustainable deve...

Existing competition-aware CFL and incentive-design approaches reward organizations based on marginal training contributions but fail to account for the costs of strengthening competitors.

Literature critique and comparison in the paper; theoretical discussion rather than a reported empirical trial or sample.

high negative Cooperate to Compete: Strategic Data Generation and Incentiv... adequacy_of_incentive_design (accounting for competitor-strengthening costs)

Non-IID data amplifies this coopetition dilemma by producing asymmetric learning gains across organizations and undermining sustained participation.

Conceptual claim supported by the paper's theoretical modeling and later experiments (described as 'non-IID data' experiments); no numeric sample size given in abstract.

high negative Cooperate to Compete: Strategic Data Generation and Incentiv... asymmetry_in_learning_gains / sustained_participation

Cross-silo federated learning (CFL) deployments in data-sensitive domains are inherently coopetitive: organizations cooperate during model training while competing in downstream markets, so training contributions can inadvertently strengthen rivals.

Conceptual argument and literature motivation presented in the paper's introduction; no empirical sample size reported.

high negative Cooperate to Compete: Strategic Data Generation and Incentiv... strengthening_of_rivals / participation incentives

This mismatch makes it difficult to predict post-deployment success and obscures competitive effects such as early-adoption advantages and market dominance.

Argument in paper linking limitations of current evaluation methods to inability to predict deployment outcomes; conceptual claim without empirical demonstration in the abstract.

high negative Evaluation of Agents under Simulated AI Marketplace Dynamics predictability of post-deployment success and visibility of competitive effects ...

Evaluation is still largely conducted on static benchmarks with accuracy-focused measures that assume systems operate in isolation.

Statement in paper critiquing prevailing evaluation practice; presented as a general observation without cited systematic review or quantitative evidence in the abstract.

high negative Evaluation of Agents under Simulated AI Marketplace Dynamics evaluation practice (use of static accuracy-focused benchmarks)

Infrastructure constraints, particularly in developing countries, limit AI adoption in auditing.

Thematic analysis of reviewed articles noting infrastructure limitations (e.g., ICT infrastructure) in developing-country contexts.

high negative Implementing Artificial Intelligence in Auditing: A Systemat... infrastructure constraints affecting AI adoption

Limitations in auditor competencies (skills and training) hinder effective AI adoption in auditing.

Thematic findings across the sample of articles report auditor competency gaps as a challenge to AI implementation.

high negative Implementing Artificial Intelligence in Auditing: A Systemat... auditor competencies / skill gaps

Ethical and data privacy concerns are persistent challenges to AI implementation in auditing.

Recurring theme in the reviewed literature identified via thematic analysis; papers cite ethics and privacy as obstacles.

high negative Implementing Artificial Intelligence in Auditing: A Systemat... ethical and data privacy concerns as barriers

Several challenges persist for AI adoption in auditing, including high technology investment costs.

Thematic analysis of barriers reported across the 15 articles highlighting cost as a recurrent challenge.

high negative Implementing Artificial Intelligence in Auditing: A Systemat... barrier: technology investment costs to AI adoption

Early iterations suffered severe execution decay.

Reported observation from the longitudinal study describing early-phase performance problems (qualitative; no quantitative metric in the excerpt).

high negative OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Al... execution decay (degradation of execution/performance in early iterations)

Execution-based environments suffer from adversarial 'Test Evasion' by unconstrained agents.

Stated assertion in the paper's motivation/abstract; presented as a limitation of execution-based evaluation (no empirical sample size or experiment details provided in the excerpt).

high negative OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Al... test evasion (agents adversarially bypassing execution-based tests)

Current paradigms, such as Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF), frequently induce model sycophancy.

Stated assertion in the paper's motivation/abstract; presented as a limitation of existing alignment paradigms (no empirical sample size or experiment details provided in the excerpt).

high negative OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Al... model sycophancy (agents producing sycophantic behaviour)

Agent frameworks infer authority conversationally, reconstruct accountability from logs, and produce silent errors: incorrect determinations that execute without any human review signal.

Statement/argument in paper describing failure modes of general-purpose agent frameworks; no empirical sample or experiment reported for this claim in the excerpt.

high negative Governed Reasoning for Institutional AI occurrence of silent errors (incorrect determinations executing without human-re...

Findings suggest that previous results relying on attitudinal outcomes may generalize poorly to behaviour, and therefore risk substantially mischaracterizing the real-world behavioural impact of AI persuasion.

Interpretation/conclusion based on the paper's empirical results: discrepancy between attitudinal effects and behavioural effects observed in the preregistered experiments.

high negative Artificial intelligence can persuade people to take politica... generalizability of attitudinal findings to real-world behavior

A foreign state actor threat model for enterprise identity governance establishing that Silk Typhoon, Salt Typhoon, Volt Typhoon, and North Korean AI-enhanced identity fraud operations have already operationalized AI identity vulnerabilities as active attack vectors.

Paper claims to provide a threat model and asserts these named actors have operationalized AI identity vulnerabilities; stated grounding implied to be threat intelligence and incident analysis, though not detailed in the excerpt.

high negative Who Governs the Machine? A Machine Identity Governance Taxon... operationalization of AI identity vulnerabilities by named foreign actor groups

Nation-state actors including Silk Typhoon and Salt Typhoon have operationalized ungoverned machine credentials as primary espionage vectors against critical infrastructure.

Asserted in paper and described as grounded in threat intelligence; no specific threats, incidents, or data described in the excerpt.

high negative Who Governs the Machine? A Machine Identity Governance Taxon... use of ungoverned machine credentials by nation-state actors for espionage again...

A single ungoverned automated agent produced $5.4-10 billion in losses in the 2024 CrowdStrike outage.

Statement in paper attributing a $5.4-10B loss to an ungoverned automated agent during the 2024 CrowdStrike outage; no citation or method shown in excerpt.

high negative Who Governs the Machine? A Machine Identity Governance Taxon... financial losses caused by an ungoverned automated agent in the 2024 CrowdStrike...

No integrated framework exists to govern machine identities (AI agents, service accounts, API tokens, automated workflows).

Asserted in paper as a gap in existing governance frameworks; no empirical test or survey reported in the excerpt.

high negative Who Governs the Machine? A Machine Identity Governance Taxon... existence of an integrated governance framework for machine identities

Automated agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1.

Statement in paper (asserted prevalence); no sample size or data source provided in the excerpt.

high negative Who Governs the Machine? A Machine Identity Governance Taxon... number of machine identities relative to human identities in enterprise environm...

Major methodological risks include overfitting, regime instability, interpretability deficits, and institutional dependence.

Critical evaluation within the review identifying key methodological risks across the surveyed streams (conceptual assessment; no empirical estimate provided).

high negative Artificial Intelligence in Financial Decision-Making presence of methodological risks in AI applications to finance

The literature remains fragmented across at least three partially connected domains: financial time-series forecasting, portfolio construction, and firm-level sustainability analysis.

Author's characterization of the existing literature in the review (synthesis of published work; no single empirical sample; survey-based statement).

high negative Artificial Intelligence in Financial Decision-Making degree of fragmentation/disciplinary separation in the literature

Persistent challenges to AI implementation include resistance to change, data quality limitations, and concerns regarding transparency and algorithmic bias.

Recurring barriers identified across the 27 included studies, summarized in the review's findings.

high negative Artificial Intelligence for Business Decision-Making in Lati... implementation barriers (resistance, data quality, transparency, bias)

AI infrastructure owners may command more wealth and capability than most governments, threatening the future viability or authority of the nation-state.

Futuristic projection based on the paper's modeling and synthesis of wealth/capability concentration under AI; no empirical measures or comparative data versus governments provided in the excerpt.

high negative A Framework for Understanding the Convergence of Geopolitica... relative wealth and capability of AI infrastructure owners vs. governments; impa...

Universal Basic Income (UBI), evaluated through incentive-structure lens, will default to a pacification mechanism rather than a genuine solution in the absence of a revolutionary threat that historically forced redistribution.

Normative and theoretical analysis of incentive structures and historical mechanisms of redistribution; the excerpt presents this as an argument rather than reporting empirical trials or quantified outcomes.

high negative A Framework for Understanding the Convergence of Geopolitica... policy effect of UBI (pacification vs. genuine redistribution/solution)

Unlike previous feudal orders, this one may prove uniquely resistant to revolution because the mechanisms of enforcement (autonomous weapons, AI surveillance, algorithmic propaganda) do not require human cooperation and therefore cannot be undermined by human dissent.

Logical and theoretical claim based on characteristics of AI-enabled enforcement technologies; presented as an argument rather than an empirically tested finding in the excerpt.

high negative A Framework for Understanding the Convergence of Geopolitica... resistance of a future authoritarian/feudal order to revolution due to autonomou...

Under this emerging order, the vast majority of humanity will lose their political leverage.

Theoretical and historical argument linking concentration of infrastructure control to political disempowerment; no empirical metrics or sample size provided in the excerpt.

high negative A Framework for Understanding the Convergence of Geopolitica... political leverage of the majority

Under this emerging order, the vast majority of humanity will lose their labor value.

Claim made via theoretical argument about automation and AI replacing labor value; no quantitative empirical evidence or sample detailed in the excerpt.

high negative A Framework for Understanding the Convergence of Geopolitica... labor value of the majority (economic value of human labor)

This structural transformation could stabilize into a neo-feudal equilibrium in which a vanishingly small class of infrastructure owners wields power comparable to pre-Enlightenment monarchs.

Futuristic projection and normative/historical analogy based on conceptual modeling of class structure under AGI; the excerpt gives no empirical data or formal model outputs.

high negative A Framework for Understanding the Convergence of Geopolitica... emergence of a neo-feudal equilibrium with extreme concentration of political/ec...

The convergence of geopolitical fragmentation (democratic decline) and AI-driven economic concentration is producing a structural transformation unprecedented in human history.

Theoretical synthesis and historical comparison; the paper presents this as an argument based on conceptual modeling and historical analogy; no specific empirical test or sample noted in the excerpt.

high negative A Framework for Understanding the Convergence of Geopolitica... structural transformation of political-economic order

« Prev 1 2 3 … 20 21 22 … 137 138 Next »