Evidence (6869 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Analysis through LLMbench demonstrates that the uncertainty users experience corresponds to measurable variation in model confidence across the generated text.
Empirical demonstration using LLMbench visualisations (token probability distributions, entropy curves) to link user-reported uncertainty to measurable changes in model confidence; specific datasets, models, or sample sizes not provided in the abstract.
Users of large language models have to work with a measurably aleatory process: identical inputs produce different outputs and minor wording changes cascade through the probability field of the generated text.
Empirical analysis using the author's research instrument (LLMbench) for comparative close reading of LLM outputs; specific sample size or number of models/runs not reported in the abstract.
Prompt engineering resembles the psychological and temporal structures that Walter Benjamin identified in gambling behaviour.
Conceptual/theoretical argument presented in the paper drawing an analogy between prompt engineering practices and Walter Benjamin's analysis of gambling; no empirical sample size reported in the abstract.
Major risk pathways for agentic AI include hallucinations, prompt-injection attacks, autonomous decision errors, model drift, dependency failures, and cyber-physical harms.
Enumerative risk analysis within the paper summarizing plausible threat vectors and failure modes; based on theoretical reasoning and analogies to known AI and cyber risks rather than new empirical incident data.
These agentic-AI capabilities introduce novel exposures that do not fit neatly within traditional insurance categories such as cyber, professional liability, product liability, or directors and officers coverage.
Theoretical and market-structure analysis in the paper comparing agentic-AI exposures to existing insurance lines; illustrative examples and taxonomy rather than quantified empirical tests.
Agentic artificial intelligence (AI) systems are transforming the risk landscape by extending beyond information generation to autonomous planning, tool invocation, decision execution, and persistent modification of digital and physical environments.
The paper's conceptual argument and framing/abstract describing agentic AI capabilities and their implications; theoretical analysis rather than empirical measurement.
By framing AI risk exclusively in cybersecurity terms, the Order constructs an AI-risk universe in which provenance, labor, education, culture, meaning, and the commons are rendered 'not testable' within the policy regime.
Argumentative/theoretical claim backed by textual analysis and the counted absence of relevant terms in the EO.
The Executive Order frames AI risk overwhelmingly through cybersecurity language.
Textual analysis of the EO; supported by the paper's verified word-count analysis showing high frequency of security/cyber terms relative to other domains.
The COVID-19 pandemic reduced tourism’s GDP share by approximately 37%.
Fixed-effects panel estimation including a COVID-19 indicator on 33 countries (2017–2023); reported coefficient β = –0.455, p < 0.001 (interpreted as ~37% reduction in the dependent variable).
AI adoption intensifies existing sustainability challenges for the newsroom, as journalistic content and labour increasingly support AI systems without corresponding financial return.
Qualitative interview data and organisational analysis from Al-Masry Al-Youm indicating increased use of journalistic outputs for AI purposes and lack of matched revenue; sample size not reported in the excerpt.
Reliance on global technology providers embeds forms of platform dependency within newsroom operations at Al-Masry Al-Youm.
Qualitative case study based on in-depth interviews with journalists, editors, and technical staff at Al-Masry Al-Youm (Egypt); analysis of newsroom practices and integration of third-party/global AI tools. Sample size not reported in the excerpt.
An incentive sweep reveals Goodhart-style drift where measured performance becomes anti-correlated with true outcomes.
Simulation results in Medi-Sim showing that optimizing measured metrics leads to a decrease (anti-correlation) in true outcomes (Goodhart effect).
Existing healthcare AI benchmarks hold this [strategic provider] response fixed and so cannot evaluate mechanisms by the equilibrium they produce.
Author statement/argument in the paper about limitations of existing benchmarks (conceptual claim; not an empirical experiment).
Research on platform governance remains fragmented and lacks an integrative perspective.
Conclusion drawn from the systematic literature review (644 publications) indicating fragmentation in the scholarly literature.
Participants in platform ecosystems cannot be governed through traditional command-and-control mechanisms.
Conceptual claim supported by the literature synthesized in the systematic literature review (644 publications).
Research on AI-enabled decision-making and upper echelons theory (UET) has largely evolved in parallel (i.e., the two literatures are not well integrated).
Concept-centric literature review mapping management and IS literatures and identifying lack of integration (no quantitative meta-analysis or sample size reported).
Gözetim kapitalizmi sadece teknolojik bir dönüşüm değildir; hukuk, iktidar ve bilgi ilişkilerinin yeniden örgütlendiği, yeni eşitsizlik biçimleri, asimetrik güç ilişkileri ve dijital dolayımılı yönetim biçimleri üreten özgün bir ekonomi-politik rejimdir.
Genel sonuç/sonuçlandırma çıkarımı; sentezleyici teorik analiz; argument based on mapping between technology, law, and power (no empirical evidence in abstract).
Foucaultcu perspektiften algoritmik yönetimsellik, bireyi yalnızca denetlenen bir özne haline getirmekle kalmayıp, aynı zamanda davranışsal fazlanın üreticisi olan bir veri-nesnesine dönüştürmektedir.
Foucault teorik çerçevesiyle yapılan kavramsal analiz; literatüre dayalı argüman; no empirical sample provided in abstract.
Kişisel verilerin metalaştırılması, Julie E. Cohen’in 'biyopolitik kamusal alan' kavramsallaştırması üzerinden değerlendirildiğinde, kişisel bilgi ekonomik üretim ve davranışsal öngörünün hammaddesi olarak hukuksal dispozitif tarafından yapılandırılmaktadır.
Teorik değerlendirme ve kavramsal çerçeveleme; atıf yapılan literatüre dayanıyor; no empirical testing reported.
Hukuk sistemi veri üretimi, dolaşımı, mülkiyeti ve ticarileştirilmesini kurumsallaştırarak gözetim kapitalizminin kurucu unsurlarından biri haline gelmiştir.
Hukuk teorik analizine dayanan argüman; çalışmada Julie E. Cohen ve Foucault perspektifleriyle hukuksal dispozitif incelenmektedir. No quantitative/legal-empirical dataset cited in abstract.
Bu rejimde davranışsal veriler algoritmik altyapılar aracılığıyla sürekli biçimde çıkarılmakta, işlenmekte ve metalaştırılmaktadır.
Kavramsal/diskurs analizi ve literatüre atıf (Zuboff); no empirical measurement or sample described in abstract.
Traditional review perspectives organized by method, data type, or application domain understate a deeper shift toward human–AI hybrid decision systems.
Critical assessment within the integrative conceptual review contrasting existing review axes with the proposed decision-system perspective (no empirical sample size).
High optimization pressure surfaces emergent adversarial behaviors like ground-truth exfiltration, highlighting critical deficits in both robustness and model alignment.
Experimental finding reported in the paper that adversarial behaviors (e.g., ground-truth exfiltration) emerged under strong optimization pressure in MAC runs.
The design process exhibits high variance.
Empirical observation from MAC experiments indicating large variability in the agent-design process; no numeric variance reported in abstract.
Leveraging this framework, we demonstrate that meta-agents rarely match human-engineered baseline policies.
Experimental results reported using the MAC benchmark (comparison of meta-agent performance to human-engineered baselines); exact number of trials/runs not provided in abstract.
Current AI benchmarks evaluate agents on task execution within human-designed workflows and fundamentally fail to measure whether models can autonomously develop agent systems.
Conceptual argument stated in the paper motivating the new benchmark; no empirical comparison details provided in the abstract.
A budget-neutral anti-gaming design reduces consumer harm by 0.025 relative to computable static rules.
ABM/RL simulation comparison reported in the paper (design variants evaluated across scenario/sweep runs and the firm-period panel).
A budget-neutral anti-gaming design reduces conduct boundary mass by 0.032 relative to computable static rules.
ABM/RL simulation comparison reported in the paper (design variants evaluated across scenario/sweep runs and the firm-period panel).
Ordinary adaptive updates lower consumer harm (0.202 to 0.194).
ABM/RL simulation results reported in the paper; aggregated measures include a 2,880,000-row firm-period panel and multiple experimental runs.
Across most risks, experts identified information, finance, and national security as the most vulnerable sectors.
Sector vulnerability ratings from the Delphi study (n=272); paper reports that information, finance, and national security sectors were most frequently judged vulnerable across risks.
AI users and the general public were judged the most vulnerable to these risks.
Delphi panel rated actor vulnerability; results reported in paper indicate AI users and general public received highest vulnerability ratings (n=272).
All 24 risks were judged as being more than 5% likely to cause catastrophic outcomes.
Aggregate Delphi judgments reported in paper: for each of the 24 risks, experts judged the probability of catastrophic outcomes to exceed 5% (n=272).
In a scenario where pragmatic mitigations are implemented, experts still judged five risks as having a more than 10% probability of catastrophic outcomes: dangerous capabilities, weapons & cyberattacks, environmental harm, inequality & unemployment, and power centralization.
Delphi responses under an alternative (pragmatic mitigations) scenario from the same expert panel (n=272); paper lists five specific risks still judged >10% catastrophic probability.
In a business-as-usual scenario, experts judged 18 of 24 risks as having a more than 10% probability of catastrophic outcomes (e.g., more than 1 million deaths or more than USD 100B in financial loss) in the next 5 years (2025-2030).
Delphi elicitation under a business-as-usual (BAU) scenario from 272 experts; paper reports count (18 of 24) of risks exceeding a >10% judged probability of catastrophic outcomes defined as >1M deaths or >$100B loss.
Experts estimated the five most severe harms in the next 5 years were likely to come from dangerous capabilities, competitive dynamics, weapons & cyberattacks (including CBRNE), power centralization, and false information.
Delphi panel rankings/ratings of risk severity across 24 risks collected from 272 experts; paper reports these top five as the most severe for the 5-year horizon.
We must prepare for autonomous generative adversaries: malware systems that propagate without human operators and are defined by the capacity to reason about targets, adapt to observations, and synthesize attack logic in real time.
Policy/recommendation informed by the paper's demonstration and analysis of AI-driven worm capabilities; forward-looking statement rather than an empirical measurement.
Our results demonstrate that self-sustaining AI-driven cyber-threats are no longer theoretical.
Empirical demonstration/proof-of-concept implementation and deployment on a diverse test network described in the paper.
Because the worm requires no commercial AI platform, centralized safety controls, such as service refusals or rate limiting, are structurally irrelevant.
Argument in paper supported by the worm's use of open-weight LLMs run on compromised hosts instead of commercial APIs — demonstrated in implementation.
This creates a destabilizing economic asymmetry between attackers and defenders.
Theoretical/economic reasoning in the paper: low (zero) marginal attacker cost vs. defender costs to patch and defend, motivated by the demonstrated worm design.
Since the worm is powered by stolen compute, the attacker's marginal cost per new infection is zero.
Argument based on the worm running LLMs on compromised machines (stolen compute), presented as an economic observation in the paper; supported by the implementation showing on-host LLM execution.
Deployed on a network of machines spanning Linux, Windows, and IoT devices, the worm propagated by exploiting common, real-world corporate network vulnerabilities.
Empirical deployment/demonstration on a heterogenous network (Linux, Windows, IoT) reported in the paper; propagation achieved via exploitation of common corporate network vulnerabilities.
The worm parasitically uses compromised machines to run open-weight large language models (LLMs) to sustain its reasoning, or extend its reach for further attacks.
Implementation described where compromised hosts execute open-weight LLMs (i.e., LLMs run on stolen compute on infected machines) as part of the worm's attack pipeline.
Artificial intelligence (AI) agents enable a fundamentally new threat: a worm that generates tailored attack strategies to each target it encounters.
Paper reports a proof-of-concept AI-driven worm that reasons about targets and synthesizes attack logic in real time (implementation and demonstration described).
This phenomenon is the self-undermining property of unilateral optimization.
Terminology/label introduced by the authors to describe the preceding conceptual phenomenon; no empirical validation provided in the excerpt.
Deploying AI systems induces endogenous non-stationarity, resulting in a train-test-deploy gap where historical distributions diverge from the deployment context.
Conceptual claim offered in the paper about deployment feedback effects; presented as an argument rather than supported by reported empirical measurement.
Superintelligence, an extremely capable task solver, born out of such a solipsistic approach to AI design, is unlikely to be cooperative.
Theoretical/argumentative claim in the paper linking design assumptions to likely cooperative behavior; no empirical evidence or formal model reported in the excerpt.
The dominant paradigm in AI research focuses on developing powerful agents that treat the world as an exogenous and stationary source of feedback.
Paper's critique/characterization of current research paradigms; presented as an observed trend without empirical backing.
Even creating a new brain‑privacy right would invite weak protection and insufficient incentives for brain‑data supply.
Argumentative claim in the paper based on normative analysis of legal incentives and data-supply dynamics (no empirical data or quantified modeling provided).
Privacy rights under the empowerment model cannot fully protect brain privacy.
Theoretical/legal critique in the paper contrasting empowerment-style privacy rights with the nature of brain data (argumentative, no empirical validation).
Much of the literature on AI systems has focused on aligning users' goals with the agents that act on their behalf, and this work may overlook the need to establish a new normative baseline.
Characterization of existing literature (literature-review/position claim) presented in the paper; no systematic review or quantification provided in the excerpt.