Evidence (6869 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Decoys contribute to the network-making power that is at the heart of the Project's extraction and exploitation.
Theoretical synthesis and interpretive argument grounded in literature across relevant fields; the paper posits a mechanism (decoys → strengthened networks → increased extraction/exploitation) but provides no empirical quantification.
Decoys often create the illusion of accountability while masking the emerging political economies that the Project of AI has set into motion.
Conceptual critique supported by literature from communication, STS, and economic sociology; argument that particular practices/instruments function rhetorically to appear accountable while obscuring material political economy. No empirical sample or quantified measures reported.
As AI funders and developers expand their access to resources and configure sociotechnical conditions, they benefit from decoys that animate scholars, critics, policymakers, journalists, and the public into co-constructing industry-empowering AI futures.
Theoretical analysis and literature review; paper identifies and interprets how discursive and institutional phenomena (termed 'decoys') function to produce consent and co-construction of industry-aligned futures. No empirical sample size provided.
Those who fund and develop AI systems operate through and seek to sustain networks of power and wealth.
Conceptual argument and literature synthesis drawing on communication studies, science & technology studies (STS), and economic sociology; no empirical sample reported.
Current attack policies do not saturate LinuxArena (human-crafted attacks evade monitors at substantially higher rates than model-generated attacks, indicating headroom for attackers).
Empirical observation comparing human-crafted attacks (LaStraj) and elicited model-generated attacks; authors interpret higher human evasion rates as evidence that current automated attack policies have not saturated the challenge posed by LinuxArena.
LaStraj is a dataset of human-crafted attack trajectories that evade monitors at substantially higher rates than any model-generated attacks we elicited.
Authors release LaStraj and report empirical comparisons showing human-crafted trajectories evade monitors at higher rates than the model-generated attacks they tested (exact evasion rates and sample sizes not provided in the excerpt).
Against a GPT-5-nano trusted monitor at a 1% step-wise false positive rate, Claude Opus 4.6 achieves roughly a 23% undetected sabotage success rate.
Empirical sabotage evaluation reported by the authors: monitoring a trusted monitor (GPT-5-nano) at a specified step-wise false positive rate and reporting attacking model (Claude Opus 4.6) undetected success rate. (Sample size / number of evaluated runs not provided in the excerpt.)
Prior research has focused mainly on functional or behavioral alignment rather than moral alignment.
Asserted as a characterization of the literature in the paper (literature-review / conceptual claim; no empirical sampling or quantitative synthesis reported in the supplied text).
AI can exacerbate occupational polarization, digital exclusion, and discriminatory outcomes when models are trained on biased data or deployed without transparency and accountability.
Thematic synthesis across included studies identifying mechanisms (biased training data, lack of transparency/accountability) linked to negative distributional outcomes (occupational polarization, digital exclusion, discrimination).
Even explicitly aligned agents exhibit intrinsic biases toward certain ethical frameworks, consistent with known left-leaning tendencies in large language models.
Empirical observation in the alignment-conditioned agents' choices and reasoning frameworks in the triage experiments; authors relate these observations to prior literature on LLM political/ideological tendencies.
While achieving financial autonomy, firms are also getting exposed to new constraints by shifting their reliance on third-party software, technological infrastructures and opaque algorithms (Gaviyau & Godi, 2025; Suhrab et al., 2026).
Stated with citations to Gaviyau & Godi (2025) and Suhrab et al. (2026); presented as an observed/paraphrased risk or unintended consequence in the paper. No empirical sample details in the excerpt.
SMEs are suffering from various financial constraints, mostly relying heavily on traditional financial institutions for their survival (Kadzima et al., 2025).
Statement supported by citation to Kadzima et al. (2025); presented as a literature-supported empirical generalization in the paper's background/introduction. No sample size or empirical details given in the excerpt.
Large language models remain confined to linguistic simulation rather than grounded understanding.
Conceptual assertion in the paper arguing limits of current models; no empirical tests or measurements reported.
Fluency is not reliability: without structures that stabilise both human and model reasoning, AI cannot be trusted or governed where it matters most.
Central thesis/claim of the paper; normative argument synthesising the paper's observations and proposals rather than an empirically tested finding provided here.
Humans often mistake fluency for reliability: when a model responds smoothly, users tend to trust it, even when both model and user are drifting together.
Behavioral/psychological assertion in the paper referencing human interaction patterns with fluent outputs; no experimental data or sample size reported in this paper excerpt.
LLMs produce fluent outputs even when their internal reasoning has drifted; a confident answer can conceal uncertainty, speculation, or inconsistency, and small changes in phrasing can lead to different conclusions.
Conceptual/observational claim presented in the paper; no original empirical test or sample size reported here.
Stronger reasoning capabilities do not prevent LLMs from defecting in single-shot social dilemmas (i.e., models defect with or without reasoning enabled).
Authors' experiments that explicitly compared model behavior with reasoning enabled vs disabled in single-shot social dilemmas; details not provided in the excerpt.
Repetition-induced cooperation deteriorates drastically when co-players vary.
Authors' experimental observation comparing repeated-game cooperation under fixed vs varying co-players in their study; no quantitative metrics or sample sizes provided in the excerpt.
Our experiments show that recent models — with or without reasoning enabled — consistently defect in single-shot social dilemmas.
Authors' experimental results comparing recent LLMs in single-shot social dilemma games, with reasoning enabled vs disabled; specific models, number of games, and sample sizes are not provided in the excerpt.
Recent works report that LLMs with stronger reasoning capabilities behave less cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings.
Statement referencing prior literature (recent works) summarized in the paper's introduction/background; no specific dataset or sample size given in the excerpt.
Most Sub-Saharan African states still lack the institutional frameworks needed to turn these innovations into sustainable development.
Comparative policy analysis stated in the paper; no quantitative sample size or formal survey data reported in the excerpt.
Existing competition-aware CFL and incentive-design approaches reward organizations based on marginal training contributions but fail to account for the costs of strengthening competitors.
Literature critique and comparison in the paper; theoretical discussion rather than a reported empirical trial or sample.
Non-IID data amplifies this coopetition dilemma by producing asymmetric learning gains across organizations and undermining sustained participation.
Conceptual claim supported by the paper's theoretical modeling and later experiments (described as 'non-IID data' experiments); no numeric sample size given in abstract.
Cross-silo federated learning (CFL) deployments in data-sensitive domains are inherently coopetitive: organizations cooperate during model training while competing in downstream markets, so training contributions can inadvertently strengthen rivals.
Conceptual argument and literature motivation presented in the paper's introduction; no empirical sample size reported.
This mismatch makes it difficult to predict post-deployment success and obscures competitive effects such as early-adoption advantages and market dominance.
Argument in paper linking limitations of current evaluation methods to inability to predict deployment outcomes; conceptual claim without empirical demonstration in the abstract.
Evaluation is still largely conducted on static benchmarks with accuracy-focused measures that assume systems operate in isolation.
Statement in paper critiquing prevailing evaluation practice; presented as a general observation without cited systematic review or quantitative evidence in the abstract.
Infrastructure constraints, particularly in developing countries, limit AI adoption in auditing.
Thematic analysis of reviewed articles noting infrastructure limitations (e.g., ICT infrastructure) in developing-country contexts.
Limitations in auditor competencies (skills and training) hinder effective AI adoption in auditing.
Thematic findings across the sample of articles report auditor competency gaps as a challenge to AI implementation.
Ethical and data privacy concerns are persistent challenges to AI implementation in auditing.
Recurring theme in the reviewed literature identified via thematic analysis; papers cite ethics and privacy as obstacles.
Several challenges persist for AI adoption in auditing, including high technology investment costs.
Thematic analysis of barriers reported across the 15 articles highlighting cost as a recurrent challenge.
Early iterations suffered severe execution decay.
Reported observation from the longitudinal study describing early-phase performance problems (qualitative; no quantitative metric in the excerpt).
Execution-based environments suffer from adversarial 'Test Evasion' by unconstrained agents.
Stated assertion in the paper's motivation/abstract; presented as a limitation of execution-based evaluation (no empirical sample size or experiment details provided in the excerpt).
Current paradigms, such as Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF), frequently induce model sycophancy.
Stated assertion in the paper's motivation/abstract; presented as a limitation of existing alignment paradigms (no empirical sample size or experiment details provided in the excerpt).
Agent frameworks infer authority conversationally, reconstruct accountability from logs, and produce silent errors: incorrect determinations that execute without any human review signal.
Statement/argument in paper describing failure modes of general-purpose agent frameworks; no empirical sample or experiment reported for this claim in the excerpt.
Findings suggest that previous results relying on attitudinal outcomes may generalize poorly to behaviour, and therefore risk substantially mischaracterizing the real-world behavioural impact of AI persuasion.
Interpretation/conclusion based on the paper's empirical results: discrepancy between attitudinal effects and behavioural effects observed in the preregistered experiments.
A foreign state actor threat model for enterprise identity governance establishing that Silk Typhoon, Salt Typhoon, Volt Typhoon, and North Korean AI-enhanced identity fraud operations have already operationalized AI identity vulnerabilities as active attack vectors.
Paper claims to provide a threat model and asserts these named actors have operationalized AI identity vulnerabilities; stated grounding implied to be threat intelligence and incident analysis, though not detailed in the excerpt.
Nation-state actors including Silk Typhoon and Salt Typhoon have operationalized ungoverned machine credentials as primary espionage vectors against critical infrastructure.
Asserted in paper and described as grounded in threat intelligence; no specific threats, incidents, or data described in the excerpt.
A single ungoverned automated agent produced $5.4-10 billion in losses in the 2024 CrowdStrike outage.
Statement in paper attributing a $5.4-10B loss to an ungoverned automated agent during the 2024 CrowdStrike outage; no citation or method shown in excerpt.
No integrated framework exists to govern machine identities (AI agents, service accounts, API tokens, automated workflows).
Asserted in paper as a gap in existing governance frameworks; no empirical test or survey reported in the excerpt.
Automated agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1.
Statement in paper (asserted prevalence); no sample size or data source provided in the excerpt.
Major methodological risks include overfitting, regime instability, interpretability deficits, and institutional dependence.
Critical evaluation within the review identifying key methodological risks across the surveyed streams (conceptual assessment; no empirical estimate provided).
The literature remains fragmented across at least three partially connected domains: financial time-series forecasting, portfolio construction, and firm-level sustainability analysis.
Author's characterization of the existing literature in the review (synthesis of published work; no single empirical sample; survey-based statement).
Persistent challenges to AI implementation include resistance to change, data quality limitations, and concerns regarding transparency and algorithmic bias.
Recurring barriers identified across the 27 included studies, summarized in the review's findings.
AI infrastructure owners may command more wealth and capability than most governments, threatening the future viability or authority of the nation-state.
Futuristic projection based on the paper's modeling and synthesis of wealth/capability concentration under AI; no empirical measures or comparative data versus governments provided in the excerpt.
Universal Basic Income (UBI), evaluated through incentive-structure lens, will default to a pacification mechanism rather than a genuine solution in the absence of a revolutionary threat that historically forced redistribution.
Normative and theoretical analysis of incentive structures and historical mechanisms of redistribution; the excerpt presents this as an argument rather than reporting empirical trials or quantified outcomes.
Unlike previous feudal orders, this one may prove uniquely resistant to revolution because the mechanisms of enforcement (autonomous weapons, AI surveillance, algorithmic propaganda) do not require human cooperation and therefore cannot be undermined by human dissent.
Logical and theoretical claim based on characteristics of AI-enabled enforcement technologies; presented as an argument rather than an empirically tested finding in the excerpt.
Under this emerging order, the vast majority of humanity will lose their political leverage.
Theoretical and historical argument linking concentration of infrastructure control to political disempowerment; no empirical metrics or sample size provided in the excerpt.
Under this emerging order, the vast majority of humanity will lose their labor value.
Claim made via theoretical argument about automation and AI replacing labor value; no quantitative empirical evidence or sample detailed in the excerpt.
This structural transformation could stabilize into a neo-feudal equilibrium in which a vanishingly small class of infrastructure owners wields power comparable to pre-Enlightenment monarchs.
Futuristic projection and normative/historical analogy based on conceptual modeling of class structure under AGI; the excerpt gives no empirical data or formal model outputs.
The convergence of geopolitical fragmentation (democratic decline) and AI-driven economic concentration is producing a structural transformation unprecedented in human history.
Theoretical synthesis and historical comparison; the paper presents this as an argument based on conceptual modeling and historical analogy; no specific empirical test or sample noted in the excerpt.