Evidence (8570 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
Standard health system digital transformation policy, which typically addresses only the threshold failure through individual incentives, is predicted to systematically produce the partial adoption trap.
Model prediction contrasting full policy architecture vs. conventional policies that focus solely on individual incentives; analytical conclusion that such limited policies leave other failure modes unaddressed and therefore lead to stable partial adoption. Theoretical model; no empirical sample.
The barrier-lowering benefit of failed attempts is offset when trust erosion is rapid.
Model analysis combining cost-ratchet dynamics and trust erosion parameters; results showing interaction where fast trust erosion negates barrier reductions. Theoretical simulations/derivations; no empirical sample.
These failure modes are most severe precisely for the technologies with the greatest systemic value: the Value-Adoption Paradox.
Analytical result from the model showing failure-mode severity as a function of systemic value; theoretical identification of a paradox where higher systemic-value technologies face stronger coordination/trust/cultural barriers. Theoretical derivation; no empirical sample.
The basin of attraction of the partial adoption trap is enlarged by a cultural failure arising from negative coordination norms among doctors.
Model analysis including cultural coordination norms; theoretical demonstration that negative norms exacerbate partial adoption equilibria. Theoretical model; no empirical sample.
The basin of attraction of the partial adoption trap is enlarged by a trust failure arising from the organisation's inability to credibly commit to sharing productivity gains.
Model extension incorporating organisational commitment/transfer of gains; analytical results showing trust/commitment constraints increase stability of partial adoption. Theoretical model; no empirical sample.
The basin of attraction of the partial adoption trap is enlarged by a threshold coordination failure arising from the non-appropriable nature of systemic benefits.
Model analysis showing how non-appropriable systemic benefits (externalities) change payoff structure and enlarge the basin of attraction for partial adoption. Theoretical derivation; no empirical sample.
Current monolithic architectures struggle to enforce rigid brand constraints, frequently hallucinating unapproved visual assets.
Asserted critique of existing architectures in paper; no specific empirical metrics, datasets, or sample sizes provided.
Integration of generative video models into enterprise environments is restricted by temporal inconsistencies and severe brand misalignment.
Statement in paper describing deployment limitations; no empirical study, dataset, or sample size provided to quantify these restrictions.
Deterministic copy collapses uncertainty (i.e., copying deterministically collapses the learner's uncertainty over actions).
Ablation/diagnostic comparisons reported in the paper showing deterministic-copy policies reduce or collapse uncertainty compared to stochastic or trace-informed policies in the benchmark tasks.
Reward-only PPO variants miss trace alignment (they achieve reward/KPIs but do not align with benchmark trace/behavior).
Empirical comparison across the two-hotel benchmark and a compact hidden-budget bidding task showing reward-only PPO variants fail to match trace-based diagnostics.
Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance.
Empirical/modeling results from the paper's framework (simulation results using projection models + Azure operational data); the abstract claims material effects but does not report numeric sample sizes or effect sizes in the excerpt provided.
Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix.
Analytic/methodological claim enumerating interacting factors; stated as a complexity motivating the modeling framework.
Power utilization is particularly important as grid power capacity is a scarce resource in the AI era.
Contextual claim in the paper linking increased AI demand to constrained grid power capacity; supported by the paper's framing rather than reported empirical measurements in the abstract.
As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned.
Conceptual/mechanistic claim supported by the paper's modeling framework that examines mismatches between provisioned power and deployed demand; no numeric sample size provided in the abstract.
This poses a major challenge for datacenter power delivery designers.
Argument based on the projected rise in rack power density and resulting engineering constraints; asserted in the paper's introduction/contextual framing rather than an experimental result.
Yapay zekâ gelişmekte olan ekonomiler için hem fırsatlar hem de tehditler yaratmaktadır: AI işgücü maliyeti avantajını törpüleyebilir.
Kavramsal değerlendirme; mekanizma temelli argüman (otomasyon işgücü maliyeti avantajını azaltır); ampirik veri ya da örneklem belirtilmemiştir.
Bu dönüşüm mevcut küresel değer zinciri yapılarını ve ülkelerin bu zincirlerdeki konumlarını doğrudan sorgulamaktadır.
Kavramsal tartışma; yazarın analitik çerçevesiyle GVC (küresel değer zinciri) yapılarının AI ile yeniden değerlendirilebileceği ileri sürülmektedir; ampirik örneklem yok.
Monte Carlo simulations illustrate that standard DID estimators that ignore spillovers can miss the total effect.
Monte Carlo simulation results reported in the paper comparing standard DID estimators (which ignore spillovers) to the proposed approach; simulations show standard DID can fail to capture the total effect under spillovers.
No existing AI system replicates this: conversational recommenders treat recommendation as a terminal act, while general-purpose LLMs hallucinate product claims and default to generic promotional templates that fail to engage or persuade.
Author assertion/diagnosis comparing existing conversational recommenders and general-purpose LLMs; no empirical comparisons or quantified evaluation provided in the excerpt.
Das Dokument untersucht neuere Daten zur Verbreitung von KI in den G7-Volkswirtschaften, die auf große und anhaltende Unterschiede zwischen KMU und großen Unternehmen hindeuten.
Empirical examination of recent diffusion/adoption data across G7 economies as described in the paper; no sample size or specific datasets provided in the excerpt.
Trotz der jüngsten technologischen Fortschritte bei KI-Tools, sind KMU bei der Einführung von KI im Vergleich zu anderen digitalen Technologien und größeren Unternehmen zurückhaltender.
Statement referencing 'neuere Daten zur Verbreitung von KI in den G7-Volkswirtschaften' showing differences between SMEs and large firms; implies empirical analysis of diffusion/adoption data (no sample size given in excerpt).
The analysis also identifies risks linked to exclusion, symbolic compliance, and concentration of control over compliance processes.
Theoretical risk mapping produced by the integrative review and interpretive synthesis; no primary empirical evidence presented.
Uncertainty around compliance and excessive risk avoidance reduce the space for lawful business activity.
Interpretive synthesis of evidence and arguments across the reviewed literatures (sanctions compliance, institutional voids); no original empirical test.
Firms working under such conditions often experience limited access to finance and markets.
Claim derived from literature on firm constraints in weak institutional/sanctioned contexts as reviewed in the paper; no primary empirical data reported.
Post-conflict and sanctions-affected environments are strongly affected by sanctions pressure, weak rule enforcement, and high levels of corruption risk.
Synthesis of literature on sanctions, weak institutions, and corruption risk presented in the integrative review; no new empirical sample reported.
Currently, systematic assessment errors cause owners of lower-valued properties to face disproportionately high tax burdens, creating regressivity in the property tax system.
Empirical analysis of property assessments and tax burdens using 26 million property sales across ~95% of U.S. counties, showing systematic errors that bias tax burdens toward lower-valued properties.
There are limits to technology‑led growth strategies in labor‑abundant contexts; such strategies do not reliably deliver inclusive employment gains.
Argument based on synthesis of theory and comparative field evidence demonstrating weak employment outcomes from technology‑led growth in labor‑abundant settings (no quantitative effect sizes reported).
Digital media play a significant role in shaping youth mobilization and political unrest in migrants' countries of origin.
Empirical observations and regional field evidence reported in the paper linking digital media use to youth mobilization and political outcomes (qualitative/comparative evidence; no numeric sample size provided).
Developing countries face macroeconomic vulnerabilities because of dependence on remittances, which are exposed by automation-driven changes in migrant labor demand.
Analytical linkage developed in the paper supported by comparative field evidence and macroeconomic reasoning; remittance dependence highlighted as a vulnerability (no quantitative estimates or sample sizes reported).
Technology adoption in core industries in advanced economies is linked with labor displacement, rising youth unemployment, and urban labor saturation in South Asia and North Africa.
Geographically grounded framework combined with comparative regional field evidence focused on South Asia and North Africa (qualitative/comparative field data referenced; no numeric sample sizes provided).
AI adoption and accelerating automation amplify employment precarity in labor‑surplus economies.
Conceptual synthesis grounded in economic geography and labor economics, supported by comparative field evidence cited for labor‑surplus contexts (no quantitative sample size reported).
Automation functions as a transnational shock that contracts demand for migrant labor in advanced economies.
Theoretical argument drawing on economic geography, labor economics, and development studies; comparative/regional field evidence referenced in the paper (no numerical sample size reported).
Shifts persist in even the newest AI models despite remarkable progress in AI modeling, post-training alignment and safeguards.
Asserted in paper; supported by later empirical validation across multiple models and production chatbots (see other claims), but no explicit sample size in this sentence.
ChatGPT-like AI behavior can shift, unnoticed, from desirable to undesirable (e.g., encouraging self-harm, extremist acts, financial losses, or costly medical and military mistakes), and no one can yet predict when.
Statement in paper framing the problem; qualitative observations and motivating examples (no numeric sample size provided in the excerpt).
An analysis of a 21-instrument inventory identifies an incentive gradient where geopolitical and industrial pressures systematically reward surface-level behavioral proxies over deep structural verification.
Empirical/qualitative analysis of an inventory of 21 governance instruments compiled and analysed in the paper (n=21 instruments).
Behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify.
The paper's normative and conceptual argument synthesising governance requirements and the epistemic limits of behavioural testing.
Current assurance methodologies (primarily behavioural evaluations and red-teaming) are epistemically limited to observable model outputs and cannot verify latent representations or long-horizon agentic behaviours.
Conceptual/analytic argument and review of existing assurance methodologies presented in the paper.
Policy responses in Europe are fragmented across the EU and Member State levels and do not match the potential scale of disruption from AGI.
Paper's policy analysis of EU- and Member-State-level responses (stated in abstract); no quantitative metrics provided in the abstract.
Europe has low rates of industrial AI adoption.
Paper's empirical/policy review claiming low industrial AI adoption in Europe (as stated in abstract); the abstract does not provide numeric adoption rates or sample sizes.
Europe exhibits structural weaknesses in compute infrastructure and talent retention.
Paper's structural assessment of Europe's AI value-chain capabilities (stated in abstract); no numerical measures provided in the abstract.
Europe has limited strategic awareness of frontier AI progress.
Paper's assessment of Europe's positioning based on policy analysis and review of capabilities monitoring (as stated in abstract); no supporting metrics or sample sizes provided in the abstract.
AGI could strain existing governance frameworks.
Paper's policy analysis describing potential mismatches between governance capacity and AGI-induced disruptions (as stated in abstract); no empirical tests or quantification reported in the abstract.
AGI could intensify interstate competition.
Paper's geopolitical analysis and scenario-based reasoning informed by trends in AI capabilities (stated in abstract); no quantitative measures reported in the abstract.
AGI could fundamentally alter the global distribution of economic and military power.
Paper's geopolitical analysis drawing on capability trends and scenario reasoning (as stated in abstract); no empirical quantification provided in the abstract.
Simulated users produce feedback dynamics that diverge from humans.
Temporal/interaction analysis in the replication showing differences in how simulators provide feedback across multi-turn interactions compared to humans.
Simulated users exhibit amplified position biases relative to human participants.
Behavioral comparison in the simulator replication showing stronger position biases in simulated responses than in human responses.
Simulated users discuss different topics compared to the human participants.
Analysis of conversation content in the simulator replication showing differences in topical distribution between simulators and humans.
Simulators perform far below human self-consistency baselines for individual judgements.
Comparison in the replication study between simulator consistency and human self-consistency on individual-level judgments; reported large performance gap (simulators far below humans).
Amplified sycophancy and relationship-seeking behaviours may introduce deleterious long-term consequences.
Authors' interpretation and cautionary note based on observed behavioral amplification after fine-tuning; presented as potential long-term risk rather than an empirically measured long-term outcome.
In a controlled experiment across six industry configurations (72 tool invocations using Qwen3-32B), unconstrained tool parameters produced a 43% hallucination rate for domain identifiers.
Controlled experiment reported in the paper: six industry configurations, 72 tool invocations, model used: Qwen3-32B; reported unconstrained parameter condition resulted in 43% hallucination rate for domain identifiers.