Evidence (4114 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
Firms with a learning culture strongly driven by AI reported higher innovation performance, both directly and indirectly through two mediating factors (knowledge orchestration and organisational intelligence).
Cross-sectional quantitative survey (N=348) using established scales for AI-driven learning culture (AIDLC), knowledge orchestration (KO), organisational intelligence (OI) and innovation performance (IP); statistical analysis testing direct and serial mediation relationships.
Scholarly and empirical research should prioritize multilevel analysis, algorithmic governance, and ethical considerations to study the AI-infused strategic landscape.
Paper's concluding research agenda based on gaps identified in the conceptual analysis; prescriptive recommendation rather than empirical finding.
Although evaluated in the ads stack, this is a general framework that can be applied broadly to any large-scale recommendation and retrieval systems facing similar scaling and predictability challenges.
Author statement about generalizability and applicability beyond ads; no cross-domain experiments reported in the excerpt to substantiate broad applicability.
We tested this LLM ads retrieval framework in a large-scale industrial ads recommendation system, demonstrating significant improvements across offline and online A/B experiments, showcasing gains in both predictability and traditional performance metrics.
Reported large-scale industrial deployment and both offline and online A/B experiments; authors state 'significant improvements' but no numeric effect sizes, p-values, or sample sizes are provided in the excerpt.
The approach extracts hierarchical semantic attributes from ad creatives to obtain LLM representations, which serve as the foundation for graph-based expansion to retrieve semantic variants of an ad.
Method description in paper: hierarchical semantic attribute extraction, LLM representations, graph-based expansion; presented as the core technical approach (no detailed quantitative validation in excerpt).
We present an online validated semantic candidate generation framework powered by fine-tuned Large Language Models (LLMs) that showed significant improvement along these metrics by fundamentally improving the semantic-awareness of the system.
Claim backed by reported online validation and use of fine-tuned LLMs; paper states results come from online validation in a large-scale industrial ads recommendation system and offline/online A/B experiments (no numeric details provided in excerpt).
We introduce a new evaluation framework for quantifying stability and predictability of an ads recommender system.
Paper presents a methodological contribution (new evaluation framework) described in the text; no numerical validation details provided in the excerpt.
These effects are linked to improvements in green innovation quality.
Authors report that the observed negative associations between AIO and carbon emission intensity are connected to measures of green innovation quality (suggesting a mediating mechanism) in their empirical analyses.
Politika önerisi: Yapay zekâ teknolojileri alanında faaliyet gösteren firmalara uygulanan vergi indirim oranları artırılabilir.
Araştırma bulgularının (Ar-Ge vergi teşviklerinin AI patent sayısıyla pozitif ilişkisi) politika çıkarımı; doğrudan ampirik test değil öneri.
Politika önerisi: Devlet, Ar-Ge harcamalarında verimliliği artırmak için performans ve proje bazlı destekler verebilir.
Yazarların çalışmanın bulgularından hareketle önerdiği uygulamalı politika tedbiri; ampirik olarak test edilmemiş öneri.
Politika önerisi: Teknolojik ilerlemeyi ve yeniliği önemseyen devletler, özel sektörün Ar-Ge yatırımlarını sübvansiyonlar ve düşük faizli krediler gibi araçlarla teşvik etmelidir.
Araştırmanın regresyon bulgularına dayanarak yapılan politika önerisi; doğrudan ampirik test değil, uygulama önerisi (çalışmanın sonuçlarından türetilmiş).
Yukarıdaki bulgular, özel sektör Ar-Ge harcamalarının ve Ar-Ge’deki vergi teşviklerinin verimli kullanıldığını göstermektedir.
Araştırmanın pozitif ilişkiler üzerine elde ettiği regresyon sonuçlarından çıkarılan yorum/yorumlayıcı çıkarım (G8 + Türkiye, 2010-2020, random effects regresyon).
Ar-Ge'de uygulanan vergi teşvikleri arttıkça yapay zekâ patent sayıları artmaktadır (pozitif ilişki).
Aynı panel veri seti ve rassal etkiler regresyonu (G8 + Türkiye, 2010-2020); vergi teşvikleri değişkeninin AI patent sayısı üzerindeki katsayısı pozitif bulunmuştur.
Özel sektörün Ar-Ge harcamaları ile yapay zekâ (AI) patent sayıları arasında pozitif bir ilişki vardır.
Panel veri analizi: G8 ülkeleri + Türkiye, yıllar 2010-2020; rassal etkiler (random effects) regresyon modeli; ülke-yıl düzeyinde veri (9 ülke × 11 yıl = 99 gözlem). Sonuç olarak özel sektör Ar-Ge harcamaları değişkeninin AI patent sayıları ile istatistiksel olarak pozitif ilişki gösterdiği raporlanmıştır.
AI is a knowledge-intensive field that is particularly shaped by the flow of knowledge from scientific research to technological development.
Framing/background claim in the introduction describing the nature of AI and its dependence on science-to-technology knowledge flow.
The analysis covers AI-related patents filed from 2002 to 2021.
Paper states the temporal scope of the patent dataset analyzed (2002–2021).
Abstracts from patents and their cited scientific publications were extracted and BERTopic modelling was applied; topic labels were generated using generative AI.
Method description: data extraction of patent abstracts and cited scientific publication abstracts, application of BERTopic for topic modeling, and use of generative AI to create topic labels.
AI patents are classified into four categories using centrality measures derived from a CPC co-occurrence network.
Method section describing construction of a CPC (Cooperative Patent Classification) co-occurrence network and use of centrality measures to partition patents into four categories.
This study proposes a semantic science-technology exploration framework specifically designed for the AI domain, consisting of two stages: technology classification and semantic topic exploration.
Paper description of the proposed framework and its two-stage design (methodological contribution).
Software products and software R&D contributed 50 percent of the 1.2 percentage point acceleration in nonfarm business labor productivity (2017–2024 relative to 2012–2017).
Empirical decomposition comparing productivity growth rates across periods (2017–2024 vs 2012–2017) in the paper; the authors attribute half of the observed 1.2 percentage point acceleration to software products and software R&D.
Software products and software R&D contributed 50 percent of the 2 percent average growth rate in nonfarm business labor productivity from 2017 to 2024.
Empirical decomposition of nonfarm business labor productivity growth in the United States for the period 2017–2024 reported in the paper (the authors attribute shares of the observed 2% average growth to components including software products and software R&D).
AI is already materially affecting official productivity measures in the United States.
Empirical decomposition of U.S. productivity data reported in the paper that attributes portions of measured productivity growth to software-related channels linked to AI.
Using a framework that separates upstream innovation from downstream production suggests that AI boosts both upstream total factor productivity and intangible capital use downstream.
Model/framework decomposition in the paper (theoretical separation of upstream vs downstream, combined with empirical application to productivity data); the paper reports results consistent with increases in upstream TFP and downstream intangible capital use.
The authors open-source optimize_anything with support for multiple backends as part of the GEPA project at https://github.com/gepa-ai/gepa.
Explicit statement and provided GitHub URL in the paper excerpt.
Multi-task search outperforms independent optimization given equivalent per-problem budget through cross-task transfer, with benefits scaling with the number of related tasks.
Reported experiments comparing multi-task search versus independent per-problem optimization under equal per-problem budget; observed cross-task transfer benefits and that benefits increase with more related tasks.
Ablations across three domains reveal that actionable side information yields substantially higher final scores than score-only feedback.
Same ablation studies across three domains as above; reported higher final optimization scores when using actionable side information compared to only score feedback.
Ablations across three domains reveal that actionable side information yields faster convergence than score-only feedback.
Paper reports ablation studies in three domains comparing optimization with actionable side information versus score-only feedback and finds faster convergence with side information.
The system outperforms AlphaEvolve's reported circle packing solution (n=26).
Direct comparison reported to AlphaEvolve's circle packing solution with sample size notation n=26 provided in the excerpt; implies evaluation over 26 instances or trials.
The system generates CUDA kernels where 87% match or beat PyTorch.
Reported evaluation of generated CUDA kernels against PyTorch implementations; paper states 87% of generated kernels match or outperform PyTorch.
The system finds scheduling algorithms that cut cloud costs by 40%.
Paper reports that its discovered scheduling algorithms reduce cloud costs by 40%; presumably measured by evaluating cost of scheduled workloads before/after optimization.
The system discovers agent architectures that nearly triple Gemini Flash's ARC-AGI accuracy (32.5% to 89.5%).
Reported comparison to Gemini Flash on the ARC-AGI benchmark with explicit accuracy numbers (32.5% baseline to 89.5% after optimization). Method: discovered agent architectures via LLM-based search; benchmark evaluation on ARC-AGI.
A single AI-based optimization system achieves state-of-the-art results across six diverse tasks.
Paper reports experiments applying a single LLM-based optimization system to six diverse tasks and claims SOTA results across them; no further per-task details provided in the excerpt.
Compute expansion increases data-centre electricity pressure.
Public institutional data on compute expansion and data-centre electricity demand analyzed with growth indicators (CAGR, relative growth) showing rising electricity demand associated with compute capacity expansion.
Industrial robots represent persistent cyber-physical action capacity (as evidenced by installations and operational stock).
Use of public data on robot installations and operational stock, summarized via stock-flow ratios and related indicators to characterize persistent robotic action capacity.
AI investment signals broad capital allocation.
Public institutional data on AI investment examined with indicators such as growth multipliers, CAGR and concentration ratios to infer capital allocation patterns.
AI adoption is accelerating.
Analysis of public institutional data on AI adoption using growth indicators (relative growth, CAGR, growth multipliers) within a conceptual-empirical quantitative diagnostic design (no causal econometric model).
Shifting the community's default mindset from optimizing models per task to sampling models from learned weight distributions will accelerate toward an era in which AI systems routinely improve or create other AI systems.
Normative/prognostic statement by the authors outlining the paper's intended impact and vision; not supported by empirical data in the abstract.
Adapter-scale and conditional generation are advancing rapidly.
Authors' assessment of the current research trajectory (statement in abstract); implies multiple recent papers showing progress at adapter and conditional scales but no specific quantification in the abstract.
The authors organize existing methods into a five-stage pipeline and survey applications where weight-space generative approaches are already practical.
Descriptive claim about the content and organization of this position paper (methodology and survey); evidence is the paper itself.
High-performing models occupy low-dimensional, highly structured regions of weight space shaped by symmetry, flatness, modularity, and shared subspaces.
Authors' theoretical/empirical contention synthesizing observations from recent work; presented as an explanatory claim in the paper's abstract rather than a specific experimental result.
Recent advances demonstrate that neural weights can be synthesized on demand, often matching fine-tuning performance while reducing adaptation cost by orders of magnitude.
Claim refers to recent empirical work in the literature showing weight-synthesis methods; no specific papers, sample sizes, or quantified studies are cited in the abstract.
Model checkpoints should be treated as a first-class data modality, and generative modeling in weight space should be standardized as a core machine learning primitive.
Normative argument made by the authors in the position paper (proposal/recommendation); not supported by an empirical study in the abstract.
Neural network checkpoints have quietly become a large-scale data resource: millions of trained weight vectors now exist, each encoding task-, domain-, and architecture-specific knowledge.
Statement in the paper's abstract describing the current state of checkpoints; references to public model zoos and industry practice are implied but not enumerated in the abstract.
Deployment of GrowthGR delivered a non-trivial 0.3% gain in overall search GMV.
Reported result from the same production deployment / online A/B testing on Taobao (overall search GMV improvement claimed); no sample size or experimental details provided in the excerpt.
We successfully deployed GrowthGR on Taobao's production platform, achieving a substantial 5.3% lift in new item GMV.
Reported result from a production deployment / online A/B testing on Taobao (deployment and observed lift claimed in paper); no sample size or experimental details provided in the excerpt.
The Multi-Value-Aware Generative Retrieval (MultiGR) module, built on a semantic-ID-based generative retrieval architecture, leverages structured samples with search cascade signals and adopts a Multi-Value-Aware Policy Optimization (MoPO) training paradigm to align with multi-stage online values while explicitly balancing short-term transactional value and long-term growth potential estimated by ItemLTV.
Methodological description in the paper (design of MultiGR and MoPO); no empirical results cited in this sentence.
The Item Long-term Transaction Value Prediction (ItemLTV) module employs counterfactual inference to quantify the long-term value increment attributable to a single user interaction.
Methodological description in the paper (design of ItemLTV module); no experimental quantification provided in excerpt.
We propose a Multi-Value-Aware retrieval framework (GrowthGR) tailored for e-commerce search, designed to better align with the cascaded online values across different stages of the search system while balancing immediate conversion and long-term item growth.
Methodological contribution described in the paper (system/algorithm proposal); no empirical evaluation details in this sentence.
Policymakers should combine support for technological development with strategic investments in finance, trade integration, and public infrastructure to maximize AI's economic benefits and transform its potential into sustainable and inclusive growth.
Policy recommendation derived from the empirical findings (positive AI effects and positive interactions with financial innovation, trade openness, and government consumption) reported for 19 G20 countries (2005–2023) using GMM.
The interaction between AI and government final consumption expenditure helps strengthen economic growth by improving public infrastructure, institutional quality, and capacity to leverage new technologies.
GMM interaction specifications using panel data for 19 G20 countries (2005–2023); reported AI × government final consumption expenditure interaction coefficient is positive and statistically significant, with interpretation linking it to public infrastructure and institutional capacity.