Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Canonical decision-theoretic strategies that account for adaptive user trajectories can be mapped so that agents transition between strategies based on interaction feedback to reach stable equilibria.
Analytical results from the decision-theoretic modeling in the paper showing adaptive trajectories and stable equilibria (theoretical model derivation).
The paper develops a decision- and game-theoretic approach to the human-AI delegation-verification dilemma.
Methodological contribution: construction of decision- and game-theoretic models described in the paper (modeling/theoretical development).
Emerging models of human-AI interaction predominantly advance the complementarity thesis variously dubbed human-AI collaboration and human-AI hybrid intelligence.
Literature characterization / conceptual review reported in the paper (no empirical sample or quantitative analysis cited).
Politika önerisi: Yapay zekâ teknolojileri alanında faaliyet gösteren firmalara uygulanan vergi indirim oranları artırılabilir.
Araştırma bulgularının (Ar-Ge vergi teşviklerinin AI patent sayısıyla pozitif ilişkisi) politika çıkarımı; doğrudan ampirik test değil öneri.
Politika önerisi: Devlet, Ar-Ge harcamalarında verimliliği artırmak için performans ve proje bazlı destekler verebilir.
Yazarların çalışmanın bulgularından hareketle önerdiği uygulamalı politika tedbiri; ampirik olarak test edilmemiş öneri.
Politika önerisi: Teknolojik ilerlemeyi ve yeniliği önemseyen devletler, özel sektörün Ar-Ge yatırımlarını sübvansiyonlar ve düşük faizli krediler gibi araçlarla teşvik etmelidir.
Araştırmanın regresyon bulgularına dayanarak yapılan politika önerisi; doğrudan ampirik test değil, uygulama önerisi (çalışmanın sonuçlarından türetilmiş).
Yukarıdaki bulgular, özel sektör Ar-Ge harcamalarının ve Ar-Ge’deki vergi teşviklerinin verimli kullanıldığını göstermektedir.
Araştırmanın pozitif ilişkiler üzerine elde ettiği regresyon sonuçlarından çıkarılan yorum/yorumlayıcı çıkarım (G8 + Türkiye, 2010-2020, random effects regresyon).
Ar-Ge'de uygulanan vergi teşvikleri arttıkça yapay zekâ patent sayıları artmaktadır (pozitif ilişki).
Aynı panel veri seti ve rassal etkiler regresyonu (G8 + Türkiye, 2010-2020); vergi teşvikleri değişkeninin AI patent sayısı üzerindeki katsayısı pozitif bulunmuştur.
Özel sektörün Ar-Ge harcamaları ile yapay zekâ (AI) patent sayıları arasında pozitif bir ilişki vardır.
Panel veri analizi: G8 ülkeleri + Türkiye, yıllar 2010-2020; rassal etkiler (random effects) regresyon modeli; ülke-yıl düzeyinde veri (9 ülke × 11 yıl = 99 gözlem). Sonuç olarak özel sektör Ar-Ge harcamaları değişkeninin AI patent sayıları ile istatistiksel olarak pozitif ilişki gösterdiği raporlanmıştır.
AI is a knowledge-intensive field that is particularly shaped by the flow of knowledge from scientific research to technological development.
Framing/background claim in the introduction describing the nature of AI and its dependence on science-to-technology knowledge flow.
The analysis covers AI-related patents filed from 2002 to 2021.
Paper states the temporal scope of the patent dataset analyzed (2002–2021).
Abstracts from patents and their cited scientific publications were extracted and BERTopic modelling was applied; topic labels were generated using generative AI.
Method description: data extraction of patent abstracts and cited scientific publication abstracts, application of BERTopic for topic modeling, and use of generative AI to create topic labels.
AI patents are classified into four categories using centrality measures derived from a CPC co-occurrence network.
Method section describing construction of a CPC (Cooperative Patent Classification) co-occurrence network and use of centrality measures to partition patents into four categories.
This study proposes a semantic science-technology exploration framework specifically designed for the AI domain, consisting of two stages: technology classification and semantic topic exploration.
Paper description of the proposed framework and its two-stage design (methodological contribution).
We develop a unified taxonomy mapping diverging terminology to a shared framework of measured signals based on what benchmark authors claim to measure.
Methodological contribution described in the paper: creation of a taxonomy to harmonize labels and claimed measurement targets across benchmarks (details and mapping provided in paper/tool).
We introduce and open-source Benchmarking-Cultures-25, a dataset of 231 benchmarks highlighted across 139 model releases in 2025 from 11 major AI builders, alongside an interactive tool to explore the data.
Empirical contribution: the paper publishes the dataset and tool (links provided). Counts reported in the paper metadata (231 benchmarks, 139 model releases, 11 builders).
Experiments on real-world and synthetic tabular datasets show that SPN consistently improves robustness and predictive performance under strategic manipulation compared with both tabular foundation models and classical tabular methods.
Empirical experiments reported in the paper (on unspecified real-world and synthetic tabular datasets) comparing SPN to PFN-style tabular foundation models and classical tabular methods; the abstract claims consistent improvements but does not report sample sizes, dataset names, or quantitative effect sizes.
SPN constructs strategic in-context examples to approximate post-manipulation inputs and aligns PFN predictions with the induced strategic distribution.
Description of SPN's mechanism in the paper (methodological detail). Presented as the approach used to approximate strategic post-manipulation inputs and align predictions; no quantitative details or sample sizes in the abstract.
We propose Strategic Prior-data Fitted Network (SPN), an inference-time strategy-aware framework that adapts tabular foundation models to strategic environments without retraining.
Methodological contribution described in the paper: SPN is introduced as an inference-time framework that modifies behavior without retraining. This is a description of the proposed method rather than quantified empirical evidence; no sample sizes reported in the abstract.
Tabular foundation models based on pretrained prior-data fitted networks (PFNs) have shown strong generalization on diverse tabular tasks, but they are typically designed for non-strategic settings where data distributions are independent of deployed classifiers.
Statement in the paper situating PFN-style tabular foundation models as having strong generalization in prior work and noting their design assumption of non-strategic, classifier-independent data distributions; no dataset/sample sizes provided in the abstract.
The framework extends platform capitalism theory to professional service contexts.
Theoretical contribution claimed in the paper, integrating platform capitalism literature with sociology of professions and critical information science.
Resistance requires collective organising, alternative infrastructure development, and recognition that current AI implementations conflict with core professional values.
Normative conclusion drawn from the paper's critical qualitative analysis and theoretical framing; prescriptive recommendations rather than empirical measurement.
Vendor monopolies (84% ARL member institutions market share at peak concentration).
Market concentration data synthesized in the paper (reported peak share among ARL member institutions).
We introduce the concept [of twin agents], distinguish it from digital twins, and outline the research questions this new class of agent demands.
Stated contribution of the paper (conceptual development and research agenda); content claim about what the paper contains rather than an empirical finding.
Cognitive forcing functions and related frameworks address overreliance effectively in contexts where there is a clear boundary between the AI and the human decision-maker.
Claim based on literature and frameworks cited or discussed by the authors (asserted effectiveness in boundary-defined contexts); the abstract does not provide empirical evaluation details or sample sizes.
The next role on that list is more personal: you — digital twins of each individual (twin agents) representing their knowledge, perspective, and communicative style to colleagues when they are unavailable.
Proposed argument supported by the authors' early design work in an ongoing project; conceptual proposal rather than reported empirical validation in the abstract.
Agentic AI has taken on the role of assistant, collaborator, and decision-support tool.
Asserted in the paper's framing/introduction; based on synthesis of prior work and the authors' characterization of current agentic-AI deployments (no empirical sample or quantitative data reported in the abstract).
The paper recommends staged, governance-aware implementation for responsible AI adoption in SMEs.
Policy and practice recommendation from the reviewer's synthesis and conclusions section.
This review extends the resource-based view to AI-enabled capabilities in SMEs.
Conceptual/theoretical contribution described in the paper based on synthesis of literature and interpretation of AI as a firm capability in SMEs.
AI enhances operational efficiency primarily in recruitment and performance analytics.
Synthesis across the 21 included studies in the review identifying recurring application domains (recruitment, performance analytics) and reported efficiency benefits.
Artificial intelligence (AI) is transforming human resource management (HRM) by automating tasks and enabling data-driven decisions.
Statement synthesized from the systematic literature review (PRISMA-based) of global studies on AI applications in HRM included in the paper; no single empirical estimate reported.
AI excels at structured, retrieval-grounded, and tool-mediated tasks.
Paper's synthesized conclusion from cross-stage analysis; appears to be based on qualitative benchmarking and review rather than a specific randomized trial in the excerpt.
Long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input.
Qualitative claim based on the paper's end-to-end analysis of AI across the research lifecycle (review of developments through April 2026); no specific trials or sample sizes reported in the excerpt.
Fully automated systems can now generate research papers for as little as $15.
Statement in paper's introduction asserting observed market/practice examples and cost estimates; no specific empirical sample or experiment reported in the excerpt.
Major deployed generative AI advertising systems preserve a visible boundary between commercial content and AI-generated responses.
Descriptive claim based on review of major deployed systems and design patterns; stated by authors as observed industry practice (no specific sample size or experiments reported in text).
Macro-level policy shocks activate market discipline in emerging market debt markets (illuminated by the observed penalty on AI washing firms).
Interpretive conclusion based on the observed post-FYP increases in debt financing costs for AI washing firms and associated analyses (inference from empirical results; not a direct test reported in the abstract).
Supply chain concentration and bank proximity attenuate the debt-cost penalty for AI washing firms.
Heterogeneity/interaction analyses indicating smaller post-shock financing-cost increases for AI washing firms with concentrated supply chains and closer bank relationships (moderator evidence; no sample sizes in abstract).
External validation shows this decoupling reflects strategic deception (AI washing) — evidenced by subsidy extraction and future regulatory violations — rather than benign ambition, supporting its validity as an AI washing proxy.
External validation analyses linking the residual decoupling to observed subsidy extraction and to higher incidence of future regulatory violations (validation methods described; sample size not provided in abstract).
The policy architecture required to escape the trap (targeting trust, sequencing, and team-level adoption) is characterised.
Model-derived policy prescriptions identifying interventions (trust-building, sequencing, team-level targeting) necessary to shift equilibria toward genuine adoption; theoretical argumentation. No empirical trial or sample.
Conditions are derived under which sustained but imperfect adoption pressure is welfare-improving.
Analytical derivation within the model framework characterising parameter regions where persistent imperfect adoption increases welfare (model-defined welfare metric). Theoretical analysis; no empirical sample.
A cost ratchet dynamic implies that failed adoption attempts permanently lower barriers even when embedding fails.
Model component introducing a cost-ratcheting mechanism; analytical/simulation results showing permanent barrier reductions following failed attempts. Theoretical model; no empirical sample.
Genflow establishes a robust framework for scalable, enterprise-grade generative systems.
Concluding claim in paper based on the proposed architecture and reported yield improvement; no broader deployment studies, scalability benchmarks, or enterprise trials detailed in this statement.
By transitioning to a multi-stage, self-correcting pipeline, Genflow improved the yield of brand-compliant video generations from 42% to 89%.
Reported empirical result comparing yield of brand-compliant video generations before and after applying Genflow; no sample size, dataset description, statistical significance, or experimental protocol provided in the text.
We implement an Adversarial Multi-Agent Quality Control (QC) loop in which evaluator agents iteratively critique generated frames and prompt generators to refine outputs until a deterministic consensus is reached.
Method description of a multi-agent adversarial QC loop used in the pipeline; no experimental protocol, number of agents, or sample sizes provided in this sentence.
Our architecture integrates a retrieval-based 'Brand DNA' extraction module to parameterize generation according to established corporate identity guidelines.
Methodological description in paper indicating a retrieval-based module for extracting Brand DNA used to condition generation; no evaluation metrics or sample sizes provided in this statement.
We introduce Genflow, a Compound AI System designed to enforce brand consistency in generative media production.
Paper describes the proposed system architecture (Genflow) as a methodological contribution; description of modules and pipeline provided but no external validation details in this sentence.
Recent advancements in generative video models demonstrate high visual fidelity.
Asserted in paper as a background observation about recent generative video models; no specific dataset, benchmark, or sample size reported.
Trace-Prior RL adds bounded adaptation under capacity asymmetry.
Experiments contrasting Trace-Prior RL versus behavior cloning and reward-only approaches in settings with capacity asymmetry, showing Trace-Prior RL permits limited/adaptive deviation while preserving trace alignment.
Pure behavior cloning is nearly enough for symmetric imitation.
Empirical results in symmetric imitation settings (presumably in the two-hotel or bidding benchmarks) showing behavior cloning achieves close imitation without additional RL.
Trace-prior or corrected-history policies better preserve price or bid distributions.
Comparative experiments and ablations across the two-hotel benchmark and hidden-budget bidding task showing trace-prior and corrected-history policies retain price/bid distribution characteristics better than reward-only variants.