Evidence (7448 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
The analysis draws on data from 36 countries for 2018–2022 for the AI Vibrancy Score (AIVS)–EGDI comparison.
Data description in abstract explicitly reporting the AIVS–EGDI sample coverage as 36 countries for 2018–2022.
We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation.
Paper states that the anonymized Asta Interaction Dataset, accompanying analysis, and a new query intent taxonomy are being released publicly.
The Asta Interaction Dataset comprises over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform.
Statement in paper describing dataset composition: >200,000 user queries and interaction logs collected from two deployed tools (literature discovery and scientific Q&A) within an RAG platform. Dataset release described in methods/dataset section.
Methods combine targeted literature synthesis, comparative conceptual analysis, and framework building (with recent scholarly and institutional sources reviewed).
Explicit methodological statement in the paper describing the review and analytic approach; no primary-data methods used.
AI coding assistants are a high-visibility class of corporate AI and are given special attention as an illustrative case in the paper.
Paper specifically calls out AI coding assistants as a focal example in the conceptual analysis and discussion; based on literature review rather than original measurement.
AI’s societal integration in India is gradual, and therefore its impact on economic variables (like wages and inequality) is also gradual.
Synthesis in the paper based on empirical adoption figures (e.g., <0.7% adoption for AI ride services) and the observed weak changes in inequality measures in the transportation sector.
Despite AI’s introduction, wage inequality in the transportation sector (measured by the Gini coefficient) has not significantly worsened.
Empirical investigation reported in the paper analyzing transportation-sector wage disparities over time using the Gini coefficient; the paper reports no significant worsening post-introduction.
The Article translates these insights into risk-sensitive guideposts for modernizing governance of AI-enabled tools and emerging modalities, from agentic systems to blockchain-deployed smart contracts.
Prescriptive/conceptual policy guidance presented in the Article (normative recommendations; governance framework).
The Innovation Frontier traces LegalTech’s evolution from 2000s-vintage e-discovery to generative AI.
Historical/chronological analysis in the Article (literature review/history of LegalTech provided by authors).
The Legal Services Value Chain disaggregates the lifecycle of a legal matter into five distinct nodes of activity.
Model description in the Article (conceptual architecture; decomposition of legal work).
The Article develops two core organizing models: the Legal Services Value Chain and the Innovation Frontier.
Explicit claim in the Article describing conceptual/model contributions (theoretical/model-building).
This Article provides a practical framework for navigating the shifting terrain of legal innovation and AI.
Statement of purpose in the Article (conceptual contribution; framework development). No empirical validation reported in the excerpt.
There are action tools for higher-stakes tasks like financial transactions.
Observed examples of action tools in the monitored MCP repositories that perform higher-stakes functions, with financial transactions given as an explicit example in the paper.
We use O*NET mapping to identify each tool's task domain and consequentiality.
Method described in paper: mapping each tool to O*NET task domains and consequentiality using the monitored tool metadata and descriptions.
We categorise tools according to their direct impact: perception tools to access and read data, reasoning tools to analyse data or concepts, and action tools to directly modify external environments.
Methodological classification described in paper (taxonomy of tools into perception, reasoning, action); applied to monitored MCP server dataset.
AI transparency alone did not significantly increase data-sharing.
Result reported from the randomized experiment (N=240) comparing actual data-sharing rates across human, white-box AI, and black-box AI conditions; authors state that transparency alone did not produce a significant increase in sharing.
The SRL did not generate designs with significantly better performance than RWL, even though it explored a different region of the design space.
Empirical comparison on the battery pack design task showing no significant performance improvement of SRL over RWL despite differing exploration; exact statistical tests, p-values, and sample sizes are not provided in the excerpt.
These energy reductions are achieved without statistically significant performance loss.
Paper states that performance loss is not statistically significant across the evaluated benchmarks (as reported in the abstract).
The empirical analysis is based on A-share listed companies from 2015 to 2023.
Data description in the paper stating the study sample and time period (A-share listed firms, 2015–2023).
The research surveys current methodologies and empirical evidence related to regulatory early-warning systems and desegregates (synthesizes) findings from empirical information.
Paper states it examines existing methodologies and empirical findings (literature review / synthesis); no scope (e.g., number of studies reviewed) given in the excerpt.
The study uses a mixed-methods approach combining qualitative insights from 1,500 semi-structured customer interviews with quantitative analysis of transaction records, loan repayment histories, and account activity.
Paper states methods explicitly in abstract: 1,500 semi-structured interviews plus quantitative analysis of transaction records, loan repayment histories, and account activity (case-study approach across three platforms).
Three interlocking threads characterize AI for science: (1) AI as research instrument, (2) AI for research infrastructure, and (3) the reshaping of scholarly profiles and incentives by machine-readable metrics.
Conceptual framework presented in the paper; organization of topics rather than empirical measurement. The paper indicates these threads are followed through historical and contemporary examples.
The history of artificial intelligence for scientific discovery is not a two year story about chatbots learning to write papers; it is a sixty year story beginning with DENDRAL (1965).
Historical narrative / literature review citing early systems such as DENDRAL (1965) and subsequent developments in scholarly infrastructure (arXiv, Google Scholar, ORCID). No empirical sample or statistical test reported.
Four control mechanisms emerged from the review: GPS tracking (panoptic surveillance), rating systems (emotional labour demands), dynamic pricing (income volatility), and automated sanctions (deactivation fear).
Thematic synthesis across the 48 reviewed studies identifying recurring algorithmic control mechanisms.
Thematic synthesis integrated Job Demand-Control Model, Conservation of Resources Theory, and Algorithmic Management Theory to develop an integrated multilevel theoretical framework.
Authors' stated method: thematic synthesis combining those three theoretical frameworks across the reviewed literature (48 studies).
PRISMA-guided systematic integrative review of 48 peer-reviewed studies (2016-2025) sourced from 4,812 initial records (Scopus, Web of Science, PubMed).
Methods statement in the paper: PRISMA-guided systematic integrative review; search across Scopus, Web of Science, PubMed; initial yield 4,812 records; final included studies = 48.
At the macroeconomic level, Kazakhstan's state programs (e.g., 'Digital Kazakhstan' and the Industrial and Innovation Development Program) and international indices (WIPO Global Innovation Index, OECD digital assessments, IMF data) are used to evaluate and position Kazakhstan within the global digital economy.
Macro-level analysis using national programs and international indices described in the article to assess Kazakhstan's digital economy standing.
This paper uses panel data of China's Shanghai and Shenzhen A-share non-financial listed companies from 2010 to 2022 to study AI's effects.
Explicit data description in the paper (sample frame and period stated).
Both the positive (approach) and negative (avoidance) AI job crafting pathways failed to significantly affect life satisfaction, indicating domain specificity of AI-related psychological mechanisms.
Analysis of the same multi-source, multi-wave dataset of 287 employee–leader dyads; tests of effects on life satisfaction showed non-significant results for both pathways.
Deep Reinforcement Learning (DRL) has shown strong microscopic performance in car-following conditions, but its macroscopic traffic flow characteristics remain underexplored.
Literature synthesis / motivation in the paper (review of existing DRL work focused on microscopic performance). No empirical sample size.
The paper is intentionally public-safe: it omits proprietary implementation details, training recipes, thresholds, hidden-state instrumentation, deployment procedures, and confidential system design choices, and therefore the contribution is theoretical rather than operational.
Statement about the paper's scope and publication choices; directly asserted by the authors regarding omitted content and the theoretical nature of the contribution.
The paper introduces a constraint-coupled reasoning framework with four elements: bounded transition burden, path-load accumulation, dynamically evolving feasible regions, and a capability-stability coupling condition.
Descriptive/theoretical: the paper explicitly defines and enumerates these four framework elements. This is a claim about the paper's content rather than an empirical finding.
The analysis uses data on 31 million users of Ctrip, China's largest online travel platform, to study "Wendao," an LLM-based AI assistant integrated into the platform.
Descriptive statement in the paper about data source: platform logs/usage data for Ctrip covering 31 million users and the Wendao assistant.
The top three platforms (Claude, ChatGPT, and DeepSeek) receive statistically indistinguishable satisfaction ratings despite vast differences in funding, team size, and benchmark performance.
Statistical comparison of self-reported satisfaction ratings collected via the paper's survey (overall N=388); statistical tests reported in paper (specific test and per-platform n not provided in abstract).
The frequency of manipulative behaviours (propensity) of an AI model is not consistently predictive of the likelihood of manipulative success (efficacy), underscoring the importance of studying these dimensions separately.
Analytic results reported in the study comparing model propensity (how often manipulative outputs are produced) with measures of success (induced belief/behavior changes), finding inconsistent or weak association.
For readers less familiar with the Bayesian and decision-theoretic language, key terms are defined in a glossary at the end of the article.
Statement about the article's structure and supporting material (presence of glossary noted in the article).
The gap between a continuously updated belief state and your frozen deployed model is 'learning debt.'
Terminology/definition introduced by the author in the article (glossary and definitional exposition).
Model retraining is usually treated as an ongoing maintenance task.
Author's descriptive claim in the article; presented as an observation about prevailing practice (no empirical sample or data reported).
We ran a behavioral experiment (N = 200) in which participants predicted the AI's correctness across four AI calibration conditions: standard, overconfidence, underconfidence, and a counterintuitive "reverse confidence" mapping.
Reported experimental design and sample size in the paper (behavioral experiment with N = 200; four experimental conditions).
Study methodology: Two online experiments were conducted via the crowdsourcing platform Prolific with sample sizes study 1: n = 325 and study 2: n = 371; participant mean age = 35 years; 55% female.
Methodological and sample description provided in the abstract.
Late disclosure of AI involvement did not improve affective engagement for AI-generated content.
Reported experimental result in the abstract from the two online studies manipulating disclosure timing (early vs. late).
The study was conducted by the Mohammed bin Rashid School of Government’s Future of Government Center, in collaboration with global AI pioneers.
Authorship and collaboration statement in the report.
The report highlights the key findings of a field study covering ten Arab countries to explore the realities and challenges of AI governance.
Report statement describing the geographic scope of the field study (explicitly: ten Arab countries).
The recommendations are based on regional research that included hundreds of leaders active in the AI domains, from the public and private sectors.
Report statement claiming participant base of the underlying research (described as 'hundreds of leaders').
Zero-shot baselines and standard retrieval stagnate around 50-60% accuracy across model generations on the graduate-level final exam.
Pilot study reported on a full graduate-level final exam comparing zero-shot and standard retrieval baselines across model generations; reported accuracy range given as ~50-60%. Exact number of exam questions or models compared not stated.
Afriat's theorem guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint.
Theoretical claim citing Afriat's theorem (mathematical result used as foundational justification in the paper).
We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents.
Methods described in the paper: authors report fine-tuning Chronos-2 on synthetically generated time series from utility-maximizing agents (methodological statement).
This yields a common scale (bits of usable information) for comparing a wide range of interventions, contexts, and models.
Theoretical implication of the authors' formalization combining Bayesian persuasion and V-usable information (paper argues for a common information scale measured in bits).
To formalize mecha-nudges, we combine the Bayesian persuasion framework with V-usable information, a generalization of Shannon information that is observer-relative.
Methodological/theoretical development described in the paper (formal combination of two theoretical frameworks).
We introduce mecha-nudges: changes to how choices are presented that systematically influence AI agents without degrading the decision environment for humans.
Conceptual/definitional contribution made in the paper (novel concept introduced by authors).