Evidence (6869 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Real-world deployment will require representative data coverage and online adaptation despite the method’s robustness mechanisms.
Authors' discussion/limitations section: theoretical requirements for persistently exciting/representative trajectories for DeePC and recommendation for online adaptation and continual data collection for deployment.
Agent performance degrades markedly as environment complexity, stochasticity, and non-stationarity increase, revealing core limitations of current LLM-based agents for long-horizon, multi-factor decision problems.
Experimental results across progressively harder RetailBench environments showing performance falloff for multiple LLMs under increased task complexity and non-stationarity.
Behavioral memorization probe (TS‑Guessing) signaled memorization above chance for 72.5% of prompts across all models and items.
Experiment 3 — TS‑Guessing behavioral probe applied exhaustively to all 513 MMLU questions × six models (total prompts = 513×6); statistical thresholds used to classify above-chance memorization signals, yielding 72.5% of prompts flagged.
Paraphrase / indirect-reference diagnostic: on a 100-question subset, average accuracy dropped by 7.0 percentage points under indirect referencing.
Experiment 2 — paraphrase/indirect-reference diagnostic applied to a 100-question subset of MMLU; measured delta between original and paraphrased question accuracy averaged to 7.0 percentage points.
STEM items show higher lexical contamination (18.1%) relative to the overall rate.
Category-level results from Experiment 1 (lexical matching) on the MMLU dataset (513 questions), aggregated by subject domain to compute an 18.1% contamination rate for STEM categories.
Overall lexical contamination: 13.8% of MMLU items show evidence of exposure in training data.
Experiment 1 — lexical contamination detection pipeline that searched model training–era public corpora and the open web for literal or near-literal occurrences of the 513 MMLU questions/answers; per-item contamination flags aggregated to produce the 13.8% figure.
Public leaderboards overstate modern LLM capabilities because substantial portions of benchmark QA items appear in (or are memorized from) training data, inflating measured accuracy.
Multi-method contamination audit across six frontier LLMs (GPT-4o, GPT-4o-mini, DeepSeek-R1, DeepSeek-V3, Llama-3.3-70B, Qwen3-235B) evaluated on the MMLU benchmark (513 questions, 57 subjects), using lexical matching, paraphrase sensitivity, and behavioral memorization probes that together show systematic leakage.
Proactive AI at national scale amplifies concerns around transparency, accountability, privacy, and potential misuse, necessitating robust regulatory and ethical frameworks.
Normative and ethical analysis in the paper, supported by general literature on large-scale AI governance; no empirical assessment of regulatory effectiveness in Russia included.
There are limited randomized controlled trials or longitudinal evaluations; few studies measure patient-relevant outcomes or economic impacts.
Literature synthesis noting scarcity of RCTs and long-term observational studies, and absence of widespread patient-outcome and cost-effectiveness evaluations in existing publications.
Many published studies focus on standalone algorithm accuracy rather than clinician–AI joint performance in routine workflows.
Review of the literature categorizing study designs (preponderance of algorithm development/validation studies, fewer reader-in-the-loop, simulation, or deployment studies).
Ethical and legal issues—patient privacy, algorithmic bias, intellectual property, and equitable access—pose risks to AI deployment in drug development.
Ethics and legal analyses, policy reports, and documented case examples collated in the review that identify these recurring concerns.
Regulatory uncertainty about validation standards and liability for AI tools raises investment risk and may slow deployment.
Regulatory and policy reports included in the narrative review describing evolving standards and open questions about validation, explainability, and liability for ML-based tools.
Adoption of AI in drug R&D requires high upfront investment in data curation, compute infrastructure, and specialized talent.
Industry reports and economic analyses summarized in the review reporting capital and operational needs for building AI capabilities; qualitative synthesis rather than quantitative costing across firms.
Limited transparency and interpretability of many AI algorithms (black-box models) complicate clinical and regulatory trust and adoption.
Regulatory reports, methodological critiques, and case examples in the review highlighting interpretability concerns and their impact on clinical/regulatory acceptance.
Performance of AI models in drug R&D depends on large, high-quality, and representative biomedical datasets; dataset bias or gaps substantially undermine model performance and generalizability.
Methodological literature and case studies cited in the review documenting failures or limited generalization when training data are biased, sparse, or non-representative; thematic synthesis rather than pooled quantification.
High-quality, standardized, interoperable data (clean, annotated, connected across modalities) is a critical limiting factor for translating AI capability into sustained impact.
Conceptual emphasis and domain knowledge argument in the editorial; no empirical measurement of data quality's causal effect included.
The paper's evidence base is limited by early-stage projects with limited longitudinal outcome data and dependence on publicly available project information which may be incomplete or biased.
Methods and limitations explicitly stated in the paper (qualitative review; reliance on secondary sources; two case studies; absence of large-scale quantitative evaluation).
Data protection and privacy (especially sensitive health data) complicate open-data DAO models.
Conceptual analysis referencing privacy/data-protection concerns for health data (e.g., GDPR-like regimes); no empirical evaluation of privacy breaches within DAOs provided.
Significant barriers remain for DAOs in pharma: regulatory uncertainty about tokenized securities, IP fractionalization, and clinical data sharing.
Legal/regulatory analysis and literature synthesis highlighting unclear classifications and open regulatory questions; no new regulatory rulings provided.
Pharmaceutical R&D faces rising costs, long approval timelines, supply-chain inefficiencies, and low patient involvement.
Literature review and synthesis of well-documented industry challenges cited in the paper (secondary sources); no new primary data presented in this study.
There is limited reporting on privacy safeguards, model interpretability, and external validity in the reviewed studies.
Review observed sparse reporting on privacy protections, interpretability analyses and external validation across included studies.
Misclassification risks (false positives and false negatives) are a common limitation and can harm consumers by incorrectly restricting access or by failing to detect harm.
Review notes model error rates reported via precision/recall and AUC; discusses harms from false positives/negatives as a recurrent limitation in the literature.
Privacy and ethical concerns are substantial: continuous monitoring and sensitive behavioural inference raise privacy, surveillance, and misuse risks.
Multiple included studies and the review discussion explicitly identify privacy, ethical, and potential misuse concerns with continuous monitoring and behavioural inference.
Advanced technologies' complexity and lack of explainability create risks for audit reliability and professional judgement.
Findings from literature synthesis and professional/regulatory perspectives included in the review; presented as an identified risk/challenge rather than quantified effect.
Audit 5.0 introduces key challenges: data quality and integration issues, complexity and explainability of advanced technologies, regulatory and ethical uncertainty, and skills shortages combined with cultural resistance.
Systematic literature review and synthesis of professional standards and regulatory perspectives; assertions based on reviewed literature rather than a single empirical dataset.
At the question level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best.
Question-level analysis from the randomized experiment comparing cases where chatbot suggestions were incorrect versus control; paper reports a ~66% reduction in accuracy on easy questions when chatbot suggestions were incorrect (exact denominators and statistics not provided in the excerpt).
Common barriers to ERM adoption in MSMEs include resource constraints and lack of expertise.
Findings from the literature review identifying determinants and barriers reported across studies (survey and qualitative studies commonly cited in such reviews); specific sample sizes/methods not provided in the summary.
MSMEs are particularly vulnerable to external shocks because of limited financial resources, weak internal controls, and heavy dependence on owner-managers’ intuition.
Background literature summarized in the review describing common structural and governance characteristics of MSMEs; drawn from multiple sources in the literature (specific studies not cited in the summary).
The article identifies and lays out several concerns regarding the government's approach to regulating AI.
Analytical critique presented in the paper (legal/policy analysis summarizing potential regulatory shortcomings). Based on the author's review and argumentation rather than primary empirical data.
Environmental regulations weaken the beneficial influence of generative AI on a company's ESG performance.
Moderation/interaction tests in the panel-data econometric model using measures of environmental regulation (on the same 2012–2024 Chinese A-share firm sample) showing a statistically significant negative interaction effect.
Gaps in infrastructure readiness, digital awareness, and inclusive policy frameworks hinder equitable AI adoption among micro‑enterprises.
Cross‑study synthesis of barriers identified across the 55 included articles; infrastructural, awareness, and policy barriers are explicitly reported as recurring themes.
Entrenched societal inequities imply that women and girls are often disproportionately held back from achieving their potential.
Broad claim referencing societal inequities and their effects on women and girls; stated in the introduction without specific empirical citations in the excerpt.
Significant challenges persist for AI-enhanced GS-BESS deployment, including limited data availability, poor model generalization, high computational requirements, scalability issues, and regulatory gaps.
Barriers and limitations identified across the literature as reported in this systematic review (PRISMA-based synthesis). The excerpt does not enumerate which studies reported each barrier or provide prevalence statistics.
A preregistered, nationally representative replication experiment in the United States (N = 1,200) replicates the causal finding that a labor-replacing (vs. labor-creating) AI frame reduces willingness to politically engage with future AI developments.
Preregistered randomized experiment (nationally representative US sample, N = 1,200) replicating the UK manipulation and measuring willingness to engage politically regarding AI.
A preregistered, nationally representative experiment in the United Kingdom (N = 1,202) shows that exposure to a labor-replacing (vs. labor-creating) AI frame causally reduces trust in democracy.
Preregistered randomized experiment (nationally representative UK sample, N = 1,202) manipulating AI framing (labor-replacing vs. labor-creating) and measuring trust/satisfaction with democratic institutions.
Large-scale survey data indicate that the public tends to view AI as labor-replacing rather than labor-creating.
Cross-sectional survey (N = 37,079 respondents across 38 European countries); descriptive analysis of responses about AI's labor market impact.
Only 12% of gig workers participate in retirement savings programs.
Survey and administrative measures of retirement-savings participation among gig workers in the 24-country sample.
Only 23% of gig workers report access to employer-provided health insurance.
Self-reported benefits coverage from labor force surveys and linked administrative records for gig workers across the 24 OECD countries (2015–2025).
The environmental footprint of healthcare systems is growing and persistent inequities in access and outcomes have intensified calls for procurement reform.
Contemporary literature review and synthesis of sector reports and studies documenting healthcare emissions/footprint and health inequities (no original empirical data reported in this paper).
There exists a systemic governance vacuum around GenAI, including gaps in privacy, accountability, and intellectual property protections.
Authors' synthesis of governance-related gaps reported across the 28 secondary studies and research agendas in the review.
Societal and ethical risks—such as bias, misuse, and skill erosion—constrain GenAI adoption.
Themes synthesized from the reviewed literature (28 papers) reporting societal and ethical concerns associated with GenAI deployment.
Technical unreliability—manifesting as hallucinations and performance drift—is a major constraint on GenAI adoption.
Recurring identification of technical reliability issues (hallucinations, performance drift) in the 28 reviewed papers and authors' aggregation of technical risks.
Adoption of GenAI is constrained by multiple interrelated challenges.
Cross-paper synthesis from the systematic review of 28 studies identifying recurring barriers and constraints reported in the literature.
Ongoing issues remain such as data access, model transparency, ethical concerns, and the varying relevance across Global North and Global South contexts.
Critical synthesis within the review drawing on discussions and critiques in the literature about barriers and ethical challenges; based on reported limitations and regional comparisons in reviewed studies (no numerical breakdown provided).
Human judgment is constrained by bounded rationality, cognitive biases, and information-processing limitations.
Cited as established findings from prior research across decision sciences and related fields (extensive literature evidence referenced; no new empirical data in this paper's abstract).
Key implementation challenges include data quality and integration, model interpretability, cybersecurity and privacy, regulatory/compliance uncertainty, skills gaps among accounting professionals, and implementation costs.
Identified by the paper through literature review and practitioner reports; these are presented as recurring barriers rather than quantified with a specific sample.
Many studies on serious-game DSTs are small-scale or experimental, and long-term impact data on behavioral change and emissions outcomes are sparse, limiting generalizability.
Review of the literature summarized in the chapter showing predominance of case studies, prototypes, and short-term evaluations rather than longitudinal or large-sample studies.
Ensuring scientific validity of game models, scaling co-design processes, measuring real-world behavioral change, and aligning incentives (policy/subsidies, markets) are remaining challenges to using serious games for DST uptake.
Chapter discussion of limitations and gaps identified in the reviewed literature; absence or sparsity of long-term validation studies and large-scale co-design implementations documented in existing research.
Current uptake of DSTs for net zero remains limited because of issues of trust, usability, lack of evidence linking actions to farm profitability, and poor integration into farmer workflows.
Literature synthesis, qualitative interviews and surveys, case studies documenting low adoption and barriers; multiple practice reports and studies cited in the chapter. Many studies report limited or uneven adoption across contexts.
Using LLM participants without rigorous validation can bias external validity and causal inference in economic research.
Review documents cognitive misalignments and distortions that can bias estimated behaviors, preferences, or treatment effects; authors highlight this as a risk.