Evidence (2608 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Skills Training
Remove filter
Across heterogeneous learners, a common broadcast curriculum can be slower than personalized instruction by a factor linear in the number of learner types.
Theoretical comparative result in the model (analysis of broadcast vs personalized curricula across heterogeneous learner types; abstract states factor linear in number of types).
Significant limitations emerged in case law citations, with most cited cases being non-existent or incorrectly referenced.
Authors' review of the case citations produced by the four AI engines for the single transcript, finding many citations were fabricated or misreferenced.
Initial adaptation challenges to AI integration were identified among employees.
Participants in semi-structured interviews (n=12) reported initial difficulties adapting to AI tools; themes relating to early adaptation challenges were coded.
There is a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence.
Conceptual/theoretical claim derived from the framework and discussion in the paper (argument and mathematical framing), no empirical sample or longitudinal data presented in the excerpt.
Rather than broad job losses, evidence points to a reallocation at the entry level: AI automates tasks typically assigned to junior staff, shifting the nature of entry-level roles.
Synthesis of firm- and task-level empirical studies reported in the brief documenting automation of routine/junior tasks and changes in job-task composition; specific sample sizes vary by cited study and are not provided in the brief.
The gap between informal natural language requirements and precise program behavior (the 'intent gap') has always plagued software engineering, but AI-generated code amplifies it to an unprecedented scale.
Conceptual claim and argumentation in the paper; presented as an observed escalation in the scale of the existing 'intent gap' due to AI code generation. No quantitative evidence or sample size given in the excerpt.
Some declines (in self-efficacy and meaningfulness) from passive AI use persist after participants return to manual work.
Within-experiment assessment of outcomes after participants returned to manual (no-AI) tasks following the AI-use manipulation in the pre-registered experiment (N = 269); reported persistent reductions in self-efficacy and meaningfulness for the passive condition.
Passive use of AI reduces perceived meaningfulness of work.
Pre-registered experiment (N = 269) with self-reported measure of work meaningfulness; passive-copy condition showed lower meaningfulness ratings than No-AI and Active-collaboration conditions.
Passive use of AI reduces psychological ownership of the produced outputs.
Same pre-registered experiment (N = 269). Participants in the passive-copy AI condition reported lower psychological ownership of their outputs (self-report scales) relative to No-AI and Active-collaboration conditions.
Passive use of AI (copying AI-generated output) reduces workers' self-efficacy.
Pre-registered between-subjects experiment (N = 269) using occupation-specific writing tasks. Participants assigned to a passive-copy AI condition reported lower self-efficacy (self-reported confidence to complete tasks without AI) compared to the No-AI (manual) and Active-collaboration conditions.
Provider incentives may be misaligned (e.g., optimizing for engagement or test performance instead of durable learning), requiring contracts, regulation, or purchaser design to align incentives.
Consensus from interdisciplinary workshop (50 scholars) highlighting incentive risks and market-design considerations; descriptive, not empirical.
Extensive learner data needed to personalize AI feedback raises privacy and data-governance concerns (consent, storage, usage).
Qualitative consensus from workshop participants (50 scholars) noting data-collection requirements and governance risks; no empirical governance studies included.
Automated feedback may not capture pedagogical nuances expert teachers use (motivation, socio-emotional cues, complex reasoning), limiting pedagogical fit.
Expert syntheses from the workshop of 50 scholars highlighting limits of automation relative to expert teacher judgment; no empirical comparisons presented.
AI-generated feedback can be incorrect, misleading, or misaligned with learning objectives; assessing feedback quality is nontrivial.
Repeated concern raised across workshop participants (50 scholars) in qualitative synthesis; noted as a substantive risk and open challenge rather than empirically quantified here.
Adoption requires hardware (VR headsets, capable GPUs) and integration effort, implying upfront capital expenditure for labs/observatories.
Paper explicitly notes hardware requirements (VR headsets, capable GPUs) and integration effort as part of adoption considerations; common-sense assessment of required capital.
When identical replies are labeled as coming from AI rather than from a human, recipients report feeling less heard and less validated (an attribution effect).
Controlled attribution labeling experiment within the study: identical replies presented with different source labels (AI vs. human) and recipient-rated perceptions of being heard/validated measured.
There are limited randomized controlled trials or longitudinal evaluations; few studies measure patient-relevant outcomes or economic impacts.
Literature synthesis noting scarcity of RCTs and long-term observational studies, and absence of widespread patient-outcome and cost-effectiveness evaluations in existing publications.
Many published studies focus on standalone algorithm accuracy rather than clinician–AI joint performance in routine workflows.
Review of the literature categorizing study designs (preponderance of algorithm development/validation studies, fewer reader-in-the-loop, simulation, or deployment studies).
Ethical and legal issues—patient privacy, algorithmic bias, intellectual property, and equitable access—pose risks to AI deployment in drug development.
Ethics and legal analyses, policy reports, and documented case examples collated in the review that identify these recurring concerns.
Regulatory uncertainty about validation standards and liability for AI tools raises investment risk and may slow deployment.
Regulatory and policy reports included in the narrative review describing evolving standards and open questions about validation, explainability, and liability for ML-based tools.
Adoption of AI in drug R&D requires high upfront investment in data curation, compute infrastructure, and specialized talent.
Industry reports and economic analyses summarized in the review reporting capital and operational needs for building AI capabilities; qualitative synthesis rather than quantitative costing across firms.
Limited transparency and interpretability of many AI algorithms (black-box models) complicate clinical and regulatory trust and adoption.
Regulatory reports, methodological critiques, and case examples in the review highlighting interpretability concerns and their impact on clinical/regulatory acceptance.
Performance of AI models in drug R&D depends on large, high-quality, and representative biomedical datasets; dataset bias or gaps substantially undermine model performance and generalizability.
Methodological literature and case studies cited in the review documenting failures or limited generalization when training data are biased, sparse, or non-representative; thematic synthesis rather than pooled quantification.
Predictions from AI depend on data quality and coverage and still require experimental (wet-lab) validation.
Discussion of early failures and limits in case studies and expert observations within the narrative review; methodological argument about dependence of ML models on input data.
High-quality, standardized, interoperable data (clean, annotated, connected across modalities) is a critical limiting factor for translating AI capability into sustained impact.
Conceptual emphasis and domain knowledge argument in the editorial; no empirical measurement of data quality's causal effect included.
At the question level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best.
Question-level analysis from the randomized experiment comparing cases where chatbot suggestions were incorrect versus control; paper reports a ~66% reduction in accuracy on easy questions when chatbot suggestions were incorrect (exact denominators and statistics not provided in the excerpt).
Gaps in infrastructure readiness, digital awareness, and inclusive policy frameworks hinder equitable AI adoption among micro‑enterprises.
Cross‑study synthesis of barriers identified across the 55 included articles; infrastructural, awareness, and policy barriers are explicitly reported as recurring themes.
Only 24.4% of at-risk workers have viable transition pathways, where 'viable' is defined as sharing at least 3 skills and achieving at least 50% skill transfer.
Analysis of job-to-job transitions on the validated knowledge graph using an operational definition of viable pathways (>=3 shared skills and >=50% skill transfer); proportion of at-risk workers meeting that criterion reported as 24.4% (underlying at-risk worker count not given in the excerpt).
20.9% of jobs in the dataset face high automation risk.
Risk classification applied to the jobs represented in the knowledge graph (sample size: 9,978 job postings); proportion of jobs labeled as 'high automation risk' is reported as 20.9%.
AI notably reduces customer stability in sports enterprises (SE).
Empirical estimation using the DML model on the same panel dataset of 45 Chinese listed SEs (2012–2023); authors report a statistically significant negative effect of AI on customer stability.
The sample is limited to Chinese A-share-listed design enterprises (2014–2023), which may limit generalizability to small and medium-sized enterprises (SMEs) or firms in other countries/regions.
Study sample description: A-share-listed design-oriented enterprises in China between 2014 and 2023; authors explicitly note this as a limitation.
Using TFP as a proxy for project efficiency aggregates effects at the firm level and therefore lacks micro-level insight into specific project workflows or design iteration processes.
Methodological limitation acknowledged in the paper: TFP is used as a firm-level proxy and the dataset does not include micro-level project workflow or iteration logs.
Human judgment is constrained by bounded rationality, cognitive biases, and information-processing limitations.
Cited as established findings from prior research across decision sciences and related fields (extensive literature evidence referenced; no new empirical data in this paper's abstract).
Ireland exhibits the largest gender gap in advanced digital task use: approximately 44% of men versus 18% of women perform advanced digital tasks — a 26 percentage point gap, close to double the European average.
Country-level descriptive statistics from ESJS for Ireland reporting shares of men and women performing advanced digital tasks. (Exact Irish sample size not provided in the excerpt.)
Across Europe, women are around 15 percentage points less likely than men to perform advanced digital tasks in their jobs.
Empirical analysis of the European Skills and Jobs Survey (ESJS) (Cedefop, 2021) using regression-based estimates and descriptive statistics across European countries. (Exact sample size and country count not provided in the excerpt.)
AI substitutes many routine tasks, including both manual and cognitive/rule-based activities, disproportionately affecting middle-skill occupations.
Task-based substitution reasoning within SBTC framework and cross-sectoral task analysis. The paper provides conceptual synthesis rather than presenting new microdata or quantified task-level estimates.
Key implementation challenges include data quality and integration, model interpretability, cybersecurity and privacy, regulatory/compliance uncertainty, skills gaps among accounting professionals, and implementation costs.
Identified by the paper through literature review and practitioner reports; these are presented as recurring barriers rather than quantified with a specific sample.
Nearby business closures increased perceived impediments to growth, amplifying pessimism via local exposure (social contagion effect).
Empirical comparison of perceived impediments to growth across variation in local exposure to nearby business closures (survey measures of local closures correlated with respondents' perceived impediments), using the cross-country survey sample.
Two regimes emerge: an inequality-decreasing regime when AI behaves like a broadly available commodity technology or when labor-market institutions share rents widely (high ξ).
Model regime characterization and calibrated counterfactuals showing falling wage dispersion and ΔGini under commodity-like AI assumptions or higher rent-sharing elasticity.
Generative AI compresses within-task skill differences (reduces dispersion of individual task performance).
Theoretical task-based model and calibrated quantitative simulations (Method of Simulated Moments matching six empirical moments) showing reductions in within-task performance dispersion after introducing AI technology.
No evaluated program reported Kirkpatrick‑Barr level‑4 outcomes (organizational change, patient outcomes, or sustained metacognitive mastery).
Reviewers mapped reported outcomes from all 27 included programs and found none that demonstrated organizational-level impacts or patient‑level outcomes (level 4).
Because the design is cross-sectional and sampling purposive/geographically constrained, causal inference and generalizability are limited.
Authors' stated limitations in the summary: cross-sectional design and purposive, geographically constrained sample (Karnataka, India).
Workplace stress is associated with lower employee retention.
PLS-SEM analysis on a cross-sectional survey of N = 350 pharmaceutical workers in Karnataka, India (purposive sampling). Reported direct path: Stress → Retention, β = 0.321, p < 0.001. (Note: the paper interprets this as stress reducing retention; sign/coding conventions of the variables are not detailed in the summary.)
Automated compliance and credentialing systems raise governance issues (auditability, appeals mechanisms) and risk incorrect automated deregistration if not properly governed.
Governance and algorithmic-risk discussion in the paper; logical argumentation rather than case-based evidence.
The paper models career progression as a continuous function and treats certification gaps as discontinuities that impede labour-market mobility.
Mathematical/conceptual modeling described in the methods (career-progression-as-continuous-function approach); this is a modeling choice reported in the paper rather than an empirical finding.
There is limited long-term impact evidence and few system-level assessments of AI in developing-country agriculture.
Authors' methodological caveat based on the temporal scope and types of studies available in the >60-study review.
The evidence base is skewed toward pilots and high‑performer contexts; there is a lack of long‑panel, multi‑project longitudinal studies to validate typical returns and scalability.
Authors' assessment of evidence types in the 160 studies: mix of conceptual papers, case studies, pilots, and only limited larger empirical evaluations.
Opacity, bias, and errors in AI systems demand auditing, standards, and governance (algorithmic accountability) to ensure trustworthy assessment.
Synthesis of literature on algorithmic bias and accountability plus policy analysis recommending audits and standards; supported by country cases that discuss governance concerns.
Student data used by AI vendors raises risks around consent, reuse, commercial exploitation, and other data-privacy concerns.
Policy analysis and literature on data governance, privacy law debates; examples from national policy documents in the comparative cases. No original data on breaches or misuse presented.
Limitations of the study include reliance on self-reported perceptions (subject to response and survivorship bias), lack of experimental/causal identification, potential non-representative sample, and cross-sectional design limiting inference about long-term productivity effects.
Authors' stated limitations in the paper summary.