Evidence (2608 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Skills Training Remove filter

Across heterogeneous learners, a common broadcast curriculum can be slower than personalized instruction by a factor linear in the number of learner types.

Theoretical comparative result in the model (analysis of broadcast vs personalized curricula across heterogeneous learner types; abstract states factor linear in number of types).

high negative A Mathematical Theory of Understanding speed of instruction / time to learn under broadcast curriculum vs personalized ...

Significant limitations emerged in case law citations, with most cited cases being non-existent or incorrectly referenced.

Authors' review of the case citations produced by the four AI engines for the single transcript, finding many citations were fabricated or misreferenced.

high negative Robot Wingman: Using AI to Assess an Employment Termination accuracy of case law citations (error rate / hallucination rate)

Initial adaptation challenges to AI integration were identified among employees.

Participants in semi-structured interviews (n=12) reported initial difficulties adapting to AI tools; themes relating to early adaptation challenges were coded.

high negative AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... initial adaptation challenges to AI

There is a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence.

Conceptual/theoretical claim derived from the framework and discussion in the paper (argument and mathematical framing), no empirical sample or longitudinal data presented in the excerpt.

high negative Cognitive Amplification vs Cognitive Delegation in Human-AI ... long-term human cognitive competence

Rather than broad job losses, evidence points to a reallocation at the entry level: AI automates tasks typically assigned to junior staff, shifting the nature of entry-level roles.

Synthesis of firm- and task-level empirical studies reported in the brief documenting automation of routine/junior tasks and changes in job-task composition; specific sample sizes vary by cited study and are not provided in the brief.

high negative AI, Productivity, and Labor Markets: A Review of the Empiric... automation of entry-level/junior tasks and changes to entry-level job content

The gap between informal natural language requirements and precise program behavior (the 'intent gap') has always plagued software engineering, but AI-generated code amplifies it to an unprecedented scale.

Conceptual claim and argumentation in the paper; presented as an observed escalation in the scale of the existing 'intent gap' due to AI code generation. No quantitative evidence or sample size given in the excerpt.

high negative Intent Formalization: A Grand Challenge for Reliable Coding ... mismatch between intended and actual program behavior (intent gap) / resulting c...

Some declines (in self-efficacy and meaningfulness) from passive AI use persist after participants return to manual work.

Within-experiment assessment of outcomes after participants returned to manual (no-AI) tasks following the AI-use manipulation in the pre-registered experiment (N = 269); reported persistent reductions in self-efficacy and meaningfulness for the passive condition.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy; perceived meaningfulness (measured post-return to manual work)

Passive use of AI reduces perceived meaningfulness of work.

Pre-registered experiment (N = 269) with self-reported measure of work meaningfulness; passive-copy condition showed lower meaningfulness ratings than No-AI and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... perceived meaningfulness of work

Passive use of AI reduces psychological ownership of the produced outputs.

Same pre-registered experiment (N = 269). Participants in the passive-copy AI condition reported lower psychological ownership of their outputs (self-report scales) relative to No-AI and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... psychological ownership of outputs

Passive use of AI (copying AI-generated output) reduces workers' self-efficacy.

Pre-registered between-subjects experiment (N = 269) using occupation-specific writing tasks. Participants assigned to a passive-copy AI condition reported lower self-efficacy (self-reported confidence to complete tasks without AI) compared to the No-AI (manual) and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy (confidence to complete tasks without AI)

Provider incentives may be misaligned (e.g., optimizing for engagement or test performance instead of durable learning), requiring contracts, regulation, or purchaser design to align incentives.

Consensus from interdisciplinary workshop (50 scholars) highlighting incentive risks and market-design considerations; descriptive, not empirical.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... provider optimization metrics (engagement/test performance) vs. durable learning...

Extensive learner data needed to personalize AI feedback raises privacy and data-governance concerns (consent, storage, usage).

Qualitative consensus from workshop participants (50 scholars) noting data-collection requirements and governance risks; no empirical governance studies included.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... volume/type of learner data collected; privacy risk indicators; compliance with ...

Automated feedback may not capture pedagogical nuances expert teachers use (motivation, socio-emotional cues, complex reasoning), limiting pedagogical fit.

Expert syntheses from the workshop of 50 scholars highlighting limits of automation relative to expert teacher judgment; no empirical comparisons presented.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... coverage of socio-emotional and complex-reasoning cues in feedback; corresponden...

AI-generated feedback can be incorrect, misleading, or misaligned with learning objectives; assessing feedback quality is nontrivial.

Repeated concern raised across workshop participants (50 scholars) in qualitative synthesis; noted as a substantive risk and open challenge rather than empirically quantified here.

high negative The Future of Feedback: How Can AI Help Transform Feedback t... feedback factual correctness; alignment with stated learning objectives; rate of...

Adoption requires hardware (VR headsets, capable GPUs) and integration effort, implying upfront capital expenditure for labs/observatories.

Paper explicitly notes hardware requirements (VR headsets, capable GPUs) and integration effort as part of adoption considerations; common-sense assessment of required capital.

high negative iDaVIE v1.0: A virtual reality tool for interactive analysis... upfront capital expenditure and integration effort required for adoption

When identical replies are labeled as coming from AI rather than from a human, recipients report feeling less heard and less validated (an attribution effect).

Controlled attribution labeling experiment within the study: identical replies presented with different source labels (AI vs. human) and recipient-rated perceptions of being heard/validated measured.

high negative Practicing with Language Models Cultivates Human Empathic Co... recipient-rated feelings of being heard and validated

There are limited randomized controlled trials or longitudinal evaluations; few studies measure patient-relevant outcomes or economic impacts.

Literature synthesis noting scarcity of RCTs and long-term observational studies, and absence of widespread patient-outcome and cost-effectiveness evaluations in existing publications.

high negative Human-AI interaction and collaboration in radiology: from co... number of RCTs/longitudinal studies, frequency of patient outcome and economic o...

Many published studies focus on standalone algorithm accuracy rather than clinician–AI joint performance in routine workflows.

Review of the literature categorizing study designs (preponderance of algorithm development/validation studies, fewer reader-in-the-loop, simulation, or deployment studies).

high negative Human-AI interaction and collaboration in radiology: from co... proportion of studies reporting standalone algorithm metrics versus those report...

Ethical and legal issues—patient privacy, algorithmic bias, intellectual property, and equitable access—pose risks to AI deployment in drug development.

Ethics and legal analyses, policy reports, and documented case examples collated in the review that identify these recurring concerns.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... ethical/legal risk incidence; privacy breaches; bias outcomes; access inequities

Regulatory uncertainty about validation standards and liability for AI tools raises investment risk and may slow deployment.

Regulatory and policy reports included in the narrative review describing evolving standards and open questions about validation, explainability, and liability for ML-based tools.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... regulatory clarity; investment risk and deployment timelines

Adoption of AI in drug R&D requires high upfront investment in data curation, compute infrastructure, and specialized talent.

Industry reports and economic analyses summarized in the review reporting capital and operational needs for building AI capabilities; qualitative synthesis rather than quantitative costing across firms.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... fixed upfront costs (data curation, compute, hiring/training)

Limited transparency and interpretability of many AI algorithms (black-box models) complicate clinical and regulatory trust and adoption.

Regulatory reports, methodological critiques, and case examples in the review highlighting interpretability concerns and their impact on clinical/regulatory acceptance.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... clinical/regulatory acceptance, trust, and adoption rates; explainability metric...

Performance of AI models in drug R&D depends on large, high-quality, and representative biomedical datasets; dataset bias or gaps substantially undermine model performance and generalizability.

Methodological literature and case studies cited in the review documenting failures or limited generalization when training data are biased, sparse, or non-representative; thematic synthesis rather than pooled quantification.

high negative From Algorithm to Medicine: AI in the Discovery and Developm... model performance/generalizability across populations and contexts

Predictions from AI depend on data quality and coverage and still require experimental (wet-lab) validation.

Discussion of early failures and limits in case studies and expert observations within the narrative review; methodological argument about dependence of ML models on input data.

high negative Learning from the successes and failures of early artificial... predictive validity of computational models / need for experimental validation

High-quality, standardized, interoperable data (clean, annotated, connected across modalities) is a critical limiting factor for translating AI capability into sustained impact.

Conceptual emphasis and domain knowledge argument in the editorial; no empirical measurement of data quality's causal effect included.

high negative AI as the Catalyst for a New Paradigm in Biomedical Research ability to translate AI capability into sustained impact (dependent on data qual...

At the question level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best.

Question-level analysis from the randomized experiment comparing cases where chatbot suggestions were incorrect versus control; paper reports a ~66% reduction in accuracy on easy questions when chatbot suggestions were incorrect (exact denominators and statistics not provided in the excerpt).

high negative LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy on easy questions when presented with incorrect chatbot sugg...

Gaps in infrastructure readiness, digital awareness, and inclusive policy frameworks hinder equitable AI adoption among micro‑enterprises.

Cross‑study synthesis of barriers identified across the 55 included articles; infrastructural, awareness, and policy barriers are explicitly reported as recurring themes.

high negative Role of AI in Enhancing Work Efficiency and Opportunities fo... barriers to AI adoption (infrastructure readiness, digital awareness, policy inc...

Only 24.4% of at-risk workers have viable transition pathways, where 'viable' is defined as sharing at least 3 skills and achieving at least 50% skill transfer.

Analysis of job-to-job transitions on the validated knowledge graph using an operational definition of viable pathways (>=3 shared skills and >=50% skill transfer); proportion of at-risk workers meeting that criterion reported as 24.4% (underlying at-risk worker count not given in the excerpt).

high negative Graph-Based Analysis of AI-Driven Labor Market Transitions: ... percentage of at-risk workers with viable transition pathways (per defined thres...

20.9% of jobs in the dataset face high automation risk.

Risk classification applied to the jobs represented in the knowledge graph (sample size: 9,978 job postings); proportion of jobs labeled as 'high automation risk' is reported as 20.9%.

high negative Graph-Based Analysis of AI-Driven Labor Market Transitions: ... proportion of jobs classified as high automation risk

AI notably reduces customer stability in sports enterprises (SE).

Empirical estimation using the DML model on the same panel dataset of 45 Chinese listed SEs (2012–2023); authors report a statistically significant negative effect of AI on customer stability.

high negative Can Artificial Intelligence Enhance the Stability of Supply ... customer stability (component of supply chain stability)

The sample is limited to Chinese A-share-listed design enterprises (2014–2023), which may limit generalizability to small and medium-sized enterprises (SMEs) or firms in other countries/regions.

Study sample description: A-share-listed design-oriented enterprises in China between 2014 and 2023; authors explicitly note this as a limitation.

high negative AI-driven design management: enhancing organizational produc... External validity / generalizability of results

Using TFP as a proxy for project efficiency aggregates effects at the firm level and therefore lacks micro-level insight into specific project workflows or design iteration processes.

Methodological limitation acknowledged in the paper: TFP is used as a firm-level proxy and the dataset does not include micro-level project workflow or iteration logs.

high negative AI-driven design management: enhancing organizational produc... Granularity of project-efficiency measurement (limitation of TFP proxy)

Human judgment is constrained by bounded rationality, cognitive biases, and information-processing limitations.

Cited as established findings from prior research across decision sciences and related fields (extensive literature evidence referenced; no new empirical data in this paper's abstract).

high negative Reframing Organizational Decision-Making in the Age of Artif... human judgment accuracy/quality and cognitive processing capacity

Ireland exhibits the largest gender gap in advanced digital task use: approximately 44% of men versus 18% of women perform advanced digital tasks — a 26 percentage point gap, close to double the European average.

Country-level descriptive statistics from ESJS for Ireland reporting shares of men and women performing advanced digital tasks. (Exact Irish sample size not provided in the excerpt.)

high negative Squandered skills? Bridging the digital gender skills gap fo... Share (%) of men and women in Ireland performing advanced digital tasks; gender ...

Across Europe, women are around 15 percentage points less likely than men to perform advanced digital tasks in their jobs.

Empirical analysis of the European Skills and Jobs Survey (ESJS) (Cedefop, 2021) using regression-based estimates and descriptive statistics across European countries. (Exact sample size and country count not provided in the excerpt.)

high negative Squandered skills? Bridging the digital gender skills gap fo... Probability / share of workers performing advanced digital tasks (binary indicat...

AI substitutes many routine tasks, including both manual and cognitive/rule-based activities, disproportionately affecting middle-skill occupations.

Task-based substitution reasoning within SBTC framework and cross-sectoral task analysis. The paper provides conceptual synthesis rather than presenting new microdata or quantified task-level estimates.

high negative Artificial Intelligence, Automation, and Employment Dynamics... employment and wages in routine / middle-skill occupations; task displacement

Key implementation challenges include data quality and integration, model interpretability, cybersecurity and privacy, regulatory/compliance uncertainty, skills gaps among accounting professionals, and implementation costs.

Identified by the paper through literature review and practitioner reports; these are presented as recurring barriers rather than quantified with a specific sample.

high negative Role of Artificial Intelligence in the Accounting Sector incidence/severity of implementation barriers (data quality scores, integration ...

Nearby business closures increased perceived impediments to growth, amplifying pessimism via local exposure (social contagion effect).

Empirical comparison of perceived impediments to growth across variation in local exposure to nearby business closures (survey measures of local closures correlated with respondents' perceived impediments), using the cross-country survey sample.

high negative Peer Influence and Individual Motivations in Global Small Bu... perceived impediments to growth

Two regimes emerge: an inequality-decreasing regime when AI behaves like a broadly available commodity technology or when labor-market institutions share rents widely (high ξ).

Model regime characterization and calibrated counterfactuals showing falling wage dispersion and ΔGini under commodity-like AI assumptions or higher rent-sharing elasticity.

high negative When AI Levels the Playing Field: Skill Homogenization, Asse... wage dispersion and aggregate inequality (ΔGini)

Generative AI compresses within-task skill differences (reduces dispersion of individual task performance).

Theoretical task-based model and calibrated quantitative simulations (Method of Simulated Moments matching six empirical moments) showing reductions in within-task performance dispersion after introducing AI technology.

high negative When AI Levels the Playing Field: Skill Homogenization, Asse... within-task performance dispersion (skill/ability variance within a task)

No evaluated program reported Kirkpatrick‑Barr level‑4 outcomes (organizational change, patient outcomes, or sustained metacognitive mastery).

Reviewers mapped reported outcomes from all 27 included programs and found none that demonstrated organizational-level impacts or patient‑level outcomes (level 4).

high negative Assessing the effectiveness of artificial intelligence educa... Kirkpatrick‑Barr level‑4 outcomes (organizational impact, patient outcomes, meta...

Because the design is cross-sectional and sampling purposive/geographically constrained, causal inference and generalizability are limited.

Authors' stated limitations in the summary: cross-sectional design and purposive, geographically constrained sample (Karnataka, India).

high negative AI-driven stress management and performance optimization: A ... generalizability / causal inference (methodological limitation)

Workplace stress is associated with lower employee retention.

PLS-SEM analysis on a cross-sectional survey of N = 350 pharmaceutical workers in Karnataka, India (purposive sampling). Reported direct path: Stress → Retention, β = 0.321, p < 0.001. (Note: the paper interprets this as stress reducing retention; sign/coding conventions of the variables are not detailed in the summary.)

high negative AI-driven stress management and performance optimization: A ... employee retention (retention intent/behavior)

Automated compliance and credentialing systems raise governance issues (auditability, appeals mechanisms) and risk incorrect automated deregistration if not properly governed.

Governance and algorithmic-risk discussion in the paper; logical argumentation rather than case-based evidence.

high negative <i>Electrotechnical education, institutional complianc... rate of incorrect automated decisions, existence and effectiveness of appeal pro...

The paper models career progression as a continuous function and treats certification gaps as discontinuities that impede labour-market mobility.

Mathematical/conceptual modeling described in the methods (career-progression-as-continuous-function approach); this is a modeling choice reported in the paper rather than an empirical finding.

high negative <i>Electrotechnical education, institutional complianc... labour-market mobility / continuity of career progression (in the conceptual mod...

There is limited long-term impact evidence and few system-level assessments of AI in developing-country agriculture.

Authors' methodological caveat based on the temporal scope and types of studies available in the >60-study review.

high negative A systematic review of the economic impact of artificial int... presence/absence of long-term impact evaluations and system-level assessments

The evidence base is skewed toward pilots and high‑performer contexts; there is a lack of long‑panel, multi‑project longitudinal studies to validate typical returns and scalability.

Authors' assessment of evidence types in the 160 studies: mix of conceptual papers, case studies, pilots, and only limited larger empirical evaluations.

high negative Digital Twins Across the Asset Lifecycle: Technical, Organis... representativeness and longitudinal robustness of evidence

Opacity, bias, and errors in AI systems demand auditing, standards, and governance (algorithmic accountability) to ensure trustworthy assessment.

Synthesis of literature on algorithmic bias and accountability plus policy analysis recommending audits and standards; supported by country cases that discuss governance concerns.

high negative The Future of Assessment: Rethinking Evaluation in an AI-Ass... algorithmic fairness, transparency, and reliability

Student data used by AI vendors raises risks around consent, reuse, commercial exploitation, and other data-privacy concerns.

Policy analysis and literature on data governance, privacy law debates; examples from national policy documents in the comparative cases. No original data on breaches or misuse presented.

high negative The Future of Assessment: Rethinking Evaluation in an AI-Ass... privacy risks and governance of student data

Limitations of the study include reliance on self-reported perceptions (subject to response and survivorship bias), lack of experimental/causal identification, potential non-representative sample, and cross-sectional design limiting inference about long-term productivity effects.

Authors' stated limitations in the paper summary.

high negative Artificial Intelligence as a Catalyst for Innovation in Soft... validity threats (self-report bias, lack of causal design) as reported by author...

« Prev 1 2 3 … 7 8 9 … 52 53 Next »