Evidence (2608 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Skills Training Remove filter

The study was conducted by the Mohammed bin Rashid School of Government’s Future of Government Center, in collaboration with global AI pioneers.

Authorship and collaboration statement in the report.

high null result Charting AI Governance Future in the Arab Region: A Policy R... institutional authorship and collaboration on the study

The report highlights the key findings of a field study covering ten Arab countries to explore the realities and challenges of AI governance.

Report statement describing the geographic scope of the field study (explicitly: ten Arab countries).

high null result Charting AI Governance Future in the Arab Region: A Policy R... geographic coverage of the field study (number of countries)

The recommendations are based on regional research that included hundreds of leaders active in the AI domains, from the public and private sectors.

Report statement claiming participant base of the underlying research (described as 'hundreds of leaders').

high null result Charting AI Governance Future in the Arab Region: A Policy R... scope and participant coverage of the underlying research

Zero-shot baselines and standard retrieval stagnate around 50-60% accuracy across model generations on the graduate-level final exam.

Pilot study reported on a full graduate-level final exam comparing zero-shot and standard retrieval baselines across model generations; reported accuracy range given as ~50-60%. Exact number of exam questions or models compared not stated.

high null result From 50% to Mastery in 3 Days: A Low-Resource SOP for Locali... exam accuracy (percentage correct)

The cooperative video game KeyWe, with a scripted agent, served as a valid testbed for studying human-agent teamwork and the effects of the training intervention.

Methodological choice: KeyWe was used as the experimental environment and the agent behavior was scripted for consistency; all behavioral and performance measures were collected within this setting.

high null result Teaming Up With an AI Agent: Training Humans to Develop Huma... experimental_testbed_description

Half of the participants received the teamwork training and half did not (between-subjects comparison).

Experimental design description: participants were split into trained and untrained groups (50/50).

high null result Teaming Up With an AI Agent: Training Humans to Develop Huma... experimental_assignment (trained vs. untrained)

The model yields two limits on the speed of learning and adoption: a structural limit determined by prerequisite reachability and an epistemic limit determined by uncertainty about the target.

Theoretical result stated in the paper (model-derived identification of two distinct limiting factors on learning speed).

high null result A Mathematical Theory of Understanding speed of learning / adoption

Teaching is modeled as sequential communication with a latent target.

Modeling assumption explicitly stated in the paper (formalization of teaching in the theoretical framework).

high null result A Mathematical Theory of Understanding model specification (teaching process)

The paper models the learner as a mind: an abstract learning system characterized by a prerequisite structure over concepts.

Modeling assumption explicitly stated in the paper (definition of the 'mind' in the theoretical model).

high null result A Mathematical Theory of Understanding model specification (representation of learner)

This Article presents the results of an experiment in which a transcript of a hypothetical client interview involving potential disability discrimination, retaliation, and wrongful termination claims was submitted to each AI system, with prompts requesting identification and assessment of viable legal theories.

Methodological description of the experiment: one hypothetical client interview transcript fed to each of four AI engines with prompts to identify and assess legal theories.

high null result Robot Wingman: Using AI to Assess an Employment Termination experimental procedure (input and prompts)

Despite fears of mass unemployment, aggregate labor-market data through 2025 show limited labor-market disruption from generative AI.

Review of aggregate employment and labor-market studies and macro-level data through 2025 cited in the brief; methods include analyses of employment statistics and macro labor indicators (no single sample size reported).

high null result AI, Productivity, and Labor Markets: A Review of the Empiric... aggregate employment / labor-market disruption

Open research challenges that define the research agenda include scaling beyond benchmarks, achieving compositionality over changes, metrics for validating specifications, handling rich logics, and designing human-AI specification interactions.

Authors' explicit enumeration of open problems and a proposed multi-disciplinary research agenda; presented as expert opinion rather than empirical finding.

high null result Intent Formalization: A Grand Challenge for Reliable Coding ... progress on research questions (research agenda advancement)

Self-concordance did not mediate the AI-over-questionnaire effect on goal progress.

Preplanned mediation model reported in the paper found no evidence that self-concordance mediated the AI vs questionnaire effect on goal progress; reported as non-significant in the preregistered analysis.

high null result AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (mediator tested: self-concordance, self-report)

Compared with the matched written-reflection questionnaire, the AI did not significantly improve overall goal progress.

Preplanned comparison within the preregistered RCT; reported non-significant difference between AI and written-reflection condition on overall goal progress at two-week follow-up (no significant p-value reported in the summary).

high null result AI-Assisted Goal Setting Improves Goal Progress Through Soci... goal progress (self-reported goal progress at two-week follow-up)

We conducted a preregistered three-arm randomized controlled trial (RCT) comparing an AI career coach ('Leon,' powered by Claude Sonnet), a matched structured written questionnaire, and a no-support control.

Preregistered RCT reported in the paper; three arms as described; total sample size N = 517; participants randomized to AI coach, written-reflection questionnaire, or no-support control; outcomes assessed at two-week follow-up.

high null result AI-Assisted Goal Setting Improves Goal Progress Through Soci... trial design / allocation and follow-up measurement of goal-related outcomes at ...

Research agenda: empirical microdata on managerial time use, task-level automation, performance outcomes, and wage impacts are needed to quantify substitution versus complementarity and to evaluate human-in-the-loop designs' effects on firm performance and distributional outcomes.

Explicit methodological recommendation within the paper; identifies gaps due to the paper's conceptual (non-empirical) approach.

high null result Comparative analysis of strategic vs. computational thinking... availability and use of microdata on managerial tasks, automation, firm performa...

Practical recommendations for firms and policymakers include investing in training for AI curation/evaluation/coordination, experimenting with decentralised decision rights and governance safeguards, and monitoring competitive dynamics related to model/platform providers.

Policy and practitioner takeaways explicitly presented in the discussion/implications sections, deriving from the conceptual framework and mapped literature.

high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended organisational and policy actions

The paper recommends a research agenda for AI economists: causal microeconometric studies (DiD, IVs, RCTs), structural models with hybrid human–AI agents, measurement work on GenAI use, distributional analysis and policy evaluation.

Explicit recommendations listed in the implications and research agenda sections; logical follow‑on from bibliometric findings about gaps in causal and measurement evidence.

high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended methodological directions for future empirical and theoretical resea...

Bibliometric mapping profiles the intellectual structure and evolution of the field but does not establish causal effects of GenAI on organisational outcomes.

Methodological limitation explicitly stated in the paper; bibliometric approach (co‑word, citation, thematic mapping) is descriptive and historical in scope.

high null result Generative AI and the algorithmic workplace: a bibliometric ... methodological limitation (inability to infer causality from bibliometric mappin...

Co‑word and thematic analyses reveal six coherent conceptual clusters that bridge technical AI topics (e.g., LLMs, GANs) with managerial themes (e.g., autonomy, coordination, decision‑making).

Thematic mapping and co‑word network analysis performed on the 212‑paper corpus; identification of six clusters reported in results.

high null result Generative AI and the algorithmic workplace: a bibliometric ... number and thematic composition of conceptual clusters (six clusters linking tec...

Bibliometric and conceptual tools (VOSviewer, Bibliometrix) were used to identify performance trends, co‑word structures, thematic maps, and conceptual evolution in the GenAI–organisation literature.

Methods section: use of VOSviewer for network visualization and Bibliometrix for bibliometric statistics, co‑word analysis, thematic mapping and Sankey thematic evolution.

high null result Generative AI and the algorithmic workplace: a bibliometric ... types of bibliometric analyses applied (performance trends, co‑word structures, ...

The study analysed a corpus of 212 Scopus‑indexed publications covering 2018–2025 to map emergent literature on Generative AI and organisational change.

Bibliometric dataset constructed from Scopus; sample size = 212 peer‑reviewed articles; time window 2018–2025; analyses performed with Bibliometrix and VOSviewer.

high null result Generative AI and the algorithmic workplace: a bibliometric ... size and timeframe of bibliometric corpus (number of publications, 2018–2025)

Research agenda: causal studies (panel data, quasi-experiments) are needed to estimate effects of AI exposure on employment outcomes and to evaluate retraining/income-support interventions for pre-retirement populations.

Authors’ stated recommendation based on limits of cross-sectional regression results from the n=889 survey and the identified need to move from association to causation.

high null result Analysis of the Impact of Artificial Intelligence on Middle-... N/A

Study limitations: cross-sectional design, self-reported intentions, potential unobserved confounders, and limited generalizability to only three cities (Beijing, Guangzhou, Lanzhou).

Explicit methodological statements in the paper describing data and design: cross-sectional survey of 889 respondents from three cities and reliance on self-reported employment intentions.

high null result Analysis of the Impact of Artificial Intelligence on Middle-... N/A

Outcomes reported are primarily self-reported psychological measures rather than objective productivity metrics.

Paper reports measurement instruments focused on self-reported self-efficacy, psychological ownership, meaningfulness, and enjoyment/satisfaction; no primary objective productivity metrics reported.

high null result Relying on AI at work reduces self-efficacy, ownership, and ... measurement type (self-reported psychological outcomes)

The experiment was pre-registered, used occupation-specific writing tasks, and employed a between-subjects design with three conditions (No-AI, Passive AI, Active collaboration).

Study design reported in the paper: pre-registration statement, N = 269, between-subjects assignment to three conditions using occupation-specific writing tasks.

high null result Relying on AI at work reduces self-efficacy, ownership, and ... n/a (methodological claim)

Active, collaborative AI use preserves perceived meaningfulness of work at levels comparable to independent work and does not produce the lasting psychological costs seen with passive use.

Pre-registered experiment (N = 269) with post-manipulation and post-return measures; Active-collaboration condition matched No-AI on meaningfulness and showed no persistent declines after returning to manual tasks.

high null result Relying on AI at work reduces self-efficacy, ownership, and ... perceived meaningfulness of work (including post-return)

Active, collaborative AI use preserves psychological ownership of outputs at levels comparable to independent work.

Pre-registered experiment (N = 269); Active-collaboration condition reported ownership levels similar to No-AI condition on self-report scales.

high null result Relying on AI at work reduces self-efficacy, ownership, and ... psychological ownership of outputs

Active, collaborative AI use (human drafts first, then uses AI to refine) preserves self-efficacy at levels comparable to independent (no-AI) work.

Pre-registered experiment (N = 269) comparing Active-collaboration and No-AI conditions; no statistically meaningful differences in self-efficacy between them (self-reported measures).

high null result Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy (confidence to complete tasks without AI)

The work is qualitative and exploratory — presenting naturalistic phenomena rather than causal empirical estimates, and is intended to be hypothesis-generating rather than definitive.

Methodology explicitly stated: naturalistic, qualitative daily observations over one month across multiple platforms; comparative observational documentation without experimental manipulation or causal identification.

high null result When Openclaw Agents Learn from Each Other: Insights from Em... nature of evidence (qualitative/exploratory vs. causal inference)

Results are from role-play contexts and short-term interventions; economic estimates of benefit require validation in field settings, across diverse populations, and with different LLM models.

Authors' caveats and limitations stated in the paper noting external validity concerns and the experimental context (role-play, short-term follow-up).

high null result Practicing with Language Models Cultivates Human Empathic Co... generalizability/external validity (not directly measured)

Outcome measures included alignment to the normative taxonomy (coding/automated), recipient-rated perceptions of being heard/validated, and blinded empathy judgments.

Methods section description listing primary and secondary outcomes used in the trial and evaluations.

high null result Practicing with Language Models Cultivates Human Empathic Co... alignment metrics, recipient-rated perceptions, blinded empathy judgments

A data-driven taxonomy was derived mapping common idiomatic empathic moves (e.g., validation, perspective-taking, emotional labeling, offers of support) used in naturalistic support conversations.

Textual analysis of the collected corpus (33,938 messages) produced an operational taxonomy of idiomatic empathic expressions used in the role-play dialogues.

high null result Practicing with Language Models Cultivates Human Empathic Co... taxonomy of empathic communication moves (categorical coding scheme)

The Lend an Ear platform collected a large conversational corpus: 33,938 messages across 2,904 conversations with 968 participants.

Dataset description reported in the paper specifying counts of participants, conversations, and messages used to build and analyze communication patterns.

high null result Practicing with Language Models Cultivates Human Empathic Co... corpus size (number of messages, conversations, participants)

Key empirical metrics introduced and used are: AI adoption rates (sector-level intensity), Skill shift index, Hybrid job share, and employment levels/net changes by sector.

Methods description listing the constructed metrics used in the simulated dataset and subsequent analyses (definitions and calculation procedures provided in the paper).

high null result AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Defined metrics (AI adoption rate, Skill shift index, Hybrid job share, Employme...

The study's main limitations include reliance on a simulated dataset rather than exhaustive administrative microdata, literature limited to selected publishers/years, and correlational (not causal) identification of some effects.

Authors' explicitly stated limitations in the paper's methods and discussion sections describing data choices (simulated dataset, selected publishers 2020–2024) and the observational/correlational nature of several analyses.

high null result AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Study validity/generalizability limitations

This work is conceptual/theoretical and reports no original empirical dataset; it explicitly calls for mixed-methods empirical validation (case studies, field experiments, longitudinal studies), measurement development, and multi-level data collection.

Explicit methodological statement in the paper describing its nature as a theoretical synthesis and listing empirical needs; no empirical sample provided.

high null result Revolutionizing Human Resource Development: A Theoretical Fr... presence/absence of original empirical data in the paper (none)

Four autonomous agents were benchmarked on the same fresh CTF challenge set alongside human teams.

Benchmarking experiment described in the study: four autonomous AI agents evaluated on the identical fresh challenge set used in the live onsite CTF.

high null result Understanding Human-AI Collaboration in Cybersecurity Compet... agent performance metrics on the fresh CTF challenge set (success rates, traject...

The study's empirical base consists of 40 semi-structured interviews with cross-industry project practitioners in the UK, analyzed using thematic qualitative methods.

Stated data and methods in the paper: sample size (40), interview method, cross-industry sampling, and thematic analysis.

high null result AI in project teams: how trust calibration reconfigures team... study sample and methodology (empirical basis)

Limitation: Implementation heterogeneity — the costs and feasibility of the recommended HR changes vary by context and may affect generalisability.

Explicit limitation acknowledged in the paper; drawn from theoretical reasoning about contextual heterogeneity and practitioner variability.

high null result Symbiarchic leadership: leading integrated human and AI cybe... implementation costs; feasibility; effect on generalisability

Limitation: The framework is conceptual and requires empirical validation across sectors, firm sizes and AI‑intensity levels.

Explicit limitation acknowledged by the authors; based on the paper's method (theoretical synthesis, no original data).

high null result Symbiarchic leadership: leading integrated human and AI cybe... generalizability and empirical validity across contexts

The paper generates empirically testable propositions (e.g., how leader practices affect AI adoption speed, task reallocation, productivity, error rates, employee well‑being and turnover) and suggests natural‑experiment settings for evaluation.

Stated methodological output of the conceptual synthesis; the paper lists candidate empirical tests and research opportunities but contains no original empirical tests.

high null result Symbiarchic leadership: leading integrated human and AI cybe... AI adoption speed; task reallocation; productivity; error rates; employee well‑b...

The available evidence consists mainly of promising empirical studies and case studies, but there are few long-run, generalized ROI or productivity estimates; results are heterogeneous across therapeutic areas.

Self-described limitation of the narrative review: heterogeneity of study designs and outcomes precluded pooled quantitative estimates and long-run ROI assessment.

high null result From Algorithm to Medicine: AI in the Discovery and Developm... evidence quality (availability of long-run ROI/productivity estimates) and heter...

AI applications span the full drug development pipeline, including target discovery, in silico screening and de novo design, preclinical safety models, clinical trial design and patient selection/monitoring, and post-marketing surveillance.

Comprehensive literature synthesis across preclinical, clinical, and post-marketing sources in the narrative review summarizing documented uses across these stages.

high null result From Algorithm to Medicine: AI in the Discovery and Developm... coverage of pipeline stages by AI applications (scope)

Current evidence is illustrative rather than systematic; there is a lack of long-run, quantitative measures of AI’s effect on late-stage clinical outcomes in the literature reviewed.

Explicit methodological statement in the paper: study is an expert/opinion synthesis and narrative review with no new causal econometric estimates or primary experimental data.

high null result Learning from the successes and failures of early artificial... existence/availability of long-run quantitative measures linking AI adoption to ...

Suggested metrics for researchers and investors to monitor include R&D cycle time, cost per IND/NDA, proportion of projects using AI, success rates at development stages, market concentration measures, and investment flows into AI-enabled biotech vs incumbents.

Recommendations made in the Implications section as metrics to watch; no empirical tracking or baseline measures provided.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research recommended monitoring metrics for AI impact in pharma/biotech

Limitations of the analysis include limited empirical validation of archetypes or impacts and potential selection bias toward prominent firms and technologies.

Explicit limitations stated in the Data & Methods section of the paper.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research generalizability and representativeness of the paper's claims

The paper is an editorial/conceptual synthesis rather than a primary empirical study: it uses qualitative analysis and illustrative examples, and reports no new quantitative estimates.

Explicit statement in the Data & Methods section of the paper describing document type, approach, evidence base, and limitations.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research empirical evidence provision (absence of new quantitative data)

Ethical oversight and governance (addressing bias, consent, downstream risks) are critical constraints that must be addressed for AI to generate sustained benefits.

Normative synthesis referencing common ethical concerns; no empirical evaluation of oversight mechanisms in the paper.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research ethical acceptability and downstream risk mitigation

Transparency and auditability for model behavior, provenance, and decisions are essential for trustworthy deployment and regulatory acceptance.

Policy and governance synthesis drawing on regulatory dynamics; no empirical study of regulatory outcomes included.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research trustworthiness/regulatory acceptability of models

« Prev 1 2 3 … 11 12 13 … 52 53 Next »