Evidence (7953 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

The study's main limitations include reliance on a simulated dataset rather than exhaustive administrative microdata, literature limited to selected publishers/years, and correlational (not causal) identification of some effects.

Authors' explicitly stated limitations in the paper's methods and discussion sections describing data choices (simulated dataset, selected publishers 2020–2024) and the observational/correlational nature of several analyses.

high null result AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Study validity/generalizability limitations

Further research is needed—randomized controlled trials, long-term impact measurement (earnings, employment stability, skill accumulation), distributional analysis, and model audits for bias.

Authors' stated research agenda and recommendations; not an empirical finding but a methodological recommendation following the pilot.

high null result AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... long-term earnings, employment stability, skill accumulation, distributional out...

The authors explicitly note limitations: the study focuses on prediction (not causation), results are sensitive to data quality, workforce records may contain biases, and practical constraints like privacy and deployment complexity limit direct operational adoption.

Limitations section described by the authors listing prediction-versus-causation distinction, sensitivity to data quality, potential biases, privacy concerns, and deployment complexity.

high null result Adoption of AI-Based HR Analytics and Its Impact on Firm Pro... Scope and limitations of study conclusions (qualitative)

The study used a reproducible modeling pipeline (data cleaning, feature engineering, model training and tuning, systematic evaluation) applied to several freely available workforce datasets to enable replication.

Methods section describes a reproducible workflow including preprocessing steps, engineered features, hyperparameter tuning for each model class, cross-validation, and use of publicly available datasets.

high null result Adoption of AI-Based HR Analytics and Its Impact on Firm Pro... Reproducibility of predictive modeling workflow (procedural, not an empirical pe...

This work is conceptual/theoretical and reports no original empirical dataset; it explicitly calls for mixed-methods empirical validation (case studies, field experiments, longitudinal studies), measurement development, and multi-level data collection.

Explicit methodological statement in the paper describing its nature as a theoretical synthesis and listing empirical needs; no empirical sample provided.

high null result Revolutionizing Human Resource Development: A Theoretical Fr... presence/absence of original empirical data in the paper (none)

Four autonomous agents were benchmarked on the same fresh CTF challenge set alongside human teams.

Benchmarking experiment described in the study: four autonomous AI agents evaluated on the identical fresh challenge set used in the live onsite CTF.

high null result Understanding Human-AI Collaboration in Cybersecurity Compet... agent performance metrics on the fresh CTF challenge set (success rates, traject...

Data and methods: the study used an online experiment with 861 online-retail employees performing short-duration, virtual, task-focused collaborations; analyses focused on direct effects, moderation (emotion and partner type), mediation (service empathy), and moderated-mediation.

Methods description in the paper specifying design, sample size (n = 861), task context (temporary virtual teamwork), and analytic approach (hypothesis tests including moderation and mediation analyses).

high null result Adoption of AI partners in temporary tasks: exploring the ef... NA (methodological claim about study design and analyses)

Teamwork partner type (human vs AI) has no direct, significant effect on collaboration proficiency for temporary virtual tasks.

Online experiment with employees in the online-retail industry (n = 861). Hypothesis testing showed no significant main effect of partner type on the outcome variable 'collaboration proficiency' in the reported analyses.

high null result Adoption of AI partners in temporary tasks: exploring the ef... collaboration proficiency

Empirical strategy: the main identification strategy uses panel regressions with quadratic AI specification and interaction terms, controlling for firm covariates, employing fixed effects and robustness checks (alternative measures, sub-samples).

Methods section description: panel regressions including AI and AI^2, interactions for moderators, controls, fixed effects, and robustness analyses reported in the paper.

high null result Attention to Whom? AI Adoption and Corporate Social Responsi... N/A (methodological claim)

Data/sample claim: the empirical analysis uses a panel of 2,575 Chinese listed firms observed from 2013 to 2023.

Paper-stated sample description (panel dataset covering 2013–2023, N = 2,575 firms).

high null result Attention to Whom? AI Adoption and Corporate Social Responsi... N/A (sample description)

The paper recommends an empirical research agenda including field experiments comparing teams with and without AI mediation, structural models of labor supply and wages under reduced language frictions, microdata analysis of adopters, and measurement studies for coordination costs and mediated-action reliability.

Explicit recommendations and research agenda stated in the paper; this is a descriptive claim about the paper's content rather than an empirical finding.

high null result AI as a universal collaboration layer: Eliminating language ... existence of the recommended research agenda items in the paper

The paper's primary approach is conceptual/theoretical development and agenda-setting; it does not report large-scale empirical or experimental data.

Explicit methods statement in the paper: synthesis, illustrative examples, framework development; absence of reported empirical sample or experiments.

high null result AI as a universal collaboration layer: Eliminating language ... presence/absence of empirical/experimental data in the paper

The study's empirical base consists of 40 semi-structured interviews with cross-industry project practitioners in the UK, analyzed using thematic qualitative methods.

Stated data and methods in the paper: sample size (40), interview method, cross-industry sampling, and thematic analysis.

high null result AI in project teams: how trust calibration reconfigures team... study sample and methodology (empirical basis)

Limitation: Implementation heterogeneity — the costs and feasibility of the recommended HR changes vary by context and may affect generalisability.

Explicit limitation acknowledged in the paper; drawn from theoretical reasoning about contextual heterogeneity and practitioner variability.

high null result Symbiarchic leadership: leading integrated human and AI cybe... implementation costs; feasibility; effect on generalisability

Limitation: The framework is conceptual and requires empirical validation across sectors, firm sizes and AI‑intensity levels.

Explicit limitation acknowledged by the authors; based on the paper's method (theoretical synthesis, no original data).

high null result Symbiarchic leadership: leading integrated human and AI cybe... generalizability and empirical validity across contexts

The paper generates empirically testable propositions (e.g., how leader practices affect AI adoption speed, task reallocation, productivity, error rates, employee well‑being and turnover) and suggests natural‑experiment settings for evaluation.

Stated methodological output of the conceptual synthesis; the paper lists candidate empirical tests and research opportunities but contains no original empirical tests.

high null result Symbiarchic leadership: leading integrated human and AI cybe... AI adoption speed; task reallocation; productivity; error rates; employee well‑b...

Typical methods used are deep learning for property prediction and representation learning, protein-structure modelling tools, generative models for de novo design, NLP for knowledge extraction, and ADME/Tox in silico models integrated with traditional computational chemistry.

Methodological survey in the paper listing these approaches and examples of their application.

high null result Has AI Reshaped Drug Discovery, or Is There Still a Long Way... methods deployed in AI-driven drug discovery workflows

Commonly used data types in AI-driven drug discovery include biochemical/binding assay data, protein structural data, HTS results, ADME/Tox and PK datasets, omics/phenotypic readouts, and scientific literature/patents.

Cataloguing of data sources used across studies and company pipelines described in the paper.

high null result Has AI Reshaped Drug Discovery, or Is There Still a Long Way... types of datasets employed in model training and discovery workflows

AI became widely adopted in pharmaceutical discovery during the 2010s, driven by greater compute, larger datasets, and advances in deep learning.

Historical overview and trend analysis in the paper referencing increased compute availability, growth in public and proprietary datasets, and the rise of deep-learning publications and tools over the 2010s.

high null result Has AI Reshaped Drug Discovery, or Is There Still a Long Way... timeline and adoption rate of AI methods in pharmaceutical discovery

The available evidence consists mainly of promising empirical studies and case studies, but there are few long-run, generalized ROI or productivity estimates; results are heterogeneous across therapeutic areas.

Self-described limitation of the narrative review: heterogeneity of study designs and outcomes precluded pooled quantitative estimates and long-run ROI assessment.

high null result From Algorithm to Medicine: AI in the Discovery and Developm... evidence quality (availability of long-run ROI/productivity estimates) and heter...

AI applications span the full drug development pipeline, including target discovery, in silico screening and de novo design, preclinical safety models, clinical trial design and patient selection/monitoring, and post-marketing surveillance.

Comprehensive literature synthesis across preclinical, clinical, and post-marketing sources in the narrative review summarizing documented uses across these stages.

high null result From Algorithm to Medicine: AI in the Discovery and Developm... coverage of pipeline stages by AI applications (scope)

Current evidence is illustrative rather than systematic; there is a lack of long-run, quantitative measures of AI’s effect on late-stage clinical outcomes in the literature reviewed.

Explicit methodological statement in the paper: study is an expert/opinion synthesis and narrative review with no new causal econometric estimates or primary experimental data.

high null result Learning from the successes and failures of early artificial... existence/availability of long-run quantitative measures linking AI adoption to ...

Suggested metrics for researchers and investors to monitor include R&D cycle time, cost per IND/NDA, proportion of projects using AI, success rates at development stages, market concentration measures, and investment flows into AI-enabled biotech vs incumbents.

Recommendations made in the Implications section as metrics to watch; no empirical tracking or baseline measures provided.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research recommended monitoring metrics for AI impact in pharma/biotech

Limitations of the analysis include limited empirical validation of archetypes or impacts and potential selection bias toward prominent firms and technologies.

Explicit limitations stated in the Data & Methods section of the paper.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research generalizability and representativeness of the paper's claims

The paper is an editorial/conceptual synthesis rather than a primary empirical study: it uses qualitative analysis and illustrative examples, and reports no new quantitative estimates.

Explicit statement in the Data & Methods section of the paper describing document type, approach, evidence base, and limitations.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research empirical evidence provision (absence of new quantitative data)

Ethical oversight and governance (addressing bias, consent, downstream risks) are critical constraints that must be addressed for AI to generate sustained benefits.

Normative synthesis referencing common ethical concerns; no empirical evaluation of oversight mechanisms in the paper.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research ethical acceptability and downstream risk mitigation

Transparency and auditability for model behavior, provenance, and decisions are essential for trustworthy deployment and regulatory acceptance.

Policy and governance synthesis drawing on regulatory dynamics; no empirical study of regulatory outcomes included.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research trustworthiness/regulatory acceptability of models

Rigorous model validation and reproducibility across datasets and settings are necessary constraints for successful AI deployment.

Normative claim in the editorial based on reproducibility concerns in ML and biomedical research; no reported validation trials within the paper.

high null result AI as the Catalyst for a New Paradigm in Biomedical Research reliability and generalizability of AI models across settings

The paper is primarily discursive and invitational: it opens a dialogue and proposes a research agenda rather than providing definitive empirical answers.

Stated methodological stance and limits: conceptual/philosophical analysis, interdisciplinary literature synthesis, qualitative/illustrative examples, and explicit note of no systematic empirical evaluation.

high null result At the table with Wittgenstein: How language shapes taste an... presence/absence of new empirical datasets or systematic experimental validation...

Operators and regulators should prioritize independent model audits, disclosure of data use, fairness/error rates, and field experiments to quantify causal impacts and heterogeneous effects.

Policy recommendations and research priorities summarized in the review based on identified methodological and governance gaps.

high null result Deep technologies and safer gambling: A systematic review. policy/research actions recommended (qualitative)

Research gaps include the need for robust causal evaluations (RCTs, field experiments), standardized metrics, transparency/interpretability, fairness analysis, and cross‑jurisdictional studies.

Review's recommendations and identified gaps, noting scarcity of RCTs/longitudinal work and calls for standardized outcomes and fairness checks.

high null result Deep technologies and safer gambling: A systematic review. presence of causal evaluations, standardized metrics, transparency and fairness ...

Heterogeneous study designs, outcomes, and measures across the literature hinder quantitative meta‑analysis and synthesis of effectiveness.

Review states heterogeneity of designs and outcome measures as a limitation preventing meta‑analysis.

high null result Deep technologies and safer gambling: A systematic review. heterogeneity of study designs and outcome measures (qualitative / count of disp...

Typical data used in studies are platform behavioural logs (bets, stakes, timestamps, session durations), account metadata, and in some cases limited self‑report measures.

Review summary of data sources across included studies listing platform logs and metadata as primary inputs to algorithms.

high null result Deep technologies and safer gambling: A systematic review. data types employed in models (behavioral log variables, account metadata, self‑...

Evaluation approaches in the reviewed literature varied widely, with many studies using retrospective accuracy metrics (AUC, precision/recall) rather than causal impact measures on harm reduction.

Methods synthesis in review: prevalence of supervised/unsupervised ML with retrospective performance reporting; few RCTs or field experiments reported.

high null result Deep technologies and safer gambling: A systematic review. type of evaluation used (retrospective predictive metrics vs causal designs)

Four primary application areas were identified: (1) behavioural monitoring and feedback, (2) predictive risk modelling, (3) decision support and AI classifiers, and (4) limit‑setting and self‑exclusion tools.

Thematic synthesis of included studies categorizing described applications into four main areas (review taxonomy).

high null result Deep technologies and safer gambling: A systematic review. application area classification (categorical counts / thematic presence)

Searches were performed in Web of Science, PubMed, Scopus, EBSCO and IEEE, plus manual searches, following PRISMA guidelines.

Methods section of the review specifying databases searched and PRISMA-guided review process.

high null result Deep technologies and safer gambling: A systematic review. search strategy / databases searched (qualitative)

The review included 68 empirical and methodological studies on deep technologies in online gambling.

Systematic review following PRISMA; searches of Web of Science, PubMed, Scopus, EBSCO, IEEE and manual searching produced 68 included studies (count reported in paper).

high null result Deep technologies and safer gambling: A systematic review. number of included studies (study count = 68)

The collection includes a mix of methodological papers, empirical applications demonstrating ecological insight, and translational work focused on policy or conservation practice.

Study-types categorization provided in the paper (descriptive tally/characterization of the kinds of contributions in the collection).

high null result Towards ‘digital ecology’: Advances in integrating artificia... types of studies present in the collection

Methods in the collection span from automated image and signal processing for routine tasks to integrated modelling that couples ecological theory with data‑driven methods.

Methods-scope summary in the paper describing the range of AI/ML approaches used across the collection (descriptive across studies).

high null result Towards ‘digital ecology’: Advances in integrating artificia... range of methodological approaches used

The collection uses large ecological observational datasets such as camera‑trap imagery, sensor streams, biodiversity surveys, and other high‑volume ecological monitoring data.

Data & methods section listing the data types represented across the reviewed papers (descriptive inventory of dataset types used in the collection).

high null result Towards ‘digital ecology’: Advances in integrating artificia... types of data used in ecological AI research

Recommendation (research): Future research should link AI adoption to objective performance metrics (profitability, default rates, processing times) and use longitudinal or quasi-experimental designs to identify causal effects.

Authors' suggested research directions noted in the summary, motivated by limitations of cross-sectional, self-reported data.

high null result From Data to Decisions: Harnessing Artificial Intelligence f... research design and outcome measurement (recommendation)

The summary omits important reporting details: p-values, standard errors, model control variables, and exact variable operationalizations are not provided.

Explicit reporting gap noted in the paper summary (absence of p-values, SEs, controls, and operationalization details).

high null result From Data to Decisions: Harnessing Artificial Intelligence f... statistical reporting completeness

Because the data are cross-sectional and self-reported, the design limits causal inference about AI adoption causing the observed outcomes.

Study design (cross-sectional survey, self-reported measures) and explicit limitation noted in the paper summary.

high null result From Data to Decisions: Harnessing Artificial Intelligence f... ability to infer causality

Key measures are self-reported Likert scales for AI adoption/usage and the dependent outcomes (financial decision-making efficiency, operational efficiency, financial resilience, and AI-based analytics effectiveness).

Measurement description in Methods: independent and dependent variables reported as self-reported Likert measures collected in the cross-sectional survey.

high null result From Data to Decisions: Harnessing Artificial Intelligence f... measurement type (self-reported Likert scales)

The study is a cross-sectional quantitative survey of 312 professionals in banks, fintechs, and financial service firms.

Study design and sample description reported in Data & Methods; sample size explicitly given as N = 312 and composition described as professionals across financial institutions, fintech organizations, and financial service companies.

high null result From Data to Decisions: Harnessing Artificial Intelligence f... study design / sample

The SKILL.md used in the with-skill condition encodes workflow logic, API patterns, and business rules as portable domain guidance for agents.

Paper description of the with-skill intervention specifying the content and intended role of SKILL.md.

high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... presence and content type of injected domain guidance (workflow logic, API patte...

We evaluated open-weight models under two conditions: baseline (generic agent with tool access but no domain guidance) and with-skill (agent augmented with a portable SKILL.md document encoding workflow logic, API patterns, and business rules).

Experimental design in paper describing the two agent conditions; SKILL.md described as the injected domain guidance artifact.

high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... experimental condition (baseline vs with-skill)

Each scenario is grounded in live mock API servers with seeded production-representative data, MCP tool interfaces, and deterministic evaluation rubrics combining response content checks, tool-call verification, and database state assertions.

Methods/benchmark design described in paper specifying environment: live mock APIs, seeded data, MCP tool interfaces, and deterministic evaluation combining content checks, tool-call verification, and DB assertions.

high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... evaluation environment fidelity and evaluation criteria (content checks, tool-ca...

SKILLS comprises 37 telecom operations scenarios spanning 8 TM Forum Open API domains (TMF620, TMF621, TMF622, TMF628, TMF629, TMF637, TMF639, TMF724).

Framework specification in the paper; explicit statement of scenario count (37) and list of 8 TMF Open API domains.

high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... coverage: number of scenarios (37) and number of API domains (8) included

We introduce SKILLS (Structured Knowledge Injection for LLM-driven Service Lifecycle operations), a benchmark framework for telecom operations.

Paper describes the design and release of the SKILLS benchmark framework as the contribution; methods section outlines framework components and usage.

high null result SKILLS: Structured Knowledge Injection for LLM-Driven Teleco... existence and definition of the SKILLS benchmark framework

« Prev 1 2 3 … 29 30 31 … 159 160 Next »