Evidence (3470 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

The results presented in the paper are based on a literature recherche, an analysis of individual tasks across different occupations (conducted within Erasmus+ projects), and discussions with trainers/educators.

Methodological statement from the paper; indicates the types of evidence used. The abstract does not provide numbers for analyzed tasks, the number of occupations, details of Erasmus+ projects, or counts of trainers/educators consulted.

high null result GenAI Role in Redefining Learning and Skilling in Companies n/a (describes evidence sources rather than an outcome)

The paper identifies key research gaps and proposes a future research agenda focused on human–AI interaction, organizational governance, and ethical accountability.

Conclusions/recommendations from the conceptual meta-analysis (paper-generated research agenda; no empirical testing reported in abstract).

high null result Reframing Organizational Decision-Making in the Age of Artif... presence and topics of recommended future research (human–AI interaction, govern...

This study presents a conceptual meta-analysis of interdisciplinary literature on AI-augmented decision-making in organizations.

Methodological statement of the paper (the paper itself is a conceptual meta-analysis); no primary empirical sample reported in the abstract.

high null result Reframing Organizational Decision-Making in the Age of Artif... scope and integration of interdisciplinary literature (conceptual synthesis)

A Job Digital Intensity Index (JDII) was constructed to capture how digitally intensive jobs are overall, based on the range of digital tasks performed.

Methodological construction described in the report using ESJS digital task items to form a composite JDII.

high null result Squandered skills? Bridging the digital gender skills gap fo... Job Digital Intensity Index (JDII) — composite measure of digital task breadth/i...

Deterministic automated verifiers provide objective pass/fail checks for task success.

Methods section: verifiers are deterministic and automated, enabling objective evaluation of whether an agent's trajectory accomplished the task.

high null result SkillsBench: Benchmarking How Well Agent Skills Work Across ... verification result (pass/fail)

Scale of experiments: seven agent–model configurations and 7,308 execution trajectories were used to compute pass rates and deltas.

Reported experimental scale in Methods: 7 agent–model configurations and a total of 7,308 agent execution traces collected and analyzed across tasks/conditions.

high null result SkillsBench: Benchmarking How Well Agent Skills Work Across ... sample size / number of trajectories (not an outcome variable)

Each task was evaluated under three conditions: (1) no Skills, (2) curated (human-authored) Skills, and (3) self-authored (model-generated) Skills.

Experimental protocol described in Methods: three-arm evaluation per task across the SkillsBench benchmark.

high null result SkillsBench: Benchmarking How Well Agent Skills Work Across ... experimental condition (not an outcome variable)

SkillsBench benchmark: evaluates 86 tasks spanning 11 domains with deterministic, automated verifiers.

Dataset and benchmark description in the paper: SkillsBench contains 86 tasks across 11 domains and uses deterministic pass/fail verifiers for objective evaluation.

high null result SkillsBench: Benchmarking How Well Agent Skills Work Across ... benchmark composition and verification method (not an outcome variable)

Framing claim: Ideological contests typically produce opposing normative visions (e.g., collectivized command economies vs. market democracies), which makes the development of Western economic theories that portray markets and democracy as dysfunctional puzzling.

Framing and motivation provided in the paper's introduction and background sections; synthesis of conventional expectations about ideological contest outcomes.

high null result Ideological competition during the era of the 20th century c... expectation about typical normative alignments in ideological contests (conceptu...

The paper uses a qualitative case‑study approach (archival and textual analysis, contextualization, interpretive synthesis) rather than attempting exhaustive quantitative causal identification.

Explicit methods description in the paper: in‑depth historical/institutional examination, archival/textual work, and interpretive synthesis.

high null result Ideological competition during the era of the 20th century c... methodological approach employed (qualitative/case‑study)

Calibration via Method of Simulated Moments (MSM) matches six empirical moments to discipline mechanism magnitudes.

Model calibration procedure reported in the paper: MSM matching six chosen empirical moments that summarize key pre/post-AI patterns (paper states six moments were used).

high null result When AI Levels the Playing Field: Skill Homogenization, Asse... fit to six empirical moments (identification/calibration quality)

The paper calls for subsequent quantitative validation (using task-based, matched employer-employee, and provider-level panel data) to estimate causal impacts on productivity, health outcomes, wages, and employment composition across the three interaction levels.

Stated research agenda and measurement recommendations in the paper's discussion section.

high null result Toward human+ medical professionals: navigating AI integrati... need for causal estimates of productivity, health outcomes, wages, employment co...

The study is qualitative and small-sample (four case) and therefore interpretive and illustrative rather than statistically generalizable.

Explicit methodological statement in the paper: design = qualitative multiple case study, sample = four AI healthcare applications.

high null result Toward human+ medical professionals: navigating AI integrati... generalizability/external validity

The study identifies a three-level taxonomy of human–AI interaction in healthcare: AI-assisted, AI-augmented, and AI-automated.

Conceptual taxonomy derived from multiple qualitative case studies (n=4) using cross-case comparison and Bolton et al. (2018)'s three-dimensional service-innovation framework.

high null result Toward human+ medical professionals: navigating AI integrati... classification of AI–human interaction (taxonomic mapping)

The paper's empirical scope is primarily conceptual/theoretical and literature‑based rather than an empirical case study or large‑scale data experiment; it emphasizes the need for future empirical validation.

Explicit methodological description within the paper stating reliance on literature review and conceptual development; absence of empirical sample or case study.

high null result A Review of Manufacturing Operations Research Integration in... presence/absence of empirical validation within the study

Typical evaluation metrics reported are accuracy, precision, recall, F1-score, AUC, detection rate, false positive rate, latency, and computational cost.

Survey of evaluation practices in reviewed papers listing the metrics authors commonly report.

high null result International Journal on Cybernetics & Informatics evaluation metrics used

Emerging approaches in the literature include federated learning, online/streaming learning, and transfer learning for cross-device generalization.

Trend analysis across recent papers indicating adoption of federated and continual learning paradigms and transfer-learning techniques.

high null result International Journal on Cybernetics & Informatics research trend uptake (use of federated/online/transfer approaches)

Unsupervised and semi-supervised methods (clustering, one-class classifiers, autoencoder-based anomaly detectors) are commonly employed to handle unlabeled/anomalous IoT traffic.

Synthesis of studies using anomaly-detection paradigms and unsupervised techniques reported in the reviewed papers.

high null result International Journal on Cybernetics & Informatics methods used (unsupervised/semi-supervised approaches)

Deep learning approaches used include CNNs, RNNs/LSTMs for sequence/traffic analysis, and autoencoders for anomaly detection.

Surveyed literature and taxonomy noting multiple studies that apply convolutional and recurrent architectures and autoencoders to network/traffic data.

high null result International Journal on Cybernetics & Informatics methods used (deep learning architectures applied)

Common ML approaches reported for IoT IDS include supervised models (random forest, SVM, gradient boosting, neural networks).

Taxonomy and literature synthesis showing frequent use of classical supervised classifiers in surveyed papers and experiments.

high null result International Journal on Cybernetics & Informatics methods used (algorithm type frequency)

Empirical research suggestion: recommended outcome variables for future empirical work include productivity (TFP), profitability, exports, employment composition, and process innovation rates; explanatory variables include AI adoption intensity, strategic alignment indices, leadership commitment surveys, sensing activities, and institutional support measures.

Explicit research agenda and measurement suggestions provided in the paper based on the framework and gaps identified in the 72‑article review.

high null result Beyond resource constraints: how Ibero-American SMEs leverag... List of suggested empirical outcomes (TFP, profitability, exports, employment co...

Scope & limits: the paper is a literature synthesis (no new primary empirical data), has a geographical emphasis on Ibero‑America, and covers literature up to 2024 (may omit post‑2024 developments).

Explicit limitations and scope noted in the paper (no primary data; regional emphasis; time window).

high null result Beyond resource constraints: how Ibero-American SMEs leverag... N/A (scope/limitations)

Methodological approach: the paper uses a structured narrative literature review following Torraco (2016) and Juntunen & Lehenkari (2021), analyzing a corpus of 72 articles from 2015–2024 via thematic synthesis and systematic coding.

Explicit methodological statement in the paper specifying approach, corpus size (72 articles), time window (2015–2024), and analytic techniques (thematic synthesis and coding).

high null result Beyond resource constraints: how Ibero-American SMEs leverag... N/A (methodological claim)

The framework yields eight empirically testable propositions linking capability development to firm outcomes (the paper explicitly lists eight propositions including P1–P3 and five additional linked propositions).

Explicit claim in the reviewed paper: framework includes eight testable propositions; propositions are theoretical and untested empirically within the paper.

high null result Beyond resource constraints: how Ibero-American SMEs leverag... Various firm outcomes proposed for testing (productivity, adoption probability, ...

The review followed PRISMA guidelines and included 30 scholarly articles retrieved from Scopus, published between 2020 and 2025, selected using pre-specified inclusion criteria.

Methods section of the paper reporting the SLR protocol, database, time window, and number of included studies.

high null result Pricing Strategy in Digital Marketing: A Systematic Review o... Scope of literature reviewed (database, timeframe, sample size)

These quantitative performance figures come from case‑level, high‑performer pilots and should not be treated as typical industry benchmarks.

Authors' caveat based on the composition of evidence in the review (skew towards pilots and selected advanced implementations; limited longitudinal/multi‑project empirical studies).

high null result Digital Twins Across the Asset Lifecycle: Technical, Organis... representativeness/generalizability of reported performance figures

Inter‑rater reliability for the study selection/encoding was Cohen’s κ = 0.83 (substantial agreement).

Reported inter‑rater reliability statistic from the review's quality control step (Cohen's kappa = 0.83).

high null result Digital Twins Across the Asset Lifecycle: Technical, Organis... inter‑rater reliability (Cohen's kappa)

The review screened 463 Scopus records (2018–2026) and selected 160 peer‑reviewed studies using a PRISMA‑guided process.

Systematic literature review described in paper: Scopus search (2018–2026), PRISMA screening and eligibility filtering; initial n=463, final n=160.

high null result Digital Twins Across the Asset Lifecycle: Technical, Organis... number of records retrieved and final sample size

The abstract does not report the study sample size, sectoral scope, or country/context—limiting assessment of external validity and generalizability.

Observation of reporting in the paper's abstract (absence of sample size, sectoral/country context information in the abstract as provided).

high null result Reimagining Stakeholder Engagement Through Generative AI: A ... Completeness of methodological reporting (sample/context disclosure)

The study used a two-stage mixed-methods design: a qualitative exploratory phase to surface determinants of trust and inertia, followed by a quantitative phase to validate the conceptual framework.

Methods description in the paper: explicit two-stage mixed-methods approach (qualitative then quantitative) used to identify and test determinants of initial trust and inertia toward GAICS.

high null result Reimagining Stakeholder Engagement Through Generative AI: A ... Study design / methodological approach

The authors did not perform primary empirical validation or simulation of TVR‑Sec across real VR deployments.

Methods and limitations section explicitly state no original empirical experiments or simulations were conducted; analysis is conceptual and qualitative.

high null result Securing Virtual Reality: Threat Models, Vulnerabilities, an... whether empirical validation/simulation was performed (none)

The paper's scope comprised a comparative literature review and conceptual integration of 31 peer‑reviewed studies published between 2023 and 2025.

Authors' methods description specifying sample size and publication window: 31 peer‑reviewed studies (2023–2025).

high null result Securing Virtual Reality: Threat Models, Vulnerabilities, an... number and date range of studies included in the review (31 studies, 2023–2025)

Economy & Finance threads contained no self-referential content, suggesting agents can engage in market discussion without representing themselves as agents.

Topic-model-derived topical category labeling and tagging for self-referential themes showing zero instances of self-reference in posts categorized as Economy & Finance in the dataset; counts derived from the 361,605 posts.

high null result What Do AI Agents Talk About? Emergent Communication Structu... presence/absence of self-referential tags in Economy & Finance posts

The authors released their code and data for reproducibility at https://github.com/blocksecteam/ReEVMBench/.

Statement in the paper indicating public release of code and dataset at the provided GitHub URL.

high null result Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... code_and_data_availability (repository_link)

The workshop identifies specific research directions for AI economics: cost–benefit and ROI analyses of shared infrastructure; market design for procurement of co-designed systems; models of innovation incentives under different IP/data-governance regimes; labor market impact assessments; and empirical studies of how validation ecosystems affect adoption rates and pricing.

Explicitly listed research directions in the workshop summary and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... articulated research agenda items and priority areas for future empirical study

The workshop's findings are based on qualitative synthesis of expert judgment and stakeholder inputs rather than primary empirical data or controlled experiments.

Explicitly stated in the Data & Methods section of the workshop summary; methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building.

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... nature and strength of empirical support for the recommendations (qualitative vs...

The workshop convened researchers, clinicians, and industry leaders to address co-design across four thematic areas: teleoperations/telehealth/surgical operations; wearable and implantable medicine; home ICU/hospital systems/elderly care; and medical sensing/imaging/reconstruction.

Workshop agenda and participant list from the two-day NSF workshop (Sept 26–27, 2024); methods included thematic breakout sessions focused on these four areas. Documentation at https://sites.google.com/view/nsfworkshop.

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... topics and thematic coverage of the workshop

The paper uses a mixed-methods approach combining a systematic literature review with an empirical practitioner survey to assess perceptions, adoption, and impact of AI-driven tools.

Methodological statement in the paper; survey design covers tool usage, perceived benefits, challenges, and expectations.

high null result Artificial Intelligence as a Catalyst for Innovation in Soft... methodological coverage (presence of literature review and survey)

Because this is a conceptual/systems-architecture paper, it does not present new empirical performance benchmarks.

Explicit statement in the paper's Data & Methods section that no new empirical benchmarks are presented.

high null result Reference Architecture of a Quantum-Centric Supercomputer presence or absence of new empirical performance benchmark data

DPS was empirically evaluated across diverse reasoning domains (mathematical reasoning, planning, and visual-geometry) to test generality.

Paper reports experiments on those three categories of tasks; they are listed as the evaluated tasks in the methods/experiments section.

high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... task domains evaluated (mathematics, planning, visual-geometry)

DPS uses the inferred per-prompt state distributions as a predictive prior to select prompts estimated to be most informative, avoiding exhaustive candidate rollouts for filtering.

Method and selection mechanism described: predictive prior ranking/filtering replaces rollout-heavy candidate evaluation. (Procedure described in paper; empirical comparisons reported.)

high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... selection of prompts (number of candidate rollouts avoided)

Dynamics-Predictive Sampling (DPS) models each prompt’s "extent of solving" under the current policy as a latent state in a dynamical system (a hidden Markov model) and performs online Bayesian inference on historical rollout reward signals to estimate that state.

Methodological description in the paper: DPS uses an HMM representation of per-prompt solving progress and applies online Bayesian updates using past rollout rewards. (No numerical sample size needed for this modeling claim.)

high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... inferred latent state distribution / predicted expected learning progress per pr...

The authors recommend specific measurement metrics and empirical research priorities (e.g., MAPE, stockout frequency, inventory turns, lead times, fill rates, total supply chain cost, service-level volatility, resilience measures; causal studies like diff-in-diff or randomized interventions).

Explicit recommendations in the paper's measurement and research agenda sections.

high null result Optimizing integrated supply planning in logistics: Bridging... listed supply-chain performance and resilience metrics

The study's small sample size and qualitative design limit external generalizability and prevent causal effect size estimation; potential selection and reporting biases exist due to purposive sampling and interview-based data.

Authors explicitly state these limitations in the paper's limitations section.

high null result Optimizing integrated supply planning in logistics: Bridging... external generalizability and causal inference capability

The study is a qualitative multi-case study of five medium-to-large organizations, using semi-structured interviews across procurement, production planning, inventory management, and distribution, analyzed via cross-case comparison.

Methods section description provided by the authors (sample size n = 5, sectors, interview-based primary data, cross-case analysis).

high null result Optimizing integrated supply planning in logistics: Bridging... process-level, qualitative insights into ISP implementation

There is limited empirical causal evidence linking specific explanation types to long-term outcomes (safety, fairness, economic performance) in real-world deployments.

Meta-level finding of the review: authors report gaps in the literature—few causal or longitudinal studies of explanation interventions in deployed, high-stakes settings.

high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... evidence availability for causal effects on safety, fairness, economic performan...

The literature groups explainability impacts along three linked dimensions — user trust, ethical governance, and organizational accountability.

Analytical result of the review's thematic coding and synthesis across interdisciplinary literature (categorization derived from the reviewed corpus).

high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... categorization structure of explainability impacts (three-dimension taxonomy)

The paper is primarily theoretical and prescriptive: it synthesizes literature and proposes a framework and design guidelines rather than reporting large-scale empirical datasets or causal identification of economic outcomes.

Meta-claim about the paper's methods explicitly stated in the Data & Methods summary; based on the paper's methodological description.

high null result Toward a science of human–AI teaming for decision-making: A ... presence/absence of empirical datasets or causal identification studies in the p...

Key measurable outcomes to assess Human–AI teams include accuracy/efficiency, robustness to novel cases, decision consistency, trust/misuse rates, training costs, and inequity indicators.

Prescriptive list of metrics offered by the authors as part of the research agenda and evaluation guidance; not empirically derived from a dataset in the paper.

high null result Toward a science of human–AI teaming for decision-making: A ... accuracy, efficiency, robustness, consistency, trust/misuse rates, training cost...

Empirical evaluation strategies for Human–AI teams should include randomized interventions, field trials, lab experiments, phased rollouts (difference-in-differences), and structural models that allow interaction terms between human skill and AI quality.

Methodological recommendation in the paper; suggested study designs rather than implemented analyses.

high null result Toward a science of human–AI teaming for decision-making: A ... appropriate empirical identification of team-level complementarities and causal ...

« Prev 1 2 3 … 18 19 20 … 69 70 Next »