Evidence (7953 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

DPS was empirically evaluated across diverse reasoning domains (mathematical reasoning, planning, and visual-geometry) to test generality.

Paper reports experiments on those three categories of tasks; they are listed as the evaluated tasks in the methods/experiments section.

high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... task domains evaluated (mathematics, planning, visual-geometry)

DPS uses the inferred per-prompt state distributions as a predictive prior to select prompts estimated to be most informative, avoiding exhaustive candidate rollouts for filtering.

Method and selection mechanism described: predictive prior ranking/filtering replaces rollout-heavy candidate evaluation. (Procedure described in paper; empirical comparisons reported.)

high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... selection of prompts (number of candidate rollouts avoided)

Dynamics-Predictive Sampling (DPS) models each prompt’s "extent of solving" under the current policy as a latent state in a dynamical system (a hidden Markov model) and performs online Bayesian inference on historical rollout reward signals to estimate that state.

Methodological description in the paper: DPS uses an HMM representation of per-prompt solving progress and applies online Bayesian updates using past rollout rewards. (No numerical sample size needed for this modeling claim.)

high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... inferred latent state distribution / predicted expected learning progress per pr...

The paper does not present large-scale empirical validation; its evidence is primarily theoretical exposition, a constructed illustrative example, and a literature survey.

Explicit description of methods and data in the paper (analysis type: theoretical exposition + illustrative example; no experimental sample reported).

high null result Ergodicity in reinforcement learning presence/absence of empirical experiments or sample-based validation

Local stochastic fluctuations can undo early discovery leads, preventing transient superiority from becoming permanent unless additional asymmetries intervene.

Dynamical analysis of monopolization stage in the model and simulation trajectories showing reversal or loss of early leads in symmetric interaction regimes; theoretical demonstration that fluctuations can destabilize early footholds.

high null result Macroscopic Dominance from Microscopic Extremes: Symmetry Br... persistence of local leads over time (probability of lead reversal due to stocha...

Transient superiority (finding resources faster) by itself does not stabilize a system-wide monopoly; early leads are fragile and can be undone by local stochastic fluctuations.

Analysis of monopolization dynamics and absorbing-state stability within the stochastic spatial model, plus numerical simulations showing symmetric interaction scenarios do not produce robust absorbing monopolies. This is model-based (no empirical validation).

high null result Macroscopic Dominance from Microscopic Extremes: Symmetry Br... long-term persistence/probability of absorbing (system-wide monopoly) state give...

The authors recommend specific measurement metrics and empirical research priorities (e.g., MAPE, stockout frequency, inventory turns, lead times, fill rates, total supply chain cost, service-level volatility, resilience measures; causal studies like diff-in-diff or randomized interventions).

Explicit recommendations in the paper's measurement and research agenda sections.

high null result Optimizing integrated supply planning in logistics: Bridging... listed supply-chain performance and resilience metrics

The study's small sample size and qualitative design limit external generalizability and prevent causal effect size estimation; potential selection and reporting biases exist due to purposive sampling and interview-based data.

Authors explicitly state these limitations in the paper's limitations section.

high null result Optimizing integrated supply planning in logistics: Bridging... external generalizability and causal inference capability

The study is a qualitative multi-case study of five medium-to-large organizations, using semi-structured interviews across procurement, production planning, inventory management, and distribution, analyzed via cross-case comparison.

Methods section description provided by the authors (sample size n = 5, sectors, interview-based primary data, cross-case analysis).

high null result Optimizing integrated supply planning in logistics: Bridging... process-level, qualitative insights into ISP implementation

There is limited empirical causal evidence linking specific explanation types to long-term outcomes (safety, fairness, economic performance) in real-world deployments.

Meta-level finding of the review: authors report gaps in the literature—few causal or longitudinal studies of explanation interventions in deployed, high-stakes settings.

high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... evidence availability for causal effects on safety, fairness, economic performan...

The literature groups explainability impacts along three linked dimensions — user trust, ethical governance, and organizational accountability.

Analytical result of the review's thematic coding and synthesis across interdisciplinary literature (categorization derived from the reviewed corpus).

high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... categorization structure of explainability impacts (three-dimension taxonomy)

The paper is primarily theoretical and prescriptive: it synthesizes literature and proposes a framework and design guidelines rather than reporting large-scale empirical datasets or causal identification of economic outcomes.

Meta-claim about the paper's methods explicitly stated in the Data & Methods summary; based on the paper's methodological description.

high null result Toward a science of human–AI teaming for decision-making: A ... presence/absence of empirical datasets or causal identification studies in the p...

Key measurable outcomes to assess Human–AI teams include accuracy/efficiency, robustness to novel cases, decision consistency, trust/misuse rates, training costs, and inequity indicators.

Prescriptive list of metrics offered by the authors as part of the research agenda and evaluation guidance; not empirically derived from a dataset in the paper.

high null result Toward a science of human–AI teaming for decision-making: A ... accuracy, efficiency, robustness, consistency, trust/misuse rates, training cost...

Empirical evaluation strategies for Human–AI teams should include randomized interventions, field trials, lab experiments, phased rollouts (difference-in-differences), and structural models that allow interaction terms between human skill and AI quality.

Methodological recommendation in the paper; suggested study designs rather than implemented analyses.

high null result Toward a science of human–AI teaming for decision-making: A ... appropriate empirical identification of team-level complementarities and causal ...

Research priorities include empirical measurement of task‑level automation rates, firm and industry productivity effects, wage impacts across occupations, and diffusion patterns.

Paper's stated research agenda and identification of measurement gaps; based on methodological critique of current evidence base.

high null result How AI Will Transform the Daily Life of a Techie within 5 Ye... future empirical research outputs on automation rates, productivity, wage impact...

Measuring these productivity gains will be challenging because quality improvements, faster iteration, and creative outputs are harder to price/observe than lines of code.

Methodological argument about measurement difficulty; based on conceptual considerations, not empirical validation.

high null result How AI Will Transform the Daily Life of a Techie within 5 Ye... observability and measurability of productivity gains (availability of suitable ...

Measuring AI's economic impact requires new metrics that account for decision-value uplift, reduced tail-risk exposures, and dynamic gains from continuous learning; causal identification will require experiments or staggered rollouts.

Methodological recommendation backed by conceptual discussion of measurement challenges; no implementation of such measurement approaches is reported in the paper.

high null result Next-Generation Financial Analytics Frameworks for AI-Enable... proposed measurement constructs (decision-value uplift, tail-risk reduction, lea...

Performance and evaluation should be measured using forecast accuracy, decision lift/value added, latency, and false positive/negative rates.

Paper-prescribed evaluation metrics; presented as recommended practice rather than derived from empirical testing within the paper.

high null result Next-Generation Financial Analytics Frameworks for AI-Enable... forecast accuracy, decision lift (value added), system latency, false positive/n...

Core AI techniques for these frameworks include supervised/unsupervised ML, NLP for unstructured text, anomaly detection for control/transaction monitoring, and reinforcement/prescriptive models for recommendations.

Methodological claim listing standard ML/NLP/anomaly-detection techniques and prescriptive approaches; statement of methods rather than an empirical comparison of alternatives.

high null result Next-Generation Financial Analytics Frameworks for AI-Enable... method adoption/type metrics (e.g., frequency of supervised vs. unsupervised met...

Next‑gen frameworks use large-scale structured (transactions, ledgers, KPIs) and unstructured sources (reports, news, contracts, call transcripts) to power models.

Descriptive claim listing data types the paper recommends; presented as design input requirements rather than empirically validated data-integration projects.

high null result Next-Generation Financial Analytics Frameworks for AI-Enable... data coverage and diversity (e.g., proportion of structured vs. unstructured inp...

There is a need for quantitative studies and microdata on firm-level RM practices, AI adoption, and performance outcomes to measure effect sizes and causal pathways.

Stated research gaps and limitations in the review (lack of primary empirical quantification; heterogeneity across contexts).

high null result The Role of Risk Management as an Organizational Management ... availability of quantitative evidence on RM effects (effect sizes, causal estima...

The review's conclusions are limited by reliance on published literature (potential bias toward successful implementations), lack of primary empirical quantification (no effect sizes), and heterogeneity across organizational contexts limiting direct generalizability.

Explicit limitations stated in the paper summarizing scope and method (qualitative literature review, secondary evidence only).

high null result The Role of Risk Management as an Organizational Management ... generalizability and empirical precision of review findings

Heterogeneity in system designs and deployment contexts complicates cross-site comparisons.

Limitations section and observed variation in platform architectures, degrees of automation, and governance across sites reported via descriptive data and interviews.

high null result The Role of Artificial Intelligence in Healthcare Complaint ... comparability across deployment sites (heterogeneity in systems and contexts)

Non-random selection of institutions limits causal inference and external generalizability of the study's findings.

Study limitations explicitly state non-random site selection and heterogeneous deployments; methodological note that causal claims are constrained.

high null result The Role of Artificial Intelligence in Healthcare Complaint ... generalizability and causal inference validity

The study uses a quantitative, cross-sectional survey-based research design of managers and educational administrators and employs descriptive statistics, correlation, and regression analyses.

Methods described in the summary explicitly state research design and analytical techniques; this is a methodological claim rather than an empirical substantive finding. (Sample size not provided in summary.)

high null result Algorithmic Trust and Managerial Effectiveness: The Role of ... research design / analytic approach (methodological description)

Estimation/calibration, stability assessment, and global sensitivity methods used: parameters calibrated/estimated on 2016–2023 data; equilibrium located; Jacobian eigenvalues computed for local stability; variance-based global sensitivity analysis performed over parameter space.

Methods section: description of parameter estimation/calibration, equilibrium computation, Jacobian-based stability analysis, and variance-based global sensitivity analysis.

high null result Governance of Technological Transition: A Predator-Prey Anal... methodological procedures applied (estimation, stability analysis, GSA)

The main empirical conclusions are based on a short annual panel (2016–2023) and a stylized aggregate interaction model; results should be interpreted with caution due to potential omitted variables, aggregation bias, and limited sample size.

Explicit limitations listed in the paper: short time series (eight annual observations), national aggregate data, simplified model structure, no firm/sector heterogeneity, possible endogeneity/measurement issues.

high null result Governance of Technological Transition: A Predator-Prey Anal... validity/robustness of empirical conclusions (limitations)

The empirical analysis uses annual, national-level aggregate Chinese series for 2016–2023 as proxies for AI capital, physical capital stock, and labor compensation (wage bill).

Data description in Data & Methods: annual Chinese aggregate series 2016–2023. Implied sample length: 2016–2023 inclusive (8 annual observations); national-level aggregates, no firm-level heterogeneity modeled.

high null result Governance of Technological Transition: A Predator-Prey Anal... AI capital proxy; physical capital stock; labor compensation (wage bill)

The paper models interactions among AI capital, physical capital, and labor using a Lotka–Volterra (predator–prey type) system adapted to include self-limiting (saturation) terms.

Model specification described in Methods: deterministic Lotka–Volterra system with added self-limitation terms for three stocks (AI capital, physical capital, labor).

high null result Governance of Technological Transition: A Predator-Prey Anal... model structure / interaction specification (no single dependent variable)

Key methodological details are missing or not reported: training/test split, cross-validation scheme, hyperparameter tuning, treatment of confounders/endogeneity, exact definition/measurement of the outcome, and whether results were validated out-of-sample or in field trials.

Summary lists these specific missing methodological elements as not provided in the paper.

high null result AI in food inequality: Leveraging artificial intelligence to... methodological reporting completeness

The paper does not report (or the summary omits) the sample size and full provenance of the Indian farm dataset.

Summary explicitly states that sample size and full provenance of the Indian dataset are not reported.

high null result AI in food inequality: Leveraging artificial intelligence to... reporting completeness for dataset (sample size/provenance)

Data sources used are FAO and Kaggle datasets for global context and a proprietary/field Indian farm dataset for modeling.

Paper cites FAO and Kaggle for global context and uses a proprietary Indian farm-level dataset for the core modeling work (summary notes that full provenance not reported).

high null result AI in food inequality: Leveraging artificial intelligence to... data provenance/source

The chosen ML technique is gradient boosting regression.

Explicit statement in the methods section that gradient-boosting regression was used for modeling.

high null result AI in food inequality: Leveraging artificial intelligence to... modeling technique used

Features used in modeling include pesticide/fertilizer use, farm size, crop type, harvest date, and climatic variables.

Listed predictor variables in the paper's modeling/methods section.

high null result AI in food inequality: Leveraging artificial intelligence to... predictor variables used in the ML model (feature list)

Instrumental-variable (IV) estimation is used to address endogeneity of AI adoption and to identify causal effects on employment and wages.

Paper states IV identification strategy applied to the 38-country panel; robustness checks and alternative specifications reported (paper refers to instrument details in full text).

high null result Artificial Intelligence and Labor Market Transformation: Emp... Causal estimate identification strategy for employment and wage outcomes

The AI Adoption Index is constructed as a composite measure combining enterprise investment in AI, AI-related patent filings, and workforce/firm surveys on AI use across 38 OECD countries (2019–2025).

Paper's methodological description of the index construction; data sources enumerated as investment, patenting, and survey measures over the panel period.

high null result Artificial Intelligence and Labor Market Transformation: Emp... AI adoption intensity (composite index)

The paper is entirely theoretical/analytical and does not report an empirical dataset.

Paper methodology section and abstract state primary tool is an analytical economic model; no empirical data or sample sizes are reported.

high null result Janus-Faced Technological Progress and the Arms Race in the ... presence/absence of empirical dataset

The same formal framework can be interpreted as a firm-level model where human skill investment maps onto AI/chatbot investment decisions.

Paper provides an alternative interpretation and formally maps agent skill-investment choices into an analogous firm R&D/AI-capital decision problem within the same mathematical framework.

high null result Janus-Faced Technological Progress and the Arms Race in the ... conceptual mapping between individual skill investment and firm AI investment (m...

There is a need for standardized metrics and measurement protocols for public-sector productivity and non-market outcomes (service quality, processing time, cost per transaction, transparency, trust).

Methodological critique within the review pointing to heterogeneity of outcome measures across studies and calling for standardized metrics; based on synthesis of reviewed literature.

high null result Digital Transformation and AI Adoption in Government: Evalua... existence/adoption of standardized measurement protocols and consistency of repo...

Much of the literature on public-sector digital/AI interventions is descriptive or case-based; causal, quantitative evidence on net productivity effects is limited and context-dependent.

Methodological assessment within the review noting heterogeneous study designs, reliance on secondary sources, and a lack of randomized or quasi-experimental studies; the review explicitly states this limitation.

high null result Digital Transformation and AI Adoption in Government: Evalua... availability of causal quantitative estimates of productivity impacts

Research and monitoring priorities for economists include task-level analyses of substitutability/complementarity, modeling adoption as a function of regulatory costs and reimbursement incentives, and evaluating long-run welfare and distributional effects.

Explicit research recommendations stated in the narrative review, based on gaps identified in the literature and evolving empirical questions.

high null result Will AI Replace Physicians in the Near Future? AI Adoption B... research activity in recommended areas; quality of evidence informing policy

Policymakers and payers should consider liability reform, reimbursement models that reward safe human–AI collaboration, funding for independent clinical validation, and measures to prevent market concentration.

Policy recommendations and implications derived from the narrative review's synthesis of regulatory, economic, and implementation challenges.

high null result Will AI Replace Physicians in the Near Future? AI Adoption B... policy actions implemented (liability reform, reimbursement changes, funding all...

Research priorities include causal studies on AI’s impacts on SME productivity, employment and inequality in LMICs; cost–benefit analyses of financing and policy interventions; evaluation of data governance models; and development of metrics/monitoring systems for inclusive adoption.

Authors' identification of evidence gaps from the structured literature review highlighting areas with insufficient causal or evaluative research.

high null result Artificial Intelligence Adoption for Sustainable Development... existence and quality of targeted causal and evaluative research on AI in LMIC S...

Empirical causal evidence on long-run welfare, distributional outcomes, and labor effects of AI in LMIC SMEs remains thin.

Gap identified through the structured review: few causal studies (e.g., RCTs, natural experiments) addressing long-run effects in LMIC SME contexts.

high null result Artificial Intelligence Adoption for Sustainable Development... availability of causal evidence on welfare, distributional effects, and labor ou...

Heterogeneity in SME types and sectors limits the generalizability of findings about AI adoption and impacts.

Authors' methodological limitation noted in the review: the evidence base spans diverse firm sizes, sectors, and contexts, constraining broad generalization.

high null result Artificial Intelligence Adoption for Sustainable Development... generalizability of reviewed findings across SMEs and sectors

Theoretical framing integrates Resource-Based View (RBV), Dynamic Capabilities (DC), Technology–Organization–Environment (TOE), and Diffusion of Innovation (DOI) to explain how firm resources, learning capacity, organizational and environmental factors shape AI adoption.

Conceptual synthesis performed as part of the literature review; integration based on existing theoretical literature rather than primary empirical testing.

high null result Artificial Intelligence Adoption for Sustainable Development... explanatory scope for AI adoption drivers (theoretical coherence rather than an ...

The systematic review followed PRISMA protocol and analyzed a corpus of 103 items (peer‑reviewed articles and institutional reports) published 2010–2024.

Explicit methodological statement in the paper describing PRISMA use and corpus size/timeframe.

high null result Models, applications, and limitations of the responsible ado... review methodology and corpus characteristics (sample size, timeframe)

Further longitudinal cost-benefit studies, scalability benchmarks, and cross-domain trials are needed to determine when on-prem RAG is the dominant economic choice.

Paper's research & evaluation recommendations calling for additional longitudinal and cross-domain empirical work; presented as a recommendation rather than an empirical finding.

high null result An Empirical Study on the Feasibility Analysis of On-Premise... need for further empirical evidence (longitudinal cost-benefit, scalability, cro...

Human-in-the-loop judgments were central to the paper's relevance/usefulness claims rather than relying solely on synthetic benchmarks.

Methods description explicitly states human evaluation by domain experts was used alongside quantitative benchmarks.

high null result An Empirical Study on the Feasibility Analysis of On-Premise... evaluation method (use of human expert judgments vs synthetic benchmarks)

Research gaps remain: quantifying welfare gains from specific AI applications in extraction (productivity, safety, emissions), evaluating cost-effectiveness of policy bundles, and estimating dynamic returns to data ecosystems and human capital.

Identification of gaps from literature and data coverage in the comparative analysis; calls for future empirical and modelling work.

high null result ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... magnitude of welfare gains from AI applications; cost-effectiveness metrics for ...

« Prev 1 2 3 … 36 37 38 … 159 160 Next »