Evidence (14055 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

We re-recruited 530 participants from 52 countries two years after they gave their preferences in the PRISM dataset to evaluate personalised and non-personalised language models in blinded multi-turn conversations (large-scale within-subject experiment).

Study methodology reported in paper: within-subject experiment, re-recruitment of 530 participants from 52 countries, blinded multi-turn conversations comparing models.

high null result PRISM-X: Experiments on Personalised Fine-Tuning with Human ... experimental sample composition and study design

Personalisation is a standard feature of conversational AI systems used by millions; yet, the efficacy of personalisation methods is often evaluated in academic research using simulated users rather than real people.

Authors' literature and field observation stated in introduction; contextual claim about common practice in academic evaluations (no numeric experiment reported for this claim).

high null result PRISM-X: Experiments on Personalised Fine-Tuning with Human ... prevalence of simulation-based evaluation in academic research

The study includes Natural Language Processing (NLP) analysis of 5 million consumer contacts.

Methodological statement in the paper specifying the NLP data volume.

high null result Augmented Intelligence: Resolving the AI integration-obsoles... methodological sample (NLP consumer contacts)

The study includes surveys of 800 marketers.

Methodological statement in the paper specifying the survey sample size.

high null result Augmented Intelligence: Resolving the AI integration-obsoles... methodological sample (marketer survey)

The study includes AI adoption audits from 120 organizations.

Methodological statement in the paper specifying the audits sample size.

high null result Augmented Intelligence: Resolving the AI integration-obsoles... methodological sample (AI adoption audits)

LLM-generated solutions contain roughly the same number of ideas as participant-generated solutions.

Comparative analysis of idea counts within solutions reported in the paper; phrased as 'roughly the same number of ideas' (no numeric effect size provided in the abstract).

high null result "Like Taking the Path of Least Resistance": Exploring the Im... number of ideas per solution

The findings are consolidated via the AI Engineering Integration Framework and the Skills Transition Risk Matrix, which provide guidelines for strategically harnessing AI while safeguarding the Engineering profession.

Paper reports development of two conceptual/practical tools (framework and matrix) as outputs of the study; no validation details provided in abstract.

high null result The AI-engineering imperative - Navigating synergy and obsol... existence of the AI Engineering Integration Framework and Skills Transition Risk...

Case studies were performed covering five major industries.

Paper's reported methodology (number of case studies stated in abstract).

high null result The AI-engineering imperative - Navigating synergy and obsol... number of industry case studies

A Delphi study was conducted with 40 global experts.

Paper's reported methodology (Delphi sample explicitly stated in abstract).

high null result The AI-engineering imperative - Navigating synergy and obsol... Delphi panel size (experts consulted)

A comprehensive mixed-methods study was conducted, incorporating a survey of 320 organizations.

Paper's reported methodology (survey sample explicitly stated in abstract).

high null result The AI-engineering imperative - Navigating synergy and obsol... survey sample size (organizations surveyed)

AwareLLM was evaluated in a user study with 20 participants, compared to a standard LLM assistant across multiple tasks.

Experimental methods statement in paper; explicitly reports a user study and sample size.

high null result AwareLLM: A Proactive Multimodal Ecosystem for Personalized ... evaluation study (design)

Using an agent-based simulation of a multi-SKU convenience store environment, the study evaluates deployment efficiency, inventory responsiveness, and managerial cognitive reallocation.

Methodological claim: the paper reports an agent-based simulation experiment in a multi-SKU convenience store context; details such as number of simulations, parameter settings, or statistical results are not provided in the excerpt.

high null result From Configuration to Cognition: A Self-Configuring Agentic ... deployment efficiency; inventory responsiveness; managerial cognitive reallocati...

The dominant paradigm for AI agents is an "on-the-fly" loop in which agents synthesize plans and execute actions within seconds or minutes in response to user prompts.

Statement in paper presenting a characterization of current AI agent design; conceptual/observational claim with no empirical data or sample reported.

high null result Engineering Robustness into Personal Agents with the AI Work... paradigm_adoption

Persistent data gaps—especially concerning worker-level outcomes, informal labor, and non-Anglophone markets—warrant urgent research investment.

Authors' assessment based on scope of included studies and acknowledged limitations in observation windows and geographic/labor-form coverage.

high null result Creation, validation, obsolescence: observed evidence of AI-... availability of data on worker-level outcomes, informal labor, and non-Anglophon...

Following PRISMA 2020 guidelines, we systematically searched six academic databases (Scopus, Web of Science, EconLit, SSRN, IEEE Xplore, Google Scholar) for empirical studies documenting observed—not predicted—labor market changes since 2020; from 1,847 initial records, 94 studies meeting inclusion criteria were retained for qualitative synthesis and 42 for quantitative data extraction.

Methods: systematic literature search following PRISMA 2020 across six named databases; initial records = 1,847; retained = 94 for qualitative synthesis, 42 for quantitative extraction.

high null result Creation, validation, obsolescence: observed evidence of AI-... systematic_review_search_and_screen_counts (initial records; studies retained)

We thematically analysed twelve semi-structured interviews with SME owners and managers conducted in early 2025 using Atlas.ti, yielding 19 codes grouped into six categories.

Methods statement in the paper describing qualitative sample and analysis procedures.

high null result Artificial Intelligence, Social Capital, and Sustainable Emp... qualitative_analysis_results (codes/categories)

We examine the interplay between AI adoption, social capital formation, workforce dynamics, and sustainable development in Eastern Macedonia and Thrace (EMT), one of the EU's least developed regions.

Study context and scope as stated in the paper; empirical work conducted in EMT.

high null result Artificial Intelligence, Social Capital, and Sustainable Emp... regional_AI_adoption_and_social_capital_interplay

Research has concentrated on advanced urban economies, leaving the implications of AI for peripheral small and medium-sized enterprises (SMEs) operating under weak human capital, thin digital infrastructure, and constrained social capital — underexplored.

Statement in the paper contrasting existing research focus (advanced urban economies) with a lack of attention to peripheral SMEs; no empirical sample size for this bibliographic claim reported in the excerpt.

high null result Artificial Intelligence, Social Capital, and Sustainable Emp... research_coverage_of_peripheral_SMEs

Once functional deployment and operational investment are controlled for, worker-task use is not associated with employment declines.

Multivariate regression results reported in the paper using BTOS AI supplement data showing the coefficient on worker-task use becomes statistically indistinguishable from zero after controlling for functional deployment and operational investment; exact model details and sample size not provided in excerpt.

high null result The Microstructure of AI Diffusion: Evidence from Firms, Bus... association between worker-task AI use and employment change conditional on othe...

This study conducts an empirical analysis using data on industrial robots from the International Federation of Robotics (IFR) and panel data from 14 sub-sectors of China's manufacturing industry.

Statement in paper describing data and methods: use of IFR robot data combined with panel data covering 14 manufacturing sub-sectors (panel regression framework implied).

high null result Research on the impact of industrial robot application on th... data and sample composition (use of IFR robot data and panel of 14 sub-sectors)

The location of the Pareto frontier depends only on population characteristics, utility functions and the fairness score, but not on the technical design of the algorithm — the findings hold for pre-processing, in-processing, and post-processing approaches alike.

Theoretical proof/argument demonstrating that the Pareto frontier characterization is a function of distributions, utilities and fairness metric, independent of algorithmic implementation approach (pre-, in-, post-processing).

high null result Fairness vs Performance: Characterizing the Pareto Frontier ... dependence of Pareto frontier location on algorithmic design

Under the Brier score specifically, with type-independent inflation cost, the second-best welfare equals the first-best welfare (welfare equivalence).

Analytical result/proof specialized to the Brier score and the assumption of type-independent inflation costs; comparative welfare analysis in the model.

high null result The Endogeneity of Miscalibration: Impossibility and Escape ... principal welfare (second-best vs. first-best) under Brier scoring and type-inde...

Return forecasts are translated into long–short portfolios to assess economic performance.

Stated evaluation approach: conversion of predicted returns into long–short portfolios for economic/performance assessment.

high null result Optimizing stock market prediction and stock trading strateg... economic performance of long–short portfolios constructed from forecasts

The analysis is based on 30 market, liquidity, valuation, profitability, technical and risk factors and compares linear models, tree-based machine learning and deep learning architectures (including GRU, LSTM and Transformer) within a rolling-window forecasting framework.

Description of empirical design: use of 30 factor variables and explicit listing of model families (linear, tree-based, GRU, LSTM, Transformer) and use of a rolling-window forecasting setup.

high null result Optimizing stock market prediction and stock trading strateg... model comparison across 30 factors within rolling-window forecasting

We introduce the weighted evaluation index (WEI), a finance-specific performance metric that integrates prediction accuracy with market adaptability.

Methodological contribution stated in the paper: introduction of a new performance metric called WEI described as integrating accuracy and market adaptability.

high null result Optimizing stock market prediction and stock trading strateg... performance evaluation metric (WEI)

We introduce the Diff-RMSE method for nonlinear factor identification.

Methodological contribution stated in the paper: introduction of a new method named 'Diff-RMSE' for identifying nonlinear factors.

high null result Optimizing stock market prediction and stock trading strateg... method for nonlinear factor identification

The study uses A-share market data from 2013 to 2024 with equity and firm-characteristic data available from databases such as RESSET and CSMAR for more than 5,000 listed firms.

Empirical dataset description in the paper: time period 2013–2024, sources named (RESSET, CSMAR), and statement 'more than 5,000 listed firms'.

high null result Optimizing stock market prediction and stock trading strateg... dataset coverage (time span and number of firms)

The synthesis covers research and practitioner guidance from the years 2023–2025.

Methods statement specifying the temporal scope of sources used for the synthesis.

high null result Collaborative Intelligence in Accounting: A Human + AI Compl... other

This paper synthesizes recent research and practitioner guidance (2023–2025) to develop a practical model for designing human–AI collaboration in the financial reporting function (controllership).

Methods section declaration describing scope and approach (literature/practitioner guidance synthesis covering 2023–2025).

high null result Collaborative Intelligence in Accounting: A Human + AI Compl... organizational_efficiency

We conducted a controlled experiment comparing traditional task-splitting methods with AI-assisted approaches using GitLab Duo.

Methodological statement in the paper reporting a controlled experiment using GitLab Duo; sample size not stated in the provided summary.

high null result Splitting User Stories Into Tasks with AI -- A Foe or an All... method comparison (experimental design)

We audited 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central.

Direct data collection and audit described in the paper: dataset of 111,000,000 references from 2,500,000 papers across the four named preprint/repository sources.

high null result LLM hallucinations in the wild: Large-scale evidence from no... number of references audited / dataset coverage

Future research should test these findings across different institutional contexts, particularly European economies.

Paper's stated limitations and suggestions for future research.

high null result The Inverted-U Relationship Between AI and Corporate Innovat... recommendation for external validation across contexts

The analysis employs fixed-effects models, U-tests, bootstrap mediation, and patent text similarity analysis.

Methods statement listing econometric and text-analytic techniques used in the paper.

high null result The Inverted-U Relationship Between AI and Corporate Innovat... methods_used

The study uses a sample of 25,204 firm-year observations from Chinese A-share manufacturing companies (2010–2023).

Paper statement of sample and period; descriptive sample construction (firm-year observations = 25,204).

high null result The Inverted-U Relationship Between AI and Corporate Innovat... sample_description

The empirical analysis is based on Chinese A–share listed firms observed from 2012 to 2024 and uses a difference‑in‑differences (DID) identification strategy.

Study description in the paper's methods/abstract specifying sample period (2012–2024), population (Chinese A–share listed firms), and methodology (DID).

high null result Government-Guided Funds and Corporate Digital–Intelligent Tr... study design / data sample

We validate the framework empirically on five benchmarks (MATH, MMLU, TriviaQA, SimpleQA, LiveCodeBench) across eight models from five providers.

Empirical experiments reported in the paper using five named datasets and eight models from five providers (experimental evaluation / benchmarking).

high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... empirical validation of theoretical framework via experiments on benchmarks

For k-model cascades, first-order conditions imply a single shadow price that equalizes marginal quality-per-cost across stage boundaries.

Analytical derivation of first-order conditions for k-stage cascades within the decision-theoretic constrained-optimization framework presented in the paper.

high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... marginal quality-per-cost equality across cascade stages (first-order optimality...

Given a pool of k models, the frontier achievable by deterministic two-model threshold cascades is the pointwise envelope over choose(k,2) pairwise cascades, with switching points where the optimal pair changes.

Theoretical characterization/derivation in the paper (mathematical result about deterministic two-model threshold cascades and combinatorial envelope over pairwise cascades).

high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... achievable cost-quality frontier for a k-model pool under deterministic two-mode...

Reciprocal shadow prices link the budget-constrained and quality-constrained formulations of the cascade optimization.

Analytical derivation in the decision-theoretic framework using constrained optimization and duality presented in the paper.

high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... relationship between budget- and quality-constrained optimization formulations (...

For a two-model cascade, the cost-quality frontier is piecewise concave on decreasing-benefit regions of the confidence support.

Theoretical development in a decision-theoretic framework using constrained optimization and duality; proven properties for the two-model case reported in the paper (analytical result).

high null result Is Escalation Worth It? A Decision-Theoretic Characterizatio... shape of the cost-quality frontier (concavity properties) for two-model cascades

These results are robust to alternative model specifications, including different lag lengths and forecast horizons.

Robustness checks reported in the paper: re-estimation of TVP-VAR with alternative lag lengths and forecast horizons producing consistent qualitative results.

high null result Artificial Intelligence and Financial Market Connectedness: ... stability of connectedness findings across model specifications

The emergence of generative AI is not associated with a uniform increase in financial connectedness.

Empirical TVP-VAR analysis comparing connectedness measures before and after the emergence of generative AI (paper compares connectedness over the sample period and reports no uniform increase).

high null result Artificial Intelligence and Financial Market Connectedness: ... level of financial connectedness

This study uses daily data from January 2021 to December 2025 to analyze spillover dynamics among AI-related equities, cryptocurrencies, and traditional financial assets within a time-varying parameter vector autoregression (TVP-VAR) framework.

Statement of data frequency and sample period plus description of methodology (TVP-VAR) in the paper; empirical analysis applied to specified asset groups.

high null result Artificial Intelligence and Financial Market Connectedness: ... spillover dynamics / connectedness among asset classes

Under standard smoothness and finite variance conditions, SGD is minimax optimal for finding stationary points measured by l2-norms, thereby fundamentally precluding any complexity gains for sign-based methods in standard settings.

Theoretical statement based on prior minimax optimality results for SGD under standard smoothness and finite-variance assumptions (as cited/used in the paper). No new experiment; relies on worst-case lower-bound theory.

high null result When and Why SignSGD Outperforms SGD: A Theoretical Study Ba... minimax optimality for finding l2-norm stationary points (optimization complexit...

The boundaries (critical thresholds) separating the tax regimes are derived from the workers' budget constraint.

Analytic derivation in the paper showing that constraints coming from the workers' budget constraint produce critical values of τ_ai and τ_f that determine transitions between the three regimes.

high null result The Economic Singularity: Core Mathematical Model critical_thresholds for tax parameters

The model features quadratic self-amplification in both AI capability (λ A^2) and financial capital (γ_F K_f^2), coupled through investment flows.

Model specification and equations in the paper showing terms λ A^2 for AI capability growth and γ_F K_f^2 for financial capital growth, with explicit investment flow terms linking AI and financial capital.

high null result The Economic Singularity: Core Mathematical Model model_dynamics (self-amplification terms)

The study uses a panel dataset of 35,347 firm-year observations from 2010 to 2023.

Reported sample description in the paper: panel dataset covering 2010–2023 with 35,347 firm-year observations.

high null result When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... N/A (sample description)

AI-assisted decision-making paradigms do not have a significant direct effect on task performance.

Experimental study of 59 pre-service teachers using a two-factor mixed design (between-subjects: AI-assisted decision-making paradigms; within-subjects: human-AI consistency). Data analyzed with Bayesian cumulative link mixed model and structural equation modeling; authors report no significant direct effect.

high null result Shaping Human-AI Collaboration in Education: Effects of AI-A... task performance

In the U.S., no single 'AI Act' has passed (as of 2026).

Stated in the paper as a factual legal/policy status; this is verifiable via legislative records and is presented without an underlying sample (paper cites status as of 2026).

high null result Emerging AI Trends passage of a comprehensive federal 'AI Act' in the U.S.

The authors ran a within-subjects study comparing authoring AD from scratch against editing AI drafts of varying quality.

Explicit methodological statement in the paper (within-subjects study design); sample size not reported in the excerpt.

high null result Making AI Drafts Count: A Quality Threshold in Audio Descrip... comparison of authoring modes

« Prev 1 2 3 … 67 68 69 … 281 282 Next »