Evidence (7953 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Across benchmarks, agents consistently rediscover known hardware optimization patterns without domain-specific training.

Qualitative and empirical observations across the evaluated benchmarks (12) reporting that agents found recognized hardware optimization patterns despite no hardware-specific training.

medium positive Agent Factories for High Level Synthesis: How Far Can Genera... discovery of known hardware optimization patterns by agents

This work demonstrates the technical feasibility of scalable, AI-augmented quality assessment for early childhood education and lays a foundation for continuous, inclusive AI-assisted evaluation enabling systemic improvement and equitable growth.

Overall results of dataset release, Interaction2Eval performance (agreement), and deployment efficiency reported in the paper; used by the authors to argue broader feasibility and potential systemic impact.

medium positive When AI Meets Early Childhood Education: Large Language Mode... feasibility and systemic impact of AI-augmented assessment

AI-assisted monitoring could shift assessment practice from annual expert audits to monthly AI-assisted monitoring with targeted human oversight.

Authors' synthesis combining dataset-scale results, Interaction2Eval performance (agreement), and deployment efficiency gains to argue feasibility of more frequent monitoring.

medium positive When AI Meets Early Childhood Education: Large Language Mode... frequency of quality monitoring (audit cadence)

These findings provide quantitative foundations for AI capability-threshold governance.

Synthesis/interpretation of model results and empirical validation described in the paper (recommendation/implication).

medium positive The enrichment paradox: critical capability thresholds and i... usefulness of model results for governance design

Digital transformation enhances the relational embeddedness among cities, and this enhanced relational embeddedness facilitates improved outcomes in collaborative innovation (mediating mechanism).

Mediation analysis / network metric analysis using city-level relational embeddedness measures computed from patent collaboration networks and digital transformation indicators from A-share listed companies (2011–2021).

medium positive How Does Digital Transformation Affect Cross-Regional Collab... relational embeddedness among cities and its mediating effect on collaborative i...

The work advances theory on human performance in complex negotiations and offers validated design guidance for interactive systems.

Authors' stated contributions: theoretical advancement and validated design guidance, grounded in the presented empirical results and the validated visualization tested in the N=32 experiment.

medium positive From Overload to Convergence: Supporting Multi-Issue Human-A... theoretical insight and design guidance validity

Robust arbitrage strategies remain profitable even when generalized across different domains (claim reiteration emphasizing cross-domain profitability and robustness).

Repeated/strengthened claim in the paper referencing multiple experiments and robustness checks across domains.

medium positive Computational Arbitrage in AI Model Markets cross-domain profitability of arbitrage strategies

An arbitrageur can efficiently allocate inference budget across providers to undercut the market, creating a competitive offering with no model-development risk.

Methodological description and empirical demonstration in the paper showing arbitrageur strategies that allocate inference budget across multiple providers to create a competitive service without incurring model-development risk.

medium positive Computational Arbitrage in AI Model Markets ability to undercut market prices and create competitive offering without model ...

Arbitrage reduces market segmentation and facilitates market entry for smaller model providers by enabling earlier revenue capture.

Reported analysis and/or experiments suggesting arbitrage homogenizes offerings (reduces segmentation) and allows smaller providers to capture revenue earlier through arbitrage-enabled routes.

medium positive Computational Arbitrage in AI Model Markets market segmentation and ease of market entry for smaller model providers

Robust arbitrage strategies that generalize across different domains remain profitable.

Reported experiments indicating that arbitrage strategies generalized beyond the primary SWE-bench domain and still yielded profit (authors state robust strategies remain profitable across domains).

medium positive Computational Arbitrage in AI Model Markets profitability of arbitrage strategies across multiple domains

Arbitrage is viable in AI model markets (we empirically demonstrate the viability of arbitrage and illustrate its economic consequences).

Empirical experiments and analyses presented in the paper (case study on SWE-bench and additional experiments on arbitrage strategies).

medium positive Computational Arbitrage in AI Model Markets viability/profitability and economic impact of arbitrage strategies

The paper introduces the Distributed Human Data Engine (DHDE), a socio-technical framework previously validated in biological crisis management, and adapts it for regional economic flow optimization.

Author statement describing the DHDE and asserting prior validation in biological crisis management; adaptation described in paper (methodological description).

medium positive Engineering Distributed Governance for Regional Prosperity: ... methodological/framework adaptation

The ACT represents the first open-source effort to consolidate data on Africa's evolving HPC landscape, aiming to encourage more transparency from local AI stakeholders and facilitate broader access for AI developers.

Authors' characterization of ACT as a novel, open-source consolidation; assertion based on literature/tools review performed by the authors and on the tool's stated goals.

medium positive Take the Train: Africa at the Crossroad of Modern AI transparency and access to HPC resources for AI developers

This systematic framework can help predict at a detailed level where today's AI systems can and cannot be used and how future AI capabilities may change this.

Interpretive/utility claim: authors argue that the ontology plus classification results serve as rough predictive tools for AI applicability across work activities.

medium positive Where can AI be used? Insights from a deep ontology of work ... predictive usefulness of the ontology for AI applicability across tasks

EnterpriseLab provides enterprises a practical path to deploying capable, privacy-preserving agents without compromising operational capability.

Conclusion drawn by the authors based on the platform design and the reported empirical results (performance parity with GPT-4o, cost reductions, benchmark robustness). The abstract offers this as a high-level takeaway rather than a quantified empirical claim.

medium positive EnterpriseLab: A Full-Stack Platform for developing and depl... practicality of enterprise deployment balancing capability, privacy, and operati...

Training humans to develop teamwork competencies, independent from task training, can enhance collaboration and performance in human-agent teams (HATs).

Overall experimental findings in KeyWe: task-independent teamwork training (<30 min) was associated with higher delegation, more strategy-based assignment, and better performance under difficulty for trained teams compared to controls.

medium positive Teaming Up With an AI Agent: Training Humans to Develop Huma... collaboration_and_performance_in_HATs (composite claim based on delegation, assi...

Trained teams demonstrated resilience by achieving higher task performance when the game difficulty increased.

Performance comparison under increased difficulty in the KeyWe game between teams with trained humans and teams without training; task performance measured (score or completion metric) showed trained teams performed better under harder conditions.

medium positive Teaming Up With an AI Agent: Training Humans to Develop Huma... task_performance_under_increased_difficulty

This pattern suggests that AI search may make hotel discovery less exclusively controlled by commission-based intermediaries (OTAs).

Interpretation/inference from the observed higher non-OTA citation shares for experiential queries in the audited Google Gemini sample; not a direct measurement of market outcomes such as bookings or commissions.

medium positive The End of Rented Discovery: How AI Search Redistributes Pow... degree of intermediary (OTA) control over hotel discovery

The results contribute to literature arguing that cloud-based GenAI is a source of enterprise value creation rather than merely an experimental technology.

Paper's stated addition to the existing literature based on the combined empirical and theoretical findings.

medium positive Measuring Business ROI of Generative AI Adoption on Azure Cl... enterprise value creation via GenAI

When compared to baseline approaches, the ARL-based model's accuracy in revenue and price optimization decreased by less than 20%, indicating that it can adapt and optimize pricing techniques in intricate, cutthroat markets.

Reported experimental comparison versus baselines (fixed/rule-based and cost-plus); specific metrics, dataset size, and whether 'decrease' refers to error or accuracy are not clarified in the excerpt.

medium positive The Application of Adaptive Reinforcement Learning in Dynami... accuracy in revenue and price optimization

Our results substantiate the potential of large language models as a foundational pillar for high-fidelity, scalable decision simulation and latter analysis in the real economy based on foundational database.

High-level conclusion drawn from the paper's experiments and methodological contributions; generalization claim asserting LLMs' potential as foundational tools for scalable, high-fidelity decision simulation.

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... potential of LLMs for high-fidelity, scalable decision simulation

Experiments demonstrate that our framework achieves improved simulation stability compared to existing economic and financial LLM simulation baselines.

Empirical claim: experiments vs. baselines showing improved simulation stability (paper statement that framework improved simulation stability, without quantitative details in the excerpt).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... simulation stability

Experiments demonstrate that our framework achieves significant improvements in purchase quantity prediction compared to existing economic and financial LLM simulation baselines.

Empirical claim: experiments comparing MALLES against existing baselines; paper reports 'significant improvements' in purchase quantity prediction (no numerical values provided in the excerpt).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... purchase quantity prediction accuracy

Experiments demonstrate that our framework achieves significant improvements in product selection accuracy compared to existing economic and financial LLM simulation baselines.

Empirical claim: experiments comparing MALLES against existing economic and financial LLM simulation baselines; paper reports 'significant improvements' in product selection accuracy (no numerical values provided in the excerpt).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... product selection accuracy

This preference-learning approach enables the models to internalize and transfer latent consumer preference patterns, thereby mitigating the data sparsity issues prevalent in individual categories.

Claim based on the paper's reported approach: cross-category post-training and transfer of latent preferences; supported by experiments (paper states mitigation of data sparsity).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... mitigation of data sparsity through cross-category preference transfer

Orchestrated systems of smaller, domain-adapted models can mathematically outperform frontier generalist models in most institutional deployment environments.

Formal conditions and comparative analysis derived in the paper plus referenced/claimed empirical support across several domains (frontier lab dynamics, alignment evolution, sovereign AI pressures).

medium positive Punctuated Equilibria in Artificial Intelligence: The Instit... relative institutional performance (smaller domain models vs. frontier generalis...

Debiasing via metadata redaction and explicit instructions restores detection in all interactive cases and 94% of autonomous cases.

Intervention experiments in Study 2 where metadata redaction and explicit instructions were applied to interactive assistants (e.g., GitHub Copilot) and autonomous agents (e.g., Claude Code); reported full restoration for interactive and 94% for autonomous.

medium positive Measuring and Exploiting Confirmation Bias in LLM-Assisted S... restoration of vulnerability detection (post-intervention detection rate)

An increasing number of enterprises are using the label of artificial intelligence merely as a cosmetic embellishment in their annual reports (the phenomenon of 'AI washing' is spreading).

Framing/background claim in the paper's introduction/abstract; implied support from the semantic analysis of annual report texts across Chinese A-share firms over 2006–2024.

medium positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... prevalence/trend of AI washing in annual reports

There are ethical imperatives of fairness and transparency in automated wealth management, and the paper proposes a roadmap toward sustainable and interpretable financial AI.

Normative analysis and proposed roadmap described in the paper; the excerpt does not provide operationalized fairness metrics, interpretability methods, or evaluation results.

medium positive Deep Reinforcement Learning for Dynamic Portfolio Optimizati... ethical compliance measures (fairness, transparency, interpretability) for autom...

In environments characterized by high-frequency data, non-linear dependencies, and stochastic market regimes, autonomous DRL agents can learn optimal sequential decision-making policies that offer a compelling alternative to static or rule-based allocation strategies.

Argument based on theoretical suitability of DRL for sequential decision problems and the paper's system-level investigation; excerpt does not report specific experimental datasets, sample sizes, benchmarks, or performance metrics.

medium positive Deep Reinforcement Learning for Dynamic Portfolio Optimizati... policy optimality / portfolio performance in complex market environments (implie...

The integration of Deep Reinforcement Learning (DRL) into portfolio management represents a significant evolution from classical Mean-Variance Optimization and modern econometric frameworks.

Conceptual comparison and synthesis presented in the paper; no empirical sample size or experimental results are provided in the excerpt to quantify the degree of improvement.

medium positive Deep Reinforcement Learning for Dynamic Portfolio Optimizati... methodological advancement in portfolio management (shift from static optimizati...

Blindfolding (anonymizing identifiers) allows verification of whether meaningful predictive signals persist (i.e., predictions reflect legitimate patterns rather than pre-trained recall of tickers).

Combined methodological-and-result claim: approach described (anonymization) plus stated objective and reported validation (negative controls and reported Sharpe under anonymization). Specific experimental protocol and quantitative results isolating the effect of anonymization are not provided in the excerpt.

medium positive Can Blindfolded LLMs Still Trade? An Anonymization-First Fra... persistence of predictive signal after anonymization (signal legitimacy)

On 2025 year-to-date (through 2025-08-01), the system achieved Sharpe 1.40 +/- 0.22 across 20 random seeds.

Backtest/performance claim: reported Sharpe ratio with reported uncertainty and a sample size of 20 seeds; time window specified as 2025 YTD through 2025-08-01. No further details on portfolio construction, leverage, transaction costs, or benchmark adjustment provided in the excerpt.

medium positive Can Blindfolded LLMs Still Trade? An Anonymization-First Fra... Sharpe ratio (mean and +/- presumably standard error or standard deviation) over...

Regulatory sandboxes offer a flexible and innovation-friendly governance model compared to traditional command-and-control mechanisms.

Normative and comparative analysis within a law & economics framework; no empirical performance data reported in the abstract.

medium positive Experimentalism beyond ex ante regulation: A law and economi... flexibility of governance and degree of innovation-friendliness

Comparative insights from FinTech identify the institutional design features necessary to ensure the effectiveness and resilience of regulatory sandboxes.

Comparative case-based reasoning drawing on FinTech regulatory sandbox experience (abstract does not report number or selection of cases).

medium positive Experimentalism beyond ex ante regulation: A law and economi... presence and performance of institutional design features (effectiveness/resilie...

AI regulatory sandboxes may correct specific government failures, including regulatory capture, rent-seeking, and knowledge gaps.

Analytical claims supported by comparative reasoning (FinTech examples) and economic analysis of government failure; no empirical testing or sample size reported in the abstract.

medium positive Experimentalism beyond ex ante regulation: A law and economi... incidence/severity of government failures such as regulatory capture, rent-seeki...

AI regulatory sandboxes facilitate iterative regulatory learning while promoting responsible AI innovation.

Theoretical argument using experimentalist governance concepts and law & economics reasoning; comparative insights referenced but no empirical sample detailed in the abstract.

medium positive Experimentalism beyond ex ante regulation: A law and economi... degree of regulatory learning and indicators of responsible AI innovation

AI regulatory sandboxes can reduce negative externalities associated with AI deployment.

Conceptual and economic analysis in the paper (no empirical quantification or sample size reported in the abstract).

medium positive Experimentalism beyond ex ante regulation: A law and economi... magnitude/frequency of negative externalities (e.g., harms from AI systems)

AI regulatory sandboxes can mitigate information asymmetries between regulators and firms.

Analytical application of an economic analysis of law framework; theoretical argumentation rather than reported empirical measurement in the abstract.

medium positive Experimentalism beyond ex ante regulation: A law and economi... level of information asymmetry between regulators and AI firms

A well-established legal framework for data privacy (e.g., PIPL) enhances the benefits of big data for corporate performance.

Inference drawn from the observed stronger positive big-data effect on firm value after PIPL implementation, as reported by the paper's moderation analysis.

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... firm performance / firm value

Robust sensitivity tests confirm the main findings, indicating that the results are not driven by model specification or sample selection.

Paper reports multiple robustness/sensitivity checks (unspecified in summary) that the authors state produce consistent results supporting the primary conclusions.

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... firm value

The positive impact of big data on firm performance is strengthened following the implementation of China's Personal Information Protection Law (PIPL).

Moderation/interacted-specification analysis in the paper comparing pre- and post-PIPL periods (or interacting big-data measure with a PIPL indicator), showing a larger positive effect on firm value after PIPL implementation.

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... firm value / firm performance

The positive effect of big data on firm value operates through improving operational efficiency and reducing costs.

Mechanism analysis reported in the paper indicating mediation/channel tests where big data adoption is associated with measures of operational efficiency and cost reductions, which in turn relate to higher firm value.

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... operational efficiency; operating costs; firm value

Big data application significantly improves firm value.

Results from fixed-effects regressions on the 2007–2021 panel showing a statistically significant positive coefficient for the big-data keyword-frequency measure on firm value (paper reports significance and effect direction).

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... firm value

It is optimal to start taxing AI when cognitive workers start to consider switching to manual jobs.

Analytical result derived from the extended dynamic taxation model and its comparative-static/optimal-policy analysis; the timing rule for introducing an AI tax follows from the model's equilibrium conditions and welfare optimization.

medium positive Workers' Incentives and the Optimal Taxation of AI optimal timing of initiating taxation on AI (triggered by cognitive workers' inc...

The model implies testable governance diagnostics linking latent fragility to observable patterns: recorded dissent (anonymous vs. formal voting gaps), scenario-set diversity, pipeline and method concentration, and anchor lag.

Theoretical mapping from model primitives and observable quantities to proposed diagnostics; the paper enumerates observable patterns that should correlate with model-implied fragility. This is a theoretical implication rather than an empirically validated claim.

medium positive Cohesion as Concentration: Exclusion-Driven Fragility in Fin... observable diagnostics (recorded dissent patterns, voting gaps, scenario diversi...

The clearest added value of AI over structured self-reflection lies in increasing felt accountability.

Based on RCT comparisons showing no significant AI advantage over the written-reflection questionnaire on overall goal progress, but showing higher perceived social accountability in the AI condition and a significant mediation of the AI effect on progress via perceived accountability (indirect effect = 0.15, 95% CI [0.04, 0.31]).

medium positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... perceived social accountability and resulting goal progress

AI-assisted goal setting can improve short-term (two-week) goal progress.

Aggregate interpretation based on the RCT finding that the AI condition outperformed the no-support control on two-week goal progress (d = 0.33, p = .016); two-week follow-up window specified in study.

medium positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... short-term goal progress (self-reported at two weeks)

The AI increased perceived social accountability relative to the written-reflection questionnaire.

Reported comparison from the RCT showing higher perceived social accountability in the AI condition versus the written-reflection condition; measured via self-report scales at follow-up (exact scale and statistics reported in paper).

medium positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... perceived social accountability (self-report)

JobMatchAI provides factor-wise explanations through resume-driven search workflows.

Paper states that the system gives factor-wise explanations and ties them to resume-driven workflows; the excerpt references interpretable reranking and demo artifacts but does not include user study or explanation-faithfulness metrics.

medium positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... explainability: factor-wise explanations presented to users within resume-driven...

« Prev 1 2 3 … 105 106 107 … 159 160 Next »