Evidence (4793 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Productivity Remove filter

The analysis implies specific implications for healthcare leadership and procurement (e.g., procurement and leadership should consider incentive and risk-allocation effects, not just task optimisation).

Authors' conclusions/recommendations drawn from the theoretical analysis and typology (prescriptive claim in the paper; no empirical evaluation reported in the abstract).

medium positive Incentives, Equilibria, and the Limits of Healthcare AI: A G... recommended focus of healthcare leadership and procurement decisions

Autonomous coding agents, able to create branches, open pull requests, and perform code reviews, now actively contribute to real-world projects.

Empirical observations reported in the dataset and study showing agent-originated branches, PRs, and review actions in open-source projects (paper asserts these actions occurred in real projects).

medium positive Investigating Autonomous Agent Contributions in the Wild: Ac... presence of agent-originated development activities (branches, PRs, reviews)

Workplace organization (W) materially modifies the augmentation function so that two firms with identical technology investments can realize 'radically different' augmentation outcomes.

Conceptual claim supported by the paper's theoretical model (phi(D,W)) and cited empirical illustration (Colombia EDIT survey interaction result).

medium positive From Automation to Augmentation: A Framework for Designing H... augmentation outcomes / returns to technology

Because other AI systems exhibit similar scaling-law economics, the mechanisms identified extend beyond computer vision, reinforcing that partial automation is often the economically rational long-run outcome, not merely a transitional phase.

Theoretical argument generalized from scaling-law evidence in the paper; no additional cross-domain empirical evidence reported in the summary.

medium positive Economics of Human and AI Collaboration: When is Partial Aut... prevalence of partial automation across AI application domains

These findings support the practical value of structured intent representation as a robust, protocol-like communication layer for human-AI interaction.

Aggregate interpretation of the experimental results (cross-language variance reduction, model compensation pattern, equivalence of structured frameworks, and user-study improvements).

medium positive Structured Intent as a Protocol-Like Communication Layer: Cr... practical utility / robustness of structured intent representations

We further provide initial evidence that this AI-for-AI paradigm can transfer beyond the AI stack through experiments in mathematics and biomedicine.

Reported preliminary experiments in mathematics and biomedicine intended to test transfer beyond the AI development stack.

medium positive ASI-Evolve: AI Accelerates AI transferability of AI-for-AI paradigm to domains outside core AI (mathematics an...

To our knowledge, ASI-Evolve is the first unified framework to demonstrate AI-driven discovery across three central components of AI development: data, architectures, and learning algorithms.

Authors' claim of primacy based on reported experiments demonstrating AI-driven discovery in pretraining data curation, neural architecture design, and reinforcement learning algorithm design.

medium positive ASI-Evolve: AI Accelerates AI breadth of AI-driven discovery across data, architectures, and learning algorith...

The growth of digital platforms contributes to the decentralization of job creation.

Paper cites contemporary data on the growth of digital platforms as part of its analysis (no specific platform-level datasets or sample sizes cited in the abstract).

medium positive AI Civilization and the Transformation of Work role of digital platforms in job creation / decentralization

The paper's predictions are consistent with practitioner reports.

Authors claim qualitative consistency with practitioner reports (no systematic survey/sample size provided in the provided text).

medium positive The Novelty Bottleneck: A Framework for Understanding Human ... qualitative alignment with practitioner experiences

The paper's predictions are consistent with empirical observations from scientific productivity data.

Authors state they compare model predictions to scientific productivity data (no sample sizes or dataset details provided in the provided text).

medium positive The Novelty Bottleneck: A Framework for Understanding Human ... consistency with scientific productivity patterns

The paper's predictions are consistent with empirical observations from AI coding benchmarks.

Authors state they compare model predictions to AI coding benchmark results (no sample sizes or specific benchmarks reported in the provided text).

medium positive The Novelty Bottleneck: A Framework for Understanding Human ... consistency with AI coding benchmark performance

An AI planner that uses a mix of static analysis with AI instructions can create migration plans for very complex code components that are reliably followed by the combination of an orchestrator and coders, using AI-generated example-based playbooks.

Methodological description and reported demonstrations in the paper (planner + orchestrator + coders following playbooks); no numeric sample size reported in abstract.

medium positive A Multi-agent AI System for Deep Learning Model Migration fr... reliability of migration plans being followed (plan adherence)

With experience, users issue more targeted queries and engage more deeply with supporting citations.

Longitudinal analysis of user behavior in the Asta dataset showing changes over time/with experience: increased use of targeted queries and higher engagement (clicks/inspect actions) with citations.

medium positive Understanding Usage and Engagement in AI-Powered Scientific ... targeted query frequency and citation engagement over user experience/time

Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways.

Interaction-log analysis showing patterns of revisits, non-linear navigation between generated outputs and cited evidence within sessions in the Asta dataset.

medium positive Understanding Usage and Engagement in AI-Powered Scientific ... revisit and navigation behavior (frequency of revisits, non-linear navigation pa...

Users treat the system as a collaborative research partner, delegating tasks such as drafting content and identifying research gaps.

Qualitative and quantitative analysis of interaction logs in the Asta dataset showing user behaviors where the system is used to draft content and identify gaps (examples and aggregated counts described in paper).

medium positive Understanding Usage and Engagement in AI-Powered Scientific ... frequency of delegation behaviors (drafting content, gap identification) in user...

Users submit longer and more complex queries than in traditional search.

Comparative analysis of query length/complexity in the Asta Interaction Dataset (>200,000 queries) versus traditional search baselines (as reported in the paper); measurement of query length and complexity metrics across logs.

medium positive Understanding Usage and Engagement in AI-Powered Scientific ... query length and complexity

ASR-assisted transcription offers a practical pathway toward scalable, technology-supported documentation of endangered languages.

Authors' interpretive conclusion based on the corpus creation, ASR model performance (CER ~15%), and reported reductions in transcription time/cognitive load; presented as a recommendation/implication rather than a directly measured outcome.

medium positive Automatic Speech Recognition for Documenting Endangered Lang... scalability of language documentation (feasibility/adoption implications)

ASR integration can substantially reduce cognitive load for transcribers.

Paper reports evaluation of ASR assistance including cognitive-load outcomes (authors claim cognitive load is reduced); details of measurement instrument, sample size, and statistical results are not given in the abstract.

medium positive Automatic Speech Recognition for Documenting Endangered Lang... cognitive load of transcribers

ASR integration can substantially reduce transcription time.

Paper reports an evaluation of the impact of ASR assistance on the efficiency of speech transcription (comparison of ASR-assisted vs manual transcription). The abstract asserts a substantial reduction in transcription time but does not provide numeric details in the provided text.

medium positive Automatic Speech Recognition for Documenting Endangered Lang... transcription time

Drawing on analysis of agentic investment firm operational models demonstrating 50-70% cost reductions while maintaining fiduciary standards.

Internal analysis/modeling of agentic investment firm operational models reported by the authors; paper states the 50–70% cost reduction result but provides no sample size or detailed empirical validation in the provided text.

medium positive STRENGTHENING FINANCIAL WORKFORCE COMPETITIVENESS: A CURRICU... operational costs of investment firms (cost reduction)

The proposed system architectures and findings provide practical implications for future development of agentic AI systems for engineering design.

Concluding/implicational claim based on the methods and experimental findings reported in the paper (battery pack design experiments); no empirical test of 'practical implications' is provided in the excerpt.

medium positive Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regul... practical implications for future development/adoption of agentic AI systems

Fostering digital transformation alongside workforce reskilling and innovation-ecosystem development is essential for sustainable industrial growth and strengthening Kazakhstan’s global economic position.

Policy and strategic recommendations based on the study's empirical results, case studies, and macro-level index comparisons.

medium positive Digitalization and labor costs: efficiency of industrial ent... sustainable industrial growth / global economic position

Digital transformation combined with workforce retraining optimizes labor costs and enhances productivity.

Synthesis of enterprise-level case examples and aggregated regression/correlation findings at industry and national levels that link digitalization and retraining programs to labor-cost and productivity indicators.

medium positive Digitalization and labor costs: efficiency of industrial ent... labor costs per unit of production

Overall, the DRL framework enhances traffic capacity and fuel efficiency without compromising safety.

Aggregate interpretation of simulation results comparing DRL-based AV control to IDM across capacity, fuel efficiency, and safety metrics within the simulated scenarios. Specific safety metrics and sample sizes are not described in the claim text.

medium positive Macroscopic Characteristics of Mixed Traffic Flow with Deep ... traffic capacity, fuel efficiency, and safety

These results establish agent scaling as a practical and effective axis for HLS optimization.

Synthesis/interpretation of empirical results (including mean 8.27× speedup and per-benchmark gains) reported in the paper.

medium positive Agent Factories for High Level Synthesis: How Far Can Genera... practical effectiveness of scaling the number of agents for HLS optimization

Across benchmarks, agents consistently rediscover known hardware optimization patterns without domain-specific training.

Qualitative and empirical observations across the evaluated benchmarks (12) reporting that agents found recognized hardware optimization patterns despite no hardware-specific training.

medium positive Agent Factories for High Level Synthesis: How Far Can Genera... discovery of known hardware optimization patterns by agents

This work demonstrates the technical feasibility of scalable, AI-augmented quality assessment for early childhood education and lays a foundation for continuous, inclusive AI-assisted evaluation enabling systemic improvement and equitable growth.

Overall results of dataset release, Interaction2Eval performance (agreement), and deployment efficiency reported in the paper; used by the authors to argue broader feasibility and potential systemic impact.

medium positive When AI Meets Early Childhood Education: Large Language Mode... feasibility and systemic impact of AI-augmented assessment

AI-assisted monitoring could shift assessment practice from annual expert audits to monthly AI-assisted monitoring with targeted human oversight.

Authors' synthesis combining dataset-scale results, Interaction2Eval performance (agreement), and deployment efficiency gains to argue feasibility of more frequent monitoring.

medium positive When AI Meets Early Childhood Education: Large Language Mode... frequency of quality monitoring (audit cadence)

The work advances theory on human performance in complex negotiations and offers validated design guidance for interactive systems.

Authors' stated contributions: theoretical advancement and validated design guidance, grounded in the presented empirical results and the validated visualization tested in the N=32 experiment.

medium positive From Overload to Convergence: Supporting Multi-Issue Human-A... theoretical insight and design guidance validity

The paper introduces the Distributed Human Data Engine (DHDE), a socio-technical framework previously validated in biological crisis management, and adapts it for regional economic flow optimization.

Author statement describing the DHDE and asserting prior validation in biological crisis management; adaptation described in paper (methodological description).

medium positive Engineering Distributed Governance for Regional Prosperity: ... methodological/framework adaptation

This systematic framework can help predict at a detailed level where today's AI systems can and cannot be used and how future AI capabilities may change this.

Interpretive/utility claim: authors argue that the ontology plus classification results serve as rough predictive tools for AI applicability across work activities.

medium positive Where can AI be used? Insights from a deep ontology of work ... predictive usefulness of the ontology for AI applicability across tasks

EnterpriseLab provides enterprises a practical path to deploying capable, privacy-preserving agents without compromising operational capability.

Conclusion drawn by the authors based on the platform design and the reported empirical results (performance parity with GPT-4o, cost reductions, benchmark robustness). The abstract offers this as a high-level takeaway rather than a quantified empirical claim.

medium positive EnterpriseLab: A Full-Stack Platform for developing and depl... practicality of enterprise deployment balancing capability, privacy, and operati...

The results contribute to literature arguing that cloud-based GenAI is a source of enterprise value creation rather than merely an experimental technology.

Paper's stated addition to the existing literature based on the combined empirical and theoretical findings.

medium positive Measuring Business ROI of Generative AI Adoption on Azure Cl... enterprise value creation via GenAI

When compared to baseline approaches, the ARL-based model's accuracy in revenue and price optimization decreased by less than 20%, indicating that it can adapt and optimize pricing techniques in intricate, cutthroat markets.

Reported experimental comparison versus baselines (fixed/rule-based and cost-plus); specific metrics, dataset size, and whether 'decrease' refers to error or accuracy are not clarified in the excerpt.

medium positive The Application of Adaptive Reinforcement Learning in Dynami... accuracy in revenue and price optimization

Our results substantiate the potential of large language models as a foundational pillar for high-fidelity, scalable decision simulation and latter analysis in the real economy based on foundational database.

High-level conclusion drawn from the paper's experiments and methodological contributions; generalization claim asserting LLMs' potential as foundational tools for scalable, high-fidelity decision simulation.

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... potential of LLMs for high-fidelity, scalable decision simulation

Experiments demonstrate that our framework achieves improved simulation stability compared to existing economic and financial LLM simulation baselines.

Empirical claim: experiments vs. baselines showing improved simulation stability (paper statement that framework improved simulation stability, without quantitative details in the excerpt).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... simulation stability

Experiments demonstrate that our framework achieves significant improvements in purchase quantity prediction compared to existing economic and financial LLM simulation baselines.

Empirical claim: experiments comparing MALLES against existing baselines; paper reports 'significant improvements' in purchase quantity prediction (no numerical values provided in the excerpt).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... purchase quantity prediction accuracy

Experiments demonstrate that our framework achieves significant improvements in product selection accuracy compared to existing economic and financial LLM simulation baselines.

Empirical claim: experiments comparing MALLES against existing economic and financial LLM simulation baselines; paper reports 'significant improvements' in product selection accuracy (no numerical values provided in the excerpt).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... product selection accuracy

This preference-learning approach enables the models to internalize and transfer latent consumer preference patterns, thereby mitigating the data sparsity issues prevalent in individual categories.

Claim based on the paper's reported approach: cross-category post-training and transfer of latent preferences; supported by experiments (paper states mitigation of data sparsity).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... mitigation of data sparsity through cross-category preference transfer

A well-established legal framework for data privacy (e.g., PIPL) enhances the benefits of big data for corporate performance.

Inference drawn from the observed stronger positive big-data effect on firm value after PIPL implementation, as reported by the paper's moderation analysis.

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... firm performance / firm value

Robust sensitivity tests confirm the main findings, indicating that the results are not driven by model specification or sample selection.

Paper reports multiple robustness/sensitivity checks (unspecified in summary) that the authors state produce consistent results supporting the primary conclusions.

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... firm value

The positive impact of big data on firm performance is strengthened following the implementation of China's Personal Information Protection Law (PIPL).

Moderation/interacted-specification analysis in the paper comparing pre- and post-PIPL periods (or interacting big-data measure with a PIPL indicator), showing a larger positive effect on firm value after PIPL implementation.

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... firm value / firm performance

The positive effect of big data on firm value operates through improving operational efficiency and reducing costs.

Mechanism analysis reported in the paper indicating mediation/channel tests where big data adoption is associated with measures of operational efficiency and cost reductions, which in turn relate to higher firm value.

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... operational efficiency; operating costs; firm value

Big data application significantly improves firm value.

Results from fixed-effects regressions on the 2007–2021 panel showing a statistically significant positive coefficient for the big-data keyword-frequency measure on firm value (paper reports significance and effect direction).

medium positive How Big Data Enhances Firm Value Under Data Privacy Regulati... firm value

It is optimal to start taxing AI when cognitive workers start to consider switching to manual jobs.

Analytical result derived from the extended dynamic taxation model and its comparative-static/optimal-policy analysis; the timing rule for introducing an AI tax follows from the model's equilibrium conditions and welfare optimization.

medium positive Workers' Incentives and the Optimal Taxation of AI optimal timing of initiating taxation on AI (triggered by cognitive workers' inc...

For AI agent tool design, surfacing contextual information outperforms prescribing procedural workflows.

Authors' conclusion drawn from the suite of experiments (GraphRAG vs TDD prompting vs auto-improvement) showing better regression reduction and/or resolution when contextual information is surfaced.

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... effectiveness in reducing regressions and improving resolution when using contex...

An autonomous auto-improvement loop raised resolution from 12% to 60% on a 10-instance subset with 0% regression.

Reported experiment on a 10-instance subset where an auto-improvement loop was applied (numbers provided in the excerpt).

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... resolution rate (increase from 12% to 60%) and regression rate (reported as 0%) ...

Smaller models benefit more from contextual information (which tests to verify) than from procedural instructions (how to do TDD).

Inferred from comparative results across models (Qwen3-Coder 30B vs Qwen3.5-35B-A3B) and interventions (contextual test-surfacing vs TDD prompting) reported in the paper.

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... relative improvement in regression rate and resolution when providing contextual...

When deployed as an agent skill, GraphRAG improved resolution from 24% to 32%.

Empirical comparison reported in the evaluation on SWE-bench Verified (same experimental context as above).

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... resolution rate (percentage of issues/problems resolved)

TDAD's GraphRAG workflow reduced test-level regressions by 70% (from 6.08% to 1.82%).

Empirical result reported from the SWE-bench Verified evaluation using the GraphRAG workflow (sample details: Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances as reported).

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... test-level regression rate (percentage of tests that regressed)

« Prev 1 2 3 … 59 60 61 … 95 96 Next »