Evidence (4793 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Productivity Remove filter

Validation-stage issue load falls from 8.03 to 2.09 issues per 100 tasks across the portfolio as platform versions progress.

Observed outcomes from the retrospective field study on three programs; validation-stage issues counted and normalized per 100 tasks across delivery configurations.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... validation-stage issues per 100 tasks

Modeled senior-equivalent effort falls from 1080.0 to 139.5 SEE-days under the platform configurations studied.

Modeled senior-equivalent effort computed from the study's staffing scenarios and observed outputs across the three real programs.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... senior-equivalent effort (SEE-days)

Modeled raw effort falls from 1080.0 to 232.5 person-days under the platform configurations studied (baseline -> V4 aggregate).

Modeled outcomes computed from observed task volumes and explicit staffing scenarios in the retrospective longitudinal field study covering three real programs.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... raw effort (person-days)

Portfolio totals move from 36.0 to 9.3 summed project-weeks under baseline staffing assumptions (across the three studied programs and five delivery configurations).

Retrospective longitudinal field study of the Chiron platform applied to three real software modernization programs (COBOL banking migration ~30k LOC, accounting modernization ~400k LOC, .NET/Angular mortgage modernization ~30k LOC); observed and modeled outcomes were aggregated to produce portfolio totals under explicit staffing scenarios.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... summed project-weeks (portfolio time)

Reinforcement learning (post-training) on our corpus improves downstream embodied manipulation performance.

Downstream evaluation described in the paper showing improved performance on embodied manipulation tasks after RL post-training on MultihopSpatial-Train.

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... embodied manipulation task performance

Reinforcement learning (post-training) on our MultihopSpatial-Train corpus enhances intrinsic VLM spatial reasoning.

Experimental intervention: RL-based post-training on the authors' training corpus followed by evaluation on intrinsic spatial reasoning benchmarks (described in the paper).

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... intrinsic spatial reasoning performance of VLMs

We provide MultihopSpatial-Train, a dedicated large-scale training corpus intended to foster spatial intelligence in VLMs.

Dataset/resource contribution described in the paper (existence and intended use of MultihopSpatial-Train).

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... training resource availability for spatial intelligence

We propose Acc@50IoU, a complementary metric that simultaneously evaluates reasoning and visual grounding by requiring both answer selection and precise bounding box prediction.

Methodological contribution in the paper defining the Acc@50IoU metric and its intended use to measure combined answer correctness and bounding-box IoU >= 0.5.

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... combined answer accuracy and box localization (reasoning + visual grounding)

We introduce MultihopSpatial, a comprehensive benchmark designed for multi-hop and compositional spatial reasoning, featuring 1- to 3-hop complex queries across diverse spatial perspectives.

Dataset/benchmark construction described in the paper (design and scope of MultihopSpatial).

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... ability to evaluate multi-hop and compositional spatial reasoning

Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments.

Conceptual/introductory statement in the paper motivating the work (literature-based argument about VLMs and VLA agents).

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... spatial reasoning capability as a foundational requirement

The findings position AI not merely as an operational tool but as a strategic orchestrator of regenerative production systems, offering a clear roadmap for accelerating circular transitions in line with the Sustainable Development Goals.

Conclusions drawn from the mixed-methods review (bibliometric analysis of 196 articles and systematic review of 104 studies) as reported in the abstract.

high positive Artificial intelligence as a catalyst for the circular econo... role of AI in enabling/regenerating production systems and accelerating circular...

Artificial intelligence is emerging as a powerful driver of the circular economy (CE), enabling production systems to become more resource-efficient, less waste-intensive and strategically aligned with sustainability goals.

Mixed-methods assessment combining bibliometric network analysis (196 peer-reviewed articles, 2023–2024) and a systematic review of 104 studies, as reported in the abstract.

high positive Artificial intelligence as a catalyst for the circular econo... resource efficiency and waste intensity of production systems

AI can reduce production scrap by as much as 30% in documented cases.

Systematic review of studies (paper reports a systematic review of 104 studies); the abstract cites documented cases showing up to 30% reduction in production scrap.

high positive Artificial intelligence as a catalyst for the circular econo... production scrap (waste generated during production)

AI can increase resource-efficiency metrics by up to 25% in documented cases.

Systematic review of studies (paper reports a systematic review of 104 studies); the abstract states documented cases showing up to 25% increases in resource-efficiency metrics.

high positive Artificial intelligence as a catalyst for the circular econo... resource-efficiency metrics

GenAI implementations that are strategically deployed in managed Azure cloud infrastructure provide a positive ROI over time when aligned with business processes, enterprise architecture, and performance metrics.

Conclusion drawn from the paper's mixed-method analysis (quantitative ROI modelling, cost–benefit analysis, and case study synthesis).

high positive Measuring Business ROI of Generative AI Adoption on Azure Cl... Return on Investment (ROI) over time

Close coupling among Azure OpenAI Service, Azure Machine Learning, and cost governance tooling (FinOps) significantly decreases overall cost of ownership and enhances scalability and compliance.

Architectural analysis of Azure-native GenAI services and cost/governance tooling reported in the paper.

high positive Measuring Business ROI of Generative AI Adoption on Azure Cl... overall cost of ownership, scalability, compliance

Measurable ROI from GenAI on Azure is mainly driven by improvements in productivity, optimization of operational costs, faster decision making, and increased speed of innovation across business functions.

Reported results from the paper's mixed-method study combining quantitative ROI modelling and cost–benefit analysis plus qualitative synthesis of secondary enterprise case studies.

high positive Measuring Business ROI of Generative AI Adoption on Azure Cl... business Return on Investment (ROI) driven by productivity, cost optimization, d...

Microsoft Azure has become one of the first enterprise-scale platforms facilitating GenAI-driven change.

Statement in the paper's abstract asserting Azure's market position as an early enterprise-scale platform for GenAI.

high positive Measuring Business ROI of Generative AI Adoption on Azure Cl... enterprise-scale platform adoption

Our empirics demonstrate that self-evolving AI offers a scalable and interpretable paradigm.

Empirical results on the U.S. equity market are cited as evidence; the paper claims scalability and interpretability based on those empirical demonstrations and the architecture of the system.

high positive Beyond Prompting: An Autonomous Framework for Systematic Fac... scalability and interpretability of the AI-driven investing approach

Applying this methodology to the U.S. equity market, long-short portfolios formed on the simple linear combination of signals deliver a return of 59.53% (annualized).

Empirical backtest/application to the U.S. equity market reported in the paper; specific annualized return percentage is provided. Sample period, universe, and number of observations not stated in the excerpt.

high positive Beyond Prompting: An Autonomous Framework for Systematic Fac... annualized portfolio return

Applying this methodology to the U.S. equity market, long-short portfolios formed on the simple linear combination of signals deliver an annualized Sharpe ratio of 3.11.

Empirical backtest/application to the U.S. equity market reported in the paper; specific performance metric (annualized Sharpe) is provided. Sample period, universe, and number of observations not stated in the excerpt.

high positive Beyond Prompting: An Autonomous Framework for Systematic Fac... portfolio Sharpe ratio

To mitigate data snooping biases, the closed-loop system imposes strict empirical discipline through out-of-sample validation and economic rationale requirements.

Description of model validation protocol in the paper (use of out-of-sample validation and economic rationale filters); supports claim that these steps are used to reduce data-snooping risk.

high positive Beyond Prompting: An Autonomous Framework for Systematic Fac... mitigation of data-snooping bias (robustness of signals)

The approach operationalizes the model as a self-directed engine that endogenously formulates interpretable trading signals (rather than relying on sequential manual prompts).

Methodological description and implementation details in the paper describing how the model generates signals autonomously and interpretable outputs; empirical example applied to U.S. equity market is referenced to illustrate operation.

high positive Beyond Prompting: An Autonomous Framework for Systematic Fac... interpretability and autonomy of generated trading signals

We develop an autonomous framework for systematic factor investing via agentic AI.

Statement of methodological contribution in the paper (framework description); no sample size or empirical test required for the descriptive claim.

high positive Beyond Prompting: An Autonomous Framework for Systematic Fac... autonomy of investment framework (methodological capability)

The technology particularly benefits less experienced practitioners by providing comprehensive starting points for legal research, while experienced attorneys can use it for quality control and initial drafts.

Authors' interpretation of AI outputs from the experiment and reasoning about how those outputs map onto different practitioner needs (qualitative judgment).

high positive Robot Wingman: Using AI to Assess an Employment Termination benefit to practitioners (training/assistance, drafting, quality control)

The analysis reveals AI’s potential to transform law firm economics by dramatically reducing research time while maintaining analytical quality, though careful attorney oversight remains essential.

Inference from the experimental finding that four AI systems produced substantive analysis comparable to junior-associate work on one transcript and the stated observation about traditional research time (8–40 hours); authors' qualitative judgment about economic implications and need for oversight.

high positive Robot Wingman: Using AI to Assess an Employment Termination law firm economics (research time reduction and analytical quality)

Statutory and regulatory citations proved generally accurate and useful.

Authors' examination of statutory and regulatory references produced by the four AI engines in the experiment, judged to be generally correct and helpful.

high positive Robot Wingman: Using AI to Assess an Employment Termination accuracy/usability of statutory and regulatory citations

All four engines successfully spotted legal issues, assessed claim strengths and weaknesses, and suggested follow-up investigation—tasks that traditionally required eight to forty hours of junior attorney research time.

Observed outputs from the four AI engines on the single transcript showing issue-spotting, strengths/weaknesses assessment, and suggested follow-ups; comparison to typical junior attorney research time (stated as 8–40 hours).

high positive Robot Wingman: Using AI to Assess an Employment Termination issue-spotting and assessment quality; implied time savings relative to traditio...

Contemporary generative AI performs sophisticated legal analysis comparable to experienced associates, correctly identifying major employment law claims including ADA violations, Title VII discrimination, OSHA retaliation, FMLA interference, and workers’ compensation retaliation.

Qualitative assessment of outputs from the four AI engines applied to the single hypothetical transcript; comparison against expected legal claims (authors' judgment that outputs matched those an experienced associate would produce).

high positive Robot Wingman: Using AI to Assess an Employment Termination ability to identify relevant legal claims and assess them

Four major generative AI engines—DeepSeek, Claude, ChatGPT, and Grok—are useful legal analysis tools for employment law practitioners.

Experimental evaluation in which a single hypothetical client interview transcript was submitted to each of the four AI systems and their outputs were assessed by the authors.

high positive Robot Wingman: Using AI to Assess an Employment Termination usefulness of AI as legal analysis tools (quality of analysis/output)

Policy recommendations: increase investment in AI research and expansion; promote AI-driven robotics in key sectors; provide targeted skilling programs for elderly workers; invest in digital infrastructure and the ageing industry; and leverage and develop elderly human capital to support inclusive and sustainable economic development.

Paper discussion/conclusion draws policy implications based on empirical finding that AI adoption mitigates negative ageing effects on GDP growth.

high positive Nonlinear effects of ageing population and AI on China’s GDP... policy actions to manage ageing-related economic challenges

Robustness checks using the old-age dependency ratio as the proxy for ageing deliver consistent results.

Paper reports robustness verification: replacing the primary ageing measure with the old-age dependency ratio yields similar threshold/mitigation findings.

high positive Nonlinear effects of ageing population and AI on China’s GDP... GDP growth (robustness of ageing effect and AI mitigation)

When AI adoption (industrial robot penetration) surpasses a critical threshold, the negative effect of ageing on GDP growth is significantly mitigated.

Threshold interaction result from panel threshold regression: AI adoption (robot penetration) as threshold variable; paper reports that beyond a critical robot-adoption threshold the negative ageing–GDP relationship is significantly weakened.

high positive Nonlinear effects of ageing population and AI on China’s GDP... GDP growth (mitigation of negative ageing effect by AI adoption)

Organizational support and continuous learning are important to maximize the benefits of AI integration in startup environments.

Conclusions drawn from thematic analysis of interviews with 12 startup employees emphasizing need for organizational support and ongoing learning.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... role of organizational support and continuous learning in realizing AI benefits

AI functions as a workforce augmentation tool that enhances human capabilities rather than replacing employees.

Reported perceptions from 12 startup employees in semi-structured interviews; thematic coding indicated view of AI as augmentation rather than replacement.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... AI role relative to job displacement (augmentation vs replacement)

Most employees demonstrated progressive adjustment and competency improvement over time after initial adaptation.

Interview data from 12 startup employees with thematic analysis indicating progressive adjustment and competency gains over time.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... progressive adjustment and competency improvement over time

AI improves employee performance by supporting more accurate decision-making and increasing work effectiveness and output quality.

Findings from semi-structured interviews of 12 startup employees, analyzed via thematic coding and frequency scoring, reporting improved decision accuracy and output quality with AI support.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... decision-making accuracy, work effectiveness, output quality

AI integration contributes to competency development, particularly in digital literacy, analytical thinking, and adaptive learning.

Qualitative semi-structured interviews with 12 startup employees; thematic coding highlighted competencies (digital literacy, analytical thinking, adaptive learning).

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... competency development (digital literacy, analytical thinking, adaptive learning...

AI significantly enhances employee productivity by accelerating task completion, reducing manual workload, and improving workflow efficiency.

Qualitative study using semi-structured interviews with 12 startup employees; data analyzed with thematic coding, frequency scoring, and visualized analysis.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... employee productivity (task completion speed, manual workload, workflow efficien...

Experiments highlight a reward anatomical structure that balances income, profit, efficiency, fairness, and customer retention, moving beyond income-only goals.

Experimental design / reward engineering reported in paper; claim supported by experiments (no quantitative metrics or sample size given in excerpt).

high positive The Application of Adaptive Reinforcement Learning in Dynami... reward structure balancing multiple objectives (income, profit, efficiency, fair...

Training strength is validated by benchmarking against fixed, rule-based models and cost-plus in controlled experimentation.

Paper reports controlled experiments benchmarking ARL models against fixed/rule-based and cost-plus baselines; specific experimental design and sample sizes not provided in excerpt.

high positive The Application of Adaptive Reinforcement Learning in Dynami... relative performance of ARL training vs. baselines (validation/benchmarking outc...

Inventory challenges are addressed by utilizing a curated dataset that has been enhanced through feature engineering, transformation, and systematic cleaning, providing reliable inputs for training.

Methodological claim about dataset curation and preprocessing used to train ARL agents; no dataset size or quantitative validation reported in excerpt.

high positive The Application of Adaptive Reinforcement Learning in Dynami... quality/reliability of training inputs with respect to inventory representation

Profitability in a dynamic marketplace is enhanced through an Adaptive Reinforcement Learning (ARL)-based pricing framework that utilizes Q-Learning and Deep Q-Networks (DQN) for real-time optimization in response to changing market conditions, competition, and inventory levels.

Paper proposes and experiments with an ARL-based pricing framework (methods include Q-Learning and DQN); validation claimed via benchmarking/controlled experimentation against baselines (details not provided in excerpt).

high positive The Application of Adaptive Reinforcement Learning in Dynami... profitability and pricing optimization in dynamic markets

Dynamic pricing is crucial for maximizing revenue and maintaining competitiveness in markets with fluctuating demand, perishable goods, and diverse customer preferences.

Conceptual claim stated in paper's introduction/motivation; no empirical sample or experiment specified in the statement.

high positive The Application of Adaptive Reinforcement Learning in Dynami... maximizing revenue and maintaining competitiveness

In the long term, big data promotes sustained improvements in individuals’ welfare.

Theoretical long-run growth analysis in the model showing that sustained data sharing leads to long-run welfare improvements (analytic/model-based, no empirical/sample data).

high positive Study on the impact of big data sharing on individuals’ welf... long-term growth of individuals' welfare

There exists an optimal level of data (big data) sharing that achieves the best balance between economic development and privacy, thereby maximizing individuals' welfare.

Analytical optimization within the theoretical macro model: model yields an interior optimum for data-sharing intensity that trades off economic gains and privacy costs (derivation/analytical result; no empirical test).

high positive Study on the impact of big data sharing on individuals’ welf... individuals' welfare maximization via optimal data-sharing level

Structured intent representations (PPS) can improve alignment and usability in human–AI interaction, especially in tasks where user intent is inherently ambiguous.

Synthesis of experimental findings (rendered PPS better on goal_alignment overall, task-dependent gains concentrated in high-ambiguity business tasks) and the preliminary user survey.

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... alignment_and_usability

A preliminary retrospective survey (N = 20) suggests a 66.1% reduction in follow-up prompts required, from 3.33 to 1.13 rounds, when using PPS.

Authors report a small retrospective survey of N = 20 respondents comparing number of follow-up prompt rounds required before vs after adopting PPS (self-reported).

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... number_of_follow-up_prompt_rounds_required

We introduce goal_alignment, a user-intent-centered evaluation dimension, and find that natural-language-rendered PPS outperforms both simple prompts and raw PPS JSON on this metric.

Experimental comparison across the three prompt conditions using the goal_alignment evaluation dimension applied to the collected outputs (540 outputs across 60 tasks and 3 models), as judged by an LLM judge.

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... goal_alignment

We propose a multi-agent discussion framework wherein specialized agents collaboratively process extensive product information, distributing cognitive load to alleviate single-agent attention bottlenecks and capturing critical decision factors through structured dialogue.

Method description: multi-agent discussion architecture described and implemented; claimed to distribute cognitive load and reduce single-agent attention bottlenecks (design + reported behavior).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... reduction of single-agent attention bottlenecks / distributed processing of prod...

« Prev 1 2 3 … 31 32 33 … 95 96 Next »