Evidence (4793 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Productivity
Remove filter
Validation-stage issue load falls from 8.03 to 2.09 issues per 100 tasks across the portfolio as platform versions progress.
Observed outcomes from the retrospective field study on three programs; validation-stage issues counted and normalized per 100 tasks across delivery configurations.
Modeled senior-equivalent effort falls from 1080.0 to 139.5 SEE-days under the platform configurations studied.
Modeled senior-equivalent effort computed from the study's staffing scenarios and observed outputs across the three real programs.
Modeled raw effort falls from 1080.0 to 232.5 person-days under the platform configurations studied (baseline -> V4 aggregate).
Modeled outcomes computed from observed task volumes and explicit staffing scenarios in the retrospective longitudinal field study covering three real programs.
Portfolio totals move from 36.0 to 9.3 summed project-weeks under baseline staffing assumptions (across the three studied programs and five delivery configurations).
Retrospective longitudinal field study of the Chiron platform applied to three real software modernization programs (COBOL banking migration ~30k LOC, accounting modernization ~400k LOC, .NET/Angular mortgage modernization ~30k LOC); observed and modeled outcomes were aggregated to produce portfolio totals under explicit staffing scenarios.
Reinforcement learning (post-training) on our corpus improves downstream embodied manipulation performance.
Downstream evaluation described in the paper showing improved performance on embodied manipulation tasks after RL post-training on MultihopSpatial-Train.
Reinforcement learning (post-training) on our MultihopSpatial-Train corpus enhances intrinsic VLM spatial reasoning.
Experimental intervention: RL-based post-training on the authors' training corpus followed by evaluation on intrinsic spatial reasoning benchmarks (described in the paper).
We provide MultihopSpatial-Train, a dedicated large-scale training corpus intended to foster spatial intelligence in VLMs.
Dataset/resource contribution described in the paper (existence and intended use of MultihopSpatial-Train).
We propose Acc@50IoU, a complementary metric that simultaneously evaluates reasoning and visual grounding by requiring both answer selection and precise bounding box prediction.
Methodological contribution in the paper defining the Acc@50IoU metric and its intended use to measure combined answer correctness and bounding-box IoU >= 0.5.
We introduce MultihopSpatial, a comprehensive benchmark designed for multi-hop and compositional spatial reasoning, featuring 1- to 3-hop complex queries across diverse spatial perspectives.
Dataset/benchmark construction described in the paper (design and scope of MultihopSpatial).
Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments.
Conceptual/introductory statement in the paper motivating the work (literature-based argument about VLMs and VLA agents).
The findings position AI not merely as an operational tool but as a strategic orchestrator of regenerative production systems, offering a clear roadmap for accelerating circular transitions in line with the Sustainable Development Goals.
Conclusions drawn from the mixed-methods review (bibliometric analysis of 196 articles and systematic review of 104 studies) as reported in the abstract.
Artificial intelligence is emerging as a powerful driver of the circular economy (CE), enabling production systems to become more resource-efficient, less waste-intensive and strategically aligned with sustainability goals.
Mixed-methods assessment combining bibliometric network analysis (196 peer-reviewed articles, 2023–2024) and a systematic review of 104 studies, as reported in the abstract.
AI can reduce production scrap by as much as 30% in documented cases.
Systematic review of studies (paper reports a systematic review of 104 studies); the abstract cites documented cases showing up to 30% reduction in production scrap.
AI can increase resource-efficiency metrics by up to 25% in documented cases.
Systematic review of studies (paper reports a systematic review of 104 studies); the abstract states documented cases showing up to 25% increases in resource-efficiency metrics.
GenAI implementations that are strategically deployed in managed Azure cloud infrastructure provide a positive ROI over time when aligned with business processes, enterprise architecture, and performance metrics.
Conclusion drawn from the paper's mixed-method analysis (quantitative ROI modelling, cost–benefit analysis, and case study synthesis).
Close coupling among Azure OpenAI Service, Azure Machine Learning, and cost governance tooling (FinOps) significantly decreases overall cost of ownership and enhances scalability and compliance.
Architectural analysis of Azure-native GenAI services and cost/governance tooling reported in the paper.
Measurable ROI from GenAI on Azure is mainly driven by improvements in productivity, optimization of operational costs, faster decision making, and increased speed of innovation across business functions.
Reported results from the paper's mixed-method study combining quantitative ROI modelling and cost–benefit analysis plus qualitative synthesis of secondary enterprise case studies.
Microsoft Azure has become one of the first enterprise-scale platforms facilitating GenAI-driven change.
Statement in the paper's abstract asserting Azure's market position as an early enterprise-scale platform for GenAI.
Our empirics demonstrate that self-evolving AI offers a scalable and interpretable paradigm.
Empirical results on the U.S. equity market are cited as evidence; the paper claims scalability and interpretability based on those empirical demonstrations and the architecture of the system.
Applying this methodology to the U.S. equity market, long-short portfolios formed on the simple linear combination of signals deliver a return of 59.53% (annualized).
Empirical backtest/application to the U.S. equity market reported in the paper; specific annualized return percentage is provided. Sample period, universe, and number of observations not stated in the excerpt.
Applying this methodology to the U.S. equity market, long-short portfolios formed on the simple linear combination of signals deliver an annualized Sharpe ratio of 3.11.
Empirical backtest/application to the U.S. equity market reported in the paper; specific performance metric (annualized Sharpe) is provided. Sample period, universe, and number of observations not stated in the excerpt.
To mitigate data snooping biases, the closed-loop system imposes strict empirical discipline through out-of-sample validation and economic rationale requirements.
Description of model validation protocol in the paper (use of out-of-sample validation and economic rationale filters); supports claim that these steps are used to reduce data-snooping risk.
The approach operationalizes the model as a self-directed engine that endogenously formulates interpretable trading signals (rather than relying on sequential manual prompts).
Methodological description and implementation details in the paper describing how the model generates signals autonomously and interpretable outputs; empirical example applied to U.S. equity market is referenced to illustrate operation.
We develop an autonomous framework for systematic factor investing via agentic AI.
Statement of methodological contribution in the paper (framework description); no sample size or empirical test required for the descriptive claim.
The technology particularly benefits less experienced practitioners by providing comprehensive starting points for legal research, while experienced attorneys can use it for quality control and initial drafts.
Authors' interpretation of AI outputs from the experiment and reasoning about how those outputs map onto different practitioner needs (qualitative judgment).
The analysis reveals AI’s potential to transform law firm economics by dramatically reducing research time while maintaining analytical quality, though careful attorney oversight remains essential.
Inference from the experimental finding that four AI systems produced substantive analysis comparable to junior-associate work on one transcript and the stated observation about traditional research time (8–40 hours); authors' qualitative judgment about economic implications and need for oversight.
Statutory and regulatory citations proved generally accurate and useful.
Authors' examination of statutory and regulatory references produced by the four AI engines in the experiment, judged to be generally correct and helpful.
All four engines successfully spotted legal issues, assessed claim strengths and weaknesses, and suggested follow-up investigation—tasks that traditionally required eight to forty hours of junior attorney research time.
Observed outputs from the four AI engines on the single transcript showing issue-spotting, strengths/weaknesses assessment, and suggested follow-ups; comparison to typical junior attorney research time (stated as 8–40 hours).
Contemporary generative AI performs sophisticated legal analysis comparable to experienced associates, correctly identifying major employment law claims including ADA violations, Title VII discrimination, OSHA retaliation, FMLA interference, and workers’ compensation retaliation.
Qualitative assessment of outputs from the four AI engines applied to the single hypothetical transcript; comparison against expected legal claims (authors' judgment that outputs matched those an experienced associate would produce).
Four major generative AI engines—DeepSeek, Claude, ChatGPT, and Grok—are useful legal analysis tools for employment law practitioners.
Experimental evaluation in which a single hypothetical client interview transcript was submitted to each of the four AI systems and their outputs were assessed by the authors.
Policy recommendations: increase investment in AI research and expansion; promote AI-driven robotics in key sectors; provide targeted skilling programs for elderly workers; invest in digital infrastructure and the ageing industry; and leverage and develop elderly human capital to support inclusive and sustainable economic development.
Paper discussion/conclusion draws policy implications based on empirical finding that AI adoption mitigates negative ageing effects on GDP growth.
Robustness checks using the old-age dependency ratio as the proxy for ageing deliver consistent results.
Paper reports robustness verification: replacing the primary ageing measure with the old-age dependency ratio yields similar threshold/mitigation findings.
When AI adoption (industrial robot penetration) surpasses a critical threshold, the negative effect of ageing on GDP growth is significantly mitigated.
Threshold interaction result from panel threshold regression: AI adoption (robot penetration) as threshold variable; paper reports that beyond a critical robot-adoption threshold the negative ageing–GDP relationship is significantly weakened.
Organizational support and continuous learning are important to maximize the benefits of AI integration in startup environments.
Conclusions drawn from thematic analysis of interviews with 12 startup employees emphasizing need for organizational support and ongoing learning.
AI functions as a workforce augmentation tool that enhances human capabilities rather than replacing employees.
Reported perceptions from 12 startup employees in semi-structured interviews; thematic coding indicated view of AI as augmentation rather than replacement.
Most employees demonstrated progressive adjustment and competency improvement over time after initial adaptation.
Interview data from 12 startup employees with thematic analysis indicating progressive adjustment and competency gains over time.
AI improves employee performance by supporting more accurate decision-making and increasing work effectiveness and output quality.
Findings from semi-structured interviews of 12 startup employees, analyzed via thematic coding and frequency scoring, reporting improved decision accuracy and output quality with AI support.
AI integration contributes to competency development, particularly in digital literacy, analytical thinking, and adaptive learning.
Qualitative semi-structured interviews with 12 startup employees; thematic coding highlighted competencies (digital literacy, analytical thinking, adaptive learning).
AI significantly enhances employee productivity by accelerating task completion, reducing manual workload, and improving workflow efficiency.
Qualitative study using semi-structured interviews with 12 startup employees; data analyzed with thematic coding, frequency scoring, and visualized analysis.
Experiments highlight a reward anatomical structure that balances income, profit, efficiency, fairness, and customer retention, moving beyond income-only goals.
Experimental design / reward engineering reported in paper; claim supported by experiments (no quantitative metrics or sample size given in excerpt).
Training strength is validated by benchmarking against fixed, rule-based models and cost-plus in controlled experimentation.
Paper reports controlled experiments benchmarking ARL models against fixed/rule-based and cost-plus baselines; specific experimental design and sample sizes not provided in excerpt.
Inventory challenges are addressed by utilizing a curated dataset that has been enhanced through feature engineering, transformation, and systematic cleaning, providing reliable inputs for training.
Methodological claim about dataset curation and preprocessing used to train ARL agents; no dataset size or quantitative validation reported in excerpt.
Profitability in a dynamic marketplace is enhanced through an Adaptive Reinforcement Learning (ARL)-based pricing framework that utilizes Q-Learning and Deep Q-Networks (DQN) for real-time optimization in response to changing market conditions, competition, and inventory levels.
Paper proposes and experiments with an ARL-based pricing framework (methods include Q-Learning and DQN); validation claimed via benchmarking/controlled experimentation against baselines (details not provided in excerpt).
Dynamic pricing is crucial for maximizing revenue and maintaining competitiveness in markets with fluctuating demand, perishable goods, and diverse customer preferences.
Conceptual claim stated in paper's introduction/motivation; no empirical sample or experiment specified in the statement.
In the long term, big data promotes sustained improvements in individuals’ welfare.
Theoretical long-run growth analysis in the model showing that sustained data sharing leads to long-run welfare improvements (analytic/model-based, no empirical/sample data).
There exists an optimal level of data (big data) sharing that achieves the best balance between economic development and privacy, thereby maximizing individuals' welfare.
Analytical optimization within the theoretical macro model: model yields an interior optimum for data-sharing intensity that trades off economic gains and privacy costs (derivation/analytical result; no empirical test).
Structured intent representations (PPS) can improve alignment and usability in human–AI interaction, especially in tasks where user intent is inherently ambiguous.
Synthesis of experimental findings (rendered PPS better on goal_alignment overall, task-dependent gains concentrated in high-ambiguity business tasks) and the preliminary user survey.
A preliminary retrospective survey (N = 20) suggests a 66.1% reduction in follow-up prompts required, from 3.33 to 1.13 rounds, when using PPS.
Authors report a small retrospective survey of N = 20 respondents comparing number of follow-up prompt rounds required before vs after adopting PPS (self-reported).
We introduce goal_alignment, a user-intent-centered evaluation dimension, and find that natural-language-rendered PPS outperforms both simple prompts and raw PPS JSON on this metric.
Experimental comparison across the three prompt conditions using the goal_alignment evaluation dimension applied to the collected outputs (540 outputs across 60 tasks and 3 models), as judged by an LLM judge.
We propose a multi-agent discussion framework wherein specialized agents collaboratively process extensive product information, distributing cognitive load to alleviate single-agent attention bottlenecks and capturing critical decision factors through structured dialogue.
Method description: multi-agent discussion architecture described and implemented; claimed to distribute cognitive load and reduce single-agent attention bottlenecks (design + reported behavior).