Evidence (13870 claims)
Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 196 | 98 | 892 | 1984 |
| Governance & Regulation | 817 | 394 | 188 | 121 | 1544 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 627 | 233 | 123 | 96 | 1088 |
| Research Productivity | 411 | 123 | 56 | 332 | 933 |
| Output Quality | 467 | 178 | 59 | 47 | 751 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 167 | 122 | 24 | 496 |
| Task Allocation | 207 | 64 | 71 | 32 | 379 |
| Skill Acquisition | 165 | 59 | 60 | 17 | 301 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 52 | 107 | 13 | 279 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 150 | 48 | 26 | 3 | 227 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 63 | 20 | 12 | 184 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 93 | 21 | 13 | 19 | 148 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Creative Output | 31 | 17 | 7 | 3 | 59 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
EnterpriseLab provides a modular environment exposing enterprise applications via a Model Context Protocol, enabling seamless integration of proprietary and open-source tools.
Feature/design claim in paper; supported by implementation details of the 'Model Context Protocol' and reported integration capabilities in the platform description.
We introduce EnterpriseLab, a full-stack platform that unifies tool integration, data generation, and training into a closed-loop framework.
System/design claim describing the contribution of the paper (platform implementation and architecture); supported by the paper's implementation description rather than independent validation.
AIGQ overcomes limitations of traditional HintQ methods (shallow semantics, poor cold-start performance, and low serendipity) that arise from reliance on ID-based matching and co-click heuristics.
Claimed comparative advantage in the abstract; implied support from the paper's offline and online experiments but no detailed quantitative comparisons provided in the abstract.
Extensive offline evaluations and large-scale online A/B experiments on Taobao demonstrate that AIGQ consistently delivers substantial improvements in key business metrics across platform effectiveness and user engagement.
Empirical claim supported by unspecified offline evaluations and large-scale online A/B testing on Taobao as stated in the abstract. The abstract does not report sample sizes, metric names, or numerical effect sizes.
A hybrid offline-online deployment architecture composed of AIGQ-Direct (nearline personalized user-to-query generation) and AIGQ-Think (reasoning-enhanced trigger-to-query mappings) enables meeting strict real-time and low-latency requirements while enriching interest diversity.
System/architecture description in the paper; the abstract states the two-component architecture and its intended operational benefits (real-time/low-latency and increased diversity). The paper references large-scale online deployment and experiments but no concrete latency numbers in the abstract.
IL-GRPO is enhanced by a model-based reward from the online click-through rate (CTR) ranking model.
Methodological detail in the paper: inclusion of a model-based reward signal derived from an online CTR ranking model to augment the policy optimization; described in abstract as part of IL-GRPO's design.
Interest-aware List Group Relative Policy Optimization (IL-GRPO) is a novel policy gradient algorithm with a dual-component reward mechanism that jointly optimizes individual query relevance and global list properties.
Algorithmic contribution described in the paper (policy gradient design and dual-component reward). The abstract states this design and that it is used in experiments; no numeric effect sizes provided in the abstract.
Interest-Aware List Supervised Fine-Tuning (IL-SFT) is a list-level supervised learning approach that constructs training samples through session-aware behavior aggregation and interest-guided re-ranking to faithfully model nuanced user intent.
Methodological description in the paper: definition of IL-SFT and its training sample construction; supported implicitly by offline evaluations and downstream experiments referenced in the paper (no sample size or numeric results given in abstract).
AIGQ is the first end-to-end generative framework for the HintQ (pre-search query recommendation) scenario.
Explicit novelty/assertion in the paper's introduction/abstract claiming AIGQ as the first end-to-end generative framework for HintQ; no numerical experiment used to support the 'first' claim (methodological/positioning claim).
Organizations can design more effective recruitment strategies by signaling AI adoption to increase attractiveness to prospective applicants.
Practical implication drawn from the combined experimental findings (Study 1 N = 145; Study 2 N = 240; total N = 385) showing AI-adoption signals increase organizational attractiveness via perceived innovation ability, particularly for applicants with high AI self-efficacy.
Conceptualizing AI adoption as an organizational signal extends signaling theory to the context of technology-infused recruitment.
Theoretical argumentation in the paper, supported by the two experimental studies (Study 1 and Study 2) that test signaling mechanisms in recruitment contexts.
The positive indirect effect of AI-adoption signals on organizational attractiveness via perceived innovation ability is stronger for job seekers with high AI self-efficacy (Study 2 moderated mediation).
Study 2: moderated mediation model showing AI self-efficacy moderates the mediated relationship; sample size N = 240; participants were active job seekers.
Perceived innovation ability mediates the positive association between AI-adoption signals and organizational attractiveness (Study 2).
Study 2: moderated mediation analysis in an experiment recruiting active job seekers; sample size N = 240; mediation of AI-signal -> perceived innovation ability -> organizational attractiveness was validated.
AI-adoption signals are significantly positively associated with organizational attractiveness (Study 1).
Study 1: scenario-based experiment comparing AI-adoption signal vs no-signal conditions; sample size N = 145.
The paper reports details from a 100% deployment of DRL with policy regularizations on Alibaba's e-commerce platform, Tmall.
Direct statement in the abstract claiming full deployment across Tmall; implies a real-world, company-scale deployment but the abstract provides no operational metrics or counts.
Imposing policy regularizations improves the final performance of several DRL methods for inventory management.
Empirical claim supported by the paper's synthetic experiments and reported production deployment on Alibaba/Tmall (as stated in the abstract); no quantitative effect sizes provided in the abstract.
Imposing policy regularizations, grounded in classical inventory concepts such as 'Base Stock', can significantly accelerate hyperparameter tuning for DRL methods.
Paper reports synthetic experiments and a production deployment (Alibaba/Tmall) where policy regularizations were applied; abstract claims acceleration in hyperparameter tuning but does not report numeric tuning-time metrics in the abstract.
Deep Reinforcement Learning (DRL) provides a general-purpose methodology for training inventory policies that can leverage big data and compute.
Argument/assertion made in the paper's introduction/abstract (conceptual claim about DRL capabilities); no empirical sample or quantitative test reported in the abstract.
Human-replacing technologies have a strategic role in enhancing industrial productivity and ensuring the long-term resilience of Ukraine’s mining and metallurgical sector amid workforce shortages and structural labour-market changes due to war and demographic decline.
Integrated sectoral assessment in the paper combining current context (workforce shortages, structural changes), literature on technology-driven productivity/resilience, and industry-specific considerations; presented as a high-level conclusion.
Integrating ergonomic assessments and human–systems–interaction approaches into automation projects is important to prevent cognitive overload, occupational stress and operational risks for control‑room operators.
Recommendation and emphasis in the paper, supported by references to ergonomics and human-factors literature; presented as a preventive/mitigative approach rather than a quantified empirical result for the sector.
Successful technological modernization requires continuous investment in human capital, reskilling and the development of digital and engineering competencies.
Policy/recommendation based on the paper's synthesis of the sector analysis and literature on skill requirements and technology adoption; not presented as an original empirical estimate in the summary.
Higher robot density is associated with productivity gains, particularly in low-robotized sectors such as Ukraine’s mining and metallurgical industry.
Empirical evidence cited from international and industry-specific studies reviewed in the paper (literature review/meta-analytic style evidence); no Ukraine-specific causal estimate with sample size reported in the summary.
Human-replacing technologies also have an indirect impact on productivity by increasing total factor productivity (TFP).
Analytical argumentation in the paper supported by references to empirical studies showing TFP effects of automation/digitalization; literature synthesis rather than a new econometric estimate presented for Ukraine.
Human-replacing technologies (mechanization, automation, robotization, digitalization and AI-augmentation) make a direct contribution to labour productivity growth in Ukraine's mining and metallurgical sector.
Sectoral analysis and synthesis in the paper drawing on empirical international and industry-specific studies; literature review of productivity impacts of mechanization/automation/robotization/digitalization/AI in industrial contexts.
Industrial intelligence and the digital economy can be leveraged as a 'dual engine' to boost regional TFCP and advance high-quality green and low-carbon economic development, supporting differentiated regional coordination policies.
Synthesis/implication drawn from the paper's empirical findings (SDM results on 30 provinces, 2010–2023) showing positive total/spillover effects and regional heterogeneity.
Green finance has an insignificant positive effect on regional TFCP.
Coefficient on green finance control variable in the Spatial Durbin Model (30 provinces, 2010–2023) is positive but not statistically significant.
The digital economy presents different regional driving patterns: a 'local-spillover dual drive' in the east, a 'local-dominated drive' in the central region, and a 'spillover-dominated drive' in the west.
Regional/subsample Spatial Durbin Model estimates for digital economy variables across east, central, and west subsamples (30 provinces, 2010–2023) with reported direct and indirect effects.
The digital economy exerts a significantly positive direct effect on local TFCP and a strong positive spatial spillover effect, forming a 'local driving + spatial radiation' promotion pattern.
Spatial Durbin Model estimates on panel data (30 provinces, 2010–2023) showing statistically significant positive direct and indirect (spillover) coefficients for digital economy variables.
Regional TFCP shows significant positive spatial autocorrelation.
Spatial analysis (Spatial Durbin Model and spatial statistics) applied to panel of 30 provincial-level regions; reported significant spatial autocorrelation (e.g., positive Moran's I implied).
Across 378 hardware validated experiments, concise human-expert skills with structured expert knowledge enable near-perfect success rates across platforms.
Reported experimental results: 378 hardware-validated experiments across platforms comparing agent configurations; finding reported that human-expert skills produce near-perfect success rates (no numeric success rate provided in excerpt).
Large language models (LLMs) and agentic systems have shown promise for automated software development.
Statement in paper referencing prior successes of LLMs and agentic systems for automated software development (no empirical data reported in this excerpt).
Trained participants more often assigned tasks to the agent by defining strategies compared to participants who did not receive teamwork training.
Behavioral measure in experiment (frequency of assigning tasks using defined strategies) comparing trained vs. untrained participants in the KeyWe game with a scripted agent.
Participants who received the training delegated a higher percentage of tasks to the agent than participants who did not receive teamwork training.
Between-subjects comparison in KeyWe testbed with a scripted agent; measured percentage of tasks delegated by participants in trained vs. untrained groups.
A HAT training intervention that took less than 30 minutes was developed to train humans on seven teamwork competencies.
Study description: developed a training intervention under 30 minutes targeting seven teamwork competencies; implemented as part of the experiment.
The largest gains appear when AI is embedded in an orchestrated workflow rather than deployed as an isolated coding assistant.
Central thesis supported by comparisons across five delivery configurations (traditional baseline and V1–V4) in a retrospective longitudinal field study of the Chiron platform applied to three real software modernization programs; authors observe greater portfolio-level improvements when AI is integrated into coordinated workflows.
V3 and V4 add acceptance-criteria validation, repository-native review, and hybrid human-agent execution, simultaneously improving speed, coverage, and issue load.
Observed differences across the five delivery configurations (baseline, V1–V4) in the field study of three modernization programs; authors link feature additions in V3/V4 to measured improvements in stage durations, coverage, and validation issues.
First-release coverage rises from 77.0% to 90.5% across the portfolio as platform versions progress.
Observed first-release coverage measured in the retrospective longitudinal field study of three real modernization programs, reported as percentages across delivery configurations.
Validation-stage issue load falls from 8.03 to 2.09 issues per 100 tasks across the portfolio as platform versions progress.
Observed outcomes from the retrospective field study on three programs; validation-stage issues counted and normalized per 100 tasks across delivery configurations.
Modeled senior-equivalent effort falls from 1080.0 to 139.5 SEE-days under the platform configurations studied.
Modeled senior-equivalent effort computed from the study's staffing scenarios and observed outputs across the three real programs.
Modeled raw effort falls from 1080.0 to 232.5 person-days under the platform configurations studied (baseline -> V4 aggregate).
Modeled outcomes computed from observed task volumes and explicit staffing scenarios in the retrospective longitudinal field study covering three real programs.
Portfolio totals move from 36.0 to 9.3 summed project-weeks under baseline staffing assumptions (across the three studied programs and five delivery configurations).
Retrospective longitudinal field study of the Chiron platform applied to three real software modernization programs (COBOL banking migration ~30k LOC, accounting modernization ~400k LOC, .NET/Angular mortgage modernization ~30k LOC); observed and modeled outcomes were aggregated to produce portfolio totals under explicit staffing scenarios.
There exist reserves for optimizing the interaction of artificial intelligence with the labor market, and it is necessary to adapt AI to the specifics of national economic models.
Conclusions drawn from the envelope-model results showing heterogeneity across countries and implied gaps/opportunities for policy and adaptation; the paper emphasizes policy implications and the need for AI adaptation to national economic specifics.
Certain countries can optimally transform AI diffusion into positive domestic labor-market outcomes (economic development and realization of human capital potential): the Netherlands, France, Portugal, Italy, and Malta.
Comparative envelope-model analysis across the sample of European Union countries produced a ranking or identification of countries judged able to optimally transform AI diffusion into labor-market and human-capital results; these five countries are named in the paper.
Introducing an 'AI Engineer' occupational category could catalyze population cohesion around the already-formed vocabulary, completing the co-attractor.
Speculative policy suggestion based on the co-attractor framework and empirical observation that vocabulary exists but population cohesion is absent.
Applied to 8.2 million US resumes (2022-2026), the method correctly identifies established occupations.
Empirical application of the method to a dataset of 8.2 million US resumes spanning 2022–2026; claim that results match known/established occupations (implies validation against existing taxonomy or known labels).
The co-attractor concept enables a zero-assumption method for detecting occupational emergence from resume data, requiring no predefined taxonomy or job titles: we test vocabulary cohesion and population cohesion independently, with ablation to test whether the vocabulary is the mechanism binding the population.
Methodological claim describing the approach applied to resume data: independent tests of vocabulary cohesion and population cohesion, plus ablation experiments. Supported by the method's implementation on the resume dataset.
A genuine occupation is a self-reinforcing structure (a bipartite co-attractor) in which a shared professional vocabulary makes practitioners cohesive as a group, and the cohesive group sustains the vocabulary.
Theoretical/conceptual proposal introduced by the authors as the defining mechanism for occupational emergence; motivates the detection method.
Occupations form and evolve faster than classification systems can track.
Argument supported by the paper's analysis approach and motivating observation; asserted as motivation for developing a detection method. No specific numerical test reported in the excerpt beyond the large resume dataset.
The effect is amplified in Japanese, where experiential queries draw 62.1% non-OTA citations compared to 50.0% in English.
Subset analysis by language within the audited sample comparing non-OTA citation shares for experiential queries in Japanese vs English; percentages reported in paper.
Experiential queries draw 55.9% of their citations from non-OTA sources, compared to 30.8% for transactional queries — a 25.1 percentage-point gap (p < 5 × 10^{-20}).
Quantitative comparison of citation-source types in the audited sample (1,357 citations across 156 queries), classifying queries as 'experiential' vs 'transactional' and computing share of citations from non-OTA sources; reported p-value indicates statistical test of difference.