Evidence (4114 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
In the retail stage, LLMs set retail prices, generate marketing slogans, and provide them to buyers through a role-based attention mechanism for purchase.
Methodological description of the retail-stage tasks and the role-based attention mechanism used to present offers to buyers.
In the procurement stage, LLMs bid for limited inventory in budget-constrained auctions.
Design specification of the benchmark describing procurement-stage mechanics (auction/bidding mechanism, budget constraints).
We construct a configurable multi-agent supply chain economic model where LLMs act as retailer agents responsible for procuring and retailing merchandise.
Methodological description in the paper detailing the simulated multi-agent supply chain environment and the role of LLMs as retailer agents.
We introduce Market-Bench, a comprehensive benchmark that evaluates the capabilities of LLMs in economically-relevant tasks through economic and trade competition.
Paper describes the design and release of Market-Bench as a benchmark/testbed (methodological contribution).
Analyses use fixed-effects regression and structural equation modeling (SEM) on panel data from OECD countries.
Methods statement in the paper indicating use of fixed-effects and SEM applied to OECD-country panel data.
This paper provides the first cross-country empirical validation of AI-augmented scientific evaluation systems.
Authors' stated novelty claim that prior work lacked cross-country empirical quantification and that their OECD panel study is the first such validation.
A one standard deviation increase in AIRC is associated with an 18–25% increase in scientific productivity.
Reported point estimate/range from regression/SEM results linking a 1 SD change in the constructed AIRC to productivity outcomes in the OECD panel.
AI-assisted evaluation significantly enhances scientific productivity.
Fixed-effects regression and structural equation modeling (SEM) applied to panel data from OECD countries; reported association between AIRC and research output.
We construct a novel AI Review Capability Index (AIRC).
Paper reports creation of a new composite index (AIRC) to measure national-level AI capability in peer review; constructed and applied to panel data from OECD countries.
Ablation experiments and scalability analysis verify the effectiveness of each core module of HGA-MADDPG.
Ablation study and scalability analysis reported in the paper; experiments removing or altering core modules and reporting comparative performance.
HGA-MADDPG maintains a cost reduction rate of 21.5% in a 120-node ultra-large-scale supply chain.
Scalability experiments reported in the paper on a 120-node simulated supply chain; reported cost reduction rate of 21.5% for HGA-MADDPG.
In the same extreme scenario of triple perturbation, HGA-MADDPG achieves a recovery time of 58 hours, outperforming existing methods.
Simulation experiments under triple perturbation reported in the paper; reported recovery time of 58 hours and stated superior performance relative to baselines.
In an extreme scenario of triple perturbation, HGA-MADDPG achieves a cost deviation rate of 29.6%, which is significantly better than existing methods.
Simulation experiments under an extreme scenario (triple perturbation) reported in the paper; comparison with existing methods and reported cost deviation rate of 29.6%.
In the same baseline scenario, HGA-MADDPG controls the stockout rate at 3.2%.
Simulation experiments reported in the paper (baseline four-level supply chain using real data), reporting a stockout rate of 3.2% for HGA-MADDPG compared to baselines.
In the same baseline scenario, HGA-MADDPG achieves a service level improvement rate of 42.8% compared with eight baseline algorithms.
Simulation experiments reported in the paper (baseline four-level supply chain using SCDL and WSN data), compared to eight baselines; reported 42.8% service level improvement.
In a baseline scenario (four-level supply chain, dynamic environment driven by real data from SCDL and WSN) and compared with eight baseline algorithms, HGA-MADDPG achieves a total cost reduction rate of 26.2%.
Simulation experiments reported in the paper: four-level supply chain baseline scenario driven by real data (SCDL and WSN), compared to eight baseline algorithms; reported aggregate result of 26.2% total cost reduction.
The paper constructs an adversarial disturbance and resilient training architecture that models three types of disturbances (demand mutation, node failure, transportation delay), adversarial agent injection, a dynamic environment replay buffer, and a two-stage training strategy.
Methodological description and implementation details of the training architecture and disturbance models in the paper.
An adaptive fusion weight based on marginal returns is designed to dynamically balance local and global credit.
Methodological description (design and incorporation of adaptive fusion weight in algorithm).
The algorithm quantifies the contribution of individual actions to sub-chain objectives and system-level indicators through local and global credit networks.
Methodological description and algorithm design (local and global credit networks described in the paper).
HGA-MADDPG introduces a hierarchical graph attention mechanism to dynamically represent the state of the supply chain network topology.
Methodological description and algorithm design presented in the paper (development and implementation of the hierarchical graph attention mechanism).
The proposal outlines a phased implementation roadmap from a voluntary pilot to mandatory certification within five years.
Proposal states a phased implementation timeline moving from voluntary pilot projects to mandatory certification within a five-year period; presented as a planned roadmap rather than a demonstrated outcome.
The governance structure for IASCA will be treaty-based and include anti-capture provisions.
Proposal explicitly proposes a treaty-based governance structure and states inclusion of anti-capture provisions; this is a design/policy prescription in the document rather than evidence-based finding.
IASCA employs a zero-knowledge testing architecture that evaluates model safety through behavioural probing without accessing proprietary weights, training data, or architecture.
Proposal describes a technical design: zero-knowledge testing via behavioural probes that does not require access to model weights, training data, or architecture; presented as a design feature without empirical validation or test results in the excerpt.
The International AI Safety Certification Authority (IASCA) is an independent, internationally governed body for mandatory pre-deployment safety certification of frontier AI models.
Explicit statement in the proposal describing IASCA as an independent, internationally governed authority and its role in mandatory pre-deployment certification; conceptual design, no empirical testing or implementation reported.
The taxonomy, feasibility classification, and mechanism-to-scenario mapping provide a technical foundation for policymakers and identify the R&D investments required before hardware-level governance can support verifiable international agreements.
Authors' synthesis and policy-focused conclusions based on the taxonomy, feasibility ratings, mapping, and threat analyses presented in the paper (conceptual/prescriptive).
We present an adversary-tiered threat analysis distinguishing commercial, non-state, and nation-state actors, arguing the appropriate security standard is tamper-evident assurance analogous to IAEA verification rather than absolute tamper-proofing.
Authors' adversary-model classification and normative argument recommending tamper-evident assurance (comparative reasoning with IAEA-style verification). Qualitative policy recommendation; no empirical experiment.
We map the taxonomy onto four governance scenarios: domestic regulation, bilateral agreements, multilateral treaty verification, and industry self-regulation.
Authors' scenario mapping exercise described in the paper (conceptual mapping of mechanisms to four named governance scenarios).
For each mechanism, we provide a technical description, a feasibility rating, and an identification of adversarial vulnerabilities.
Paper's stated content and structure: per-mechanism entries including technical descriptions, feasibility ratings, and adversarial vulnerability discussion (qualitative documentation).
This paper proposes a taxonomy of 20 hardware-level governance mechanisms, organised by function (monitoring, verification, enforcement) and assessed for technical feasibility on a four-point scale from currently deployable to speculative.
Authors' methodological contribution: a constructed taxonomy enumerating 20 mechanisms and an assigned four-point feasibility rating (documentation in the paper). No external sample size; based on authors' engineering analysis.
Multimodal GeoAI studies fuse multiple geospatial data modalities to tackle urban mobility tasks including accessibility mapping, demand forecasting, and origin–destination flow prediction.
Categorization of tasks addressed by the included multimodal GeoAI studies (synthesis from the surveyed papers, n=18).
To address these challenges, the paper proposes a structured research roadmap including equity-aware loss functions, adaptive multimodal fusion pipelines, participatory and human-in-the-loop workflows, and urban data trusts.
Authors' proposed agenda and recommendations presented in the discussion/conclusion of the paper (proposal, not empirically evaluated).
The paper examines emerging techniques such as knowledge graphs, federated learning, and explainable AI that support equity-relevant insights across diverse urban contexts.
Discussion and synthesis of methodological developments in the surveyed literature (reported within the review).
The review highlights the growing use of deep learning architectures in multimodal GeoAI for urban mobility.
Observed trend reported by the authors based on the systematic review of included studies (n=18).
The integration of artificial intelligence with geographic information science, combined with multimodal geospatial data fusion, provides powerful tools to diagnose and address mobility disparities by integrating heterogeneous data sources (satellite imagery, GPS trajectories, transit records, volunteered geographic information, social sensing).
Theoretical/methodological claim supported by examples and synthesis from the surveyed literature (the paper reviews multimodal GeoAI studies that fuse such data sources).
The risk of evolution selecting for deception could be mitigated if reproduction is based on purely objective criteria, rather than human judgment.
Prescriptive implication derived from the model analysis: argument that replacing human-judged fitness with objective criteria would reduce selection for deception (theoretical reasoning, not empirical test).
Assuming bounded fitness and a fixed probability that any AI reproduces a 'locked' copy of itself, fitness concentrates on the maximum reachable value.
Formal theorem/proof within the mathematical model under the stated assumptions (bounded fitness and fixed probability of locked self-reproduction).
As artificial intelligence systems (AIs) become increasingly produced by recursive self-improvement, a form of evolution may emerge, in which the traits of AI systems are shaped by the success of earlier AIs in designing and propagating their descendants.
Conceptual argument and motivation in the paper; development of a mathematical model of self-designing AIs to formalize this idea (theoretical, no empirical data or sample).
Compared to relationship-based debt, stable equity significantly promotes high-quality development in the high-end equipment manufacturing and new energy industries.
Comparative subgroup regression analysis on the same dataset (743 listed enterprises, 2014–2023) indicating that the coefficient for stable equity is significantly larger than that for relationship-based debt in the high-end equipment manufacturing and new energy industry subsamples.
The effects of two distinct forms of patient capital—stable equity and relationship-based debt—are more pronounced in promoting high-quality development in the new energy vehicle industry, energy conservation and environmental protection industry, biotechnology industry, new materials industry, and next-generation information technology industry.
Industry heterogeneity / subgroup analyses on the 2014–2023 panel of 743 listed firms showing stronger estimated effects of both stable equity and relationship-based debt on firm high-quality development within these specified industries.
The impact of patient capital on the high-quality development of enterprises exhibits regional heterogeneity: enterprises in the central region are more sensitive to patient capital in terms of high-quality development.
Subsample/regional heterogeneity analysis on the panel of 743 listed enterprises (2014–2023) comparing region-specific coefficients and finding a larger/stronger effect in the central region.
The application of artificial intelligence enhances the positive impact of patient capital on the high-quality development of enterprises in strategic emerging industries.
Moderation analysis using the same firm panel (743 listed enterprises, 2014–2023) that includes an interaction term between patient capital and measures of AI application, with the interaction reported as positive and statistically significant.
Patient capital promotes the high-quality development of these enterprises by easing financing constraints.
Mediation analysis on panel data of 743 listed firms (2014–2023) reporting that financing-constraint indicators mediate the impact of patient capital on firm high-quality development.
Patient capital promotes the high-quality development of these enterprises by alleviating information asymmetry.
Mediation tests using firm-level panel data (743 listed enterprises, 2014–2023) that include measures of information asymmetry and show a mediating effect in the patient capital → high-quality development pathway.
Patient capital promotes the high-quality development of these enterprises by enhancing the level of synergy in digital and green transformation (digital-green transformation synergy).
Mediation analysis on the same panel (743 listed enterprises, 2014–2023) showing that measures of digital-green transformation synergy mediate the relationship between patient capital and firm high-quality development.
Patient capital plays a significant role in promoting the high-quality development of enterprises in strategic emerging industries.
Empirical analysis using panel data from 743 listed enterprises in China’s strategic emerging industries over 2014–2023; regression analysis reporting a statistically significant positive coefficient for patient capital on a firm-level measure of high-quality development.
Carbon emissions initially increase with the expansion of robotics manufacturing.
Panel regressions on the 277 Chinese prefecture-level cities (2008–2019) showing the left-hand (rising) portion of the inverted U-shaped relationship.
Azar et al. (2023) show that monopsonistic employers have stronger incentives to automate, and US commuting zones with higher labor market concentration experienced more robot adoption.
Citation to Azar et al. (2023) empirical evidence reported in the paper.
Noy and Zhang (2023) and Brynjolfsson et al. (2025) provide emerging empirical evidence that AI can function as a labor-complementary technology when designed to do so.
Cited empirical studies referenced in the paper arguing that certain AI applications complement human labor.
Eloundou et al. (2024) predict that half of US jobs are significantly exposed to recent advances in generative AI.
Citation to Eloundou et al. (2024) empirical study reported in the paper's introduction.
Firms may not sufficiently account for non-monetary aspects (safety, meaning of work) when choosing technologies; a planner would include these non-monetary considerations in steering technological progress.
Theoretical argument and model extension in Section 6 on monetary vs non-monetary aspects of technology choices.