Evidence (13870 claims)
Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 196 | 98 | 892 | 1984 |
| Governance & Regulation | 817 | 394 | 188 | 121 | 1544 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 627 | 233 | 123 | 96 | 1088 |
| Research Productivity | 411 | 123 | 56 | 332 | 933 |
| Output Quality | 467 | 178 | 59 | 47 | 751 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 167 | 122 | 24 | 496 |
| Task Allocation | 207 | 64 | 71 | 32 | 379 |
| Skill Acquisition | 165 | 59 | 60 | 17 | 301 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 52 | 107 | 13 | 279 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 150 | 48 | 26 | 3 | 227 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 63 | 20 | 12 | 184 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 93 | 21 | 13 | 19 | 148 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Creative Output | 31 | 17 | 7 | 3 | 59 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
An anti-leakage learning loop converts scored failures into task-agnostic edits to skills, tools, and knowledge, letting the system improve with experience without touching model weights.
Paper describes a proposed/implemented learning loop (anti-leakage) that translates scored agent failures into edits to non-weight system components (skills, tools, knowledge) and claims this enables improvement without model weight updates.
We introduce Parthenon, a self-evolving legal-agent framework that factors Model, Harness, Agent roles, legal Knowledge, deterministic Tools, and procedural Skills into auditable surfaces for source traceability, date and number grounding, deliverable compliance, and issue closure.
Paper describes the design and implementation of the Parthenon framework and its modular decomposition into Model, Harness, Agent roles, Knowledge, Tools, and Skills, claiming these enable auditable traces and grounding.
Per-criterion accuracy climbs with stronger models.
Empirical comparison across model strengths reported in the Harvey LAB study (12,510 trajectories) showing per-criterion accuracy trends correlated with model strength.
The findings imply that research evaluation and science policy should adopt assessment frameworks that distinguish between recombinant and conceptual forms of creativity and recognize that different modes of AI adoption produce different types of scientific contribution.
Policy/recommendation statement grounded in the paper's empirical findings on heterogeneous creativity effects by AI research mode.
Adaptation-oriented AI research (modifying AI models for domain-specific problems) is associated with relatively higher object-based creativity.
Subgroup/heterogeneity analysis in the OpenAlex dataset classifying AI publications by research mode (Adaptation-oriented) and comparing object novelty outcomes across modes.
Tool-oriented AI research (applying existing AI models to domain tasks) is associated with the largest gains in recombinant-based creativity.
Subgroup/heterogeneity analysis in the OpenAlex dataset classifying AI publications by research mode (Tool-oriented) and comparing recombinant novelty outcomes across modes.
AI publications have a 5.5 to 10.2 percentage point higher likelihood to rank in the top creativity decile.
Reported quantitative effect from the paper comparing top-decile creativity probabilities between AI and non-AI publications in the OpenAlex sample.
AI publications are significantly more likely to achieve top-decile creativity relative to non-AI publications.
Observational statistical analysis comparing AI-labeled vs non-AI publications across novelty and impact measures using the >1M OpenAlex dataset (novelty measured as recombinant and object novelty; impact measured as 3-year and 10-year citation impact).
GIPs enhance urban industrial chain resilience by promoting industrial structure optimization.
Mechanism analysis in the study showing industrial structure optimization as a channel linking GIP implementation to improved UICR; based on the 281‑city panel and specified empirical tests.
GIPs enhance urban industrial chain resilience mainly by fostering green technological innovation.
Mechanism analysis reported in the paper identifying green technological innovation as a primary mediator through which GIPs improve UICR; based on empirical mediation/analysis within the panel and DML framework.
The resilience‑enhancing effect of GIPs is more pronounced in resource‑based cities.
Heterogeneity analysis reported in the study indicating larger GIP effects on UICR in cities classified as resource‑based; derived from the 281‑city panel analysis.
The resilience‑enhancing effect of GIPs is more pronounced in eastern cities.
Regional heterogeneity analysis reported in the paper showing stronger estimated impacts in eastern region cities within the 281‑city panel.
The resilience‑enhancing effect of GIPs is stronger in cities with stronger AI computing power.
Heterogeneity analysis in the study indicating larger GIP effects on UICR in cities with higher AI computing power measures; based on the same panel dataset and statistical methods.
The resilience‑enhancing effect of GIPs is stronger in cities with more advanced digital economies.
Heterogeneity analysis reported in the paper showing larger estimated impacts of GIPs on UICR in cities with more developed digital economy indicators; based on the 281‑city panel.
The resilience‑enhancing effect of GIPs is stronger in cities with higher openness.
Heterogeneity analysis reported in the study indicating larger estimated effects in subsamples or interaction models for cities with greater openness; based on the 281‑city panel (2005–2022).
The positive effect of GIPs on UICR is robust across alternative sample specifications, estimation algorithms, variable definitions, and controls for parallel policies.
Reported robustness checks in the study (alternative samples, estimation algorithms, variable definitions, and adjustments for parallel policies); based on same panel of 281 cities and DML framework.
The implementation of national Green Industrial Parks (GIPs) significantly improves urban industrial chain resilience (UICR).
Panel data analysis of 281 Chinese cities (2005–2022), treating establishment of national GIPs as a quasi‑natural experiment and estimating effects using a double machine learning approach. Statistical significance asserted in results.
This analysis provides practical insights for politicians and corporate strategists as they navigate significant transformations in capital, labor, and innovation.
Claim about the applied relevance of the paper's findings; presented as an asserted contribution rather than a quantified outcome.
AI facilitates a polycentric, resilient production topology.
Central theoretical claim of the paper's 'Cognitive Economic Geography' framework, linked to observed capital investment patterns and the four mechanisms identified in the empirical analysis.
This transition is evidenced by the significant relocation of high-value production to the Midwest, South, and Great Plains.
Empirical claim based on capital investment data (2018-2024) for EV battery factories, semiconductor fabs, and additive manufacturing sites showing relocation patterns toward those U.S. regions.
An empirical investigation of capital investment (2018-2024) in electric-vehicle battery factories, semiconductor fabrication facilities, and additive manufacturing sites identifies four bled mechanisms that facilitate a significant spatial-economic inversion.
Paper reports an empirical investigation covering capital investment in specified facility types over 2018-2024 and claims identification of four mechanisms (paper does not list numeric sample size in the provided text).
A novel spatial calculus has emerged, emphasizing the cost structures of interiors, land availability, and energy infrastructure.
Conceptual assertion in the paper, argued in relation to observed capital investment patterns (2018-2024) across EV battery, semiconductor, and additive manufacturing projects.
The American Interior is not nostalgically resurrecting antiquated factories but is instead evolving into a new, AI-driven industrial entity.
Stated thesis of the paper supported by the paper's conceptual argument and referenced empirical investigation of capital investment (2018-2024) in EV battery factories, semiconductor fabs, and additive manufacturing sites.
The deployment and online lifts demonstrate the approach's industrial value.
Paper statement linking observed online metric improvements from production deployment to industrial value; based on reported online lifts on Tmall.
Deployed via an efficient hybrid architecture, it achieves significant online lifts (+0.13% UCTR, +0.25% UCTCVR).
Online A/B (production) deployment on Tmall; reported online metric lifts: +0.13% UCTR and +0.25% UCTCVR. (No sample size or statistical significance numbers provided in the excerpt.)
Extensive experimental results on Tmall's production data show that our proposed approach has achieved better results, improving offline AUC by +1.54%.
Offline experiments reported on Tmall production data; improvement reported as +1.54% in offline AUC. (No sample size or test details given in the excerpt.)
Hierarchical prefix matching between query and item SIDs yields discriminative features that perfectly complement dense signals.
Methodological claim in paper describing feature construction and asserted complementarity with dense embeddings; paper includes experiments to evaluate overall model performance.
We explore generative LLMs on the query side to explicitly predict item SIDs from text, resolving tail queries and intent ambiguity.
Methodological claim: use of generative large language models for predicting item SIDs from queries; paper claims this helps tail queries and intent ambiguity.
We present a query-bridged contrastive quantization approach on the item side, injecting query-item interaction supervision into Residual Quantization to actively learn relevance-aware semantic partitions.
Methodological claim describing the proposed quantization approach (algorithmic design). Supported in paper by methodological exposition; experimental validation referenced elsewhere in paper.
We propose a Discrete Semantic Identifier Relevance Model (DSIRM) that explicitly models discrete relevance features for e-commerce search.
Methodological description in the paper presenting DSIRM (model proposal). No numerical evaluation data in this sentence; overall paper includes experiments on Tmall production data.
Network composition analysis of 8,012 workers shows all have inference-capable hardware.
Network composition analysis covering 8,012 workers; hardware capability inferred from worker-reported or probed specifications.
Experts assigned the highest responsibility for addressing these risks to general-purpose AI developers and governance actors (including governments, regulators, and standards bodies).
Delphi ratings of actor responsibility reported in paper: highest responsibility attributed to general-purpose AI developers and governance actors by 272 experts.
Policymakers in emerging economies should adopt integrated policy frameworks combining AI development incentives, labour market reform, and education strategies to ensure technological progress translates into inclusive and sustainable development.
Policy recommendation derived from the study's empirical findings and interpretation.
The empirical findings validate the core theoretical proposition of Routine-Biased Technological Change that skill-biased technological change operates through heterogeneous channels invisible at the aggregate level.
Synthesis of empirical results (skill-disaggregated effects differ, total unemployment insignificant) used to support the RBTC theoretical proposition.
Unemployment among less-educated workers shows a positive long-run relationship with sustainable development, interpreted as reflecting structural labour reallocation effects consistent with RBTC.
Long-run ARDL coefficient for less-educated workers' unemployment reported as positive in the paper; interpretive link to Routine-Biased Technological Change (RBTC) and labour reallocation.
In the long run, AI adoption contributes positively and significantly to sustainable development through productivity gains and innovation spillovers after structural adjustments are completed.
Long-run ARDL estimates reported in the paper indicating a positive and statistically significant long-run coefficient for AI adoption; theoretical interpretation invoking productivity gains and innovation spillovers.
Artificial intelligence significantly facilitates carbon mitigation.
Empirical analysis on prefecture-level panel data (2005–2023) showing AI development is associated with reductions in carbon emissions or improved carbon mitigation indicators (authors state 'significantly facilitates ... carbon mitigation').
Artificial intelligence significantly facilitates pollution reduction.
Empirical results from prefecture-level panel analysis (Guanzhong Plain, 2005–2023) report AI development is associated with reductions in pollution indicators (authors state 'significantly facilitates pollution reduction').
Artificial intelligence promotes the growth of urban ecological resilience through the channel of green technological innovation.
Mediation/mechanism analysis using prefecture-level panel data (2005–2023); authors identify green technological innovation as a significant mediating channel in the relationship between AI development and ecological resilience.
Artificial intelligence promotes the growth of urban ecological resilience through the channel of green finance.
Mediation/mechanism analysis in the paper using the same prefecture-level panel data (2005–2023); authors report that green finance is a statistically significant channel linking AI development to higher ecological resilience.
The development of artificial intelligence exerts a positive effect on ecological resilience.
Empirical analysis using prefecture-level panel data for cities in the Guanzhong Plain Urban Agglomeration (2005–2023); authors construct an urban ecological resilience index (three dimensions) and estimate the relationship between AI development and the index using panel econometric methods.
TAs remained fully in control and could use, edit, or ignore AI-generated drafts at their discretion.
Study design statement from the randomized field experiment: intervention provided AI-assisted feedback drafts to TAs after grading but kept TAs fully in control to accept, edit, or ignore drafts. 11 TAs in the course.
Qualitative findings indicate AI-assisted drafts function as editable scaffolds that lower barriers to initiating feedback rather than reducing overall effort.
Qualitative interviews conducted as part of the mixed-methods study (course included 11 TAs and 88 students); thematic/qualitative analysis reported that TAs described drafts as scaffolds that made starting feedback easier and did not simply replace TA effort.
AI-assisted feedback increases feedback length by 39.8 characters.
Randomized field experiment in the same course; comparison of feedback length between treatment and control. Reported estimate: +39.8 chars, SE=3.45, p<0.001. Student-level random assignment (n=88); 11 TAs.
AI-assisted feedback significantly increases feedback provision by 10.8 percentage points.
Randomized field experiment in a 300-level machine learning course. Student submissions (n=88) were randomly assigned to treatment (TAs received AI-assisted feedback drafts) or control. Reported estimate: +10.8 percentage points, SE=1.1, p<0.001. 11 TAs participated and could use, edit, or ignore drafts.
A tool-augmented agentic AI method (equipped with analytical tools, structured DIKW reasoning agents, and transparent evidence chains) can automatically learn from experimental data to generate new interventions and produce superior interventions compared to Human + Chatbot co-design.
Two-stage field experiments in healthcare prescription messaging comparing Stage 1 (Human + Chatbot: 13 message variants, 444,691 patient visits) to Stage 2 (Tool-Augmented Agentic AI: 17 AI-generated variants, 248,448 patient visits).
The best AI-generated message achieved a 69.8% CTR (+6.5 percentage points over baseline).
Stage 2 field experiment in healthcare prescription messaging where AI-generated message variants were tested; result reported directly in paper.
We will open-source all evaluation codes, tasks, and data at https://github.com/mrwwk/DeskCraft.
Author statement promising release of code, tasks, and data (stated in abstract).
GPT-5.4 reaches 27.6% on interactive tasks.
Author-reported benchmark result for GPT-5.4 on interactive tasks from the evaluation (reported in abstract); presumably measured across the evaluation tasks.
GPT-5.4 reaches 31.6% on standard tasks.
Author-reported benchmark result for GPT-5.4 on standard tasks from the evaluation (reported in abstract); presumably measured across the evaluation tasks.