Evidence (13870 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	196	98	892	1984
Governance & Regulation	817	394	188	121	1544
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	627	233	123	96	1088
Research Productivity	411	123	56	332	933
Output Quality	467	178	59	47	751
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	167	122	24	496
Task Allocation	207	64	71	32	379
Skill Acquisition	165	59	60	17	301
Innovation Output	203	27	43	18	292
Employment Level	105	52	107	13	279
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	150	48	26	3	227
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	63	20	12	184
Error Rate	69	92	10	2	173
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	93	21	13	19	148
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Creative Output	31	17	7	3	59
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

An anti-leakage learning loop converts scored failures into task-agnostic edits to skills, tools, and knowledge, letting the system improve with experience without touching model weights.

Paper describes a proposed/implemented learning loop (anti-leakage) that translates scored agent failures into edits to non-weight system components (skills, tools, knowledge) and claims this enables improvement without model weight updates.

high positive Parthenon Law: A Self-Evolving Legal-Agent Framework system improvement via edits to skills/tools/knowledge (no model weight changes)

We introduce Parthenon, a self-evolving legal-agent framework that factors Model, Harness, Agent roles, legal Knowledge, deterministic Tools, and procedural Skills into auditable surfaces for source traceability, date and number grounding, deliverable compliance, and issue closure.

Paper describes the design and implementation of the Parthenon framework and its modular decomposition into Model, Harness, Agent roles, Knowledge, Tools, and Skills, claiming these enable auditable traces and grounding.

high positive Parthenon Law: A Self-Evolving Legal-Agent Framework auditability (source traceability), date/number grounding, deliverable complianc...

Per-criterion accuracy climbs with stronger models.

Empirical comparison across model strengths reported in the Harvey LAB study (12,510 trajectories) showing per-criterion accuracy trends correlated with model strength.

high positive Parthenon Law: A Self-Evolving Legal-Agent Framework per-criterion accuracy

The findings imply that research evaluation and science policy should adopt assessment frameworks that distinguish between recombinant and conceptual forms of creativity and recognize that different modes of AI adoption produce different types of scientific contribution.

Policy/recommendation statement grounded in the paper's empirical findings on heterogeneous creativity effects by AI research mode.

high positive Does Artificial Intelligence Advance Science? policy recommendation for research evaluation frameworks

Adaptation-oriented AI research (modifying AI models for domain-specific problems) is associated with relatively higher object-based creativity.

Subgroup/heterogeneity analysis in the OpenAlex dataset classifying AI publications by research mode (Adaptation-oriented) and comparing object novelty outcomes across modes.

high positive Does Artificial Intelligence Advance Science? object-based novelty/creativity

Tool-oriented AI research (applying existing AI models to domain tasks) is associated with the largest gains in recombinant-based creativity.

Subgroup/heterogeneity analysis in the OpenAlex dataset classifying AI publications by research mode (Tool-oriented) and comparing recombinant novelty outcomes across modes.

high positive Does Artificial Intelligence Advance Science? recombinant-based novelty/creativity

AI publications have a 5.5 to 10.2 percentage point higher likelihood to rank in the top creativity decile.

Reported quantitative effect from the paper comparing top-decile creativity probabilities between AI and non-AI publications in the OpenAlex sample.

high positive Does Artificial Intelligence Advance Science? increase in probability of being top-decile creative

AI publications are significantly more likely to achieve top-decile creativity relative to non-AI publications.

Observational statistical analysis comparing AI-labeled vs non-AI publications across novelty and impact measures using the >1M OpenAlex dataset (novelty measured as recombinant and object novelty; impact measured as 3-year and 10-year citation impact).

high positive Does Artificial Intelligence Advance Science? likelihood of ranking in top creativity decile

GIPs enhance urban industrial chain resilience by promoting industrial structure optimization.

Mechanism analysis in the study showing industrial structure optimization as a channel linking GIP implementation to improved UICR; based on the 281‑city panel and specified empirical tests.

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience (mechanism: industrial structure optimization)

GIPs enhance urban industrial chain resilience mainly by fostering green technological innovation.

Mechanism analysis reported in the paper identifying green technological innovation as a primary mediator through which GIPs improve UICR; based on empirical mediation/analysis within the panel and DML framework.

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience (mechanism: green technological innovation)

The resilience‑enhancing effect of GIPs is more pronounced in resource‑based cities.

Heterogeneity analysis reported in the study indicating larger GIP effects on UICR in cities classified as resource‑based; derived from the 281‑city panel analysis.

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience (heterogeneity: resource‑based cities)

The resilience‑enhancing effect of GIPs is more pronounced in eastern cities.

Regional heterogeneity analysis reported in the paper showing stronger estimated impacts in eastern region cities within the 281‑city panel.

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience (regional heterogeneity: eastern cities)

The resilience‑enhancing effect of GIPs is stronger in cities with stronger AI computing power.

Heterogeneity analysis in the study indicating larger GIP effects on UICR in cities with higher AI computing power measures; based on the same panel dataset and statistical methods.

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience (effect heterogeneity by AI computing power)

The resilience‑enhancing effect of GIPs is stronger in cities with more advanced digital economies.

Heterogeneity analysis reported in the paper showing larger estimated impacts of GIPs on UICR in cities with more developed digital economy indicators; based on the 281‑city panel.

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience (effect heterogeneity by digital economy level...

The resilience‑enhancing effect of GIPs is stronger in cities with higher openness.

Heterogeneity analysis reported in the study indicating larger estimated effects in subsamples or interaction models for cities with greater openness; based on the 281‑city panel (2005–2022).

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience (effect heterogeneity by city openness)

The positive effect of GIPs on UICR is robust across alternative sample specifications, estimation algorithms, variable definitions, and controls for parallel policies.

Reported robustness checks in the study (alternative samples, estimation algorithms, variable definitions, and adjustments for parallel policies); based on same panel of 281 cities and DML framework.

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience

The implementation of national Green Industrial Parks (GIPs) significantly improves urban industrial chain resilience (UICR).

Panel data analysis of 281 Chinese cities (2005–2022), treating establishment of national GIPs as a quasi‑natural experiment and estimating effects using a double machine learning approach. Statistical significance asserted in results.

high positive Does green industrialization enhance urban industrial chain ... urban industrial chain resilience

This analysis provides practical insights for politicians and corporate strategists as they navigate significant transformations in capital, labor, and innovation.

Claim about the applied relevance of the paper's findings; presented as an asserted contribution rather than a quantified outcome.

high positive The cognitive heartland: A foundational framework for AI-dri... policy and strategic decision utility (usefulness of findings to policymakers an...

AI facilitates a polycentric, resilient production topology.

Central theoretical claim of the paper's 'Cognitive Economic Geography' framework, linked to observed capital investment patterns and the four mechanisms identified in the empirical analysis.

high positive The cognitive heartland: A foundational framework for AI-dri... formation of polycentric (multiple regional centers) and resilient production ne...

This transition is evidenced by the significant relocation of high-value production to the Midwest, South, and Great Plains.

Empirical claim based on capital investment data (2018-2024) for EV battery factories, semiconductor fabs, and additive manufacturing sites showing relocation patterns toward those U.S. regions.

high positive The cognitive heartland: A foundational framework for AI-dri... geographic relocation of high-value production / capital investment flows

An empirical investigation of capital investment (2018-2024) in electric-vehicle battery factories, semiconductor fabrication facilities, and additive manufacturing sites identifies four bled mechanisms that facilitate a significant spatial-economic inversion.

Paper reports an empirical investigation covering capital investment in specified facility types over 2018-2024 and claims identification of four mechanisms (paper does not list numeric sample size in the provided text).

high positive The cognitive heartland: A foundational framework for AI-dri... mechanisms enabling spatial-economic inversion (qualitative identification from ...

A novel spatial calculus has emerged, emphasizing the cost structures of interiors, land availability, and energy infrastructure.

Conceptual assertion in the paper, argued in relation to observed capital investment patterns (2018-2024) across EV battery, semiconductor, and additive manufacturing projects.

high positive The cognitive heartland: A foundational framework for AI-dri... relative importance of interior cost factors (land, energy, cost structures) in ...

The American Interior is not nostalgically resurrecting antiquated factories but is instead evolving into a new, AI-driven industrial entity.

Stated thesis of the paper supported by the paper's conceptual argument and referenced empirical investigation of capital investment (2018-2024) in EV battery factories, semiconductor fabs, and additive manufacturing sites.

high positive The cognitive heartland: A foundational framework for AI-dri... emergence of AI-driven industrial activity in the American Interior (location an...

The deployment and online lifts demonstrate the approach's industrial value.

Paper statement linking observed online metric improvements from production deployment to industrial value; based on reported online lifts on Tmall.

high positive DSIRM: Learning Query-Bridged Discrete Semantic Identifiers ... industrial value as evidenced by online metric improvements

Deployed via an efficient hybrid architecture, it achieves significant online lifts (+0.13% UCTR, +0.25% UCTCVR).

Online A/B (production) deployment on Tmall; reported online metric lifts: +0.13% UCTR and +0.25% UCTCVR. (No sample size or statistical significance numbers provided in the excerpt.)

high positive DSIRM: Learning Query-Bridged Discrete Semantic Identifiers ... user click-through rate (UCTR) and user click-to-conversion rate (UCTCVR)

Extensive experimental results on Tmall's production data show that our proposed approach has achieved better results, improving offline AUC by +1.54%.

Offline experiments reported on Tmall production data; improvement reported as +1.54% in offline AUC. (No sample size or test details given in the excerpt.)

high positive DSIRM: Learning Query-Bridged Discrete Semantic Identifiers ... offline AUC

Hierarchical prefix matching between query and item SIDs yields discriminative features that perfectly complement dense signals.

Methodological claim in paper describing feature construction and asserted complementarity with dense embeddings; paper includes experiments to evaluate overall model performance.

high positive DSIRM: Learning Query-Bridged Discrete Semantic Identifiers ... discriminative features complementary to dense signals

We explore generative LLMs on the query side to explicitly predict item SIDs from text, resolving tail queries and intent ambiguity.

Methodological claim: use of generative large language models for predicting item SIDs from queries; paper claims this helps tail queries and intent ambiguity.

high positive DSIRM: Learning Query-Bridged Discrete Semantic Identifiers ... resolution of tail queries and reduction of intent ambiguity

We present a query-bridged contrastive quantization approach on the item side, injecting query-item interaction supervision into Residual Quantization to actively learn relevance-aware semantic partitions.

Methodological claim describing the proposed quantization approach (algorithmic design). Supported in paper by methodological exposition; experimental validation referenced elsewhere in paper.

high positive DSIRM: Learning Query-Bridged Discrete Semantic Identifiers ... relevance-aware semantic partitions learned via supervised quantization

We propose a Discrete Semantic Identifier Relevance Model (DSIRM) that explicitly models discrete relevance features for e-commerce search.

Methodological description in the paper presenting DSIRM (model proposal). No numerical evaluation data in this sentence; overall paper includes experiments on Tmall production data.

high positive DSIRM: Learning Query-Bridged Discrete Semantic Identifiers ... explicit modeling of discrete relevance features

Network composition analysis of 8,012 workers shows all have inference-capable hardware.

Network composition analysis covering 8,012 workers; hardware capability inferred from worker-reported or probed specifications.

high positive The Usefulness Gap in Proof-of-Useful-Work: An Empirical Stu... presence of inference-capable hardware among network workers

Experts assigned the highest responsibility for addressing these risks to general-purpose AI developers and governance actors (including governments, regulators, and standards bodies).

Delphi ratings of actor responsibility reported in paper: highest responsibility attributed to general-purpose AI developers and governance actors by 272 experts.

high positive Prioritization of Risks from Artificial Intelligence: A Delp... actor responsibility attribution

Policymakers in emerging economies should adopt integrated policy frameworks combining AI development incentives, labour market reform, and education strategies to ensure technological progress translates into inclusive and sustainable development.

Policy recommendation derived from the study's empirical findings and interpretation.

high positive Artificial Intelligence, Disaggregated Unemployment, And Sus... policy recommendation aimed at improving inclusive and sustainable development o...

The empirical findings validate the core theoretical proposition of Routine-Biased Technological Change that skill-biased technological change operates through heterogeneous channels invisible at the aggregate level.

Synthesis of empirical results (skill-disaggregated effects differ, total unemployment insignificant) used to support the RBTC theoretical proposition.

high positive Artificial Intelligence, Disaggregated Unemployment, And Sus... support for RBTC theoretical proposition (heterogeneous channels of skill-biased...

Unemployment among less-educated workers shows a positive long-run relationship with sustainable development, interpreted as reflecting structural labour reallocation effects consistent with RBTC.

Long-run ARDL coefficient for less-educated workers' unemployment reported as positive in the paper; interpretive link to Routine-Biased Technological Change (RBTC) and labour reallocation.

high positive Artificial Intelligence, Disaggregated Unemployment, And Sus... sustainable development (effect of less-educated workers' unemployment)

In the long run, AI adoption contributes positively and significantly to sustainable development through productivity gains and innovation spillovers after structural adjustments are completed.

Long-run ARDL estimates reported in the paper indicating a positive and statistically significant long-run coefficient for AI adoption; theoretical interpretation invoking productivity gains and innovation spillovers.

high positive Artificial Intelligence, Disaggregated Unemployment, And Sus... sustainable development (long-run effect of AI adoption)

Artificial intelligence significantly facilitates carbon mitigation.

Empirical analysis on prefecture-level panel data (2005–2023) showing AI development is associated with reductions in carbon emissions or improved carbon mitigation indicators (authors state 'significantly facilitates ... carbon mitigation').

high positive The impact of artificial intelligence on urban ecological re... carbon emissions / carbon mitigation

Artificial intelligence significantly facilitates pollution reduction.

Empirical results from prefecture-level panel analysis (Guanzhong Plain, 2005–2023) report AI development is associated with reductions in pollution indicators (authors state 'significantly facilitates pollution reduction').

high positive The impact of artificial intelligence on urban ecological re... pollution levels / pollution reduction

Artificial intelligence promotes the growth of urban ecological resilience through the channel of green technological innovation.

Mediation/mechanism analysis using prefecture-level panel data (2005–2023); authors identify green technological innovation as a significant mediating channel in the relationship between AI development and ecological resilience.

high positive The impact of artificial intelligence on urban ecological re... urban ecological resilience (composite index); green technological innovation as...

Artificial intelligence promotes the growth of urban ecological resilience through the channel of green finance.

Mediation/mechanism analysis in the paper using the same prefecture-level panel data (2005–2023); authors report that green finance is a statistically significant channel linking AI development to higher ecological resilience.

high positive The impact of artificial intelligence on urban ecological re... urban ecological resilience (composite index); green finance as mediator

The development of artificial intelligence exerts a positive effect on ecological resilience.

Empirical analysis using prefecture-level panel data for cities in the Guanzhong Plain Urban Agglomeration (2005–2023); authors construct an urban ecological resilience index (three dimensions) and estimate the relationship between AI development and the index using panel econometric methods.

high positive The impact of artificial intelligence on urban ecological re... urban ecological resilience (composite index)

TAs remained fully in control and could use, edit, or ignore AI-generated drafts at their discretion.

Study design statement from the randomized field experiment: intervention provided AI-assisted feedback drafts to TAs after grading but kept TAs fully in control to accept, edit, or ignore drafts. 11 TAs in the course.

high positive AI Assistance for Discretionary Work: Increasing Feedback Pr... degree of human control over AI-generated artifacts (procedural/design feature)

Qualitative findings indicate AI-assisted drafts function as editable scaffolds that lower barriers to initiating feedback rather than reducing overall effort.

Qualitative interviews conducted as part of the mixed-methods study (course included 11 TAs and 88 students); thematic/qualitative analysis reported that TAs described drafts as scaffolds that made starting feedback easier and did not simply replace TA effort.

high positive AI Assistance for Discretionary Work: Increasing Feedback Pr... perceived barriers to initiating feedback / perceived TA effort

AI-assisted feedback increases feedback length by 39.8 characters.

Randomized field experiment in the same course; comparison of feedback length between treatment and control. Reported estimate: +39.8 chars, SE=3.45, p<0.001. Student-level random assignment (n=88); 11 TAs.

high positive AI Assistance for Discretionary Work: Increasing Feedback Pr... feedback length (number of characters)

AI-assisted feedback significantly increases feedback provision by 10.8 percentage points.

Randomized field experiment in a 300-level machine learning course. Student submissions (n=88) were randomly assigned to treatment (TAs received AI-assisted feedback drafts) or control. Reported estimate: +10.8 percentage points, SE=1.1, p<0.001. 11 TAs participated and could use, edit, or ignore drafts.

high positive AI Assistance for Discretionary Work: Increasing Feedback Pr... feedback provision (whether feedback was provided)

A tool-augmented agentic AI method (equipped with analytical tools, structured DIKW reasoning agents, and transparent evidence chains) can automatically learn from experimental data to generate new interventions and produce superior interventions compared to Human + Chatbot co-design.

Two-stage field experiments in healthcare prescription messaging comparing Stage 1 (Human + Chatbot: 13 message variants, 444,691 patient visits) to Stage 2 (Tool-Augmented Agentic AI: 17 AI-generated variants, 248,448 patient visits).

high positive Beyond One-shot: AI Agents for Learning in Field Experiments performance of message interventions (measured by CTR and comparative success of...

The best AI-generated message achieved a 69.8% CTR (+6.5 percentage points over baseline).

Stage 2 field experiment in healthcare prescription messaging where AI-generated message variants were tested; result reported directly in paper.

high positive Beyond One-shot: AI Agents for Learning in Field Experiments click-through rate (CTR)

We will open-source all evaluation codes, tasks, and data at https://github.com/mrwwk/DeskCraft.

Author statement promising release of code, tasks, and data (stated in abstract).

high positive DeskCraft: Benchmarking Desktop Agents on Professional Workf... availability of benchmark artifacts (code, tasks, data) as open source

GPT-5.4 reaches 27.6% on interactive tasks.

Author-reported benchmark result for GPT-5.4 on interactive tasks from the evaluation (reported in abstract); presumably measured across the evaluation tasks.

high positive DeskCraft: Benchmarking Desktop Agents on Professional Workf... task success rate / benchmark score on interactive tasks

GPT-5.4 reaches 31.6% on standard tasks.

Author-reported benchmark result for GPT-5.4 on standard tasks from the evaluation (reported in abstract); presumably measured across the evaluation tasks.

high positive DeskCraft: Benchmarking Desktop Agents on Professional Workf... task success rate / benchmark score on standard (non-interactive) tasks

« Prev 1 2 3 … 97 98 99 … 277 278 Next »