Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

European institutions (in particular the European AI Office) should issue guidance on how systems designed for sustained social or emotional interaction should be assessed in the implementation of the AI Act.

Policy recommendation contained in the text; prescriptive argument rather than an empirical finding; no supporting data or empirical evaluation provided.

high positive Governing Relational AI: China’s Regulation of Anthropomorph... issuance of regulatory guidance by European institutions

Existing regulatory frameworks will need to consider risks that arise not only from system outputs but also from longer-term patterns of human–AI interaction.

Normative recommendation based on the document's argument that conversational AI generates risks through sustained interaction; no empirical method or data reported.

high positive Governing Relational AI: China’s Regulation of Anthropomorph... scope of regulatory risk assessment (outputs vs. long-term interaction patterns)

The paper proposes five evaluation dimensions for AutoResearch systems: novelty, validity, impact, reliability, and provenance.

Paper explicitly proposes these five dimensions as an evaluation rubric; conceptual proposal.

high positive AutoResearch AI: Towards AI-Powered Research Automation for ... n/a (evaluation framework)

The field can be organized around five workflow conditions: literature and research grounding; hypothesis formation and planning; experimentation and tool use; feedback, validation, and review; and reporting and knowledge communication.

Authors propose this five-condition organizational framework as part of their survey and synthesis; conceptual contribution.

high positive AutoResearch AI: Towards AI-Powered Research Automation for ... n/a (framework/organizational taxonomy)

Vibe Research denotes the human-steered region of prompt-based assistance and human-verified execution within AutoResearch.

Paper-introduced terminology and conceptual delineation of a sub-region of the AutoResearch spectrum; definitional statement.

high positive AutoResearch AI: Towards AI-Powered Research Automation for ... n/a (terminology/definition)

AutoResearch is defined as the developmental spectrum of AI-powered scientific workflow automation.

Paper provides an explicit definitional framing (terminology introduced by authors); conceptual contribution rather than empirical finding.

high positive AutoResearch AI: Towards AI-Powered Research Automation for ... n/a (terminology/definition)

This shift marks a transition from task-level AI for science to workflow-level research automation.

Conceptual argument backed by literature survey and examples of systems that coordinate multiple research tasks; no single quantitative study reported.

high positive AutoResearch AI: Towards AI-Powered Research Automation for ... degree of automation along research workflows (task-level vs workflow-level)

Scientific research is being reshaped by AI systems that move beyond isolated assistance toward longer-horizon workflows spanning literature grounding, hypothesis generation, experimentation, validation, reporting, and revision.

Survey / conceptual synthesis of recent AI research systems and literature; paper presents this as an observed trend rather than reporting original empirical measurements.

high positive AutoResearch AI: Towards AI-Powered Research Automation for ... extent of AI integration across research workflows (literature grounding, hypoth...

XWind shows consistent gains across workload types, load levels, and GPU generations.

Reported experimental results spanning multiple workload types, different load levels, and various GPU generations (details in main paper); abstract states consistency of gains.

high positive XWind: A Cross-site Router for Large Language Model Inferenc... consistency of latency/performance gains across workloads, loads, and GPU genera...

XWind reduces P99 end-to-end latency by up to 98% over baselines such as power-capping and GPU idling.

Experimental results on the 64-GPU A100 testbed with emulated wind sites and Azure traces; comparison against baseline strategies including power-capping and GPU idling.

high positive XWind: A Cross-site Router for Large Language Model Inferenc... P99 end-to-end latency

XWind reduces P99 end-to-end latency by up to 52% over the strongest contender (also our idea).

Experimental results on the 64-GPU A100 testbed with emulated wind sites and Azure traces; comparison against a 'strongest contender' baseline (described as another idea from the authors).

high positive XWind: A Cross-site Router for Large Language Model Inferenc... P99 end-to-end latency

We build XWind, a lightweight, reactive, and workload-agnostic AI inference router that uses only real-time signals (inference latency, KV-cache utilization, and queue depth) to dynamically configure sites and distribute requests under variable wind power.

System implementation described in paper; design specification lists the three real-time signals used.

high positive XWind: A Cross-site Router for Large Language Model Inferenc... ability to configure sites and distribute inference requests using only specifie...

Site-wise right-sizing combined with spatial complementarity of wind energy keeps aggregate fleet utilization on par with traditional deployments.

Feasibility/analytical evaluation in the paper (presumably simulations/analysis of site sizing and spatial complementarity); specific methods/details not in abstract.

high positive XWind: A Cross-site Router for Large Language Model Inferenc... aggregate fleet utilization

Our feasibility analysis shows that 890+ GW of wind capacity lies within 50 ms network round trip time of Azure data centers.

Feasibility analysis mapping wind capacity to Azure data center network latency; result reported as aggregate capacity (890+ GW).

high positive XWind: A Cross-site Router for Large Language Model Inferenc... wind capacity within 50 ms RTT of Azure data centers

AI Greenferencing brings modular AI compute to renewable energy sources (focusing on wind), allowing AI footprint expansion, generating local behind-the-meter demand for renewable sites, and helping ease the growing strain on power utilities.

Conceptual/proposed deployment model described in the paper; feasibility analysis described elsewhere in the paper supports feasibility but exact empirical backing for all claimed benefits not specified in abstract.

high positive XWind: A Cross-site Router for Large Language Model Inferenc... local demand generation at renewable sites and reduction in grid strain

CHRONOS achieves a total privacy loss of epsilon = 4.25 at delta = 10^-6 under zCDP composition in the reported experiments.

Reported privacy accounting result in experimental section (zCDP composition).

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... privacy budget (epsilon, delta)

Measured latency for CHRONOS is 161 ms.

Reported experimental latency metric in paper.

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... latency

Across the benchmarks CHRONOS attains 2.74 queries per second throughput.

Reported experimental throughput metric in paper.

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... throughput (queries per second)

The paper reports empirical results across four benchmarks showing CHRONOS achieves 0.937 recall at ten (recall@10).

Experimental evaluation across four benchmarks reported in paper (four benchmarks stated).

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... recall@10

The paper includes a scalability analysis for 500 sellers (multi-epoch settlement).

Scalability analysis reported in paper explicitly referencing 500 sellers.

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... scalability with respect to number of sellers

CHRONOS releases a privatized affinity matrix per epoch using the Gaussian mechanism; all retrieval and ranking are post-processing and thus incur no extra privacy cost.

System design and privacy mechanism description in paper (Gaussian mechanism; post-processing argument).

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... privacy accounting / composition (privacy cost per epoch and downstream operatio...

Layer three uses EXP3-IX to achieve Big-O(sqrt(T log T)) regret while enforcing (epsilon, delta)-differential privacy via moments accounting.

Theoretical regret bound and privacy-preserving algorithmic design described in paper (EXP3-IX with moments accounting).

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... regret of the online allocation algorithm

Layer two conditions Shapley valuation on detected changepoints and provides finite-sample error guarantees under noise.

Methodological description plus finite-sample theoretical guarantees under noise presented in paper.

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... accuracy/error of Shapley-based valuations

The monotone-envelope guarantee in layer one reduces bound looseness to 1.8 to 3.2 times observed loss.

Empirical/theoretical comparison of bound looseness vs. observed loss reported in paper (range reported as 1.8–3.2×).

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... tightness of recall-loss bound (bound looseness ratio)

Layer one of CHRONOS applies neural-ODE temporal decay to shortcut edges and provides a per-query expected recall-loss bound of Big-O(Pq lambda delta t).

Theoretical bound and method description (neural-ODE temporal decay) presented in paper; no empirical sample size stated for the bound itself.

high positive CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolv... recall (expected recall-loss per query)

The AI-driven econometric approach outperforms traditional approaches by delivering more accurate forecasting and more timely policy recommendations.

Explicit claim in the paper that the approach outperforms traditional methods by producing more accurate forecasts and timelier recommendations; the excerpt contains no quantitative comparison, performance metrics, statistical tests, or sample sizes.

high positive AI-Augmented Econometrics: Transforming Labor Market Analysi... forecasting accuracy and timeliness of policy recommendations

The framework relies on distributed data processing and MLOps pipelines to enable system scalability and continuous model improvement.

System architecture description in the paper stating use of distributed processing and MLOps pipelines; no performance benchmarks, scalability tests, or deployment metrics are reported in the excerpt.

high positive AI-Augmented Econometrics: Transforming Labor Market Analysi... system scalability and continuous model improvement

The proposed approach uses ensemble models and deep learning combined with econometric methods to ensure both model interpretability and robust findings.

Methodological claim in the paper describing use of ensemble and deep learning models integrated with econometric techniques; no reported evaluation metrics, interpretability measures, or robustness tests in the provided text.

high positive AI-Augmented Econometrics: Transforming Labor Market Analysi... model interpretability and robustness of findings

Combining structured economic indicators with unstructured data from job postings and skill descriptions provides a real-time picture of employment patterns, wage changes, and skill requirements.

Paper describes integrating structured and unstructured data sources (economic indicators, job postings, skill descriptions) to produce a real-time view; no empirical metrics, evaluation sample, or quantitative validation given in the excerpt.

high positive AI-Augmented Econometrics: Transforming Labor Market Analysi... employment patterns, wage changes, skill requirements (real-time measurement)

An AI-based econometric system that incorporates machine learning algorithms and extensible data processing can enhance labor market predictions and research compared with traditional econometric models.

Methodological description in the paper stating development of an AI-based econometric system that incorporates ML and extensible data processing; no sample size or empirical evaluation statistics provided in the text excerpt.

high positive AI-Augmented Econometrics: Transforming Labor Market Analysi... labor market predictions / research quality

The study advances multilevel propositions and outlines a research agenda for examining legitimacy in hybrid human–AI decision systems.

Paper presents multilevel theoretical propositions and a suggested agenda for future empirical research (conceptual contribution; no empirical validation reported).

high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... presence of multilevel propositions and proposed research directions

Human judgment remains essential for contextual interpretation and accountability in hybrid human–AI decision systems.

Conceptual claim advanced through theoretical argumentation and literature references in the paper (no empirical sample reported).

high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... role of human judgment in contextual interpretation and accountability

Legitimacy of AI-enabled decisions depends on transparency, explainability, and perceived fairness.

Conceptual argument and literature synthesis in the paper emphasizing transparency, explainability, and fairness as determinants (no empirical sample reported).

high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... decision legitimacy as a function of transparency, explainability, perceived fai...

AI enhances efficiency and consistency in organizational decision-making.

Theoretical claim supported by referenced literature and conceptual argumentation within the paper (no empirical test or sample reported).

high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... efficiency and consistency of decisions

Procedural, distributive, and cognitive legitimacy are key dimensions of decision legitimacy in AI-enabled organizations.

Conceptual development in the paper drawing on institutional theory, socio-technical systems, and behavioral decision-making; literature synthesis and theoretical argumentation (no empirical sample reported).

high positive Decision Legitimacy in AI-Enabled Organizations: A Multileve... procedural legitimacy; distributive legitimacy; cognitive legitimacy

Fitted on 3.9B Pythia models with 30180B tokens, the Shannon Scaling Law predicts an unseen 12B model up to 307B tokens at pooled R^2=0.847, while monotonic baselines collapse.

Specific extrapolation experiment reported: model fit trained on models <=6.9B and <=180B tokens (Pythia), then used to predict behavior of an unseen 12B model up to 307B tokens; pooled R^2 reported as 0.847 and monotonic baselines reported to fail.

high positive LLMs as Noisy Channels: A Shannon Perspective on Model Capac... extrapolative predictive performance measured by pooled R^2 when predicting loss...

The Shannon Scaling Law consistently outperforms classical scaling laws and recent perturbation-aware laws, achieving strong R^2 scores and accurately capturing loss basins missed by prior approaches.

Empirical model comparison reported in the paper: goodness-of-fit comparisons (R^2) between the proposed Shannon Scaling Law and prior scaling laws / perturbation-aware variants, with qualitative claims about capturing loss basins.

high positive LLMs as Noisy Channels: A Shannon Perspective on Model Capac... goodness-of-fit (R^2) to observed loss/ performance curves and ability to captur...

We validate our theory through experiments on Pythia and OLMo2 under perturbations, including Gaussian noise, quantization and supervised fine-tuning on math, QA and code tasks.

Empirical experiments reported in the paper using Pythia and OLMo2 model families, testing various perturbations and tasks (math, QA, code).

high positive LLMs as Noisy Channels: A Shannon Perspective on Model Capac... empirical behavior of models under perturbations (robustness and fit to the prop...

Export controls often unintentionally boost China's self-reliance and R&D.

Argument in the paper that restrictions spur domestic substitution and investment in R&D in the targeted country (qualitative/historical reasoning; no quantified estimate provided).

high positive Strategic Stalemates: The Paradox of Export Controls in the ... China's domestic R&D capacity and technological self-reliance

Export controls are strategic tools in U.S.-China AI competition.

Analytical argument in the paper connecting export controls to broader strategic aims in great-power competition over AI; qualitative policy analysis rather than empirical measurement.

high positive Strategic Stalemates: The Paradox of Export Controls in the ... use of export controls as strategic instruments

Since October 2022, the U.S. Bureau of Industry and Security (BIS) has progressively tightened restrictions on advanced computing components to China.

Factual timeline asserted in the paper referencing BIS policy actions beginning October 2022 (policy documents and announcements invoked).

high positive Strategic Stalemates: The Paradox of Export Controls in the ... degree of U.S. export restrictions on advanced computing components to China

Controls cover advanced chips, capital, personnel, and critical minerals for semiconductors.

Enumerative claim in the paper listing categories of items and flows targeted by export controls (policy documents and examples cited).

high positive Strategic Stalemates: The Paradox of Export Controls in the ... categories of goods/flows subject to export controls

Export controls have become central to U.S.-China tech rivalry, especially in AI.

Policy analysis in the paper citing recent U.S. measures (e.g., BIS actions) and Chinese responses; contextual argumentation rather than a quantitative study.

high positive Strategic Stalemates: The Paradox of Export Controls in the ... centrality of export controls in bilateral tech competition

Export control is a policy and legal tool to protect national interests by regulating exports of sensitive goods and technology to foreign nations.

Descriptive/legal characterization presented in the paper (normative definition and overview of export control regimes).

high positive Strategic Stalemates: The Paradox of Export Controls in the ... scope and use of export control as a policy instrument

These findings challenge narratives that automation and digitalization induce net job loss in manufacturing.

Interpretation based on the paper's empirical results showing positive effects of digital transformation on labor demand and demand for skilled workers (Chinese A-share manufacturing firms, 2011–2024). (Sample size not stated in provided text.)

high positive How Does Digital Transformation Reshape Manufacturing Firms'... implication for automation-induced job loss narrative

Digital transformation enhances employees' digital literacy.

Mechanism analysis reported in the paper using firm-level measures of employee digital skills/digital literacy as an intermediate outcome (Chinese A-share manufacturing firms, 2011–2024). (Sample size not stated in provided text.)

high positive How Does Digital Transformation Reshape Manufacturing Firms'... employees' digital literacy

Increased total factor productivity (driven by digital transformation) promotes both the amount of labor demanded and the intensity of factor input.

Mechanism/mediation analysis linking digital transformation → TFP → labor demand and factor-input intensity in the firm-level regressions (Chinese A-share manufacturing firms, 2011–2024). (Sample size not stated in provided text.)

high positive How Does Digital Transformation Reshape Manufacturing Firms'... labor demand and intensity of factor input

Digital transformation enhances firms' total factor productivity (TFP).

Mechanism analysis / mediation analysis reported in the paper using firm-level data (Chinese A-share manufacturing firms, 2011–2024). (Sample size not stated in provided text.)

high positive How Does Digital Transformation Reshape Manufacturing Firms'... total factor productivity

Digital transformation increases firms' need (demand) for highly educated, high-skilled workers.

Regression analysis on Chinese A-share listed manufacturing firms (2011–2024); analysis of worker composition/skill-demand reported by the authors. (Sample size not stated in provided text.)

high positive How Does Digital Transformation Reshape Manufacturing Firms'... demand for highly educated high-skilled workers

Digital transformation significantly increases the quantity of firm labor demand.

Regression analysis using data from Chinese A-share listed manufacturing firms between 2011 and 2024; mechanism and heterogeneity analyses reported in the paper. (Sample size not stated in provided text.)

high positive How Does Digital Transformation Reshape Manufacturing Firms'... quantity of firm labor demand

« Prev 1 2 3 … 106 107 108 … 276 277 Next »