Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Switchcraft saves over $3,600 per million queries.

Cost savings estimate reported in the paper based on the measured 84% reduction applied to a million-query baseline.

high positive Switchcraft: AI Model Router for Agentic Tool Calling monetary savings per million queries

Switchcraft reduces inference cost by 84%.

Empirical cost analysis reported in the paper comparing inference cost with and without Switchcraft.

high positive Switchcraft: AI Model Router for Agentic Tool Calling inference cost reduction

Switchcraft's accuracy matches or exceeds the best individual model.

Empirical comparison reported in the paper between Switchcraft accuracy (82.9%) and accuracies of individual models (details summarized by authors).

high positive Switchcraft: AI Model Router for Agentic Tool Calling relative accuracy compared to individual models

Switchcraft achieves 82.9% accuracy.

Empirical evaluation results reported in the paper (accuracy metric measured on the evaluation framework).

high positive Switchcraft: AI Model Router for Agentic Tool Calling accuracy

Switchcraft operates inline, selecting the lowest-cost model subject to correctness.

Method description in the paper describing Switchcraft's operational design.

high positive Switchcraft: AI Model Router for Agentic Tool Calling model selection strategy (cost minimization constrained by correctness)

We present Switchcraft, the first (to the best of our knowledge) model router optimized for agentic tool calling.

Authors' stated contribution / novelty claim in the paper (method description: Switchcraft).

high positive Switchcraft: AI Model Router for Agentic Tool Calling availability of a router optimized for agentic tool calling

Hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting LLM-generated errors may reinforce existing inequities in scientific recognition.

Analysis linking hallucinated citations to characteristics of the (intended or assigned) cited authors, including measures of prominence and inferred gender, showing over-representation of prominent and male scholars among hallucinated attributions.

high positive LLM hallucinations in the wild: Large-scale evidence from no... distribution of (hallucinated) citation credit by cited-author prominence and ge...

Hallucinated references are especially pronounced among small and early-career author teams.

Analysis of hallucination prevalence by author-team characteristics (team size and author career stage) within the audited dataset.

high positive LLM hallucinations in the wild: Large-scale evidence from no... rate of hallucinated references by team size and author career stage

Hallucinated references are especially pronounced in manuscripts with linguistic signatures of AI-assisted writing.

Classification of manuscripts by linguistic features (signatures) indicative of AI-assistance and comparison of hallucination prevalence between groups.

high positive LLM hallucinations in the wild: Large-scale evidence from no... association between AI-writing linguistic signatures and presence of hallucinate...

These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake.

Cross-field comparison within the audited dataset showing higher rates of non-existent references in fields identified as having rapid AI adoption.

high positive LLM hallucinations in the wild: Large-scale evidence from no... rate/prevalence of hallucinated references by research field

We provide a conservative estimate of 146,932 hallucinated citations in 2025 alone.

Quantitative extrapolation/estimation from the audit of references in the dataset, producing an annualized (2025) conservative count.

high positive LLM hallucinations in the wild: Large-scale evidence from no... count of hallucinated citations in 2025

We find a sharp rise in non-existent references following widespread LLM adoption.

Temporal analysis of the audited references comparing prevalence of non-existent (hallucinated) citations before and after the period of widespread LLM adoption across the 111M-reference dataset.

high positive LLM hallucinations in the wild: Large-scale evidence from no... prevalence of non-existent (hallucinated) references over time

The Analysis Contract framework generalizes across domains of vibe inference through domain-specific instantiation.

Theoretical claim and conceptual generalization proposed in the paper; no cross-domain empirical tests or case studies reported.

high positive Vibe Econometrics and the Analysis Contract applicability/generalizability of the Analysis Contract across domains

The Analysis Contract, a proposed pre-commitment framework, can adapt the logic of pre-analysis plans and the Causal Roadmap to the AI-assisted setting by imposing three conditions before a causal claim is made: a method-data contract, a data audit, and a pre-commitment statement defining what would count as a disconfirming result.

Proposed methodological/framework contribution in the paper; described and motivated conceptually, without empirical validation or implementation evidence.

high positive Vibe Econometrics and the Analysis Contract governance of AI-assisted causal claims / credibility of causal claims under AI ...

The paper extends the TOE (Technology-Organization-Environment) framework by identifying an optimal AI adoption range and empirically validating the homogenization trap.

Theoretical contribution claimed in discussion linking empirical inverted-U and homogenization findings back to TOE framework.

high positive The Inverted-U Relationship Between AI and Corporate Innovat... theoretical extension of TOE framework

AI’s enabling effect on innovation is more sustainable in high-technology firms (relative to low-tech firms).

Heterogeneity analyses by firm technology intensity (high-tech vs. others) showing more sustained positive AI effects in high-tech firms.

high positive The Inverted-U Relationship Between AI and Corporate Innovat... sustainability/strength of AI’s effect on firm innovation by tech-intensity

AI’s enabling effect on innovation is more sustainable in non-state-owned firms (compared to state-owned firms).

Heterogeneity analyses by ownership type reported in the paper showing stronger/sustained positive AI–innovation effects for non-state-owned firms.

high positive The Inverted-U Relationship Between AI and Corporate Innovat... sustainability/strength of AI’s effect on firm innovation by ownership type

Firm absorptive capacity partially mediates the AI–innovation relationship.

Bootstrap mediation analysis performed on the sample indicating a partial mediation effect of absorptive capacity between AI and innovation.

high positive The Inverted-U Relationship Between AI and Corporate Innovat... role of absorptive capacity as mediator in AI → innovation pathway

The positive effect of GGFs on digital–intelligent transformation is particularly strong for firms with robust dynamic capabilities.

Heterogeneity analysis reported in the paper comparing effects across firms with differing levels of dynamic capabilities using the DID sample of Chinese A–share listed firms (2012–2024).

high positive Government-Guided Funds and Corporate Digital–Intelligent Tr... corporate digital–intelligent transformation (heterogeneous effect by dynamic ca...

The positive effect of GGFs on digital–intelligent transformation is particularly strong for firms operating in high‑tech industries.

Heterogeneity analysis reported in the paper comparing effects across industries (high‑tech vs. others) using the DID sample of Chinese A–share listed firms (2012–2024).

high positive Government-Guided Funds and Corporate Digital–Intelligent Tr... corporate digital–intelligent transformation (heterogeneous effect by industry t...

The positive effect of GGFs on digital–intelligent transformation is particularly strong in firms with high-quality internal controls.

Heterogeneity analysis reported in the paper comparing effects across firms with different internal control quality using the DID sample of Chinese A–share listed firms (2012–2024).

high positive Government-Guided Funds and Corporate Digital–Intelligent Tr... corporate digital–intelligent transformation (heterogeneous effect by internal c...

GGFs promote firms’ digital–intelligent transformation by encouraging knowledge spillovers.

Mechanism analysis reported in the paper that identifies knowledge spillovers as a channel from GGFs to firm-level digital–intelligent transformation, using the DID framework on Chinese A–share listed firms (2012–2024).

high positive Government-Guided Funds and Corporate Digital–Intelligent Tr... corporate digital–intelligent transformation (mediated by knowledge spillovers)

GGFs promote firms’ digital–intelligent transformation by transmitting policy guidance.

Mechanism analysis reported in the paper indicating a pathway from GGFs to firm transformation via policy guidance channels, based on the DID sample of Chinese A–share listed firms (2012–2024).

high positive Government-Guided Funds and Corporate Digital–Intelligent Tr... corporate digital–intelligent transformation (mediated by policy guidance transm...

GGFs promote firms’ digital–intelligent transformation by easing firms' financing constraints.

Mechanism analysis reported in the paper (mediation / pathway analysis tied to the DID framework) using the same sample of Chinese A–share listed firms (2012–2024).

high positive Government-Guided Funds and Corporate Digital–Intelligent Tr... corporate digital–intelligent transformation (mediated by financing constraints)

Government-guided funds (GGFs) significantly promote firms’ digital–intelligent transformation.

Difference-in-differences (DID) analysis applied to Chinese A–share listed firms over 2012–2024, as reported in the paper's main empirical results.

high positive Government-Guided Funds and Corporate Digital–Intelligent Tr... corporate digital–intelligent transformation

A lightweight pre-generation router exceeds the best cascade policy on four of five datasets, mainly because it avoids the cheap model's generation cost on queries sent directly to a larger model rather than because of a stronger routing signal.

Empirical experiments across the five benchmarks showing the pre-generation router outperforms best cascade on 4/5 datasets; analysis attributing the advantage primarily to avoided generation cost rather than improved routing accuracy/signal.

high positive Is Escalation Worth It? A Decision-Theoretic Characterizatio... number of datasets where pre-generation router outperforms best cascade; driver ...

Broader equity markets, proxied by the S&P 500, remain the dominant source of spillovers throughout the sample period.

Directional spillover results from the TVP-VAR indicating the S&P 500 has the largest and persistent net outward spillover contributions over the full sample.

high positive Artificial Intelligence and Financial Market Connectedness: ... dominance in net spillover contributions

AI-related equities initially act as net transmitters of shocks.

Directional spillover measures from the TVP-VAR showing AI equity group had positive net directional connectedness early in the sample.

high positive Artificial Intelligence and Financial Market Connectedness: ... net directional spillovers (net transmitter status)

The theoretical superiority of SignSGD accurately predicts its faster convergence during the pretraining of a 124M parameter GPT-2 model.

Empirical experiment reported in the paper: pretraining runs of a 124M-parameter GPT-2 model comparing SignSGD (or Muon) vs baseline SGD/variants; details (number of runs, datasets, seeds) are not provided in the abstract.

high positive When and Why SignSGD Outperforms SGD: A Theoretical Study Ba... empirical optimization/convergence speed during GPT-2 pretraining

Extending the sign operator to matrices preserves the optimal scaling with dimensionality: we provide an equivalent optimal lower bound for the Muon optimizer in the matrix domain.

Theoretical extension of the analysis to matrix-valued problems and derivation of a matching optimal lower bound for the Muon optimizer, demonstrating preserved scaling.

high positive When and Why SignSGD Outperforms SGD: A Theoretical Study Ba... optimal lower bound / scaling with dimensionality for matrix sign-based optimiza...

SignSGD effectively reduces the complexity by a factor of d under sparse noise, where d is the problem dimension (comparison of SignSGD upper bound with SGD lower bound shows a factor-d improvement).

Theoretical comparison between the derived upper bound for SignSGD and the derived lower bound for SGD within the paper, under the separable/sparse noise model and specified smoothness assumptions.

high positive When and Why SignSGD Outperforms SGD: A Theoretical Study Ba... optimization complexity (iterations/queries) to reach l1-stationarity under spar...

Under this distinct problem geometry (l1-stationarity, l_infty-smoothness, separable noise), we derive matched upper and lower bounds for SignSGD and explicitly characterize the problem class in which SignSGD provably dominates SGD.

Theoretical derivation of both upper bounds (for SignSGD) and matching lower bounds (for the problem class) presented in the paper; proofs establishing tightness.

high positive When and Why SignSGD Outperforms SGD: A Theoretical Study Ba... convergence bounds (upper and lower) for SignSGD under specified assumptions

By analyzing sign-based optimizers under l1-norm stationarity, l_infty-smoothness, and a separable noise model, we can better capture the coordinate-wise nature of signed updates and overcome the barrier that prevents sign-based methods from outperforming SGD in standard settings.

Theoretical analysis in the paper introducing these alternative geometric/assumption settings (l1-stationarity, l_infty-smoothness, separable noise) and deriving results under these assumptions.

high positive When and Why SignSGD Outperforms SGD: A Theoretical Study Ba... applicability of sign-based optimizer analysis and potential for improved conver...

The results imply an urgency of early intervention in AI-driven economies to avoid extreme inequality and loss of redistribution options.

Synthesis and policy discussion in the paper based on the finite-time singularity, super-exponential divergence of wealth ratios, and the policy-irreversibility result.

high positive The Economic Singularity: Core Mathematical Model policy_urgency / timing_of_intervention

Under mild conditions, the system exhibits a finite-time singularity where AI capability, AI capital, and financial capital diverge.

Analytical dynamical-systems analysis and proofs in the paper demonstrating finite-time blow-up (singularity) of A (AI capability), K_a (AI capital), and K_f (financial capital) for parameter ranges satisfying the stated mild conditions.

high positive The Economic Singularity: Core Mathematical Model innovation_output (AI capability) and financial capital levels

Users maintain a moderate level of trust in AI even when their decisions diverge from those of AI.

Reported descriptive/analytic finding from the experiment with 59 pre-service teachers indicating measured trust remained at a moderate level in inconsistent decision conditions.

high positive Shaping Human-AI Collaboration in Education: Effects of AI-A... trust in AI under decision divergence

The proportion of consistent decisions significantly moderates the impact of AI-assisted decision-making paradigms on users' confidence levels.

Moderation analysis reported in the study (N=59); authors indicate that proportion of consistent human-AI decisions significantly moderates the effect of AI-assisted decision-making paradigm on confidence.

high positive Shaping Human-AI Collaboration in Education: Effects of AI-A... users' confidence (moderation effect)

Consistency between human and AI decisions significantly enhances task performance.

Within-subject consistency manipulation in the experimental sample of 59 pre-service teachers; authors report significant positive association between proportion of consistent decisions and measured task performance.

high positive Shaping Human-AI Collaboration in Education: Effects of AI-A... task performance

Consistency between human and AI decisions significantly enhances users' confidence.

Within-subject manipulation of human-AI consistency in the study (N=59); authors report a significant positive effect of consistency on users' confidence in the measured models.

high positive Shaping Human-AI Collaboration in Education: Effects of AI-A... users' confidence

Consistency between human and AI decisions significantly enhances users' trust in AI.

Within-subject manipulation of human-AI consistency in the experiment with 59 pre-service teachers; authors report a significant positive effect of consistency on trust measured and tested in their models.

high positive Shaping Human-AI Collaboration in Education: Effects of AI-A... trust in AI

When human-AI decision consistency is taken into account, AI-assisted decision-making paradigms influence task performance indirectly through a sequential psychological pathway involving users’ confidence and their trust in the AI.

Same experimental sample (N=59), structural equation modeling reported a significant indirect (mediated) pathway from AI-assisted paradigms → users' confidence → trust in AI → task performance; moderation by human-AI consistency was considered.

high positive Shaping Human-AI Collaboration in Education: Effects of AI-A... task performance (mediated effect)

Post-hoc SHAP attribution reveals that complaint recurrence and neighborhood-level statistics are stronger predictors of actionable violations than raw complaint volume.

Empirical claim based on post-hoc SHAP feature-attribution analysis applied to the paper's models; the excerpt reports a relative feature importance finding but provides no numeric effect sizes or sample counts.

high positive Scaling the Queue: Reinforcement Learning for Equitable Call... predictive importance for actionable violations (feature importance)

We formalize each domain as a Markov Decision Process (MDP) in which equitable classification coverage is a first-class reward objective.

Methodological specification in the paper asserting each operational domain was modeled as an MDP with equity-aware reward structure. No further empirical details in the excerpt.

high positive Scaling the Queue: Reinforcement Learning for Equitable Call... equitable classification coverage (as a modeled reward)

The proposed technique is designed to maximize throughput, minimize misclassification cost, and actively narrow historical equity gaps in service delivery.

Stated design objectives of the RL approach in the paper. No quantified outcomes or evaluation reported in the provided text.

high positive Scaling the Queue: Reinforcement Learning for Equitable Call... throughput; misclassification cost; historical equity gaps in service delivery

Rather than replacing human classifiers, our agents act as intelligent intake routers that learn to assign incoming complaints to action categories: escalate, batch, defer, inspect now.

Descriptive claim of agent behavior and intended design; asserts agents perform routing decisions into four action categories. No empirical performance numbers provided in the excerpt.

high positive Scaling the Queue: Reinforcement Learning for Equitable Call... complaint routing action assignment

We develop an equity-centered reinforcement learning (RL) framework that augments call classification capacity across six New York City Department of Buildings operational domains (boiler safety, crane and derrick oversight, heat and hot water, housing complaint triage, scaffold safety, and Natural Area District protection).

Methodological development described in the paper; claimed application domain spans six named DOB operational areas. No evaluation metrics or sample sizes provided in the excerpt.

high positive Scaling the Queue: Reinforcement Learning for Equitable Call... call classification capacity / intake routing capability

U.S. lawmakers and agencies have advanced standards, testing, and procurement oversight related to AI as the AGI race tightens.

Reported in the paper as a synthesis of recent policy and agency activity (standards, testing programs, procurement oversight); descriptive summary rather than a quantified empirical analysis (no sample size reported).

high positive Emerging AI Trends advancement of AI-related standards, testing initiatives, and procurement oversi...

So far in 2026, agentic coding automation has advanced, with tools that enable end-to-end planning, coding, and debugging.

Asserted in the paper as an observed trend through 2026, based on examples of tooling and product announcements; presented descriptively without a stated empirical sample or controlled evaluation.

high positive Emerging AI Trends capability of agentic coding automation tools to perform end-to-end planning, co...

Milestones in 2025 also include early regulatory actions.

Reported in the paper's synthesis of 2025 events; based on review of policy developments and announcements rather than a quantitative evaluation (no sample size reported).

high positive Emerging AI Trends early regulatory actions (new rules, guidance, or enforcement steps in 2025)

Milestones in 2025 highlight the broad adoption of multimodal and agentic AI.

Stated in the paper as part of a narrative synthesis of 2025 milestones; presented as an observational summary drawing on literature, industry reports and documented deployments rather than a systematic empirical study (no sample size or statistical analysis reported).

high positive Emerging AI Trends adoption of multimodal and agentic AI

« Prev 1 2 3 … 121 122 123 … 276 277 Next »