Evidence (7395 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	738	1617
Governance & Regulation	671	334	160	99	1285
Organizational Efficiency	626	147	105	70	955
Technology Adoption Rate	502	176	98	78	861
Research Productivity	349	109	48	322	838
Output Quality	391	121	45	40	597
Firm Productivity	385	46	85	17	539
Decision Quality	277	145	63	34	526
AI Safety & Ethics	189	244	59	30	526
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	106	40	6	188
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	79	8	1	152
Regulatory Compliance	69	66	14	3	152
Training Effectiveness	82	16	13	18	131
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Adoption Remove filter

NLP techniques improve requirements management and team collaboration by extracting intent from natural-language artifacts (tickets, specs, PRs) and reducing miscommunication.

Synthesis of prior studies in the literature review and survey responses indicating perceived improvement in requirements handling and communication; survey sample size not reported.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... perceived reduction in miscommunication / improved clarity of requirements

RAT data could be valuable for training models that better emulate human interpretive processes; firms owning such data may gain competitive advantage.

Argument in the AI economics section; no empirical model-training experiments or market analyses provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity value of RAT data as training signal; competitive advantage for data-owning firm...

RATs make readable and potentially quantifiable the preparatory interpretive work that contributes to downstream outputs, with implications for labor accounting and human capital valuation.

Theoretical economic and policy discussion in the paper; no empirical measurement or case studies provided to quantify how much preparatory work is captured or its economic value.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity visibility/quantifiability of interpretive labor and potential economic valuatio...

RATs can enable collective sensemaking via shared trails and networked associations among readers.

Conceptual argument and suggested network-analysis methods; illustrated with the speculative WikiRAT use case. No group-level empirical studies reported.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity collective sensemaking artifacts (shared trails, co-read graphs, group understan...

RATs can support richer reader models (personalization and modeling of interpretive behavior) through sequence analysis, embedding/clustering of trajectories, and other analytic techniques.

Proposed analytical methods (sequence analysis, embedding/clustering, network analysis) listed in the paper; no implementation results or quantitative evaluations provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity reader model quality (personalization accuracy, representation of interpretive b...

RATs enable reflective practice by helping readers see and revise their own processes.

Proposed affordance in the paper based on the inspectable nature of RATs and the WikiRAT illustration; suggested as a potential use case rather than empirically demonstrated.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity changes in reflective behavior or self-revision of reading processes

RATs treat reading as a dual kind of creation: (a) creative input work that shapes future artifacts, and (b) a form of creation whose traces are valuable artifacts themselves.

Theoretical proposal and design rationale presented in the paper; illustrated via a speculative prototype (WikiRAT). No empirical validation provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity recognition/value assigned to reading traces as artifacts

Reading Activity Traces (RATs) reconceptualize reading — including navigation, interpretation, and curation across interconnected sources — as creative labor.

Conceptual argument in the paper; supported by theoretical framing and literature review rather than empirical data. No sample size or deployment reported.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity conceptual reclassification of reading (visibility/recognition of interpretive l...

The method lowers the technical barrier for adopting surrogates in economics by removing dependence on specialized Bayesian neural-network techniques while preserving rigorous uncertainty quantification.

Argument in Implications section: decoupling uncertainty quantification from network architecture allows use of deterministic NNs with MCMC-sampled parameter inputs; no user-study or adoption metrics provided.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... ease of adoption / reduction in technical complexity required to obtain UQ-prese...

The theoretical diagnostic (linking distribution mismatch to performance loss) gives practitioners a practical tool to detect when a surrogate trained on one parameter distribution will underperform after recalibration or policy changes.

Paper-provided theoretical result and suggested diagnostic use; empirical validation of the diagnostic is implied but not detailed in the summary.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... diagnostic effectiveness in detecting performance degradation under distribution...

This approach dramatically reduces computation (training and/or evaluation wall-clock time) compared to approaches that sample network weights (Bayesian NNs) or exhaustively explore parameter grids.

Computational evaluation reported in the paper includes empirical examples demonstrating substantial reductions in wall-clock training/evaluation time relative to weight-sampling or exhaustive-parameter-grid baselines (exact datasets, runtimes, and sample sizes not detailed in the summary).

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... wall-clock training time and evaluation time

Training a deterministic neural surrogate conditioned on MCMC-drawn parameter samples reproduces the original (forward) model's uncertainty quantification while avoiding embedding parametric uncertainty inside the network weights.

Methodological description: surrogate is a deterministic NN whose inputs include parameter vectors drawn by MCMC from the model-parameter posterior; uncertainty is recovered by repeatedly evaluating the trained surrogate on those MCMC draws. Empirical examples are reported (details not provided here) showing reproduction of model uncertainty.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... fidelity of uncertainty quantification / posterior predictive distributions prod...

The proposed pipeline (CFD -> CFM -> CFR) forms a closed loop that can assess and improve color fidelity in T2I systems.

Paper describes end-to-end workflow: CFD provides training/validation labels for CFM; CFM produces scores and attention maps for evaluation and localization; CFR consumes CFM attention during generation to refine images. The repository contains code implementing the pipeline.

medium positive Too Vivid to Be Real? Benchmarking and Calibrating Generativ... end-to-end improvement in measured color fidelity when applying CFD-trained CFM ...

Color Fidelity Refinement (CFR) is a training-free inference-time procedure that uses CFM attention maps to adaptively modulate spatial-temporal guidance scales during generation, thereby improving color authenticity of realistic-style T2I outputs without retraining the base model.

Method description in paper: CFR uses CFM's learned attention to identify low-fidelity regions and adapt guidance strength across space and denoising steps (spatial-temporal guidance). The authors evaluate CFR on existing T2I models and report improved perceived color authenticity; no retraining of base T2I models is required (implementation and code available in the repository).

medium positive Too Vivid to Be Real? Benchmarking and Calibrating Generativ... perceived color authenticity of generated images; requirement (or not) to retrai...

CFM aligns better with objective color realism judgments than existing preference-trained metrics and human ratings that favor vividness.

Empirical comparisons reported in the paper: CFM scoring shows improved alignment with CFD-based color-realism labels and with evaluation criteria that prioritize photographic fidelity, outperforming preference-trained metrics and the biased patterns in human ratings (paper reports both qualitative and quantitative gains; specific numerical improvements and test set sizes are provided in the paper/repo).

medium positive Too Vivid to Be Real? Benchmarking and Calibrating Generativ... alignment with color-realism judgments / correlation with CFD ground truth

The Color Fidelity Metric (CFM) is a multimodal encoder–based metric trained on CFD to predict human-consistent judgments of color fidelity and to produce spatial attention maps that localize color-fidelity errors.

Model architecture and training procedure described: a multimodal encoder trained using CFD's ordered realism labels to output scalar fidelity scores and spatial attention maps indicating where color fidelity issues occur. Training supervision comes from CFD's ordered labels (paper includes training/validation procedures; exact training dataset splits are in the paper/repo).

medium positive Too Vivid to Be Real? Benchmarking and Calibrating Generativ... color-fidelity scalar scores and spatial attention maps (localization of color e...

Varying sample size, injecting contaminated data, and including algorithm-reconstruction tasks during training allow networks to automatically inherit those properties (e.g., multi-n behavior, robustness, algorithmic outputs).

Empirical: training regimes described include varying dataset size n, contaminated simulations, and algorithm-reconstruction tasks; experiments reportedly show networks trained with these variations exhibit corresponding behaviors at test time. Specific experimental details (ranges of n, contamination levels) are not included in the summary.

medium positive ForwardFlow: Simulation only statistical inference using dee... presence of targeted properties in trained networks (finite-sample behavior acro...

Collapsing (aggregation) layers mimic reduction to sufficient statistics and enforce the desirable structure for set-valued (permutation-invariant) inputs.

Theoretical/design claim supported by architectural description and motivation: collapsing layers aggregate across observations to produce summaries, enforcing permutation invariance; supported indirectly by empirical success in simulations. This is primarily an architectural/representational argument rather than a purely empirical result.

medium positive ForwardFlow: Simulation only statistical inference using dee... permutation-invariance and quality of summary representations (qualitative/archi...

The network can learn to approximate the outputs of iterative estimation algorithms (demonstrated by learning an EM algorithm for a genetic-data estimation task).

Empirical: a genetic-data example where the network was trained (including an algorithm-reconstruction task) to approximate the EM algorithm outputs; evaluation shows qualitative/quantitative match to the iterative algorithm. Evidence is from reported experiments comparing network outputs to EM outputs (e.g., MSE between them).

medium positive ForwardFlow: Simulation only statistical inference using dee... similarity between network outputs and iterative algorithm outputs (e.g., MSE or...

Training the network with contaminated simulations yields estimators that are robust to contaminated observations at test time.

Empirical: experiments included injecting contaminated data into training simulations; evaluation measured robustness at test time under contamination and showed improved performance relative to networks not trained on contamination. Supported by reported robustness comparisons (metrics like MSE under contamination). Specific contamination rates and sample sizes are not provided in the summary.

medium positive ForwardFlow: Simulation only statistical inference using dee... robustness to contamination (estimation error / MSE under contaminated test data...

A branched neural architecture with collapsing (aggregation) layers that reduce a dataset into permutation-invariant summaries can produce parameter estimates that are exactly finite-sample (i.e., reproduce estimator outputs at finite sample sizes).

Empirical & theoretical motivation: architecture includes collapsing/aggregation layers to implement permutation-invariance and summary reduction; simulation experiments reportedly show the network reproduces reference estimator outputs at finite sample sizes (finite-sample matching). The exact experimental settings (sample sizes, number of replications) are not specified in the summary; evidence comes from simulated benchmarks and comparisons to reference estimators.

medium positive ForwardFlow: Simulation only statistical inference using dee... match to reference estimator outputs at finite sample sizes (exact equality or n...

A single “summary network” trained in a simulation-only framework can solve the inverse problem of parameter estimation for parametric models by mapping simulated datasets to parameters (minimizing MSE).

Empirical: network trained on simulated datasets (each dataset simulated conditional on a known parameter) with a mean-squared-error (MSE) loss between predicted and true parameter; evaluated on synthetic parametric benchmark problems and a genetic-data example. Specific sample sizes and number of simulations are not stated in the provided summary; evidence is based on the reported simulation experiments and benchmark comparisons.

medium positive ForwardFlow: Simulation only statistical inference using dee... parameter estimation accuracy (MSE between predicted parameter and true paramete...

Practical modalities exist for efficient classical estimation of gradients for the covered loss classes: using the classical-approximation machinery to compute analytic gradients or unbiased estimators, finite-difference approaches, and surrogate methods; the paper discusses sample complexity and noise considerations.

Methodological discussion in the paper outlining specific gradient estimation approaches compatible with the classical-approximation results, together with complexity/sample-complexity remarks. This is a methods/algorithmic claim supported by analysis rather than empirical benchmarks.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... efficiency/sample-complexity of gradient estimation procedures

The paper constructs a single-hyperparameter family of BSBMs that monotonically interpolates from weak expressive power up to full universality, enabling a controlled trade-off between simplicity and expressivity.

Explicit one-parameter family construction and monotonicity argument/proof in the paper showing that increasing the hyperparameter increases expressivity and approaches universality. This is a theoretical construction rather than empirical measurement.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... expressive power (as a monotone function of a single hyperparameter)

Classical hardness of exact or approximate sampling from the expanded (ancilla + postprocessing) BSBM family is preserved by relating these models to known hard linear-optical sampling tasks.

Complexity-theoretic reductions and arguments in the paper connecting the expanded BSBM constructions to established hard sampling problems in linear optics (e.g., boson sampling variants). The claim is supported by theoretical reductions rather than empirical hardness measurements.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... classical hardness of sampling (exact/approximate) from the expanded BSBM family

Universality (and therefore potential sampling hardness) can be recovered by expanding the model: adding ancillary modes and applying a constant-function postprocessing generalization restores universality while retaining efficient classical trainability.

Construction and theoretical argument in the paper: introduces ancilla modes and a constant-function postprocessing generalization (analogous to IQP-QCBM techniques), shows how these modifications increase representational power to universality, and demonstrates that the same classical-approximation machinery still allows efficient evaluation/approximation of training losses. The argument includes constructive proofs and reductions.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... generative universality and classical trainability after model expansion

Training can be done classically even when sampling from the trained BSBM is believed to be classically hard (the 'train classically, deploy quantumly' paradigm applies to BSBMs).

Argument combining two parts in the paper: (1) classical-evaluation results for losses/gradients (see above) and (2) separate hardness-of-sampling arguments showing sampling remains classically hard after training. This is a theoretical claim based on the constructions and reductions presented in the paper.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... feasibility of classical training vs. classical hardness of sampling at deployme...

Greater ROI may come from investing in better feedback models (how to use feedback) than solely collecting richer feedback sources.

Empirical finding that feedback model choice often produced larger retrieval-quality improvements than changing the feedback source across the evaluated tasks and methods.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Return on investment (performance improvement per resource invested in model vs....

The study's results clarify which elements of the PRF design space are most important to prioritize in practice (i.e., prioritize feedback-model improvements over source collection in many low-resource settings).

Comparative performance gains observed in controlled experiments showing larger effect sizes from varying feedback model than from varying source, combined with cost analyses.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Relative impact on retrieval performance and cost-effectiveness

Across 13 low-resource BEIR tasks and five LLM PRF methods, the choice of feedback model (how feedback is applied) critically affects retrieval effectiveness.

Empirical results reported over 13 BEIR tasks using five LLM-based PRF methods, with systematic variation of feedback model.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Retrieval effectiveness (standard BEIR metrics)

Purely LLM-generated feedback yields the best cost-effectiveness overall (best performance per unit LLM invocation cost) for low-resource retrieval tasks.

Cost-effectiveness analysis in experiments across 13 BEIR tasks and five PRF methods that accounted for LLM invocation cost versus retrieval gains.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Cost-effectiveness (retrieval gains per LLM invocation cost)

Feedback model choice can have a larger impact on retrieval quality than feedback source.

Controlled experiments comparing five LLM-based PRF methods across 13 low-resource BEIR tasks, measuring retrieval effectiveness with standard BEIR metrics.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Retrieval effectiveness (standard BEIR retrieval metrics)

Demand will grow for hybrid specialists (quantum algorithm engineers, HPC systems integrators, middleware developers) and for domain scientists fluent in hybrid workflows, shifting skill premiums toward interdisciplinary expertise.

Labor-market inference from technology adoption and the skills required by proposed QCSC systems; qualitative only, no labor-market survey data provided.

medium positive Reference Architecture of a Quantum-Centric Supercomputer demand for specific skills, wage premiums for interdisciplinary expertise

Public investment and shared facilities can mitigate entry barriers and diffuse benefits to smaller firms and research groups.

Policy analysis and precedent from shared scientific infrastructure models; no case-study data specific to QCSC presented.

medium positive Reference Architecture of a Quantum-Centric Supercomputer access to QCSC resources by small firms/research groups, reduction in entry barr...

Tightly integrating QPUs, GPUs, and CPUs across hardware, middleware, and application layers (QCSC vision) will enable high-throughput, low-latency hybrid workflows.

Architectural design reasoning and analogies to heterogeneous co-design in classical HPC; no empirical throughput/latency measurements provided.

medium positive Reference Architecture of a Quantum-Centric Supercomputer throughput and end-to-end latency of hybrid quantum-classical workflows

A phased roadmap (offload engines → middleware-coupled heterogeneous systems → fully co-designed heterogeneous systems) and a reference architecture can remove current friction (manual orchestration, scheduling, data transfer) and materially accelerate algorithmic discovery and applied quantum utility.

Roadmap and reference architecture proposed from system decomposition and use-case requirements analysis; argument based on observed friction points from literature and early hybrid deployments; no empirical validation provided.

medium positive Reference Architecture of a Quantum-Centric Supercomputer reduction in manual orchestration, scheduling overhead, data-movement latency; i...

Quantum-Centric Supercomputing (QCSC) — integrated systems co-designing QPUs with classical HPC components and middleware — is necessary to scale hybrid quantum-classical algorithms for chemistry, materials, and other applied research.

Conceptual systems-architecture analysis and synthesis of recent quantum-simulation demonstrations and hybrid algorithms; use-case-driven analysis for chemistry and materials; no new empirical performance benchmarks presented.

medium positive Reference Architecture of a Quantum-Centric Supercomputer scalability and practicability of hybrid quantum-classical algorithm execution (...

Adoption of GNN-based, FL-coordinated beam management can provide competitive differentiation by offering more reliable NTN services in challenging geometries (e.g., low-elevation, edge coverage).

Synthesized implication from experimental results showing improved GNN performance at low elevation angles and the marketing/economic discussion in the paper; no market adoption or field-deployment evidence provided.

medium positive Federated Learning-driven Beam Management in LEO 6G Non-Terr... service reliability in challenging geometries (e.g., low-elevation coverage) and...

FL via HAPS reduces data-centralization costs (bandwidth and storage) and improves privacy compared to sending raw channel data to a central server.

Implication drawn from the FL design used: federated aggregation reduces need to backhaul raw channel samples; paper lists bandwidth/storage and privacy advantages as economic/operational implications (no quantified cost measurements provided).

medium positive Federated Learning-driven Beam Management in LEO 6G Non-Terr... backhaul bandwidth and storage requirements; privacy exposure (qualitative)

The GNN solution is lightweight enough for practical on-board or edge deployment in NTN contexts.

Paper asserts the GNN is lightweight and suitable for on-board or HAPS/edge deployment; model described as designed to be compact for constrained compute/link budgets (no exact parameter counts provided in summary).

medium positive Federated Learning-driven Beam Management in LEO 6G Non-Terr... compute/footprint suitability for on-board or edge deployment (model lightweight...

Federated learning across LEO orbital planes, coordinated via HAPS, enables efficient distributed beam selection for Non-Terrestrial Networks (NTNs).

Experimental design in the paper: federated learning paradigm with orbital-plane clients and HAPS acting as aggregation/coordination points; evaluated on beam-prediction tasks using realistic channel/beamforming datasets and distributed training (no central pooling of raw samples).

medium positive Federated Learning-driven Beam Management in LEO 6G Non-Terr... beam prediction accuracy and stability in a distributed (federated) training set...

DPS compares favorably to standard rollout-based prompt-selection baselines across the reported metrics (rollouts required, training speed, final accuracy).

Empirical comparisons against baseline methods reported in the experiments; specific numeric comparisons and statistical details are not present in the provided summary.

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... relative performance vs baseline on number of rollouts, training speed, and fina...

DPS creates a predictive prior that identifies informative prompts without performing exhaustive rollouts over large candidate batches.

Methodological mechanism plus empirical claim that selection operates via predictive prior and reduces candidate rollouts; supported by experiments vs rollout-filtering baselines.

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... informativeness of selected prompts (as implied by downstream learning gains and...

The DPS inference procedure requires only historical rollout reward signals and therefore adds only a small amount of extra compute compared to the rollouts it avoids.

Practical considerations described in the paper: inference uses past rollout rewards; authors state the extra compute is small relative to avoided rollouts. (No quantified compute-cost ratio in the summary.)

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... additional inference compute relative to avoided rollout compute

DPS improves final reasoning performance (final task accuracy) across evaluated domains: mathematical reasoning, planning, and visual-geometry tasks.

Empirical results reported across those benchmark domains showing improved downstream reasoning accuracy relative to baselines. (Summary does not include exact effect sizes or sample counts.)

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... final reasoning accuracy on benchmarks (mathematics, planning, visual-geometry)

DPS speeds up RL finetuning in terms of required rollout budgets and wall-clock rollout compute.

Reported empirical findings: faster convergence of RL finetuning measured by rollout budgets and wall-clock compute on evaluated tasks. (Exact runtime metrics and sample sizes not provided in the summary.)

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... training speed (rollout budget to convergence; wall-clock rollout compute)

Compared to standard online prompt-selection methods that rely on large candidate-batch rollouts for filtering, DPS substantially reduces the number of redundant (uninformative) rollouts.

Empirical comparisons against rollout-based filtering baselines across benchmark tasks (mathematics, planning, visual-geometry). Specific numeric savings not provided in the summary.

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... number of rollouts (redundant rollouts avoided)

Structural fixes — altering environment design or policy class to ensure the induced Markov chain is ergodic (e.g., ensuring mixing/recurrence or preventing absorbing bad states) — can eliminate the ensemble/time-average gap.

Paper discussion and examples suggesting interventions to change chain structure; conceptual/theoretical proposal supported by illustrative examples (no empirical deployment studies).

medium positive Ergodicity in reinforcement learning ergodicity of induced dynamics and resulting alignment of ensemble and time-aver...

Robust/adversarial and model-uncertainty methods can hedge against trajectories that lead to poor long-run behavior and thus mitigate risks from non-ergodic dynamics.

Survey of robust control and adversarial RL approaches in the paper; conceptual argument linking robustness to protection against adverse sample paths; no new empirical tests.

medium positive Ergodicity in reinforcement learning worst-case or adversarial long-run reward under uncertainty

Ergodic control and sample-path optimality formulations recast control objectives in terms of time averages or almost-sure sample-path criteria rather than ensemble expectations and are therefore appropriate for single-trajectory performance targets.

Survey and formal discussion in the paper connecting ergodic control literature to single-trajectory objectives; theoretical references summarized.

medium positive Ergodicity in reinforcement learning time-average/sample-path optimality of control policies

« Prev 1 2 3 … 125 126 127 … 147 148 Next »