Evidence (5267 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	378	106	59	455	1007
Governance & Regulation	379	176	116	58	739
Research Productivity	240	96	34	294	668
Organizational Efficiency	370	82	63	35	553
Technology Adoption Rate	296	118	66	29	513
Firm Productivity	277	34	68	10	394
AI Safety & Ethics	117	177	44	24	364
Output Quality	244	61	23	26	354
Market Structure	107	123	85	14	334
Decision Quality	168	74	37	19	301
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	89	32	39	9	169
Firm Revenue	96	34	22	—	152
Innovation Output	106	12	21	11	151
Consumer Welfare	70	30	37	7	144
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	75	11	29	6	121
Training Effectiveness	55	12	12	16	96
Error Rate	42	48	6	—	96
Worker Satisfaction	45	32	11	6	94
Task Completion Time	78	5	4	2	89
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	17	9	5	50
Job Displacement	5	31	12	—	48
Social Protection	21	10	6	2	39
Developer Productivity	29	3	3	1	36
Worker Turnover	10	12	—	3	25
Skill Obsolescence	3	19	2	—	24
Creative Output	15	5	3	1	24
Labor Share of Income	10	4	9	—	23

Adoption Remove filter

Application to the eICU Collaborative Research Database demonstrates the practical performance of the KL-shrinkage method on a heterogeneous, multi-center clinical dataset.

Real-data empirical application described in the paper using the eICU database; reported performance comparisons (specific dataset size and metrics are provided in the paper's empirical section but are not specified in this summary).

medium positive Redefining shared information: a heterogeneity-adaptive fram... empirical performance on eICU data (e.g., predictive accuracy, estimation MSE, i...

Extensive simulation studies show the KL-shrinkage estimator is robust and versatile across varying degrees and structures of heterogeneity.

Comprehensive simulation experiments reported in the paper that vary heterogeneity magnitude and structure (simulation details reported in the empirical evaluation section; exact sample sizes/configurations given in the paper).

medium positive Redefining shared information: a heterogeneity-adaptive fram... estimator performance metrics in simulations (e.g., MSE, bias, coverage) across ...

Using KL divergence as the penalty is a natural and tractable choice because KL measures relative information between distributions and leads to convenient geometric/algebraic properties.

Argumentation and mathematical exposition in the methods section explaining properties of KL divergence and demonstrating resulting tractability in algebraic derivations.

medium positive Redefining shared information: a heterogeneity-adaptive fram... tractability of derivations / geometric justification (qualitative)

Inferential procedures (e.g., confidence intervals and hypothesis tests) based on the KL-shrinkage approach are asymptotically valid without assuming parameter homogeneity across datasets.

Asymptotic theoretical results in the paper establishing validity (coverage and test properties) even under heterogeneity assumptions; details in asymptotic analysis section.

medium positive Redefining shared information: a heterogeneity-adaptive fram... asymptotic coverage of confidence intervals and Type I error control of hypothes...

Lowering fixed costs via shared resources can enable more entrants and niche innovators (e.g., specialized clinical apps).

Workshop economic implications and participant assertions in breakout sessions and plenary at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... number of market entrants, emergence of niche products, diversity of suppliers

Public investment in shared data and compute as nonrival public goods will reduce duplication, lower entry barriers, and increase total R&D productivity.

Workshop implications for AI economics articulated by participants and authors as a policy recommendation; rationale stated in the summary document (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... duplication of effort, entry barriers (number of entrants), and aggregate R&D pr...

De-risk pathways from lab to clinic via reproducible benchmarks, continuous monitoring, and cross-sector collaborations (academia, industry, clinicians, regulators).

Workshop translation-focused recommendations and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... time-to-market, reproducibility metrics, and rate of successful clinical transla...

Enable safe, accountable, and resilient platforms (including virtual–physical healthcare ecosystems) to reduce translational risk.

Workshop recommendations addressing safety, resilience, and virtual–physical ecosystems from cross-disciplinary discussion at NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... measures of translational risk (failure rates in translation, incidents, safety ...

Promote scalable validation ecosystems grounded in objective, continuous measures and physics-informed models.

Workshop validation and safety theme recommendations from panels and consensus-building exercises (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... presence and scalability of validation ecosystems; reliability/robustness metric...

Develop clinic workflow–aware systems and human–AI collaboration frameworks to fit real clinical practice and decision chains.

Stated systems and workflows recommendation from expert panels and clinician participants at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... compatibility of AI-enabled systems with clinical workflows; measures of clinici...

Build shared compute infrastructures tailored to medical workloads and validation needs.

Workshop recommendation from infrastructure-themed sessions and consensus outcomes (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... existence and utilization of shared compute infrastructure for medical R&D (comp...

Sustain investment in shared, standardized data infrastructures (datasets, ontologies, benchmarks) to support medical algorithm–hardware co-design.

Workshop infrastructure call presented during breakout sessions and final recommendations at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... availability and use of standardized medical datasets/ontologies/benchmarks

Principal recommendation: shift from isolated algorithm or hardware efforts to integrated algorithm–hardware–workflow co-design for medical contexts.

Stated workshop recommendation derived from panels and cross-disciplinary consensus at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... alignment and integration of R&D efforts (degree of co-design adoption in projec...

Sustained public investment and new validation, governance, and translation ecosystems are needed to de-risk commercialization and accelerate safe, accountable clinical adoption.

Workshop principal recommendation based on qualitative synthesis of expert judgment from participants and breakout outcomes (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... commercialization risk level and speed/rate of clinical adoption

Enabling next-generation medical technologies requires a fundamental reorientation toward algorithm–hardware co-design that is clinic-aware, validated continuously, and backed by shared data and compute infrastructures.

Consensus recommendation from a two-day NSF workshop (Sept 26–27, 2024) in Pittsburgh convening interdisciplinary participants (academic researchers in algorithms and hardware, clinicians, industry leaders). Methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building. Documentation at https://sites.google.com/view/nsfworkshop.

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... successful development and clinical adoption of next-generation medical technolo...

A high-level RL agent dynamically adjusts end-effector interaction forces (contact wrench) in real time based on perception feedback of material location.

Method description: the high-level agent outputs adjustments to interaction force/wrench informed by perception of material location inside the vial; the RL algorithm and detailed observation/action representations are not specified in the summary.

medium positive Learning Adaptive Force Control for Contact-Rich Sample Scra... dynamic adjustment of interaction force/wrench and resulting task performance

A low-level Cartesian impedance controller provides stable, compliant physical interaction for contact stability during scraping.

Control architecture description: the paper uses Cartesian impedance control as the low-level controller intended to handle contact compliance and stability; empirical stability metrics are not given in the summary.

medium positive Learning Adaptive Force Control for Contact-Rich Sample Scra... contact stability / compliant interaction (as enabled by the controller)

The learned policy trained in simulation was successfully transferred to a real Franka Research 3 robot (sim-to-real transfer).

Training in a task-representative simulator followed by deployment on a Franka Research 3 setup in real-world scraping experiments; transfer success is asserted in the paper summary. The evaluation included five material setups on the real robot (exact number of trials per setup not specified).

medium positive Learning Adaptive Force Control for Contact-Rich Sample Scra... sim-to-real transfer success measured via real-world task performance (relative ...

An adaptive control framework that combines a low-level Cartesian impedance controller with a high-level reinforcement learning (RL) agent — guided by perception of material location — enables a robot to learn and adapt the optimal contact wrench for scraping heterogeneous samples in a constrained vial environment.

System design and experiments: the paper describes a two-level control architecture (Cartesian impedance + high-level RL) trained in a task-representative simulation and deployed on a real Franka Research 3 robot. Real-world experiments were performed in a constrained vial scraping task (details on trial counts per condition not provided in the summary).

medium positive Learning Adaptive Force Control for Contact-Rich Sample Scra... ability to learn/adapt optimal contact wrench for successful scraping (task perf...

Automation of routine SE tasks suggests measurable productivity gains at team and firm levels, but quantification requires causal, outcome-based studies (e.g., throughput, defect rates, time-to-market).

Interpretation of literature review findings and survey-reported perceived productivity gains; no causal empirical estimates provided in the paper.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... potential productivity metrics (throughput, defect rates, time-to-market) — not ...

Empirical survey evidence shows generally positive perceptions of AI tools among software engineering professionals and growing adoption.

Cross-sectional survey of software engineering professionals asking about current tool usage and perceived benefits (productivity, quality, speed); absolute respondent count and sampling frame not provided in the summary.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... self-reported perception of AI tools and self-reported adoption rate

ML enables predictive features in software engineering: effort estimation, defect prediction, work prioritization, and risk forecasting that support Agile planning and continuous delivery.

Literature review of ML-for-SE research and practitioner survey reporting use or expectations of predictive features; specific model performance metrics or dataset sizes not reported in the summary.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... availability/use of predictive outputs (e.g., estimated effort, defect risk scor...

NLP techniques improve requirements management and team collaboration by extracting intent from natural-language artifacts (tickets, specs, PRs) and reducing miscommunication.

Synthesis of prior studies in the literature review and survey responses indicating perceived improvement in requirements handling and communication; survey sample size not reported.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... perceived reduction in miscommunication / improved clarity of requirements

RAT data could be valuable for training models that better emulate human interpretive processes; firms owning such data may gain competitive advantage.

Argument in the AI economics section; no empirical model-training experiments or market analyses provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity value of RAT data as training signal; competitive advantage for data-owning firm...

RATs make readable and potentially quantifiable the preparatory interpretive work that contributes to downstream outputs, with implications for labor accounting and human capital valuation.

Theoretical economic and policy discussion in the paper; no empirical measurement or case studies provided to quantify how much preparatory work is captured or its economic value.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity visibility/quantifiability of interpretive labor and potential economic valuatio...

RATs can enable collective sensemaking via shared trails and networked associations among readers.

Conceptual argument and suggested network-analysis methods; illustrated with the speculative WikiRAT use case. No group-level empirical studies reported.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity collective sensemaking artifacts (shared trails, co-read graphs, group understan...

RATs can support richer reader models (personalization and modeling of interpretive behavior) through sequence analysis, embedding/clustering of trajectories, and other analytic techniques.

Proposed analytical methods (sequence analysis, embedding/clustering, network analysis) listed in the paper; no implementation results or quantitative evaluations provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity reader model quality (personalization accuracy, representation of interpretive b...

RATs enable reflective practice by helping readers see and revise their own processes.

Proposed affordance in the paper based on the inspectable nature of RATs and the WikiRAT illustration; suggested as a potential use case rather than empirically demonstrated.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity changes in reflective behavior or self-revision of reading processes

RATs treat reading as a dual kind of creation: (a) creative input work that shapes future artifacts, and (b) a form of creation whose traces are valuable artifacts themselves.

Theoretical proposal and design rationale presented in the paper; illustrated via a speculative prototype (WikiRAT). No empirical validation provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity recognition/value assigned to reading traces as artifacts

Reading Activity Traces (RATs) reconceptualize reading — including navigation, interpretation, and curation across interconnected sources — as creative labor.

Conceptual argument in the paper; supported by theoretical framing and literature review rather than empirical data. No sample size or deployment reported.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity conceptual reclassification of reading (visibility/recognition of interpretive l...

The method lowers the technical barrier for adopting surrogates in economics by removing dependence on specialized Bayesian neural-network techniques while preserving rigorous uncertainty quantification.

Argument in Implications section: decoupling uncertainty quantification from network architecture allows use of deterministic NNs with MCMC-sampled parameter inputs; no user-study or adoption metrics provided.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... ease of adoption / reduction in technical complexity required to obtain UQ-prese...

The theoretical diagnostic (linking distribution mismatch to performance loss) gives practitioners a practical tool to detect when a surrogate trained on one parameter distribution will underperform after recalibration or policy changes.

Paper-provided theoretical result and suggested diagnostic use; empirical validation of the diagnostic is implied but not detailed in the summary.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... diagnostic effectiveness in detecting performance degradation under distribution...

This approach dramatically reduces computation (training and/or evaluation wall-clock time) compared to approaches that sample network weights (Bayesian NNs) or exhaustively explore parameter grids.

Computational evaluation reported in the paper includes empirical examples demonstrating substantial reductions in wall-clock training/evaluation time relative to weight-sampling or exhaustive-parameter-grid baselines (exact datasets, runtimes, and sample sizes not detailed in the summary).

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... wall-clock training time and evaluation time

Training a deterministic neural surrogate conditioned on MCMC-drawn parameter samples reproduces the original (forward) model's uncertainty quantification while avoiding embedding parametric uncertainty inside the network weights.

Methodological description: surrogate is a deterministic NN whose inputs include parameter vectors drawn by MCMC from the model-parameter posterior; uncertainty is recovered by repeatedly evaluating the trained surrogate on those MCMC draws. Empirical examples are reported (details not provided here) showing reproduction of model uncertainty.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... fidelity of uncertainty quantification / posterior predictive distributions prod...

The proposed pipeline (CFD -> CFM -> CFR) forms a closed loop that can assess and improve color fidelity in T2I systems.

Paper describes end-to-end workflow: CFD provides training/validation labels for CFM; CFM produces scores and attention maps for evaluation and localization; CFR consumes CFM attention during generation to refine images. The repository contains code implementing the pipeline.

medium positive Too Vivid to Be Real? Benchmarking and Calibrating Generativ... end-to-end improvement in measured color fidelity when applying CFD-trained CFM ...

Color Fidelity Refinement (CFR) is a training-free inference-time procedure that uses CFM attention maps to adaptively modulate spatial-temporal guidance scales during generation, thereby improving color authenticity of realistic-style T2I outputs without retraining the base model.

Method description in paper: CFR uses CFM's learned attention to identify low-fidelity regions and adapt guidance strength across space and denoising steps (spatial-temporal guidance). The authors evaluate CFR on existing T2I models and report improved perceived color authenticity; no retraining of base T2I models is required (implementation and code available in the repository).

medium positive Too Vivid to Be Real? Benchmarking and Calibrating Generativ... perceived color authenticity of generated images; requirement (or not) to retrai...

CFM aligns better with objective color realism judgments than existing preference-trained metrics and human ratings that favor vividness.

Empirical comparisons reported in the paper: CFM scoring shows improved alignment with CFD-based color-realism labels and with evaluation criteria that prioritize photographic fidelity, outperforming preference-trained metrics and the biased patterns in human ratings (paper reports both qualitative and quantitative gains; specific numerical improvements and test set sizes are provided in the paper/repo).

medium positive Too Vivid to Be Real? Benchmarking and Calibrating Generativ... alignment with color-realism judgments / correlation with CFD ground truth

The Color Fidelity Metric (CFM) is a multimodal encoder–based metric trained on CFD to predict human-consistent judgments of color fidelity and to produce spatial attention maps that localize color-fidelity errors.

Model architecture and training procedure described: a multimodal encoder trained using CFD's ordered realism labels to output scalar fidelity scores and spatial attention maps indicating where color fidelity issues occur. Training supervision comes from CFD's ordered labels (paper includes training/validation procedures; exact training dataset splits are in the paper/repo).

medium positive Too Vivid to Be Real? Benchmarking and Calibrating Generativ... color-fidelity scalar scores and spatial attention maps (localization of color e...

Varying sample size, injecting contaminated data, and including algorithm-reconstruction tasks during training allow networks to automatically inherit those properties (e.g., multi-n behavior, robustness, algorithmic outputs).

Empirical: training regimes described include varying dataset size n, contaminated simulations, and algorithm-reconstruction tasks; experiments reportedly show networks trained with these variations exhibit corresponding behaviors at test time. Specific experimental details (ranges of n, contamination levels) are not included in the summary.

medium positive ForwardFlow: Simulation only statistical inference using dee... presence of targeted properties in trained networks (finite-sample behavior acro...

Collapsing (aggregation) layers mimic reduction to sufficient statistics and enforce the desirable structure for set-valued (permutation-invariant) inputs.

Theoretical/design claim supported by architectural description and motivation: collapsing layers aggregate across observations to produce summaries, enforcing permutation invariance; supported indirectly by empirical success in simulations. This is primarily an architectural/representational argument rather than a purely empirical result.

medium positive ForwardFlow: Simulation only statistical inference using dee... permutation-invariance and quality of summary representations (qualitative/archi...

The network can learn to approximate the outputs of iterative estimation algorithms (demonstrated by learning an EM algorithm for a genetic-data estimation task).

Empirical: a genetic-data example where the network was trained (including an algorithm-reconstruction task) to approximate the EM algorithm outputs; evaluation shows qualitative/quantitative match to the iterative algorithm. Evidence is from reported experiments comparing network outputs to EM outputs (e.g., MSE between them).

medium positive ForwardFlow: Simulation only statistical inference using dee... similarity between network outputs and iterative algorithm outputs (e.g., MSE or...

Training the network with contaminated simulations yields estimators that are robust to contaminated observations at test time.

Empirical: experiments included injecting contaminated data into training simulations; evaluation measured robustness at test time under contamination and showed improved performance relative to networks not trained on contamination. Supported by reported robustness comparisons (metrics like MSE under contamination). Specific contamination rates and sample sizes are not provided in the summary.

medium positive ForwardFlow: Simulation only statistical inference using dee... robustness to contamination (estimation error / MSE under contaminated test data...

A branched neural architecture with collapsing (aggregation) layers that reduce a dataset into permutation-invariant summaries can produce parameter estimates that are exactly finite-sample (i.e., reproduce estimator outputs at finite sample sizes).

Empirical & theoretical motivation: architecture includes collapsing/aggregation layers to implement permutation-invariance and summary reduction; simulation experiments reportedly show the network reproduces reference estimator outputs at finite sample sizes (finite-sample matching). The exact experimental settings (sample sizes, number of replications) are not specified in the summary; evidence comes from simulated benchmarks and comparisons to reference estimators.

medium positive ForwardFlow: Simulation only statistical inference using dee... match to reference estimator outputs at finite sample sizes (exact equality or n...

A single “summary network” trained in a simulation-only framework can solve the inverse problem of parameter estimation for parametric models by mapping simulated datasets to parameters (minimizing MSE).

Empirical: network trained on simulated datasets (each dataset simulated conditional on a known parameter) with a mean-squared-error (MSE) loss between predicted and true parameter; evaluated on synthetic parametric benchmark problems and a genetic-data example. Specific sample sizes and number of simulations are not stated in the provided summary; evidence is based on the reported simulation experiments and benchmark comparisons.

medium positive ForwardFlow: Simulation only statistical inference using dee... parameter estimation accuracy (MSE between predicted parameter and true paramete...

Practical modalities exist for efficient classical estimation of gradients for the covered loss classes: using the classical-approximation machinery to compute analytic gradients or unbiased estimators, finite-difference approaches, and surrogate methods; the paper discusses sample complexity and noise considerations.

Methodological discussion in the paper outlining specific gradient estimation approaches compatible with the classical-approximation results, together with complexity/sample-complexity remarks. This is a methods/algorithmic claim supported by analysis rather than empirical benchmarks.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... efficiency/sample-complexity of gradient estimation procedures

The paper constructs a single-hyperparameter family of BSBMs that monotonically interpolates from weak expressive power up to full universality, enabling a controlled trade-off between simplicity and expressivity.

Explicit one-parameter family construction and monotonicity argument/proof in the paper showing that increasing the hyperparameter increases expressivity and approaches universality. This is a theoretical construction rather than empirical measurement.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... expressive power (as a monotone function of a single hyperparameter)

Classical hardness of exact or approximate sampling from the expanded (ancilla + postprocessing) BSBM family is preserved by relating these models to known hard linear-optical sampling tasks.

Complexity-theoretic reductions and arguments in the paper connecting the expanded BSBM constructions to established hard sampling problems in linear optics (e.g., boson sampling variants). The claim is supported by theoretical reductions rather than empirical hardness measurements.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... classical hardness of sampling (exact/approximate) from the expanded BSBM family

Universality (and therefore potential sampling hardness) can be recovered by expanding the model: adding ancillary modes and applying a constant-function postprocessing generalization restores universality while retaining efficient classical trainability.

Construction and theoretical argument in the paper: introduces ancilla modes and a constant-function postprocessing generalization (analogous to IQP-QCBM techniques), shows how these modifications increase representational power to universality, and demonstrates that the same classical-approximation machinery still allows efficient evaluation/approximation of training losses. The argument includes constructive proofs and reductions.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... generative universality and classical trainability after model expansion

Training can be done classically even when sampling from the trained BSBM is believed to be classically hard (the 'train classically, deploy quantumly' paradigm applies to BSBMs).

Argument combining two parts in the paper: (1) classical-evaluation results for losses/gradients (see above) and (2) separate hardness-of-sampling arguments showing sampling remains classically hard after training. This is a theoretical claim based on the constructions and reductions presented in the paper.

medium positive Universality of Classically Trainable, Quantum-Deployed Boso... feasibility of classical training vs. classical hardness of sampling at deployme...

Greater ROI may come from investing in better feedback models (how to use feedback) than solely collecting richer feedback sources.

Empirical finding that feedback model choice often produced larger retrieval-quality improvements than changing the feedback source across the evaluated tasks and methods.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Return on investment (performance improvement per resource invested in model vs....

« Prev 1 2 3 … 82 83 84 … 105 106 Next »