Evidence (4560 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	378	106	59	455	1007
Governance & Regulation	379	176	116	58	739
Research Productivity	240	96	34	294	668
Organizational Efficiency	370	82	63	35	553
Technology Adoption Rate	296	118	66	29	513
Firm Productivity	277	34	68	10	394
AI Safety & Ethics	117	177	44	24	364
Output Quality	244	61	23	26	354
Market Structure	107	123	85	14	334
Decision Quality	168	74	37	19	301
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	89	32	39	9	169
Firm Revenue	96	34	22	—	152
Innovation Output	106	12	21	11	151
Consumer Welfare	70	30	37	7	144
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	75	11	29	6	121
Training Effectiveness	55	12	12	16	96
Error Rate	42	48	6	—	96
Worker Satisfaction	45	32	11	6	94
Task Completion Time	78	5	4	2	89
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	17	9	5	50
Job Displacement	5	31	12	—	48
Social Protection	21	10	6	2	39
Developer Productivity	29	3	3	1	36
Worker Turnover	10	12	—	3	25
Skill Obsolescence	3	19	2	—	24
Creative Output	15	5	3	1	24
Labor Share of Income	10	4	9	—	23

Productivity Remove filter

In the same co-authoring workflow, intermediate radiologists improve their report quality toward senior-level performance when assisted by CBCTRepD.

Paper reports comparative analyses across experience levels and states intermediates approached senior quality with AI assistance. (Exact metrics, reviewer counts, and quantitative effect sizes are not specified in the provided text.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Final report quality for intermediate radiologists in a co-authoring workflow

When used in a radiologist–AI co-authoring workflow, CBCTRepD consistently improves report quality for novice radiologists, bringing their reports toward intermediate-level quality.

Collaborative evaluation reported in the paper comparing radiologist-edited AI drafts across experience tiers; authors state novices improved toward intermediate-level reporting when using the system. (Details such as number of novice readers, magnitude of improvement, and statistical significance are not provided in the summary.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Final report quality for novice radiologists in a co-authoring workflow

Under a multi-level clinical evaluation (automatic metrics plus radiologist/clinician review), raw AI-generated draft reports from CBCTRepD achieve writing quality and standardization comparable to intermediate radiologists.

Evaluation described as multi-level and clinically grounded, combining automatic text/clinical metrics and radiologist/clinician review; the paper reports a comparison between AI drafts and radiologists stratified by experience (novice, intermediate, senior). (Specific sample sizes of reviewers, statistical tests, and numerical effect sizes are not provided in the supplied summary.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Writing quality and standardization of draft reports (AI drafts vs intermediate ...

A high-level RL agent dynamically adjusts end-effector interaction forces (contact wrench) in real time based on perception feedback of material location.

Method description: the high-level agent outputs adjustments to interaction force/wrench informed by perception of material location inside the vial; the RL algorithm and detailed observation/action representations are not specified in the summary.

medium positive Learning Adaptive Force Control for Contact-Rich Sample Scra... dynamic adjustment of interaction force/wrench and resulting task performance

A low-level Cartesian impedance controller provides stable, compliant physical interaction for contact stability during scraping.

Control architecture description: the paper uses Cartesian impedance control as the low-level controller intended to handle contact compliance and stability; empirical stability metrics are not given in the summary.

medium positive Learning Adaptive Force Control for Contact-Rich Sample Scra... contact stability / compliant interaction (as enabled by the controller)

The learned policy trained in simulation was successfully transferred to a real Franka Research 3 robot (sim-to-real transfer).

Training in a task-representative simulator followed by deployment on a Franka Research 3 setup in real-world scraping experiments; transfer success is asserted in the paper summary. The evaluation included five material setups on the real robot (exact number of trials per setup not specified).

medium positive Learning Adaptive Force Control for Contact-Rich Sample Scra... sim-to-real transfer success measured via real-world task performance (relative ...

An adaptive control framework that combines a low-level Cartesian impedance controller with a high-level reinforcement learning (RL) agent — guided by perception of material location — enables a robot to learn and adapt the optimal contact wrench for scraping heterogeneous samples in a constrained vial environment.

System design and experiments: the paper describes a two-level control architecture (Cartesian impedance + high-level RL) trained in a task-representative simulation and deployed on a real Franka Research 3 robot. Real-world experiments were performed in a constrained vial scraping task (details on trial counts per condition not provided in the summary).

medium positive Learning Adaptive Force Control for Contact-Rich Sample Scra... ability to learn/adapt optimal contact wrench for successful scraping (task perf...

Automation of routine SE tasks suggests measurable productivity gains at team and firm levels, but quantification requires causal, outcome-based studies (e.g., throughput, defect rates, time-to-market).

Interpretation of literature review findings and survey-reported perceived productivity gains; no causal empirical estimates provided in the paper.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... potential productivity metrics (throughput, defect rates, time-to-market) — not ...

Empirical survey evidence shows generally positive perceptions of AI tools among software engineering professionals and growing adoption.

Cross-sectional survey of software engineering professionals asking about current tool usage and perceived benefits (productivity, quality, speed); absolute respondent count and sampling frame not provided in the summary.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... self-reported perception of AI tools and self-reported adoption rate

ML enables predictive features in software engineering: effort estimation, defect prediction, work prioritization, and risk forecasting that support Agile planning and continuous delivery.

Literature review of ML-for-SE research and practitioner survey reporting use or expectations of predictive features; specific model performance metrics or dataset sizes not reported in the summary.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... availability/use of predictive outputs (e.g., estimated effort, defect risk scor...

NLP techniques improve requirements management and team collaboration by extracting intent from natural-language artifacts (tickets, specs, PRs) and reducing miscommunication.

Synthesis of prior studies in the literature review and survey responses indicating perceived improvement in requirements handling and communication; survey sample size not reported.

medium positive Artificial Intelligence as a Catalyst for Innovation in Soft... perceived reduction in miscommunication / improved clarity of requirements

RAT data could be valuable for training models that better emulate human interpretive processes; firms owning such data may gain competitive advantage.

Argument in the AI economics section; no empirical model-training experiments or market analyses provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity value of RAT data as training signal; competitive advantage for data-owning firm...

RATs make readable and potentially quantifiable the preparatory interpretive work that contributes to downstream outputs, with implications for labor accounting and human capital valuation.

Theoretical economic and policy discussion in the paper; no empirical measurement or case studies provided to quantify how much preparatory work is captured or its economic value.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity visibility/quantifiability of interpretive labor and potential economic valuatio...

RATs can enable collective sensemaking via shared trails and networked associations among readers.

Conceptual argument and suggested network-analysis methods; illustrated with the speculative WikiRAT use case. No group-level empirical studies reported.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity collective sensemaking artifacts (shared trails, co-read graphs, group understan...

RATs can support richer reader models (personalization and modeling of interpretive behavior) through sequence analysis, embedding/clustering of trajectories, and other analytic techniques.

Proposed analytical methods (sequence analysis, embedding/clustering, network analysis) listed in the paper; no implementation results or quantitative evaluations provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity reader model quality (personalization accuracy, representation of interpretive b...

RATs enable reflective practice by helping readers see and revise their own processes.

Proposed affordance in the paper based on the inspectable nature of RATs and the WikiRAT illustration; suggested as a potential use case rather than empirically demonstrated.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity changes in reflective behavior or self-revision of reading processes

RATs treat reading as a dual kind of creation: (a) creative input work that shapes future artifacts, and (b) a form of creation whose traces are valuable artifacts themselves.

Theoretical proposal and design rationale presented in the paper; illustrated via a speculative prototype (WikiRAT). No empirical validation provided.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity recognition/value assigned to reading traces as artifacts

Reading Activity Traces (RATs) reconceptualize reading — including navigation, interpretation, and curation across interconnected sources — as creative labor.

Conceptual argument in the paper; supported by theoretical framing and literature review rather than empirical data. No sample size or deployment reported.

medium positive Chasing RATs: Tracing Reading for and as Creative Activity conceptual reclassification of reading (visibility/recognition of interpretive l...

The method lowers the technical barrier for adopting surrogates in economics by removing dependence on specialized Bayesian neural-network techniques while preserving rigorous uncertainty quantification.

Argument in Implications section: decoupling uncertainty quantification from network architecture allows use of deterministic NNs with MCMC-sampled parameter inputs; no user-study or adoption metrics provided.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... ease of adoption / reduction in technical complexity required to obtain UQ-prese...

The theoretical diagnostic (linking distribution mismatch to performance loss) gives practitioners a practical tool to detect when a surrogate trained on one parameter distribution will underperform after recalibration or policy changes.

Paper-provided theoretical result and suggested diagnostic use; empirical validation of the diagnostic is implied but not detailed in the summary.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... diagnostic effectiveness in detecting performance degradation under distribution...

This approach dramatically reduces computation (training and/or evaluation wall-clock time) compared to approaches that sample network weights (Bayesian NNs) or exhaustively explore parameter grids.

Computational evaluation reported in the paper includes empirical examples demonstrating substantial reductions in wall-clock training/evaluation time relative to weight-sampling or exhaustive-parameter-grid baselines (exact datasets, runtimes, and sample sizes not detailed in the summary).

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... wall-clock training time and evaluation time

Training a deterministic neural surrogate conditioned on MCMC-drawn parameter samples reproduces the original (forward) model's uncertainty quantification while avoiding embedding parametric uncertainty inside the network weights.

Methodological description: surrogate is a deterministic NN whose inputs include parameter vectors drawn by MCMC from the model-parameter posterior; uncertainty is recovered by repeatedly evaluating the trained surrogate on those MCMC draws. Empirical examples are reported (details not provided here) showing reproduction of model uncertainty.

medium positive MCMC Informed Neural Emulators for Uncertainty Quantificatio... fidelity of uncertainty quantification / posterior predictive distributions prod...

Fewer expensive evaluations translate directly to lower compute hours and therefore lower cloud/on-premise costs for computational materials or chemistry R&D.

Implication discussed in the paper's implications section: economic argument linking reduced expensive evaluations to lower compute cost; not an experimental result but an economic extrapolation based on the reported reduction in evaluations.

medium positive Bayesian Optimization with Gaussian Processes to Accelerate ... compute hours / monetary cost per scientific result

Correct application of the described elements (GP with derivatives, inverse-distance kernels, active acquisition, OT sampling, MAP regularization, trust-region control, RFF scaling) reduces the number of expensive underlying-theory (energy/force) evaluations by roughly an order of magnitude while preserving underlying-theory accuracy.

Empirical claim reported in the paper: benchmarks and experiments on representative potential energy surface problems (specific datasets and numerical results are said to be presented in the paper and accompanying code); summary states an approximately one order-of-magnitude reduction in expensive evaluations with preserved accuracy.

medium positive Bayesian Optimization with Gaussian Processes to Accelerate ... number of expensive energy/force evaluations required to reach a given accuracy ...

Random Fourier features are used to decouple hyperparameter training from prediction, yielding favorable computational scaling for high-dimensional systems.

Paper describes use of random Fourier features to approximate kernels so hyperparameter fitting can be done largely independently of prediction-time complexity; complexity/scaling claims supported by methodological argument and empirical timings in the paper/code.

medium positive Bayesian Optimization with Gaussian Processes to Accelerate ... computational scaling (training vs prediction time) in higher-dimensional config...

MAP regularization via a variance barrier plus oscillation detection prevents surrogate-induced pathologies and non-convergent search behavior.

Paper describes MAP priors (variance barrier) and oscillation-detection diagnostics as regularization and robustness measures; authors report these measures prevent instabilities in surrogate-driven searches in their experiments.

medium positive Bayesian Optimization with Gaussian Processes to Accelerate ... incidence of surrogate-induced instabilities or non-convergence in optimization ...

Using Optimal Transport (Earth Mover’s Distance) for farthest-point sampling diversifies the training points in configuration space.

Paper introduces EMD-based farthest-point sampling as an extension and reports its use in experiments; implementation described in methods and code.

medium positive Bayesian Optimization with Gaussian Processes to Accelerate ... diversity of training points sampled in configuration space (sampling distributi...

Inverse-distance kernels better capture atomic interactions in configuration space than generic kernels for these surrogate models.

Paper argues and uses inverse-distance kernel design to reflect physical interatomic distance dependence; benchmark comparisons reported in the paper (details in main text and codebase).

medium positive Bayesian Optimization with Gaussian Processes to Accelerate ... surrogate quality / predictive accuracy on atomic configurations (kernel perform...

Gaussian process (GP) surrogates that incorporate derivative observations (e.g., forces) improve the fidelity of the surrogate model and provide better local estimates of gradients and Hessians.

Paper describes GP regression with value and derivative observations used to constrain the surrogate; experiments/benchmarks reported in the paper and code demonstrate use of derivative observations in surrogate training (exact datasets and sample sizes referenced in paper/code).

medium positive Bayesian Optimization with Gaussian Processes to Accelerate ... surrogate fidelity as assessed by local gradient/Hessian accuracy and downstream...

Greater ROI may come from investing in better feedback models (how to use feedback) than solely collecting richer feedback sources.

Empirical finding that feedback model choice often produced larger retrieval-quality improvements than changing the feedback source across the evaluated tasks and methods.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Return on investment (performance improvement per resource invested in model vs....

The study's results clarify which elements of the PRF design space are most important to prioritize in practice (i.e., prioritize feedback-model improvements over source collection in many low-resource settings).

Comparative performance gains observed in controlled experiments showing larger effect sizes from varying feedback model than from varying source, combined with cost analyses.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Relative impact on retrieval performance and cost-effectiveness

Across 13 low-resource BEIR tasks and five LLM PRF methods, the choice of feedback model (how feedback is applied) critically affects retrieval effectiveness.

Empirical results reported over 13 BEIR tasks using five LLM-based PRF methods, with systematic variation of feedback model.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Retrieval effectiveness (standard BEIR metrics)

Purely LLM-generated feedback yields the best cost-effectiveness overall (best performance per unit LLM invocation cost) for low-resource retrieval tasks.

Cost-effectiveness analysis in experiments across 13 BEIR tasks and five PRF methods that accounted for LLM invocation cost versus retrieval gains.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Cost-effectiveness (retrieval gains per LLM invocation cost)

Feedback model choice can have a larger impact on retrieval quality than feedback source.

Controlled experiments comparing five LLM-based PRF methods across 13 low-resource BEIR tasks, measuring retrieval effectiveness with standard BEIR metrics.

medium positive A Systematic Study of Pseudo-Relevance Feedback with LLMs Retrieval effectiveness (standard BEIR retrieval metrics)

Demand will grow for hybrid specialists (quantum algorithm engineers, HPC systems integrators, middleware developers) and for domain scientists fluent in hybrid workflows, shifting skill premiums toward interdisciplinary expertise.

Labor-market inference from technology adoption and the skills required by proposed QCSC systems; qualitative only, no labor-market survey data provided.

medium positive Reference Architecture of a Quantum-Centric Supercomputer demand for specific skills, wage premiums for interdisciplinary expertise

Public investment and shared facilities can mitigate entry barriers and diffuse benefits to smaller firms and research groups.

Policy analysis and precedent from shared scientific infrastructure models; no case-study data specific to QCSC presented.

medium positive Reference Architecture of a Quantum-Centric Supercomputer access to QCSC resources by small firms/research groups, reduction in entry barr...

Tightly integrating QPUs, GPUs, and CPUs across hardware, middleware, and application layers (QCSC vision) will enable high-throughput, low-latency hybrid workflows.

Architectural design reasoning and analogies to heterogeneous co-design in classical HPC; no empirical throughput/latency measurements provided.

medium positive Reference Architecture of a Quantum-Centric Supercomputer throughput and end-to-end latency of hybrid quantum-classical workflows

A phased roadmap (offload engines → middleware-coupled heterogeneous systems → fully co-designed heterogeneous systems) and a reference architecture can remove current friction (manual orchestration, scheduling, data transfer) and materially accelerate algorithmic discovery and applied quantum utility.

Roadmap and reference architecture proposed from system decomposition and use-case requirements analysis; argument based on observed friction points from literature and early hybrid deployments; no empirical validation provided.

medium positive Reference Architecture of a Quantum-Centric Supercomputer reduction in manual orchestration, scheduling overhead, data-movement latency; i...

Quantum-Centric Supercomputing (QCSC) — integrated systems co-designing QPUs with classical HPC components and middleware — is necessary to scale hybrid quantum-classical algorithms for chemistry, materials, and other applied research.

Conceptual systems-architecture analysis and synthesis of recent quantum-simulation demonstrations and hybrid algorithms; use-case-driven analysis for chemistry and materials; no new empirical performance benchmarks presented.

medium positive Reference Architecture of a Quantum-Centric Supercomputer scalability and practicability of hybrid quantum-classical algorithm execution (...

Adoption of GNN-based, FL-coordinated beam management can provide competitive differentiation by offering more reliable NTN services in challenging geometries (e.g., low-elevation, edge coverage).

Synthesized implication from experimental results showing improved GNN performance at low elevation angles and the marketing/economic discussion in the paper; no market adoption or field-deployment evidence provided.

medium positive Federated Learning-driven Beam Management in LEO 6G Non-Terr... service reliability in challenging geometries (e.g., low-elevation coverage) and...

FL via HAPS reduces data-centralization costs (bandwidth and storage) and improves privacy compared to sending raw channel data to a central server.

Implication drawn from the FL design used: federated aggregation reduces need to backhaul raw channel samples; paper lists bandwidth/storage and privacy advantages as economic/operational implications (no quantified cost measurements provided).

medium positive Federated Learning-driven Beam Management in LEO 6G Non-Terr... backhaul bandwidth and storage requirements; privacy exposure (qualitative)

The GNN solution is lightweight enough for practical on-board or edge deployment in NTN contexts.

Paper asserts the GNN is lightweight and suitable for on-board or HAPS/edge deployment; model described as designed to be compact for constrained compute/link budgets (no exact parameter counts provided in summary).

medium positive Federated Learning-driven Beam Management in LEO 6G Non-Terr... compute/footprint suitability for on-board or edge deployment (model lightweight...

Federated learning across LEO orbital planes, coordinated via HAPS, enables efficient distributed beam selection for Non-Terrestrial Networks (NTNs).

Experimental design in the paper: federated learning paradigm with orbital-plane clients and HAPS acting as aggregation/coordination points; evaluated on beam-prediction tasks using realistic channel/beamforming datasets and distributed training (no central pooling of raw samples).

medium positive Federated Learning-driven Beam Management in LEO 6G Non-Terr... beam prediction accuracy and stability in a distributed (federated) training set...

DPS compares favorably to standard rollout-based prompt-selection baselines across the reported metrics (rollouts required, training speed, final accuracy).

Empirical comparisons against baseline methods reported in the experiments; specific numeric comparisons and statistical details are not present in the provided summary.

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... relative performance vs baseline on number of rollouts, training speed, and fina...

DPS creates a predictive prior that identifies informative prompts without performing exhaustive rollouts over large candidate batches.

Methodological mechanism plus empirical claim that selection operates via predictive prior and reduces candidate rollouts; supported by experiments vs rollout-filtering baselines.

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... informativeness of selected prompts (as implied by downstream learning gains and...

The DPS inference procedure requires only historical rollout reward signals and therefore adds only a small amount of extra compute compared to the rollouts it avoids.

Practical considerations described in the paper: inference uses past rollout rewards; authors state the extra compute is small relative to avoided rollouts. (No quantified compute-cost ratio in the summary.)

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... additional inference compute relative to avoided rollout compute

DPS improves final reasoning performance (final task accuracy) across evaluated domains: mathematical reasoning, planning, and visual-geometry tasks.

Empirical results reported across those benchmark domains showing improved downstream reasoning accuracy relative to baselines. (Summary does not include exact effect sizes or sample counts.)

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... final reasoning accuracy on benchmarks (mathematics, planning, visual-geometry)

DPS speeds up RL finetuning in terms of required rollout budgets and wall-clock rollout compute.

Reported empirical findings: faster convergence of RL finetuning measured by rollout budgets and wall-clock compute on evaluated tasks. (Exact runtime metrics and sample sizes not provided in the summary.)

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... training speed (rollout budget to convergence; wall-clock rollout compute)

Compared to standard online prompt-selection methods that rely on large candidate-batch rollouts for filtering, DPS substantially reduces the number of redundant (uninformative) rollouts.

Empirical comparisons against rollout-based filtering baselines across benchmark tasks (mathematics, planning, visual-geometry). Specific numeric savings not provided in the summary.

medium positive Dynamics-Predictive Sampling for Active RL Finetuning of Lar... number of rollouts (redundant rollouts avoided)

Structural fixes — altering environment design or policy class to ensure the induced Markov chain is ergodic (e.g., ensuring mixing/recurrence or preventing absorbing bad states) — can eliminate the ensemble/time-average gap.

Paper discussion and examples suggesting interventions to change chain structure; conceptual/theoretical proposal supported by illustrative examples (no empirical deployment studies).

medium positive Ergodicity in reinforcement learning ergodicity of induced dynamics and resulting alignment of ensemble and time-aver...

« Prev 1 2 3 … 70 71 72 … 91 92 Next »