Evidence (11677 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5921 claims
Human-AI Collaboration
5192 claims
Org Design
3497 claims
Innovation
3492 claims
Labor Markets
3231 claims
Skills & Training
2608 claims
Inequality
1842 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 738 | 1617 |
| Governance & Regulation | 671 | 334 | 160 | 99 | 1285 |
| Organizational Efficiency | 626 | 147 | 105 | 70 | 955 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 349 | 109 | 48 | 322 | 838 |
| Output Quality | 391 | 121 | 45 | 40 | 597 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 277 | 145 | 63 | 34 | 526 |
| AI Safety & Ethics | 189 | 244 | 59 | 30 | 526 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 106 | 40 | 6 | 188 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 79 | 8 | 1 | 152 |
| Regulatory Compliance | 69 | 66 | 14 | 3 | 152 |
| Training Effectiveness | 82 | 16 | 13 | 18 | 131 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Correct application of the described elements (GP with derivatives, inverse-distance kernels, active acquisition, OT sampling, MAP regularization, trust-region control, RFF scaling) reduces the number of expensive underlying-theory (energy/force) evaluations by roughly an order of magnitude while preserving underlying-theory accuracy.
Empirical claim reported in the paper: benchmarks and experiments on representative potential energy surface problems (specific datasets and numerical results are said to be presented in the paper and accompanying code); summary states an approximately one order-of-magnitude reduction in expensive evaluations with preserved accuracy.
Random Fourier features are used to decouple hyperparameter training from prediction, yielding favorable computational scaling for high-dimensional systems.
Paper describes use of random Fourier features to approximate kernels so hyperparameter fitting can be done largely independently of prediction-time complexity; complexity/scaling claims supported by methodological argument and empirical timings in the paper/code.
MAP regularization via a variance barrier plus oscillation detection prevents surrogate-induced pathologies and non-convergent search behavior.
Paper describes MAP priors (variance barrier) and oscillation-detection diagnostics as regularization and robustness measures; authors report these measures prevent instabilities in surrogate-driven searches in their experiments.
Using Optimal Transport (Earth Mover’s Distance) for farthest-point sampling diversifies the training points in configuration space.
Paper introduces EMD-based farthest-point sampling as an extension and reports its use in experiments; implementation described in methods and code.
Inverse-distance kernels better capture atomic interactions in configuration space than generic kernels for these surrogate models.
Paper argues and uses inverse-distance kernel design to reflect physical interatomic distance dependence; benchmark comparisons reported in the paper (details in main text and codebase).
Gaussian process (GP) surrogates that incorporate derivative observations (e.g., forces) improve the fidelity of the surrogate model and provide better local estimates of gradients and Hessians.
Paper describes GP regression with value and derivative observations used to constrain the surrogate; experiments/benchmarks reported in the paper and code demonstrate use of derivative observations in surrogate training (exact datasets and sample sizes referenced in paper/code).
Practical modalities exist for efficient classical estimation of gradients for the covered loss classes: using the classical-approximation machinery to compute analytic gradients or unbiased estimators, finite-difference approaches, and surrogate methods; the paper discusses sample complexity and noise considerations.
Methodological discussion in the paper outlining specific gradient estimation approaches compatible with the classical-approximation results, together with complexity/sample-complexity remarks. This is a methods/algorithmic claim supported by analysis rather than empirical benchmarks.
The paper constructs a single-hyperparameter family of BSBMs that monotonically interpolates from weak expressive power up to full universality, enabling a controlled trade-off between simplicity and expressivity.
Explicit one-parameter family construction and monotonicity argument/proof in the paper showing that increasing the hyperparameter increases expressivity and approaches universality. This is a theoretical construction rather than empirical measurement.
Classical hardness of exact or approximate sampling from the expanded (ancilla + postprocessing) BSBM family is preserved by relating these models to known hard linear-optical sampling tasks.
Complexity-theoretic reductions and arguments in the paper connecting the expanded BSBM constructions to established hard sampling problems in linear optics (e.g., boson sampling variants). The claim is supported by theoretical reductions rather than empirical hardness measurements.
Universality (and therefore potential sampling hardness) can be recovered by expanding the model: adding ancillary modes and applying a constant-function postprocessing generalization restores universality while retaining efficient classical trainability.
Construction and theoretical argument in the paper: introduces ancilla modes and a constant-function postprocessing generalization (analogous to IQP-QCBM techniques), shows how these modifications increase representational power to universality, and demonstrates that the same classical-approximation machinery still allows efficient evaluation/approximation of training losses. The argument includes constructive proofs and reductions.
Training can be done classically even when sampling from the trained BSBM is believed to be classically hard (the 'train classically, deploy quantumly' paradigm applies to BSBMs).
Argument combining two parts in the paper: (1) classical-evaluation results for losses/gradients (see above) and (2) separate hardness-of-sampling arguments showing sampling remains classically hard after training. This is a theoretical claim based on the constructions and reductions presented in the paper.
Greater ROI may come from investing in better feedback models (how to use feedback) than solely collecting richer feedback sources.
Empirical finding that feedback model choice often produced larger retrieval-quality improvements than changing the feedback source across the evaluated tasks and methods.
The study's results clarify which elements of the PRF design space are most important to prioritize in practice (i.e., prioritize feedback-model improvements over source collection in many low-resource settings).
Comparative performance gains observed in controlled experiments showing larger effect sizes from varying feedback model than from varying source, combined with cost analyses.
Across 13 low-resource BEIR tasks and five LLM PRF methods, the choice of feedback model (how feedback is applied) critically affects retrieval effectiveness.
Empirical results reported over 13 BEIR tasks using five LLM-based PRF methods, with systematic variation of feedback model.
Purely LLM-generated feedback yields the best cost-effectiveness overall (best performance per unit LLM invocation cost) for low-resource retrieval tasks.
Cost-effectiveness analysis in experiments across 13 BEIR tasks and five PRF methods that accounted for LLM invocation cost versus retrieval gains.
Feedback model choice can have a larger impact on retrieval quality than feedback source.
Controlled experiments comparing five LLM-based PRF methods across 13 low-resource BEIR tasks, measuring retrieval effectiveness with standard BEIR metrics.
Demand will grow for hybrid specialists (quantum algorithm engineers, HPC systems integrators, middleware developers) and for domain scientists fluent in hybrid workflows, shifting skill premiums toward interdisciplinary expertise.
Labor-market inference from technology adoption and the skills required by proposed QCSC systems; qualitative only, no labor-market survey data provided.
Public investment and shared facilities can mitigate entry barriers and diffuse benefits to smaller firms and research groups.
Policy analysis and precedent from shared scientific infrastructure models; no case-study data specific to QCSC presented.
Tightly integrating QPUs, GPUs, and CPUs across hardware, middleware, and application layers (QCSC vision) will enable high-throughput, low-latency hybrid workflows.
Architectural design reasoning and analogies to heterogeneous co-design in classical HPC; no empirical throughput/latency measurements provided.
A phased roadmap (offload engines → middleware-coupled heterogeneous systems → fully co-designed heterogeneous systems) and a reference architecture can remove current friction (manual orchestration, scheduling, data transfer) and materially accelerate algorithmic discovery and applied quantum utility.
Roadmap and reference architecture proposed from system decomposition and use-case requirements analysis; argument based on observed friction points from literature and early hybrid deployments; no empirical validation provided.
Quantum-Centric Supercomputing (QCSC) — integrated systems co-designing QPUs with classical HPC components and middleware — is necessary to scale hybrid quantum-classical algorithms for chemistry, materials, and other applied research.
Conceptual systems-architecture analysis and synthesis of recent quantum-simulation demonstrations and hybrid algorithms; use-case-driven analysis for chemistry and materials; no new empirical performance benchmarks presented.
Adoption of GNN-based, FL-coordinated beam management can provide competitive differentiation by offering more reliable NTN services in challenging geometries (e.g., low-elevation, edge coverage).
Synthesized implication from experimental results showing improved GNN performance at low elevation angles and the marketing/economic discussion in the paper; no market adoption or field-deployment evidence provided.
FL via HAPS reduces data-centralization costs (bandwidth and storage) and improves privacy compared to sending raw channel data to a central server.
Implication drawn from the FL design used: federated aggregation reduces need to backhaul raw channel samples; paper lists bandwidth/storage and privacy advantages as economic/operational implications (no quantified cost measurements provided).
The GNN solution is lightweight enough for practical on-board or edge deployment in NTN contexts.
Paper asserts the GNN is lightweight and suitable for on-board or HAPS/edge deployment; model described as designed to be compact for constrained compute/link budgets (no exact parameter counts provided in summary).
Federated learning across LEO orbital planes, coordinated via HAPS, enables efficient distributed beam selection for Non-Terrestrial Networks (NTNs).
Experimental design in the paper: federated learning paradigm with orbital-plane clients and HAPS acting as aggregation/coordination points; evaluated on beam-prediction tasks using realistic channel/beamforming datasets and distributed training (no central pooling of raw samples).
DPS compares favorably to standard rollout-based prompt-selection baselines across the reported metrics (rollouts required, training speed, final accuracy).
Empirical comparisons against baseline methods reported in the experiments; specific numeric comparisons and statistical details are not present in the provided summary.
DPS creates a predictive prior that identifies informative prompts without performing exhaustive rollouts over large candidate batches.
Methodological mechanism plus empirical claim that selection operates via predictive prior and reduces candidate rollouts; supported by experiments vs rollout-filtering baselines.
The DPS inference procedure requires only historical rollout reward signals and therefore adds only a small amount of extra compute compared to the rollouts it avoids.
Practical considerations described in the paper: inference uses past rollout rewards; authors state the extra compute is small relative to avoided rollouts. (No quantified compute-cost ratio in the summary.)
DPS improves final reasoning performance (final task accuracy) across evaluated domains: mathematical reasoning, planning, and visual-geometry tasks.
Empirical results reported across those benchmark domains showing improved downstream reasoning accuracy relative to baselines. (Summary does not include exact effect sizes or sample counts.)
DPS speeds up RL finetuning in terms of required rollout budgets and wall-clock rollout compute.
Reported empirical findings: faster convergence of RL finetuning measured by rollout budgets and wall-clock compute on evaluated tasks. (Exact runtime metrics and sample sizes not provided in the summary.)
Compared to standard online prompt-selection methods that rely on large candidate-batch rollouts for filtering, DPS substantially reduces the number of redundant (uninformative) rollouts.
Empirical comparisons against rollout-based filtering baselines across benchmark tasks (mathematics, planning, visual-geometry). Specific numeric savings not provided in the summary.
Structural fixes — altering environment design or policy class to ensure the induced Markov chain is ergodic (e.g., ensuring mixing/recurrence or preventing absorbing bad states) — can eliminate the ensemble/time-average gap.
Paper discussion and examples suggesting interventions to change chain structure; conceptual/theoretical proposal supported by illustrative examples (no empirical deployment studies).
Robust/adversarial and model-uncertainty methods can hedge against trajectories that lead to poor long-run behavior and thus mitigate risks from non-ergodic dynamics.
Survey of robust control and adversarial RL approaches in the paper; conceptual argument linking robustness to protection against adverse sample paths; no new empirical tests.
Ergodic control and sample-path optimality formulations recast control objectives in terms of time averages or almost-sure sample-path criteria rather than ensemble expectations and are therefore appropriate for single-trajectory performance targets.
Survey and formal discussion in the paper connecting ergodic control literature to single-trajectory objectives; theoretical references summarized.
Almost-sure and probabilistic constraint methods (chance constraints, safe RL) can enforce that long-run performance exceeds thresholds with high probability, addressing single-trajectory guarantees.
Surveyed methodologies and references in the paper describing chance-constrained and safe RL formulations; conceptual synthesis rather than empirical demonstration.
Distributional reinforcement learning (optimizing the full return distribution) enables optimizing objectives such as median, lower quantiles, or CVaR which better reflect single-run guarantees.
Literature survey in the paper citing distributional RL approaches and linking them conceptually to single-trajectory guarantees; no new experiments provided.
Risk-sensitive and utility-based objectives (e.g., maximize expected utility such as log-utility or minimize downside risk) can produce policies that prefer more reliable time-average outcomes compared to raw expected-reward objectives.
Surveyed literature in the paper summarizing risk-sensitive and utility-based RL approaches; conceptual argument rather than new empirical validation.
Numerical simulations confirm the analytic extreme-value scaling for earliest discoveries and demonstrate that introducing non-reciprocal biases leads to stable monopolies whereas symmetric interactions do not.
Numerical simulations (stochastic realizations) reported in the paper used to validate analytic predictions and illustrate dynamical outcomes; however, the summary does not specify simulation sample sizes, parameter sweeps, or robustness checks.
Empirically, RAD improves out-of-distribution (OOD) robustness (OOD harmlessness) compared to baselines.
Out-of-distribution harmlessness evaluations reported in the paper showing RAD performs better than baselines on OOD safety tests (exact experimental details not provided in the summary).
Empirically, RAD improves harmlessness relative to baseline RLHF methods.
Empirical evaluations reported in the paper comparing RAD to baseline RLHF methods on harmlessness metrics (specific datasets, sample sizes, and exact metrics not provided in the summary).
Entropic regularization plus Sinkhorn iterations yields a differentiable, computationally tractable objective suitable for end-to-end optimization with policy gradient methods.
Algorithmic design and implementation details in the paper showing use of entropic-regularized OT and Sinkhorn; claimed compatibility with policy-gradient/end-to-end training (no concrete runtime benchmarks or sample-complexity numbers in the summary).
AI-enabled forecasting can raise operational productivity by reducing forecasting error, stockouts, and excess inventory, but realized returns depend on organizational complements (processes, governance).
Authors' synthesis of case evidence where AI forecasting reduced errors and inventory problems, combined with the theoretical claim that organizational complements condition realized gains.
Critical enablers for successful ISP adoption include executive sponsorship, cross-functional processes, data quality/governance, shared KPIs, and continuous learning cycles.
Recurring themes identified across the five case studies and synthesized in the authors' cross-case analysis as necessary organizational complements.
AI-enabled forecasting combined with ERP integration leads to better synchronization across procurement, production, inventory, and distribution; improved decision visibility; and reduced forecasting errors where implemented.
Reported outcomes from cases in which firms implemented AI forecasting and ERP integration; interviewees described improved synchronization and lower forecasting errors (qualitative reports rather than quantified effect sizes).
Policy recommendations: economists and policymakers should perform cost–benefit analyses of explainability mandates, incentivize research into human-centered explanation methods, subsidize standards and certification infrastructure, and consider staged regulation balancing innovation with accountability in high-risk domains.
Prescriptive recommendations drawn by the paper's authors from the review of technical, social-science, and policy literatures; based on synthesis rather than empirical testing of policy impacts.
Clearer explanations and audit trails make it easier to assign responsibility and price risk (insurance markets, contract terms), potentially reducing uncertainty in public procurement and private contracts.
Economic and legal literature included in the review providing conceptual arguments and illustrative cases; no new empirical risk-pricing estimates provided in the paper.
Better explainability (when usable) raises willingness-to-adopt AI in regulated, risk-averse sectors by reducing information asymmetries and perceived liability—potentially expanding market size for explainable systems.
Economic and conceptual arguments synthesized from the reviewed literature; the review aggregates studies and arguments but does not present new quantitative adoption estimates.
Implementation requires organizational practices—governance, training, monitoring, and incentives—to translate explainability into safer, more legitimate AI use.
Synthesis of organizational, policy, and case-study literature in the review that identifies organizational measures correlated with effective deployment of explainable systems; descriptive evidence rather than causal experiments.
Regulatory frameworks, auditability, documentation (e.g., model cards, datasheets), and clear lines of responsibility amplify the effectiveness of explainability for accountability and compliance.
Synthesis of policy and governance literature included in the review that discusses how institutional mechanisms interact with technical explainability to produce accountability; descriptive evidence from case studies and governance proposals in the literature.
Labor demand will increasingly favor skills that support effective Human–AI teaming (interpretation, interrogation of AI, systems orchestration, shared-model building) rather than routine task execution.
Implication drawn from the framework and literature on complementarity and skill-biased technological change; presented as an expectation rather than quantified by labor market data in the paper.