The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13870 claims)

Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 196 98 892 1984
Governance & Regulation 817 394 188 121 1544
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 627 233 123 96 1088
Research Productivity 411 123 56 332 933
Output Quality 467 178 59 47 751
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 167 122 24 496
Task Allocation 207 64 71 32 379
Skill Acquisition 165 59 60 17 301
Innovation Output 203 27 43 18 292
Employment Level 105 52 107 13 279
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 150 48 26 3 227
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 63 20 12 184
Error Rate 69 92 10 2 173
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 93 21 13 19 148
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Creative Output 31 17 7 3 59
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Findings are based on a student sample rating decontextualized messages, so external validity to industry communication or real project logs is uncertain and requires replication.
Study sample consisted of 81 students in team-based software projects labeling decontextualized statements; authors explicitly note this limitation as a caveat.
high null result Exploring Indicators of Developers' Sentiment Perceptions in... generalizability/external validity of the study findings to non-student, context...
Many apparent correlations between predictors and sentiment labels do not remain significant after global multiple-testing correction.
Correlation analyses across many predictors with explicit application of multiple-testing correction procedures; many initial signals failed to survive correction.
high null result Exploring Indicators of Developers' Sentiment Perceptions in... statistical significance of correlations between predictors (e.g., mood, team me...
The paper does not provide quantitative estimates of time saved per report, cost reductions, or effects on employment/wages; such economic impacts remain to be quantified.
Caveats noted in the paper: absence of quantitative estimates for time/cost/employment effects and a call for field trials and economic modeling. This is explicitly stated in the summary.
high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Absence of quantitative economic impact estimates (time saved, cost reduction, e...
The paper used a clinically grounded, multi-level evaluation framework that separately assessed raw AI drafts (automatic metrics + clinician review) and radiologist-AI collaborative final reports (how radiologists edit and downstream clinical effects), including comparisons across radiologist experience levels.
Methodology section summarized in the paper: multi-level assessment covering AI drafts and radiologist-edited collaborative reports; combination of automatic metrics and radiologist-/clinician-centered evaluations; experience-level stratified analyses (novice/intermediate/senior).
high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Evaluation framework components (draft assessment, collaborative report assessme...
CBCTRepD is a report-generation system trained on this curated paired dataset to produce bilingual CBCT radiology draft reports intended for radiologist-in-the-loop (co-authoring) workflows.
System description in the paper: CBCTRepD built using the curated dataset; authors state purpose is to generate clinically usable drafts for radiologist editing. (Model architecture and training hyperparameters are not specified in the provided text.)
high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... System capability: generation of bilingual CBCT draft reports for human editing
The authors curated a paired CBCT–report dataset of approximately 7,408 CBCT studies covering 55 oral and maxillofacial disease entities that is bilingual and includes diverse acquisition settings.
Data curation described in the paper: stated dataset size (~7,408 studies), coverage of 55 disease entities, bilingual reports, and inclusion of a range of acquisition settings to increase heterogeneity and clinical realism. (Exact languages, provenance of studies, and dataset split details are not specified in the provided text.)
high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Dataset composition (number of studies, disease-entity coverage, bilingual statu...
The workshop identifies specific research directions for AI economics: cost–benefit and ROI analyses of shared infrastructure; market design for procurement of co-designed systems; models of innovation incentives under different IP/data-governance regimes; labor market impact assessments; and empirical studies of how validation ecosystems affect adoption rates and pricing.
Explicitly listed research directions in the workshop summary and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).
high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... articulated research agenda items and priority areas for future empirical study
The workshop's findings are based on qualitative synthesis of expert judgment and stakeholder inputs rather than primary empirical data or controlled experiments.
Explicitly stated in the Data & Methods section of the workshop summary; methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building.
high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... nature and strength of empirical support for the recommendations (qualitative vs...
The workshop convened researchers, clinicians, and industry leaders to address co-design across four thematic areas: teleoperations/telehealth/surgical operations; wearable and implantable medicine; home ICU/hospital systems/elderly care; and medical sensing/imaging/reconstruction.
Workshop agenda and participant list from the two-day NSF workshop (Sept 26–27, 2024); methods included thematic breakout sessions focused on these four areas. Documentation at https://sites.google.com/view/nsfworkshop.
high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... topics and thematic coverage of the workshop
Evaluation was performed on five different material setups.
Experimental evaluation described in the summary: performance reported as averaged across five material setups. The summary does not list per-setup names or trial counts.
high null result Learning Adaptive Force Control for Contact-Rich Sample Scra... number of material setups used in evaluation (n = 5)
The simulation models samples as collections of spheres with per-sphere procedurally generated dislodgement-force thresholds derived from Perlin noise to introduce spatial heterogeneity and diversity.
Simulation/modeling description in the paper: discrete-sphere representation of sample; each sphere assigned a dislodgement threshold; spatial variation produced via Perlin noise. This is a concrete modeling choice reported in the methods.
high null result Learning Adaptive Force Control for Contact-Rich Sample Scra... representation of material heterogeneity in simulation (model design detail)
The paper uses a mixed-methods approach combining a systematic literature review with an empirical practitioner survey to assess perceptions, adoption, and impact of AI-driven tools.
Methodological statement in the paper; survey design covers tool usage, perceived benefits, challenges, and expectations.
high null result Artificial Intelligence as a Catalyst for Innovation in Soft... methodological coverage (presence of literature review and survey)
Empirical work (experiments and measurements) is needed to quantify how much value interpretive traces add to downstream outputs, how RATs affect platform incentives, and what governance frameworks fairly allocate resulting rents.
Concluding recommendation in the paper stating the research gaps; not an empirical claim but a stated need.
high null result Chasing RATs: Tracing Reading for and as Creative Activity research agenda items (value quantification, platform incentive effects, governa...
The current presentation of RATs is speculative and illustrative; empirical validation, scalability, and ethical safeguards remain to be developed.
Limitations section of the paper explicitly states the speculative nature and lack of empirical evaluation.
high null result Chasing RATs: Tracing Reading for and as Creative Activity status of empirical validation/scalability/ethical development
Implementation of RATs requires instrumentation at the browser/platform level or via plugins and must address privacy/consent, storage/ownership, sharing controls, and interoperable trace formats.
Design and implementation considerations enumerated in the paper; this is a requirements statement rather than an empirical claim.
high null result Chasing RATs: Tracing Reading for and as Creative Activity implementation requirements and privacy/governance needs
Analytical approaches compatible with RATs include sequence/trajectory mining, network analysis of associations/co-read graphs, embedding/clustering of trajectories, qualitative inspection of reflections, and experimental (A/B or RCT) evaluation of downstream effects.
Methods section of the paper listing suggested analytical techniques; these are proposed methods rather than applied analyses.
high null result Chasing RATs: Tracing Reading for and as Creative Activity analytical approaches applicable to RAT data
The approach shifts some computational burden to obtaining MCMC samples of the parameter posterior, requiring access to (or ability to compute) MCMC samples before surrogate training.
Method description: training data are MCMC-drawn parameter vectors; the paper notes this practical requirement and trade-off (MCMC cost vs. avoiding repeated expensive forward-model evaluations).
high null result MCMC Informed Neural Emulators for Uncertainty Quantificatio... need for and cost of MCMC sampling (computational requirement)
More theoretical work is needed to establish guarantees (consistency, asymptotic behavior, and frequentist coverage) for these networks when applied in economic settings.
Stated research need/caveat in the paper; no new theoretical proofs are provided in the summary to establish these properties.
high null result ForwardFlow: Simulation only statistical inference using dee... theoretical guarantees (absence of established consistency/asymptotic/coverage r...
The Boson Sampling Born Machine (BSBM) is a generative model whose model distribution is the output probability distribution of a linear-optical (bosonic modes) circuit.
Definition and constructive specification in the paper: model architecture described as linear-optical circuits with outputs given by bosonic-mode measurement probabilities (the paper's formal definition/construction). The claim is definitional/theoretical (no empirical sample size).
high null result Universality of Classically Trainable, Quantum-Deployed Boso... model distribution = linear-optical circuit output probabilities
Because this is a conceptual/systems-architecture paper, it does not present new empirical performance benchmarks.
Explicit statement in the paper's Data & Methods section that no new empirical benchmarks are presented.
high null result Reference Architecture of a Quantum-Centric Supercomputer presence or absence of new empirical performance benchmark data
The evaluated models consist of an MLP baseline and a GNN tailored to exploit relational/spatial structure among beams/antennas.
Model descriptions provided in the methods section: two supervised-learning architectures (MLP and GNN) used for beam prediction experiments.
high null result Federated Learning-driven Beam Management in LEO 6G Non-Terr... model architecture comparison (GNN vs MLP)
Using Federated Learning (FL) with orbital planes as distributed learners and HAPS for aggregation avoids centralization of raw channel data.
Method description: federated-learning architecture with clients mapped to orbital planes and HAPS performing coordination/aggregation; explicitly states no central pooling of raw channel samples.
high null result Federated Learning-driven Beam Management in LEO 6G Non-Terr... presence/absence of central pooling of raw channel data
DPS was empirically evaluated across diverse reasoning domains (mathematical reasoning, planning, and visual-geometry) to test generality.
Paper reports experiments on those three categories of tasks; they are listed as the evaluated tasks in the methods/experiments section.
high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... task domains evaluated (mathematics, planning, visual-geometry)
DPS uses the inferred per-prompt state distributions as a predictive prior to select prompts estimated to be most informative, avoiding exhaustive candidate rollouts for filtering.
Method and selection mechanism described: predictive prior ranking/filtering replaces rollout-heavy candidate evaluation. (Procedure described in paper; empirical comparisons reported.)
high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... selection of prompts (number of candidate rollouts avoided)
Dynamics-Predictive Sampling (DPS) models each prompt’s "extent of solving" under the current policy as a latent state in a dynamical system (a hidden Markov model) and performs online Bayesian inference on historical rollout reward signals to estimate that state.
Methodological description in the paper: DPS uses an HMM representation of per-prompt solving progress and applies online Bayesian updates using past rollout rewards. (No numerical sample size needed for this modeling claim.)
high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... inferred latent state distribution / predicted expected learning progress per pr...
The paper does not present large-scale empirical validation; its evidence is primarily theoretical exposition, a constructed illustrative example, and a literature survey.
Explicit description of methods and data in the paper (analysis type: theoretical exposition + illustrative example; no experimental sample reported).
high null result Ergodicity in reinforcement learning presence/absence of empirical experiments or sample-based validation
Local stochastic fluctuations can undo early discovery leads, preventing transient superiority from becoming permanent unless additional asymmetries intervene.
Dynamical analysis of monopolization stage in the model and simulation trajectories showing reversal or loss of early leads in symmetric interaction regimes; theoretical demonstration that fluctuations can destabilize early footholds.
high null result Macroscopic Dominance from Microscopic Extremes: Symmetry Br... persistence of local leads over time (probability of lead reversal due to stocha...
Transient superiority (finding resources faster) by itself does not stabilize a system-wide monopoly; early leads are fragile and can be undone by local stochastic fluctuations.
Analysis of monopolization dynamics and absorbing-state stability within the stochastic spatial model, plus numerical simulations showing symmetric interaction scenarios do not produce robust absorbing monopolies. This is model-based (no empirical validation).
high null result Macroscopic Dominance from Microscopic Extremes: Symmetry Br... long-term persistence/probability of absorbing (system-wide monopoly) state give...
The authors recommend specific measurement metrics and empirical research priorities (e.g., MAPE, stockout frequency, inventory turns, lead times, fill rates, total supply chain cost, service-level volatility, resilience measures; causal studies like diff-in-diff or randomized interventions).
Explicit recommendations in the paper's measurement and research agenda sections.
high null result Optimizing integrated supply planning in logistics: Bridging... listed supply-chain performance and resilience metrics
The study's small sample size and qualitative design limit external generalizability and prevent causal effect size estimation; potential selection and reporting biases exist due to purposive sampling and interview-based data.
Authors explicitly state these limitations in the paper's limitations section.
high null result Optimizing integrated supply planning in logistics: Bridging... external generalizability and causal inference capability
The study is a qualitative multi-case study of five medium-to-large organizations, using semi-structured interviews across procurement, production planning, inventory management, and distribution, analyzed via cross-case comparison.
Methods section description provided by the authors (sample size n = 5, sectors, interview-based primary data, cross-case analysis).
high null result Optimizing integrated supply planning in logistics: Bridging... process-level, qualitative insights into ISP implementation
There is limited empirical causal evidence linking specific explanation types to long-term outcomes (safety, fairness, economic performance) in real-world deployments.
Meta-level finding of the review: authors report gaps in the literature—few causal or longitudinal studies of explanation interventions in deployed, high-stakes settings.
high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... evidence availability for causal effects on safety, fairness, economic performan...
The literature groups explainability impacts along three linked dimensions — user trust, ethical governance, and organizational accountability.
Analytical result of the review's thematic coding and synthesis across interdisciplinary literature (categorization derived from the reviewed corpus).
high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... categorization structure of explainability impacts (three-dimension taxonomy)
The paper is primarily theoretical and prescriptive: it synthesizes literature and proposes a framework and design guidelines rather than reporting large-scale empirical datasets or causal identification of economic outcomes.
Meta-claim about the paper's methods explicitly stated in the Data & Methods summary; based on the paper's methodological description.
high null result Toward a science of human–AI teaming for decision-making: A ... presence/absence of empirical datasets or causal identification studies in the p...
Key measurable outcomes to assess Human–AI teams include accuracy/efficiency, robustness to novel cases, decision consistency, trust/misuse rates, training costs, and inequity indicators.
Prescriptive list of metrics offered by the authors as part of the research agenda and evaluation guidance; not empirically derived from a dataset in the paper.
high null result Toward a science of human–AI teaming for decision-making: A ... accuracy, efficiency, robustness, consistency, trust/misuse rates, training cost...
Empirical evaluation strategies for Human–AI teams should include randomized interventions, field trials, lab experiments, phased rollouts (difference-in-differences), and structural models that allow interaction terms between human skill and AI quality.
Methodological recommendation in the paper; suggested study designs rather than implemented analyses.
high null result Toward a science of human–AI teaming for decision-making: A ... appropriate empirical identification of team-level complementarities and causal ...
Research priorities include empirical measurement of task‑level automation rates, firm and industry productivity effects, wage impacts across occupations, and diffusion patterns.
Paper's stated research agenda and identification of measurement gaps; based on methodological critique of current evidence base.
high null result How AI Will Transform the Daily Life of a Techie within 5 Ye... future empirical research outputs on automation rates, productivity, wage impact...
Measuring these productivity gains will be challenging because quality improvements, faster iteration, and creative outputs are harder to price/observe than lines of code.
Methodological argument about measurement difficulty; based on conceptual considerations, not empirical validation.
high null result How AI Will Transform the Daily Life of a Techie within 5 Ye... observability and measurability of productivity gains (availability of suitable ...
Measuring AI's economic impact requires new metrics that account for decision-value uplift, reduced tail-risk exposures, and dynamic gains from continuous learning; causal identification will require experiments or staggered rollouts.
Methodological recommendation backed by conceptual discussion of measurement challenges; no implementation of such measurement approaches is reported in the paper.
high null result Next-Generation Financial Analytics Frameworks for AI-Enable... proposed measurement constructs (decision-value uplift, tail-risk reduction, lea...
Performance and evaluation should be measured using forecast accuracy, decision lift/value added, latency, and false positive/negative rates.
Paper-prescribed evaluation metrics; presented as recommended practice rather than derived from empirical testing within the paper.
high null result Next-Generation Financial Analytics Frameworks for AI-Enable... forecast accuracy, decision lift (value added), system latency, false positive/n...
Core AI techniques for these frameworks include supervised/unsupervised ML, NLP for unstructured text, anomaly detection for control/transaction monitoring, and reinforcement/prescriptive models for recommendations.
Methodological claim listing standard ML/NLP/anomaly-detection techniques and prescriptive approaches; statement of methods rather than an empirical comparison of alternatives.
high null result Next-Generation Financial Analytics Frameworks for AI-Enable... method adoption/type metrics (e.g., frequency of supervised vs. unsupervised met...
Next‑gen frameworks use large-scale structured (transactions, ledgers, KPIs) and unstructured sources (reports, news, contracts, call transcripts) to power models.
Descriptive claim listing data types the paper recommends; presented as design input requirements rather than empirically validated data-integration projects.
high null result Next-Generation Financial Analytics Frameworks for AI-Enable... data coverage and diversity (e.g., proportion of structured vs. unstructured inp...
There is a need for quantitative studies and microdata on firm-level RM practices, AI adoption, and performance outcomes to measure effect sizes and causal pathways.
Stated research gaps and limitations in the review (lack of primary empirical quantification; heterogeneity across contexts).
high null result The Role of Risk Management as an Organizational Management ... availability of quantitative evidence on RM effects (effect sizes, causal estima...
The review's conclusions are limited by reliance on published literature (potential bias toward successful implementations), lack of primary empirical quantification (no effect sizes), and heterogeneity across organizational contexts limiting direct generalizability.
Explicit limitations stated in the paper summarizing scope and method (qualitative literature review, secondary evidence only).
high null result The Role of Risk Management as an Organizational Management ... generalizability and empirical precision of review findings
Heterogeneity in system designs and deployment contexts complicates cross-site comparisons.
Limitations section and observed variation in platform architectures, degrees of automation, and governance across sites reported via descriptive data and interviews.
high null result The Role of Artificial Intelligence in Healthcare Complaint ... comparability across deployment sites (heterogeneity in systems and contexts)
Non-random selection of institutions limits causal inference and external generalizability of the study's findings.
Study limitations explicitly state non-random site selection and heterogeneous deployments; methodological note that causal claims are constrained.
high null result The Role of Artificial Intelligence in Healthcare Complaint ... generalizability and causal inference validity
The study uses a quantitative, cross-sectional survey-based research design of managers and educational administrators and employs descriptive statistics, correlation, and regression analyses.
Methods described in the summary explicitly state research design and analytical techniques; this is a methodological claim rather than an empirical substantive finding. (Sample size not provided in summary.)
high null result Algorithmic Trust and Managerial Effectiveness: The Role of ... research design / analytic approach (methodological description)
Estimation/calibration, stability assessment, and global sensitivity methods used: parameters calibrated/estimated on 2016–2023 data; equilibrium located; Jacobian eigenvalues computed for local stability; variance-based global sensitivity analysis performed over parameter space.
Methods section: description of parameter estimation/calibration, equilibrium computation, Jacobian-based stability analysis, and variance-based global sensitivity analysis.
high null result Governance of Technological Transition: A Predator-Prey Anal... methodological procedures applied (estimation, stability analysis, GSA)
The main empirical conclusions are based on a short annual panel (2016–2023) and a stylized aggregate interaction model; results should be interpreted with caution due to potential omitted variables, aggregation bias, and limited sample size.
Explicit limitations listed in the paper: short time series (eight annual observations), national aggregate data, simplified model structure, no firm/sector heterogeneity, possible endogeneity/measurement issues.
high null result Governance of Technological Transition: A Predator-Prey Anal... validity/robustness of empirical conclusions (limitations)
The empirical analysis uses annual, national-level aggregate Chinese series for 2016–2023 as proxies for AI capital, physical capital stock, and labor compensation (wage bill).
Data description in Data & Methods: annual Chinese aggregate series 2016–2023. Implied sample length: 2016–2023 inclusive (8 annual observations); national-level aggregates, no firm-level heterogeneity modeled.
high null result Governance of Technological Transition: A Predator-Prey Anal... AI capital proxy; physical capital stock; labor compensation (wage bill)