The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2340 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Org Design Remove filter
The paper calls for subsequent quantitative validation (using task-based, matched employer-employee, and provider-level panel data) to estimate causal impacts on productivity, health outcomes, wages, and employment composition across the three interaction levels.
Stated research agenda and measurement recommendations in the paper's discussion section.
high null result Toward human+ medical professionals: navigating AI integrati... need for causal estimates of productivity, health outcomes, wages, employment co...
The study is qualitative and small-sample (four case) and therefore interpretive and illustrative rather than statistically generalizable.
Explicit methodological statement in the paper: design = qualitative multiple case study, sample = four AI healthcare applications.
high null result Toward human+ medical professionals: navigating AI integrati... generalizability/external validity
The study identifies a three-level taxonomy of human–AI interaction in healthcare: AI-assisted, AI-augmented, and AI-automated.
Conceptual taxonomy derived from multiple qualitative case studies (n=4) using cross-case comparison and Bolton et al. (2018)'s three-dimensional service-innovation framework.
high null result Toward human+ medical professionals: navigating AI integrati... classification of AI–human interaction (taxonomic mapping)
The paper's empirical scope is primarily conceptual/theoretical and literature‑based rather than an empirical case study or large‑scale data experiment; it emphasizes the need for future empirical validation.
Explicit methodological description within the paper stating reliance on literature review and conceptual development; absence of empirical sample or case study.
high null result A Review of Manufacturing Operations Research Integration in... presence/absence of empirical validation within the study
Typical evaluation metrics reported are accuracy, precision, recall, F1-score, AUC, detection rate, false positive rate, latency, and computational cost.
Survey of evaluation practices in reviewed papers listing the metrics authors commonly report.
high null result International Journal on Cybernetics & Informatics evaluation metrics used
Emerging approaches in the literature include federated learning, online/streaming learning, and transfer learning for cross-device generalization.
Trend analysis across recent papers indicating adoption of federated and continual learning paradigms and transfer-learning techniques.
high null result International Journal on Cybernetics & Informatics research trend uptake (use of federated/online/transfer approaches)
Unsupervised and semi-supervised methods (clustering, one-class classifiers, autoencoder-based anomaly detectors) are commonly employed to handle unlabeled/anomalous IoT traffic.
Synthesis of studies using anomaly-detection paradigms and unsupervised techniques reported in the reviewed papers.
high null result International Journal on Cybernetics & Informatics methods used (unsupervised/semi-supervised approaches)
Deep learning approaches used include CNNs, RNNs/LSTMs for sequence/traffic analysis, and autoencoders for anomaly detection.
Surveyed literature and taxonomy noting multiple studies that apply convolutional and recurrent architectures and autoencoders to network/traffic data.
high null result International Journal on Cybernetics & Informatics methods used (deep learning architectures applied)
Common ML approaches reported for IoT IDS include supervised models (random forest, SVM, gradient boosting, neural networks).
Taxonomy and literature synthesis showing frequent use of classical supervised classifiers in surveyed papers and experiments.
high null result International Journal on Cybernetics & Informatics methods used (algorithm type frequency)
Empirical research suggestion: recommended outcome variables for future empirical work include productivity (TFP), profitability, exports, employment composition, and process innovation rates; explanatory variables include AI adoption intensity, strategic alignment indices, leadership commitment surveys, sensing activities, and institutional support measures.
Explicit research agenda and measurement suggestions provided in the paper based on the framework and gaps identified in the 72‑article review.
high null result Beyond resource constraints: how Ibero-American SMEs leverag... List of suggested empirical outcomes (TFP, profitability, exports, employment co...
Scope & limits: the paper is a literature synthesis (no new primary empirical data), has a geographical emphasis on Ibero‑America, and covers literature up to 2024 (may omit post‑2024 developments).
Explicit limitations and scope noted in the paper (no primary data; regional emphasis; time window).
high null result Beyond resource constraints: how Ibero-American SMEs leverag... N/A (scope/limitations)
Methodological approach: the paper uses a structured narrative literature review following Torraco (2016) and Juntunen & Lehenkari (2021), analyzing a corpus of 72 articles from 2015–2024 via thematic synthesis and systematic coding.
Explicit methodological statement in the paper specifying approach, corpus size (72 articles), time window (2015–2024), and analytic techniques (thematic synthesis and coding).
high null result Beyond resource constraints: how Ibero-American SMEs leverag... N/A (methodological claim)
The framework yields eight empirically testable propositions linking capability development to firm outcomes (the paper explicitly lists eight propositions including P1–P3 and five additional linked propositions).
Explicit claim in the reviewed paper: framework includes eight testable propositions; propositions are theoretical and untested empirically within the paper.
high null result Beyond resource constraints: how Ibero-American SMEs leverag... Various firm outcomes proposed for testing (productivity, adoption probability, ...
The review followed PRISMA guidelines and included 30 scholarly articles retrieved from Scopus, published between 2020 and 2025, selected using pre-specified inclusion criteria.
Methods section of the paper reporting the SLR protocol, database, time window, and number of included studies.
high null result Pricing Strategy in Digital Marketing: A Systematic Review o... Scope of literature reviewed (database, timeframe, sample size)
These quantitative performance figures come from case‑level, high‑performer pilots and should not be treated as typical industry benchmarks.
Authors' caveat based on the composition of evidence in the review (skew towards pilots and selected advanced implementations; limited longitudinal/multi‑project empirical studies).
high null result Digital Twins Across the Asset Lifecycle: Technical, Organis... representativeness/generalizability of reported performance figures
Inter‑rater reliability for the study selection/encoding was Cohen’s κ = 0.83 (substantial agreement).
Reported inter‑rater reliability statistic from the review's quality control step (Cohen's kappa = 0.83).
high null result Digital Twins Across the Asset Lifecycle: Technical, Organis... inter‑rater reliability (Cohen's kappa)
The review screened 463 Scopus records (2018–2026) and selected 160 peer‑reviewed studies using a PRISMA‑guided process.
Systematic literature review described in paper: Scopus search (2018–2026), PRISMA screening and eligibility filtering; initial n=463, final n=160.
high null result Digital Twins Across the Asset Lifecycle: Technical, Organis... number of records retrieved and final sample size
The abstract does not report the study sample size, sectoral scope, or country/context—limiting assessment of external validity and generalizability.
Observation of reporting in the paper's abstract (absence of sample size, sectoral/country context information in the abstract as provided).
high null result Reimagining Stakeholder Engagement Through Generative AI: A ... Completeness of methodological reporting (sample/context disclosure)
The study used a two-stage mixed-methods design: a qualitative exploratory phase to surface determinants of trust and inertia, followed by a quantitative phase to validate the conceptual framework.
Methods description in the paper: explicit two-stage mixed-methods approach (qualitative then quantitative) used to identify and test determinants of initial trust and inertia toward GAICS.
high null result Reimagining Stakeholder Engagement Through Generative AI: A ... Study design / methodological approach
The authors did not perform primary empirical validation or simulation of TVR‑Sec across real VR deployments.
Methods and limitations section explicitly state no original empirical experiments or simulations were conducted; analysis is conceptual and qualitative.
high null result Securing Virtual Reality: Threat Models, Vulnerabilities, an... whether empirical validation/simulation was performed (none)
The paper's scope comprised a comparative literature review and conceptual integration of 31 peer‑reviewed studies published between 2023 and 2025.
Authors' methods description specifying sample size and publication window: 31 peer‑reviewed studies (2023–2025).
high null result Securing Virtual Reality: Threat Models, Vulnerabilities, an... number and date range of studies included in the review (31 studies, 2023–2025)
Economy & Finance threads contained no self-referential content, suggesting agents can engage in market discussion without representing themselves as agents.
Topic-model-derived topical category labeling and tagging for self-referential themes showing zero instances of self-reference in posts categorized as Economy & Finance in the dataset; counts derived from the 361,605 posts.
high null result What Do AI Agents Talk About? Emergent Communication Structu... presence/absence of self-referential tags in Economy & Finance posts
The authors released their code and data for reproducibility at https://github.com/blocksecteam/ReEVMBench/.
Statement in the paper indicating public release of code and dataset at the provided GitHub URL.
high null result Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... code_and_data_availability (repository_link)
The workshop identifies specific research directions for AI economics: cost–benefit and ROI analyses of shared infrastructure; market design for procurement of co-designed systems; models of innovation incentives under different IP/data-governance regimes; labor market impact assessments; and empirical studies of how validation ecosystems affect adoption rates and pricing.
Explicitly listed research directions in the workshop summary and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).
high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... articulated research agenda items and priority areas for future empirical study
The workshop's findings are based on qualitative synthesis of expert judgment and stakeholder inputs rather than primary empirical data or controlled experiments.
Explicitly stated in the Data & Methods section of the workshop summary; methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building.
high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... nature and strength of empirical support for the recommendations (qualitative vs...
The workshop convened researchers, clinicians, and industry leaders to address co-design across four thematic areas: teleoperations/telehealth/surgical operations; wearable and implantable medicine; home ICU/hospital systems/elderly care; and medical sensing/imaging/reconstruction.
Workshop agenda and participant list from the two-day NSF workshop (Sept 26–27, 2024); methods included thematic breakout sessions focused on these four areas. Documentation at https://sites.google.com/view/nsfworkshop.
high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... topics and thematic coverage of the workshop
The paper uses a mixed-methods approach combining a systematic literature review with an empirical practitioner survey to assess perceptions, adoption, and impact of AI-driven tools.
Methodological statement in the paper; survey design covers tool usage, perceived benefits, challenges, and expectations.
high null result Artificial Intelligence as a Catalyst for Innovation in Soft... methodological coverage (presence of literature review and survey)
Because this is a conceptual/systems-architecture paper, it does not present new empirical performance benchmarks.
Explicit statement in the paper's Data & Methods section that no new empirical benchmarks are presented.
high null result Reference Architecture of a Quantum-Centric Supercomputer presence or absence of new empirical performance benchmark data
DPS was empirically evaluated across diverse reasoning domains (mathematical reasoning, planning, and visual-geometry) to test generality.
Paper reports experiments on those three categories of tasks; they are listed as the evaluated tasks in the methods/experiments section.
high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... task domains evaluated (mathematics, planning, visual-geometry)
DPS uses the inferred per-prompt state distributions as a predictive prior to select prompts estimated to be most informative, avoiding exhaustive candidate rollouts for filtering.
Method and selection mechanism described: predictive prior ranking/filtering replaces rollout-heavy candidate evaluation. (Procedure described in paper; empirical comparisons reported.)
high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... selection of prompts (number of candidate rollouts avoided)
Dynamics-Predictive Sampling (DPS) models each prompt’s "extent of solving" under the current policy as a latent state in a dynamical system (a hidden Markov model) and performs online Bayesian inference on historical rollout reward signals to estimate that state.
Methodological description in the paper: DPS uses an HMM representation of per-prompt solving progress and applies online Bayesian updates using past rollout rewards. (No numerical sample size needed for this modeling claim.)
high null result Dynamics-Predictive Sampling for Active RL Finetuning of Lar... inferred latent state distribution / predicted expected learning progress per pr...
The authors recommend specific measurement metrics and empirical research priorities (e.g., MAPE, stockout frequency, inventory turns, lead times, fill rates, total supply chain cost, service-level volatility, resilience measures; causal studies like diff-in-diff or randomized interventions).
Explicit recommendations in the paper's measurement and research agenda sections.
high null result Optimizing integrated supply planning in logistics: Bridging... listed supply-chain performance and resilience metrics
The study's small sample size and qualitative design limit external generalizability and prevent causal effect size estimation; potential selection and reporting biases exist due to purposive sampling and interview-based data.
Authors explicitly state these limitations in the paper's limitations section.
high null result Optimizing integrated supply planning in logistics: Bridging... external generalizability and causal inference capability
The study is a qualitative multi-case study of five medium-to-large organizations, using semi-structured interviews across procurement, production planning, inventory management, and distribution, analyzed via cross-case comparison.
Methods section description provided by the authors (sample size n = 5, sectors, interview-based primary data, cross-case analysis).
high null result Optimizing integrated supply planning in logistics: Bridging... process-level, qualitative insights into ISP implementation
There is limited empirical causal evidence linking specific explanation types to long-term outcomes (safety, fairness, economic performance) in real-world deployments.
Meta-level finding of the review: authors report gaps in the literature—few causal or longitudinal studies of explanation interventions in deployed, high-stakes settings.
high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... evidence availability for causal effects on safety, fairness, economic performan...
The literature groups explainability impacts along three linked dimensions — user trust, ethical governance, and organizational accountability.
Analytical result of the review's thematic coding and synthesis across interdisciplinary literature (categorization derived from the reviewed corpus).
high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... categorization structure of explainability impacts (three-dimension taxonomy)
The paper is primarily theoretical and prescriptive: it synthesizes literature and proposes a framework and design guidelines rather than reporting large-scale empirical datasets or causal identification of economic outcomes.
Meta-claim about the paper's methods explicitly stated in the Data & Methods summary; based on the paper's methodological description.
high null result Toward a science of human–AI teaming for decision-making: A ... presence/absence of empirical datasets or causal identification studies in the p...
Key measurable outcomes to assess Human–AI teams include accuracy/efficiency, robustness to novel cases, decision consistency, trust/misuse rates, training costs, and inequity indicators.
Prescriptive list of metrics offered by the authors as part of the research agenda and evaluation guidance; not empirically derived from a dataset in the paper.
high null result Toward a science of human–AI teaming for decision-making: A ... accuracy, efficiency, robustness, consistency, trust/misuse rates, training cost...
Empirical evaluation strategies for Human–AI teams should include randomized interventions, field trials, lab experiments, phased rollouts (difference-in-differences), and structural models that allow interaction terms between human skill and AI quality.
Methodological recommendation in the paper; suggested study designs rather than implemented analyses.
high null result Toward a science of human–AI teaming for decision-making: A ... appropriate empirical identification of team-level complementarities and causal ...
Research priorities include empirical measurement of task‑level automation rates, firm and industry productivity effects, wage impacts across occupations, and diffusion patterns.
Paper's stated research agenda and identification of measurement gaps; based on methodological critique of current evidence base.
high null result How AI Will Transform the Daily Life of a Techie within 5 Ye... future empirical research outputs on automation rates, productivity, wage impact...
Measuring these productivity gains will be challenging because quality improvements, faster iteration, and creative outputs are harder to price/observe than lines of code.
Methodological argument about measurement difficulty; based on conceptual considerations, not empirical validation.
high null result How AI Will Transform the Daily Life of a Techie within 5 Ye... observability and measurability of productivity gains (availability of suitable ...
Measuring AI's economic impact requires new metrics that account for decision-value uplift, reduced tail-risk exposures, and dynamic gains from continuous learning; causal identification will require experiments or staggered rollouts.
Methodological recommendation backed by conceptual discussion of measurement challenges; no implementation of such measurement approaches is reported in the paper.
high null result Next-Generation Financial Analytics Frameworks for AI-Enable... proposed measurement constructs (decision-value uplift, tail-risk reduction, lea...
Performance and evaluation should be measured using forecast accuracy, decision lift/value added, latency, and false positive/negative rates.
Paper-prescribed evaluation metrics; presented as recommended practice rather than derived from empirical testing within the paper.
high null result Next-Generation Financial Analytics Frameworks for AI-Enable... forecast accuracy, decision lift (value added), system latency, false positive/n...
Core AI techniques for these frameworks include supervised/unsupervised ML, NLP for unstructured text, anomaly detection for control/transaction monitoring, and reinforcement/prescriptive models for recommendations.
Methodological claim listing standard ML/NLP/anomaly-detection techniques and prescriptive approaches; statement of methods rather than an empirical comparison of alternatives.
high null result Next-Generation Financial Analytics Frameworks for AI-Enable... method adoption/type metrics (e.g., frequency of supervised vs. unsupervised met...
Next‑gen frameworks use large-scale structured (transactions, ledgers, KPIs) and unstructured sources (reports, news, contracts, call transcripts) to power models.
Descriptive claim listing data types the paper recommends; presented as design input requirements rather than empirically validated data-integration projects.
high null result Next-Generation Financial Analytics Frameworks for AI-Enable... data coverage and diversity (e.g., proportion of structured vs. unstructured inp...
There is a need for quantitative studies and microdata on firm-level RM practices, AI adoption, and performance outcomes to measure effect sizes and causal pathways.
Stated research gaps and limitations in the review (lack of primary empirical quantification; heterogeneity across contexts).
high null result The Role of Risk Management as an Organizational Management ... availability of quantitative evidence on RM effects (effect sizes, causal estima...
The review's conclusions are limited by reliance on published literature (potential bias toward successful implementations), lack of primary empirical quantification (no effect sizes), and heterogeneity across organizational contexts limiting direct generalizability.
Explicit limitations stated in the paper summarizing scope and method (qualitative literature review, secondary evidence only).
high null result The Role of Risk Management as an Organizational Management ... generalizability and empirical precision of review findings
The study uses a quantitative, cross-sectional survey-based research design of managers and educational administrators and employs descriptive statistics, correlation, and regression analyses.
Methods described in the summary explicitly state research design and analytical techniques; this is a methodological claim rather than an empirical substantive finding. (Sample size not provided in summary.)
high null result Algorithmic Trust and Managerial Effectiveness: The Role of ... research design / analytic approach (methodological description)
There is a need for standardized metrics and measurement protocols for public-sector productivity and non-market outcomes (service quality, processing time, cost per transaction, transparency, trust).
Methodological critique within the review pointing to heterogeneity of outcome measures across studies and calling for standardized metrics; based on synthesis of reviewed literature.
high null result Digital Transformation and AI Adoption in Government: Evalua... existence/adoption of standardized measurement protocols and consistency of repo...
Much of the literature on public-sector digital/AI interventions is descriptive or case-based; causal, quantitative evidence on net productivity effects is limited and context-dependent.
Methodological assessment within the review noting heterogeneous study designs, reliance on secondary sources, and a lack of randomized or quasi-experimental studies; the review explicitly states this limitation.
high null result Digital Transformation and AI Adoption in Government: Evalua... availability of causal quantitative estimates of productivity impacts