The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2954 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Clear
Human Ai Collab Remove filter
Generative AI clinical decision support (GenAI CDS) can improve diagnostic and treatment suggestions through synthesis of patient data and medical knowledge, reducing missed diagnoses and standardizing care where evidence is clear.
Early evaluations reported in the paper: controlled tasks, simulated patient vignettes, retrospective validation comparing model outputs to historical chart-verified diagnoses or guideline-concordant actions; no large-scale RCTs cited and sample sizes for cited studies are not specified in the paper.
medium positive GenAI and clinical decision making in general practice diagnostic accuracy; guideline concordance; missed-diagnoses rate; treatment qua...
Researchers should develop benchmark datasets and validated simulation testbeds (industry‑anonymized) to enable reproducible economic analysis.
Explicit research recommendation in the paper's implications and research agenda section.
medium positive A Review of Manufacturing Operations Research Integration in... availability of benchmark datasets/testbeds and reproducibility of simulation st...
Simulations that incorporate government policy constraints can inform industrial policy, subsidies, regulation aimed at supply‑chain resilience, and quantify environmental externalities relevant to circular economy measures.
Policy‑relevance arguments and recommendations in the paper; conceptual claim without empirical policy evaluation.
medium positive A Review of Manufacturing Operations Research Integration in... policy insights, measured environmental externalities, policy‑relevant indicator...
Digital twins and real‑time analytics can make simulations dynamic, enabling economic evaluation of shock scenarios and policy interventions.
Conceptual argument and forward‑looking recommendations in the paper; no empirical test of digital twin implementations provided.
medium positive A Review of Manufacturing Operations Research Integration in... dynamic simulation capability and ability to evaluate shocks/policy intervention...
AI/ML methods (including reinforcement learning, optimization, and causal methods) can be used to calibrate and validate simulation models against firm‑level and operational data.
Recommendations and discussion in the paper's implications section; conceptual suggestion rather than demonstrated implementation.
medium positive A Review of Manufacturing Operations Research Integration in... accuracy and validity of model calibration and validation using AI/ML
Integration should start from the outsourcing decision: outsourcing choices are treated as a primary lever for supply‑chain integration and closed‑loop operations.
Argument and framing in the paper's conceptual framework and roadmap; based on literature synthesis rather than empirical estimation.
medium positive A Review of Manufacturing Operations Research Integration in... impact of outsourcing decisions on supply‑chain integration and closed‑loop oper...
Policy levers such as privacy-preserving markets for personalization data (data trusts, opt-in marketplaces) and regulation of algorithmic constraints (fairness mandates, right-to-explanation) are viable approaches to manage risks from RS-enabled robots.
Policy recommendations drawing on regulatory and market-design literature; conceptual proposals not empirically evaluated in this work.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... policy adoption, privacy outcomes, fairness compliance, data-sharing incentives
RS-enabled personalization creates opportunities for platformization of social-robot services, producing data network effects, lock-in, and cross-selling possibilities for firms.
Market-structure analysis and economic theory applied to RS-enabled services; no empirical market data provided.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... platform market power indicators (market concentration), network-effect measures...
Ethical constraints can and should be treated as first-class inputs to the ranking/selection process (e.g., safety filters, fairness constraints) to ensure value alignment in robots.
Conceptual design recommendation grounded in constrained optimization literature; no empirical demonstrations provided.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... constraint satisfaction rates (safety/fairness), reduction in ethically problema...
RS modules (user model, ranking engine, evaluator) can be modular and plug-and-play in existing robot architectures, augmenting LLMs and RL modules.
Design proposal mapping RS components to robot pipeline stages; no integration experiments reported.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... integration feasibility, modularity (development time, interface compatibility),...
Interpretability, fairness, and privacy-preserving methods (e.g., explainable recommendations, differential privacy, fairness-aware algorithms) are applicable and important for social-robot personalization.
Survey of algorithmic approaches in RS and privacy/fairness literature; conceptual recommendation without empirical application in robots.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... interpretability scores, privacy guarantees (e.g., DP epsilon), fairness metrics
Optimizing for diversity, novelty, and serendipity in recommendations can help avoid echo chambers and repetitive interactions with social robots.
Argument based on RS objectives and prior RS findings about diversity/serendipity; no robot-specific empirical evidence provided.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... diversity/novelty metrics, reduction in repetitive interaction measures, user sa...
Multi-objective and constrained optimization techniques from RS can be used to balance engagement, well-being, fairness, privacy, and safety in social-robot behavior selection.
Conceptual proposal referencing multi-objective/constrained recommendation literature; no empirical tests within robots included.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... multi-objective trade-offs (metrics for engagement vs well-being, fairness const...
Latent-factor models, embeddings, and hierarchical user models from RS can be used to capture long- and short-term preferences in social robots' user models.
Methodological proposal drawing on RS modeling techniques; no experimental validation in robotic systems provided.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... fidelity of user preference representation (e.g., embedding quality, predictive ...
Integrating recommender-system techniques across the robot pipeline (user modeling, ranking, contextualization, evaluation) can capture long-term, short-term, and fine-grained user preferences and enable proactive, ethically constrained action selection.
Conceptual framework and design proposal synthesizing recommender-systems (RS) and human–robot interaction (HRI) literature; no novel empirical experiments or sample size reported.
medium positive Reimagining Social Robots as Recommender Systems: Foundation... personalization quality (long-term consistency, short-term responsiveness), abil...
ANN analysis ranks information barriers as the most important predictor of organizational inertia.
ANN feature-importance analysis reported in the paper that ranks predictors for inertia, identifying information barriers as the top predictor; methodological specifics (sample size, ANN parameters) are not provided in the abstract.
Artificial neural network (ANN) analysis ranks functional values as the most important predictor of initial trust.
ANN feature-importance analysis reported in the paper that ranks predictors for initial trust, with functional values highest; method described as ANN-based relative importance ranking (details such as network architecture, training sample size, or validation metrics not reported in the abstract).
Human interaction, information, and norm barriers increase organizational inertia (resistance to change) toward GAICS.
Qualitative phase surfaced these barriers; quantitative validation showed statistically significant positive relationships between (a) need for human interaction barriers, (b) information barriers (lack of knowledge/clarity), and (c) norm barriers (cultural/social norms) and organizational inertia.
medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Organizational inertia / resistance to change regarding GAICS
Functional and instrumental values increase initial trust in GAICS.
Mixed-methods evidence: qualitative exploratory phase identified functional and instrumental value as drivers; quantitative phase (inferential analysis) found positive, statistically significant effects of functional value (system usefulness/quality) and instrumental value (task-related benefits) on initial trust.
Based on findings and student-reported concerns, the authors recommend integrating explicit AI-literacy instruction to support critical and reflective use of Generative AI tools in education.
Authors' recommendation in discussion sections, motivated by observed heterogeneous effects, student concerns about accuracy and overreliance, and qualitative calls for guidance; recommendation not experimentally tested in this study.
medium positive Expanding the lens: multi-institutional evidence on student ... recommendation for AI-literacy instruction (policy/educational intervention)
Students reported that ChatGPT provided faster access to information, helped clarify concepts, and aided organization (e.g., outlining and summarizing).
Qualitative topic-based coding of open-ended survey responses from participating students (sample = 254 across six courses); thematic analysis identified benefits including speed, clarification, and organizational support.
medium positive Expanding the lens: multi-institutional evidence on student ... student-reported perceived usefulness/benefits
There is a weak but statistically significant positive relationship between iterative engagement with ChatGPT (measured by number of edits to the tool's outputs) and better academic performance.
Correlational analysis between usage behavior (number of edits) and student scores reported as weak but significant; based on same experimental sample (N = 254) and usage logs/survey data.
medium positive Expanding the lens: multi-institutional evidence on student ... student task/course scores (correlated with number of edits)
The improvement from allowing ChatGPT use was statistically significant in specific courses (examples named: computer systems administration, informatics, childhood disorders).
Course-level analyses using GLM and non-parametric comparisons showing statistically significant treatment effects in some courses; sample drawn from the full N = 254 distributed across six courses (per-course Ns not specified in summary).
medium positive Expanding the lens: multi-institutional evidence on student ... course/task scores within specified courses
Allowing students to use ChatGPT on knowledge-based academic tasks led to generally higher scores compared with control groups restricted to non-GenAI resources.
Randomized/experimental assignment of students to treatment (allowed ChatGPT) vs control (no GenAI) across six courses at two institutions; overall sample N = 254; comparisons made using descriptive statistics, general linear model (GLM) controlling for covariates, and non-parametric tests.
medium positive Expanding the lens: multi-institutional evidence on student ... student task/course scores (short-term performance on knowledge-based tasks)
Policy and platform design choices (e.g., provenance metadata, detection/disclosure of AI-generated content, monetization rule alignment) can reinforce or mitigate harms from GenAI-driven creator economies.
Policy recommendations and implications drawn from the qualitative findings across the 377-video sample and normative reasoning; not empirically tested.
medium positive Monetizing Generative AI: YouTubers' Collective Knowledge on... potential mitigation or amplification of harms via platform and policy intervent...
For economic and policy analysis, researchers should estimate distributions of effects, account for dynamic adaptation/nonstationarity, pre-register plans, track model versions, and combine RCTs with longitudinal/observational/structural methods.
Implications and recommendations section synthesized from practitioner interviews (n=16) and authors' applied methodological reasoning.
medium positive RCTs & Human Uplift Studies: Methodological Challenges and P... recommended research practices for economically meaningful inference about AI up...
High-stakes deployment, governance, and safety decisions should not rely on single uplift RCTs; they require synthesis across studies, ongoing monitoring, scenario analysis, and explicit uncertainty characterization.
Authors' recommendations drawn from thematic analysis of interview data (n=16) and the mapped validity consequences; policy implications section articulates this guidance.
medium positive RCTs & Human Uplift Studies: Methodological Challenges and P... reliability of decision-making based on uplift evidence
Scaffold choice creates an economic opportunity for third-party tooling and open-source scaffolding because scaffold effects materially affect performance and reproducibility.
Observed performance differences across scaffolds (up to ~5 percentage points) and sensitivity of results to scaffold selection reported in the study.
medium positive Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... market_opportunity_for_scaffold_tools (qualitative_based_on_performance_impact)
NFD increases complementarities between domain experts and AI, raising demand for hybrid roles (expert + knowledge engineer) and skills in elicitation, verification, and artifact design.
Conceptual argument in implications section, supported by practical demands observed in the case study (coordination between analysts and knowledge engineering activities).
medium positive Nurture-First Agent Development: Building Domain-Expert AI A... demand for hybrid roles; number of hybrid role hires or time spent on elicitatio...
The case study produced modular knowledge artifacts (rules, templates, tests) that supported reuse and auditability.
Empirical artifact production in the case study: creation of templates, checklists, heuristics, and test suites; reuse counts and audit traces were tracked qualitatively and with reuse metrics (exact numbers not specified).
medium positive Nurture-First Agent Development: Building Domain-Expert AI A... number and reuse rate of modular artifacts; presence of audit trails
In the same case study, iterative crystallization increased the consistency/reliability of agent outputs.
Case study measurements of agent reliability and qualitative practitioner feedback/acceptance across development spirals; precise quantitative details and sample size are not reported.
medium positive Nurture-First Agent Development: Building Domain-Expert AI A... consistency/reliability of outputs (agent output variance, agreement with practi...
In a detailed case study building a U.S. equity financial research agent, iterative crystallization reduced per-task human effort.
Case study with iterative co-development with financial analysts; interaction transcripts logged and operational metrics (time per analysis) reported across development spirals. The paper does not report sample size or statistical tests.
medium positive Nurture-First Agent Development: Building Domain-Expert AI A... analyst time per analysis (human effort per task)
Annotator affective traits shift labeling propensity (toward positivity); classifiers trained on pooled annotator labels may inherit systematic biases from annotator heterogeneity.
Observed associations between trait mood/reactivity and increased positive labeling in GEE models; extrapolated implication for classifier training when using pooled labels from heterogeneous annotators.
medium positive Exploring Indicators of Developers' Sentiment Perceptions in... systematic shift in aggregate labels (and therefore potential classifier outputs...
Trait-level mood and emotional reactivity weakly predict a higher tendency to label statements as positive (and fewer as neutral).
Statement-level repeated-measures generalized estimating equations (GEE) using the 81 participants' repeated labels of 30 statements per round; trait mood and reactivity variables were significant predictors in GEE models for positive vs neutral labeling, but with small effect sizes.
medium positive Exploring Indicators of Developers' Sentiment Perceptions in... probability of labeling a statement as positive (vs neutral)
CBCTRepD improves report structure, reduces omissions, and promotes more systematic attention to co-existing lesions across anatomical regions in CBCT reports.
Clinical evaluation findings reported in the paper indicate improvements in structure, reduced omissions, and increased attention to multi-region co-existing lesions when using the system. (Operational definitions of 'structure', how omissions were identified, and measurement methods are not detailed in the provided text.)
medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Report structure, omission rate, and documentation of multi-region co-existing l...
Senior radiologists using CBCTRepD produce collaborative reports with reduced omission-related errors, including fewer clinically important missed lesions.
Clinician-centered assessment described in the evaluation; paper reports reductions in omission-related errors and clinically important missed lesions for seniors when using the system. (The provided summary does not list the number of senior reviewers, counts of omissions before/after, or statistical testing.)
medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Omission-related errors and clinically important missed lesions in final reports...
In the same co-authoring workflow, intermediate radiologists improve their report quality toward senior-level performance when assisted by CBCTRepD.
Paper reports comparative analyses across experience levels and states intermediates approached senior quality with AI assistance. (Exact metrics, reviewer counts, and quantitative effect sizes are not specified in the provided text.)
medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Final report quality for intermediate radiologists in a co-authoring workflow
When used in a radiologist–AI co-authoring workflow, CBCTRepD consistently improves report quality for novice radiologists, bringing their reports toward intermediate-level quality.
Collaborative evaluation reported in the paper comparing radiologist-edited AI drafts across experience tiers; authors state novices improved toward intermediate-level reporting when using the system. (Details such as number of novice readers, magnitude of improvement, and statistical significance are not provided in the summary.)
medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Final report quality for novice radiologists in a co-authoring workflow
Under a multi-level clinical evaluation (automatic metrics plus radiologist/clinician review), raw AI-generated draft reports from CBCTRepD achieve writing quality and standardization comparable to intermediate radiologists.
Evaluation described as multi-level and clinically grounded, combining automatic text/clinical metrics and radiologist/clinician review; the paper reports a comparison between AI drafts and radiologists stratified by experience (novice, intermediate, senior). (Specific sample sizes of reviewers, statistical tests, and numerical effect sizes are not provided in the supplied summary.)
medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Writing quality and standardization of draft reports (AI drafts vs intermediate ...
Lowering fixed costs via shared resources can enable more entrants and niche innovators (e.g., specialized clinical apps).
Workshop economic implications and participant assertions in breakout sessions and plenary at the NSF workshop (Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... number of market entrants, emergence of niche products, diversity of suppliers
Public investment in shared data and compute as nonrival public goods will reduce duplication, lower entry barriers, and increase total R&D productivity.
Workshop implications for AI economics articulated by participants and authors as a policy recommendation; rationale stated in the summary document (NSF workshop, Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... duplication of effort, entry barriers (number of entrants), and aggregate R&D pr...
De-risk pathways from lab to clinic via reproducible benchmarks, continuous monitoring, and cross-sector collaborations (academia, industry, clinicians, regulators).
Workshop translation-focused recommendations and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... time-to-market, reproducibility metrics, and rate of successful clinical transla...
Enable safe, accountable, and resilient platforms (including virtual–physical healthcare ecosystems) to reduce translational risk.
Workshop recommendations addressing safety, resilience, and virtual–physical ecosystems from cross-disciplinary discussion at NSF workshop (Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... measures of translational risk (failure rates in translation, incidents, safety ...
Promote scalable validation ecosystems grounded in objective, continuous measures and physics-informed models.
Workshop validation and safety theme recommendations from panels and consensus-building exercises (NSF workshop, Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... presence and scalability of validation ecosystems; reliability/robustness metric...
Develop clinic workflow–aware systems and human–AI collaboration frameworks to fit real clinical practice and decision chains.
Stated systems and workflows recommendation from expert panels and clinician participants at the NSF workshop (Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... compatibility of AI-enabled systems with clinical workflows; measures of clinici...
Build shared compute infrastructures tailored to medical workloads and validation needs.
Workshop recommendation from infrastructure-themed sessions and consensus outcomes (NSF workshop, Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... existence and utilization of shared compute infrastructure for medical R&D (comp...
Sustain investment in shared, standardized data infrastructures (datasets, ontologies, benchmarks) to support medical algorithm–hardware co-design.
Workshop infrastructure call presented during breakout sessions and final recommendations at the NSF workshop (Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... availability and use of standardized medical datasets/ontologies/benchmarks
Principal recommendation: shift from isolated algorithm or hardware efforts to integrated algorithm–hardware–workflow co-design for medical contexts.
Stated workshop recommendation derived from panels and cross-disciplinary consensus at the NSF workshop (Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... alignment and integration of R&D efforts (degree of co-design adoption in projec...
Sustained public investment and new validation, governance, and translation ecosystems are needed to de-risk commercialization and accelerate safe, accountable clinical adoption.
Workshop principal recommendation based on qualitative synthesis of expert judgment from participants and breakout outcomes (NSF workshop, Sept 26–27, 2024).
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... commercialization risk level and speed/rate of clinical adoption
Enabling next-generation medical technologies requires a fundamental reorientation toward algorithm–hardware co-design that is clinic-aware, validated continuously, and backed by shared data and compute infrastructures.
Consensus recommendation from a two-day NSF workshop (Sept 26–27, 2024) in Pittsburgh convening interdisciplinary participants (academic researchers in algorithms and hardware, clinicians, industry leaders). Methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building. Documentation at https://sites.google.com/view/nsfworkshop.
medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... successful development and clinical adoption of next-generation medical technolo...