Evidence (2954 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	369	105	58	432	972
Governance & Regulation	365	171	113	54	713
Research Productivity	229	95	33	294	655
Organizational Efficiency	354	82	58	34	531
Technology Adoption Rate	277	115	63	27	486
Firm Productivity	273	33	68	10	389
AI Safety & Ethics	112	177	43	24	358
Output Quality	228	61	23	25	337
Market Structure	105	118	81	14	323
Decision Quality	154	68	33	17	275
Employment Level	68	32	74	8	184
Fiscal & Macroeconomic	74	52	32	21	183
Skill Acquisition	85	31	38	9	163
Firm Revenue	96	30	22	—	148
Innovation Output	100	11	20	11	143
Consumer Welfare	66	29	35	7	137
Regulatory Compliance	51	61	13	3	128
Inequality Measures	24	66	31	4	125
Task Allocation	64	6	28	6	104
Error Rate	42	47	6	—	95
Training Effectiveness	55	12	10	16	93
Worker Satisfaction	42	32	11	6	91
Task Completion Time	71	5	3	1	80
Wages & Compensation	38	13	19	4	74
Team Performance	41	8	15	7	72
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	17	15	9	5	46
Job Displacement	5	28	12	—	45
Social Protection	18	8	6	1	33
Developer Productivity	25	1	2	1	29
Worker Turnover	10	12	—	3	25
Creative Output	15	5	3	1	24
Skill Obsolescence	3	18	2	—	23
Labor Share of Income	7	4	9	—	20

Human Ai Collab Remove filter

Generative AI clinical decision support (GenAI CDS) can improve diagnostic and treatment suggestions through synthesis of patient data and medical knowledge, reducing missed diagnoses and standardizing care where evidence is clear.

Early evaluations reported in the paper: controlled tasks, simulated patient vignettes, retrospective validation comparing model outputs to historical chart-verified diagnoses or guideline-concordant actions; no large-scale RCTs cited and sample sizes for cited studies are not specified in the paper.

medium positive GenAI and clinical decision making in general practice diagnostic accuracy; guideline concordance; missed-diagnoses rate; treatment qua...

Researchers should develop benchmark datasets and validated simulation testbeds (industry‑anonymized) to enable reproducible economic analysis.

Explicit research recommendation in the paper's implications and research agenda section.

medium positive A Review of Manufacturing Operations Research Integration in... availability of benchmark datasets/testbeds and reproducibility of simulation st...

Simulations that incorporate government policy constraints can inform industrial policy, subsidies, regulation aimed at supply‑chain resilience, and quantify environmental externalities relevant to circular economy measures.

Policy‑relevance arguments and recommendations in the paper; conceptual claim without empirical policy evaluation.

medium positive A Review of Manufacturing Operations Research Integration in... policy insights, measured environmental externalities, policy‑relevant indicator...

Digital twins and real‑time analytics can make simulations dynamic, enabling economic evaluation of shock scenarios and policy interventions.

Conceptual argument and forward‑looking recommendations in the paper; no empirical test of digital twin implementations provided.

medium positive A Review of Manufacturing Operations Research Integration in... dynamic simulation capability and ability to evaluate shocks/policy intervention...

AI/ML methods (including reinforcement learning, optimization, and causal methods) can be used to calibrate and validate simulation models against firm‑level and operational data.

Recommendations and discussion in the paper's implications section; conceptual suggestion rather than demonstrated implementation.

medium positive A Review of Manufacturing Operations Research Integration in... accuracy and validity of model calibration and validation using AI/ML

Integration should start from the outsourcing decision: outsourcing choices are treated as a primary lever for supply‑chain integration and closed‑loop operations.

Argument and framing in the paper's conceptual framework and roadmap; based on literature synthesis rather than empirical estimation.

medium positive A Review of Manufacturing Operations Research Integration in... impact of outsourcing decisions on supply‑chain integration and closed‑loop oper...

Policy levers such as privacy-preserving markets for personalization data (data trusts, opt-in marketplaces) and regulation of algorithmic constraints (fairness mandates, right-to-explanation) are viable approaches to manage risks from RS-enabled robots.

Policy recommendations drawing on regulatory and market-design literature; conceptual proposals not empirically evaluated in this work.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... policy adoption, privacy outcomes, fairness compliance, data-sharing incentives

RS-enabled personalization creates opportunities for platformization of social-robot services, producing data network effects, lock-in, and cross-selling possibilities for firms.

Market-structure analysis and economic theory applied to RS-enabled services; no empirical market data provided.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... platform market power indicators (market concentration), network-effect measures...

Ethical constraints can and should be treated as first-class inputs to the ranking/selection process (e.g., safety filters, fairness constraints) to ensure value alignment in robots.

Conceptual design recommendation grounded in constrained optimization literature; no empirical demonstrations provided.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... constraint satisfaction rates (safety/fairness), reduction in ethically problema...

RS modules (user model, ranking engine, evaluator) can be modular and plug-and-play in existing robot architectures, augmenting LLMs and RL modules.

Design proposal mapping RS components to robot pipeline stages; no integration experiments reported.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... integration feasibility, modularity (development time, interface compatibility),...

Interpretability, fairness, and privacy-preserving methods (e.g., explainable recommendations, differential privacy, fairness-aware algorithms) are applicable and important for social-robot personalization.

Survey of algorithmic approaches in RS and privacy/fairness literature; conceptual recommendation without empirical application in robots.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... interpretability scores, privacy guarantees (e.g., DP epsilon), fairness metrics

Optimizing for diversity, novelty, and serendipity in recommendations can help avoid echo chambers and repetitive interactions with social robots.

Argument based on RS objectives and prior RS findings about diversity/serendipity; no robot-specific empirical evidence provided.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... diversity/novelty metrics, reduction in repetitive interaction measures, user sa...

Multi-objective and constrained optimization techniques from RS can be used to balance engagement, well-being, fairness, privacy, and safety in social-robot behavior selection.

Conceptual proposal referencing multi-objective/constrained recommendation literature; no empirical tests within robots included.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... multi-objective trade-offs (metrics for engagement vs well-being, fairness const...

Latent-factor models, embeddings, and hierarchical user models from RS can be used to capture long- and short-term preferences in social robots' user models.

Methodological proposal drawing on RS modeling techniques; no experimental validation in robotic systems provided.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... fidelity of user preference representation (e.g., embedding quality, predictive ...

Integrating recommender-system techniques across the robot pipeline (user modeling, ranking, contextualization, evaluation) can capture long-term, short-term, and fine-grained user preferences and enable proactive, ethically constrained action selection.

Conceptual framework and design proposal synthesizing recommender-systems (RS) and human–robot interaction (HRI) literature; no novel empirical experiments or sample size reported.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... personalization quality (long-term consistency, short-term responsiveness), abil...

ANN analysis ranks information barriers as the most important predictor of organizational inertia.

ANN feature-importance analysis reported in the paper that ranks predictors for inertia, identifying information barriers as the top predictor; methodological specifics (sample size, ANN parameters) are not provided in the abstract.

medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Organizational inertia

Artificial neural network (ANN) analysis ranks functional values as the most important predictor of initial trust.

ANN feature-importance analysis reported in the paper that ranks predictors for initial trust, with functional values highest; method described as ANN-based relative importance ranking (details such as network architecture, training sample size, or validation metrics not reported in the abstract).

medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Initial trust in GAICS

Human interaction, information, and norm barriers increase organizational inertia (resistance to change) toward GAICS.

Qualitative phase surfaced these barriers; quantitative validation showed statistically significant positive relationships between (a) need for human interaction barriers, (b) information barriers (lack of knowledge/clarity), and (c) norm barriers (cultural/social norms) and organizational inertia.

medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Organizational inertia / resistance to change regarding GAICS

Functional and instrumental values increase initial trust in GAICS.

Mixed-methods evidence: qualitative exploratory phase identified functional and instrumental value as drivers; quantitative phase (inferential analysis) found positive, statistically significant effects of functional value (system usefulness/quality) and instrumental value (task-related benefits) on initial trust.

medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Initial trust in GAICS

Based on findings and student-reported concerns, the authors recommend integrating explicit AI-literacy instruction to support critical and reflective use of Generative AI tools in education.

Authors' recommendation in discussion sections, motivated by observed heterogeneous effects, student concerns about accuracy and overreliance, and qualitative calls for guidance; recommendation not experimentally tested in this study.

medium positive Expanding the lens: multi-institutional evidence on student ... recommendation for AI-literacy instruction (policy/educational intervention)

Students reported that ChatGPT provided faster access to information, helped clarify concepts, and aided organization (e.g., outlining and summarizing).

Qualitative topic-based coding of open-ended survey responses from participating students (sample = 254 across six courses); thematic analysis identified benefits including speed, clarification, and organizational support.

medium positive Expanding the lens: multi-institutional evidence on student ... student-reported perceived usefulness/benefits

There is a weak but statistically significant positive relationship between iterative engagement with ChatGPT (measured by number of edits to the tool's outputs) and better academic performance.

Correlational analysis between usage behavior (number of edits) and student scores reported as weak but significant; based on same experimental sample (N = 254) and usage logs/survey data.

medium positive Expanding the lens: multi-institutional evidence on student ... student task/course scores (correlated with number of edits)

The improvement from allowing ChatGPT use was statistically significant in specific courses (examples named: computer systems administration, informatics, childhood disorders).

Course-level analyses using GLM and non-parametric comparisons showing statistically significant treatment effects in some courses; sample drawn from the full N = 254 distributed across six courses (per-course Ns not specified in summary).

medium positive Expanding the lens: multi-institutional evidence on student ... course/task scores within specified courses

Allowing students to use ChatGPT on knowledge-based academic tasks led to generally higher scores compared with control groups restricted to non-GenAI resources.

Randomized/experimental assignment of students to treatment (allowed ChatGPT) vs control (no GenAI) across six courses at two institutions; overall sample N = 254; comparisons made using descriptive statistics, general linear model (GLM) controlling for covariates, and non-parametric tests.

medium positive Expanding the lens: multi-institutional evidence on student ... student task/course scores (short-term performance on knowledge-based tasks)

Policy and platform design choices (e.g., provenance metadata, detection/disclosure of AI-generated content, monetization rule alignment) can reinforce or mitigate harms from GenAI-driven creator economies.

Policy recommendations and implications drawn from the qualitative findings across the 377-video sample and normative reasoning; not empirically tested.

medium positive Monetizing Generative AI: YouTubers' Collective Knowledge on... potential mitigation or amplification of harms via platform and policy intervent...

For economic and policy analysis, researchers should estimate distributions of effects, account for dynamic adaptation/nonstationarity, pre-register plans, track model versions, and combine RCTs with longitudinal/observational/structural methods.

Implications and recommendations section synthesized from practitioner interviews (n=16) and authors' applied methodological reasoning.

medium positive RCTs & Human Uplift Studies: Methodological Challenges and P... recommended research practices for economically meaningful inference about AI up...

High-stakes deployment, governance, and safety decisions should not rely on single uplift RCTs; they require synthesis across studies, ongoing monitoring, scenario analysis, and explicit uncertainty characterization.

Authors' recommendations drawn from thematic analysis of interview data (n=16) and the mapped validity consequences; policy implications section articulates this guidance.

medium positive RCTs & Human Uplift Studies: Methodological Challenges and P... reliability of decision-making based on uplift evidence

Scaffold choice creates an economic opportunity for third-party tooling and open-source scaffolding because scaffold effects materially affect performance and reproducibility.

Observed performance differences across scaffolds (up to ~5 percentage points) and sensitivity of results to scaffold selection reported in the study.

medium positive Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... market_opportunity_for_scaffold_tools (qualitative_based_on_performance_impact)

NFD increases complementarities between domain experts and AI, raising demand for hybrid roles (expert + knowledge engineer) and skills in elicitation, verification, and artifact design.

Conceptual argument in implications section, supported by practical demands observed in the case study (coordination between analysts and knowledge engineering activities).

medium positive Nurture-First Agent Development: Building Domain-Expert AI A... demand for hybrid roles; number of hybrid role hires or time spent on elicitatio...

The case study produced modular knowledge artifacts (rules, templates, tests) that supported reuse and auditability.

Empirical artifact production in the case study: creation of templates, checklists, heuristics, and test suites; reuse counts and audit traces were tracked qualitatively and with reuse metrics (exact numbers not specified).

medium positive Nurture-First Agent Development: Building Domain-Expert AI A... number and reuse rate of modular artifacts; presence of audit trails

In the same case study, iterative crystallization increased the consistency/reliability of agent outputs.

Case study measurements of agent reliability and qualitative practitioner feedback/acceptance across development spirals; precise quantitative details and sample size are not reported.

medium positive Nurture-First Agent Development: Building Domain-Expert AI A... consistency/reliability of outputs (agent output variance, agreement with practi...

In a detailed case study building a U.S. equity financial research agent, iterative crystallization reduced per-task human effort.

Case study with iterative co-development with financial analysts; interaction transcripts logged and operational metrics (time per analysis) reported across development spirals. The paper does not report sample size or statistical tests.

medium positive Nurture-First Agent Development: Building Domain-Expert AI A... analyst time per analysis (human effort per task)

Annotator affective traits shift labeling propensity (toward positivity); classifiers trained on pooled annotator labels may inherit systematic biases from annotator heterogeneity.

Observed associations between trait mood/reactivity and increased positive labeling in GEE models; extrapolated implication for classifier training when using pooled labels from heterogeneous annotators.

medium positive Exploring Indicators of Developers' Sentiment Perceptions in... systematic shift in aggregate labels (and therefore potential classifier outputs...

Trait-level mood and emotional reactivity weakly predict a higher tendency to label statements as positive (and fewer as neutral).

Statement-level repeated-measures generalized estimating equations (GEE) using the 81 participants' repeated labels of 30 statements per round; trait mood and reactivity variables were significant predictors in GEE models for positive vs neutral labeling, but with small effect sizes.

medium positive Exploring Indicators of Developers' Sentiment Perceptions in... probability of labeling a statement as positive (vs neutral)

CBCTRepD improves report structure, reduces omissions, and promotes more systematic attention to co-existing lesions across anatomical regions in CBCT reports.

Clinical evaluation findings reported in the paper indicate improvements in structure, reduced omissions, and increased attention to multi-region co-existing lesions when using the system. (Operational definitions of 'structure', how omissions were identified, and measurement methods are not detailed in the provided text.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Report structure, omission rate, and documentation of multi-region co-existing l...

Senior radiologists using CBCTRepD produce collaborative reports with reduced omission-related errors, including fewer clinically important missed lesions.

Clinician-centered assessment described in the evaluation; paper reports reductions in omission-related errors and clinically important missed lesions for seniors when using the system. (The provided summary does not list the number of senior reviewers, counts of omissions before/after, or statistical testing.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Omission-related errors and clinically important missed lesions in final reports...

In the same co-authoring workflow, intermediate radiologists improve their report quality toward senior-level performance when assisted by CBCTRepD.

Paper reports comparative analyses across experience levels and states intermediates approached senior quality with AI assistance. (Exact metrics, reviewer counts, and quantitative effect sizes are not specified in the provided text.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Final report quality for intermediate radiologists in a co-authoring workflow

When used in a radiologist–AI co-authoring workflow, CBCTRepD consistently improves report quality for novice radiologists, bringing their reports toward intermediate-level quality.

Collaborative evaluation reported in the paper comparing radiologist-edited AI drafts across experience tiers; authors state novices improved toward intermediate-level reporting when using the system. (Details such as number of novice readers, magnitude of improvement, and statistical significance are not provided in the summary.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Final report quality for novice radiologists in a co-authoring workflow

Under a multi-level clinical evaluation (automatic metrics plus radiologist/clinician review), raw AI-generated draft reports from CBCTRepD achieve writing quality and standardization comparable to intermediate radiologists.

Evaluation described as multi-level and clinically grounded, combining automatic text/clinical metrics and radiologist/clinician review; the paper reports a comparison between AI drafts and radiologists stratified by experience (novice, intermediate, senior). (Specific sample sizes of reviewers, statistical tests, and numerical effect sizes are not provided in the supplied summary.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Writing quality and standardization of draft reports (AI drafts vs intermediate ...

Lowering fixed costs via shared resources can enable more entrants and niche innovators (e.g., specialized clinical apps).

Workshop economic implications and participant assertions in breakout sessions and plenary at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... number of market entrants, emergence of niche products, diversity of suppliers

Public investment in shared data and compute as nonrival public goods will reduce duplication, lower entry barriers, and increase total R&D productivity.

Workshop implications for AI economics articulated by participants and authors as a policy recommendation; rationale stated in the summary document (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... duplication of effort, entry barriers (number of entrants), and aggregate R&D pr...

De-risk pathways from lab to clinic via reproducible benchmarks, continuous monitoring, and cross-sector collaborations (academia, industry, clinicians, regulators).

Workshop translation-focused recommendations and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... time-to-market, reproducibility metrics, and rate of successful clinical transla...

Enable safe, accountable, and resilient platforms (including virtual–physical healthcare ecosystems) to reduce translational risk.

Workshop recommendations addressing safety, resilience, and virtual–physical ecosystems from cross-disciplinary discussion at NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... measures of translational risk (failure rates in translation, incidents, safety ...

Promote scalable validation ecosystems grounded in objective, continuous measures and physics-informed models.

Workshop validation and safety theme recommendations from panels and consensus-building exercises (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... presence and scalability of validation ecosystems; reliability/robustness metric...

Develop clinic workflow–aware systems and human–AI collaboration frameworks to fit real clinical practice and decision chains.

Stated systems and workflows recommendation from expert panels and clinician participants at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... compatibility of AI-enabled systems with clinical workflows; measures of clinici...

Build shared compute infrastructures tailored to medical workloads and validation needs.

Workshop recommendation from infrastructure-themed sessions and consensus outcomes (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... existence and utilization of shared compute infrastructure for medical R&D (comp...

Sustain investment in shared, standardized data infrastructures (datasets, ontologies, benchmarks) to support medical algorithm–hardware co-design.

Workshop infrastructure call presented during breakout sessions and final recommendations at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... availability and use of standardized medical datasets/ontologies/benchmarks

Principal recommendation: shift from isolated algorithm or hardware efforts to integrated algorithm–hardware–workflow co-design for medical contexts.

Stated workshop recommendation derived from panels and cross-disciplinary consensus at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... alignment and integration of R&D efforts (degree of co-design adoption in projec...

Sustained public investment and new validation, governance, and translation ecosystems are needed to de-risk commercialization and accelerate safe, accountable clinical adoption.

Workshop principal recommendation based on qualitative synthesis of expert judgment from participants and breakout outcomes (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... commercialization risk level and speed/rate of clinical adoption

Enabling next-generation medical technologies requires a fundamental reorientation toward algorithm–hardware co-design that is clinic-aware, validated continuously, and backed by shared data and compute infrastructures.

Consensus recommendation from a two-day NSF workshop (Sept 26–27, 2024) in Pittsburgh convening interdisciplinary participants (academic researchers in algorithms and hardware, clinicians, industry leaders). Methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building. Documentation at https://sites.google.com/view/nsfworkshop.

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... successful development and clinical adoption of next-generation medical technolo...

« Prev 1 2 3 … 45 46 47 … 59 60 Next »