Evidence (5157 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Human Ai Collab Remove filter

GenAI models enable personalization (tailored care pathways and risk predictions) by integrating multimodal data (notes, imaging, labs).

Technical capability demonstrated in model development literature and small-scale studies using multimodal inputs; the paper notes limited real-world longitudinal evidence of clinical outcome improvements from such personalization.

medium positive GenAI and clinical decision making in general practice individualized risk predictions; guideline-concordant personalized care; predict...

GenAI CDS can extend access to expertise in low-resource settings by supporting non-specialists or overburdened clinicians.

The paper cites the potential based on the capability of decision-support systems and early pilot evaluations; empirical real-world evidence and large-scale trials in low-resource settings are limited or not cited.

medium positive GenAI and clinical decision making in general practice access to specialist-level recommendations; capacity (patients served); referral...

GenAI CDS can save clinician time (faster charting, literature summarization, guideline retrieval), potentially increasing capacity and access.

Reported process findings from early studies and human-AI interaction evaluations (qualitative and quantitative) and retrospective workflow analyses; specific sample sizes and effect magnitudes are not provided in the paper.

medium positive GenAI and clinical decision making in general practice clinician time per patient; documentation time; time-to-task completion

Generative AI clinical decision support (GenAI CDS) can improve diagnostic and treatment suggestions through synthesis of patient data and medical knowledge, reducing missed diagnoses and standardizing care where evidence is clear.

Early evaluations reported in the paper: controlled tasks, simulated patient vignettes, retrospective validation comparing model outputs to historical chart-verified diagnoses or guideline-concordant actions; no large-scale RCTs cited and sample sizes for cited studies are not specified in the paper.

medium positive GenAI and clinical decision making in general practice diagnostic accuracy; guideline concordance; missed-diagnoses rate; treatment qua...

Researchers should develop benchmark datasets and validated simulation testbeds (industry‑anonymized) to enable reproducible economic analysis.

Explicit research recommendation in the paper's implications and research agenda section.

medium positive A Review of Manufacturing Operations Research Integration in... availability of benchmark datasets/testbeds and reproducibility of simulation st...

Simulations that incorporate government policy constraints can inform industrial policy, subsidies, regulation aimed at supply‑chain resilience, and quantify environmental externalities relevant to circular economy measures.

Policy‑relevance arguments and recommendations in the paper; conceptual claim without empirical policy evaluation.

medium positive A Review of Manufacturing Operations Research Integration in... policy insights, measured environmental externalities, policy‑relevant indicator...

Digital twins and real‑time analytics can make simulations dynamic, enabling economic evaluation of shock scenarios and policy interventions.

Conceptual argument and forward‑looking recommendations in the paper; no empirical test of digital twin implementations provided.

medium positive A Review of Manufacturing Operations Research Integration in... dynamic simulation capability and ability to evaluate shocks/policy intervention...

AI/ML methods (including reinforcement learning, optimization, and causal methods) can be used to calibrate and validate simulation models against firm‑level and operational data.

Recommendations and discussion in the paper's implications section; conceptual suggestion rather than demonstrated implementation.

medium positive A Review of Manufacturing Operations Research Integration in... accuracy and validity of model calibration and validation using AI/ML

Integration should start from the outsourcing decision: outsourcing choices are treated as a primary lever for supply‑chain integration and closed‑loop operations.

Argument and framing in the paper's conceptual framework and roadmap; based on literature synthesis rather than empirical estimation.

medium positive A Review of Manufacturing Operations Research Integration in... impact of outsourcing decisions on supply‑chain integration and closed‑loop oper...

Policy levers such as privacy-preserving markets for personalization data (data trusts, opt-in marketplaces) and regulation of algorithmic constraints (fairness mandates, right-to-explanation) are viable approaches to manage risks from RS-enabled robots.

Policy recommendations drawing on regulatory and market-design literature; conceptual proposals not empirically evaluated in this work.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... policy adoption, privacy outcomes, fairness compliance, data-sharing incentives

RS-enabled personalization creates opportunities for platformization of social-robot services, producing data network effects, lock-in, and cross-selling possibilities for firms.

Market-structure analysis and economic theory applied to RS-enabled services; no empirical market data provided.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... platform market power indicators (market concentration), network-effect measures...

Ethical constraints can and should be treated as first-class inputs to the ranking/selection process (e.g., safety filters, fairness constraints) to ensure value alignment in robots.

Conceptual design recommendation grounded in constrained optimization literature; no empirical demonstrations provided.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... constraint satisfaction rates (safety/fairness), reduction in ethically problema...

RS modules (user model, ranking engine, evaluator) can be modular and plug-and-play in existing robot architectures, augmenting LLMs and RL modules.

Design proposal mapping RS components to robot pipeline stages; no integration experiments reported.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... integration feasibility, modularity (development time, interface compatibility),...

Interpretability, fairness, and privacy-preserving methods (e.g., explainable recommendations, differential privacy, fairness-aware algorithms) are applicable and important for social-robot personalization.

Survey of algorithmic approaches in RS and privacy/fairness literature; conceptual recommendation without empirical application in robots.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... interpretability scores, privacy guarantees (e.g., DP epsilon), fairness metrics

Optimizing for diversity, novelty, and serendipity in recommendations can help avoid echo chambers and repetitive interactions with social robots.

Argument based on RS objectives and prior RS findings about diversity/serendipity; no robot-specific empirical evidence provided.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... diversity/novelty metrics, reduction in repetitive interaction measures, user sa...

Multi-objective and constrained optimization techniques from RS can be used to balance engagement, well-being, fairness, privacy, and safety in social-robot behavior selection.

Conceptual proposal referencing multi-objective/constrained recommendation literature; no empirical tests within robots included.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... multi-objective trade-offs (metrics for engagement vs well-being, fairness const...

Latent-factor models, embeddings, and hierarchical user models from RS can be used to capture long- and short-term preferences in social robots' user models.

Methodological proposal drawing on RS modeling techniques; no experimental validation in robotic systems provided.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... fidelity of user preference representation (e.g., embedding quality, predictive ...

Integrating recommender-system techniques across the robot pipeline (user modeling, ranking, contextualization, evaluation) can capture long-term, short-term, and fine-grained user preferences and enable proactive, ethically constrained action selection.

Conceptual framework and design proposal synthesizing recommender-systems (RS) and human–robot interaction (HRI) literature; no novel empirical experiments or sample size reported.

medium positive Reimagining Social Robots as Recommender Systems: Foundation... personalization quality (long-term consistency, short-term responsiveness), abil...

ANN analysis ranks information barriers as the most important predictor of organizational inertia.

ANN feature-importance analysis reported in the paper that ranks predictors for inertia, identifying information barriers as the top predictor; methodological specifics (sample size, ANN parameters) are not provided in the abstract.

medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Organizational inertia

Artificial neural network (ANN) analysis ranks functional values as the most important predictor of initial trust.

ANN feature-importance analysis reported in the paper that ranks predictors for initial trust, with functional values highest; method described as ANN-based relative importance ranking (details such as network architecture, training sample size, or validation metrics not reported in the abstract).

medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Initial trust in GAICS

Human interaction, information, and norm barriers increase organizational inertia (resistance to change) toward GAICS.

Qualitative phase surfaced these barriers; quantitative validation showed statistically significant positive relationships between (a) need for human interaction barriers, (b) information barriers (lack of knowledge/clarity), and (c) norm barriers (cultural/social norms) and organizational inertia.

medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Organizational inertia / resistance to change regarding GAICS

Functional and instrumental values increase initial trust in GAICS.

Mixed-methods evidence: qualitative exploratory phase identified functional and instrumental value as drivers; quantitative phase (inferential analysis) found positive, statistically significant effects of functional value (system usefulness/quality) and instrumental value (task-related benefits) on initial trust.

medium positive Reimagining Stakeholder Engagement Through Generative AI: A ... Initial trust in GAICS

Based on findings and student-reported concerns, the authors recommend integrating explicit AI-literacy instruction to support critical and reflective use of Generative AI tools in education.

Authors' recommendation in discussion sections, motivated by observed heterogeneous effects, student concerns about accuracy and overreliance, and qualitative calls for guidance; recommendation not experimentally tested in this study.

medium positive Expanding the lens: multi-institutional evidence on student ... recommendation for AI-literacy instruction (policy/educational intervention)

Students reported that ChatGPT provided faster access to information, helped clarify concepts, and aided organization (e.g., outlining and summarizing).

Qualitative topic-based coding of open-ended survey responses from participating students (sample = 254 across six courses); thematic analysis identified benefits including speed, clarification, and organizational support.

medium positive Expanding the lens: multi-institutional evidence on student ... student-reported perceived usefulness/benefits

There is a weak but statistically significant positive relationship between iterative engagement with ChatGPT (measured by number of edits to the tool's outputs) and better academic performance.

Correlational analysis between usage behavior (number of edits) and student scores reported as weak but significant; based on same experimental sample (N = 254) and usage logs/survey data.

medium positive Expanding the lens: multi-institutional evidence on student ... student task/course scores (correlated with number of edits)

The improvement from allowing ChatGPT use was statistically significant in specific courses (examples named: computer systems administration, informatics, childhood disorders).

Course-level analyses using GLM and non-parametric comparisons showing statistically significant treatment effects in some courses; sample drawn from the full N = 254 distributed across six courses (per-course Ns not specified in summary).

medium positive Expanding the lens: multi-institutional evidence on student ... course/task scores within specified courses

Allowing students to use ChatGPT on knowledge-based academic tasks led to generally higher scores compared with control groups restricted to non-GenAI resources.

Randomized/experimental assignment of students to treatment (allowed ChatGPT) vs control (no GenAI) across six courses at two institutions; overall sample N = 254; comparisons made using descriptive statistics, general linear model (GLM) controlling for covariates, and non-parametric tests.

medium positive Expanding the lens: multi-institutional evidence on student ... student task/course scores (short-term performance on knowledge-based tasks)

Policy and platform design choices (e.g., provenance metadata, detection/disclosure of AI-generated content, monetization rule alignment) can reinforce or mitigate harms from GenAI-driven creator economies.

Policy recommendations and implications drawn from the qualitative findings across the 377-video sample and normative reasoning; not empirically tested.

medium positive Monetizing Generative AI: YouTubers' Collective Knowledge on... potential mitigation or amplification of harms via platform and policy intervent...

For economic and policy analysis, researchers should estimate distributions of effects, account for dynamic adaptation/nonstationarity, pre-register plans, track model versions, and combine RCTs with longitudinal/observational/structural methods.

Implications and recommendations section synthesized from practitioner interviews (n=16) and authors' applied methodological reasoning.

medium positive RCTs & Human Uplift Studies: Methodological Challenges and P... recommended research practices for economically meaningful inference about AI up...

High-stakes deployment, governance, and safety decisions should not rely on single uplift RCTs; they require synthesis across studies, ongoing monitoring, scenario analysis, and explicit uncertainty characterization.

Authors' recommendations drawn from thematic analysis of interview data (n=16) and the mapped validity consequences; policy implications section articulates this guidance.

medium positive RCTs & Human Uplift Studies: Methodological Challenges and P... reliability of decision-making based on uplift evidence

Scaffold choice creates an economic opportunity for third-party tooling and open-source scaffolding because scaffold effects materially affect performance and reproducibility.

Observed performance differences across scaffolds (up to ~5 percentage points) and sensitivity of results to scaffold selection reported in the study.

medium positive Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... market_opportunity_for_scaffold_tools (qualitative_based_on_performance_impact)

NFD increases complementarities between domain experts and AI, raising demand for hybrid roles (expert + knowledge engineer) and skills in elicitation, verification, and artifact design.

Conceptual argument in implications section, supported by practical demands observed in the case study (coordination between analysts and knowledge engineering activities).

medium positive Nurture-First Agent Development: Building Domain-Expert AI A... demand for hybrid roles; number of hybrid role hires or time spent on elicitatio...

The case study produced modular knowledge artifacts (rules, templates, tests) that supported reuse and auditability.

Empirical artifact production in the case study: creation of templates, checklists, heuristics, and test suites; reuse counts and audit traces were tracked qualitatively and with reuse metrics (exact numbers not specified).

medium positive Nurture-First Agent Development: Building Domain-Expert AI A... number and reuse rate of modular artifacts; presence of audit trails

In the same case study, iterative crystallization increased the consistency/reliability of agent outputs.

Case study measurements of agent reliability and qualitative practitioner feedback/acceptance across development spirals; precise quantitative details and sample size are not reported.

medium positive Nurture-First Agent Development: Building Domain-Expert AI A... consistency/reliability of outputs (agent output variance, agreement with practi...

In a detailed case study building a U.S. equity financial research agent, iterative crystallization reduced per-task human effort.

Case study with iterative co-development with financial analysts; interaction transcripts logged and operational metrics (time per analysis) reported across development spirals. The paper does not report sample size or statistical tests.

medium positive Nurture-First Agent Development: Building Domain-Expert AI A... analyst time per analysis (human effort per task)

Annotator affective traits shift labeling propensity (toward positivity); classifiers trained on pooled annotator labels may inherit systematic biases from annotator heterogeneity.

Observed associations between trait mood/reactivity and increased positive labeling in GEE models; extrapolated implication for classifier training when using pooled labels from heterogeneous annotators.

medium positive Exploring Indicators of Developers' Sentiment Perceptions in... systematic shift in aggregate labels (and therefore potential classifier outputs...

Trait-level mood and emotional reactivity weakly predict a higher tendency to label statements as positive (and fewer as neutral).

Statement-level repeated-measures generalized estimating equations (GEE) using the 81 participants' repeated labels of 30 statements per round; trait mood and reactivity variables were significant predictors in GEE models for positive vs neutral labeling, but with small effect sizes.

medium positive Exploring Indicators of Developers' Sentiment Perceptions in... probability of labeling a statement as positive (vs neutral)

CBCTRepD improves report structure, reduces omissions, and promotes more systematic attention to co-existing lesions across anatomical regions in CBCT reports.

Clinical evaluation findings reported in the paper indicate improvements in structure, reduced omissions, and increased attention to multi-region co-existing lesions when using the system. (Operational definitions of 'structure', how omissions were identified, and measurement methods are not detailed in the provided text.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Report structure, omission rate, and documentation of multi-region co-existing l...

Senior radiologists using CBCTRepD produce collaborative reports with reduced omission-related errors, including fewer clinically important missed lesions.

Clinician-centered assessment described in the evaluation; paper reports reductions in omission-related errors and clinically important missed lesions for seniors when using the system. (The provided summary does not list the number of senior reviewers, counts of omissions before/after, or statistical testing.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Omission-related errors and clinically important missed lesions in final reports...

In the same co-authoring workflow, intermediate radiologists improve their report quality toward senior-level performance when assisted by CBCTRepD.

Paper reports comparative analyses across experience levels and states intermediates approached senior quality with AI assistance. (Exact metrics, reviewer counts, and quantitative effect sizes are not specified in the provided text.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Final report quality for intermediate radiologists in a co-authoring workflow

When used in a radiologist–AI co-authoring workflow, CBCTRepD consistently improves report quality for novice radiologists, bringing their reports toward intermediate-level quality.

Collaborative evaluation reported in the paper comparing radiologist-edited AI drafts across experience tiers; authors state novices improved toward intermediate-level reporting when using the system. (Details such as number of novice readers, magnitude of improvement, and statistical significance are not provided in the summary.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Final report quality for novice radiologists in a co-authoring workflow

Under a multi-level clinical evaluation (automatic metrics plus radiologist/clinician review), raw AI-generated draft reports from CBCTRepD achieve writing quality and standardization comparable to intermediate radiologists.

Evaluation described as multi-level and clinically grounded, combining automatic text/clinical metrics and radiologist/clinician review; the paper reports a comparison between AI drafts and radiologists stratified by experience (novice, intermediate, senior). (Specific sample sizes of reviewers, statistical tests, and numerical effect sizes are not provided in the supplied summary.)

medium positive Bridging the Skill Gap in Clinical CBCT Interpretation with ... Writing quality and standardization of draft reports (AI drafts vs intermediate ...

Lowering fixed costs via shared resources can enable more entrants and niche innovators (e.g., specialized clinical apps).

Workshop economic implications and participant assertions in breakout sessions and plenary at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... number of market entrants, emergence of niche products, diversity of suppliers

Public investment in shared data and compute as nonrival public goods will reduce duplication, lower entry barriers, and increase total R&D productivity.

Workshop implications for AI economics articulated by participants and authors as a policy recommendation; rationale stated in the summary document (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... duplication of effort, entry barriers (number of entrants), and aggregate R&D pr...

De-risk pathways from lab to clinic via reproducible benchmarks, continuous monitoring, and cross-sector collaborations (academia, industry, clinicians, regulators).

Workshop translation-focused recommendations and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... time-to-market, reproducibility metrics, and rate of successful clinical transla...

Enable safe, accountable, and resilient platforms (including virtual–physical healthcare ecosystems) to reduce translational risk.

Workshop recommendations addressing safety, resilience, and virtual–physical ecosystems from cross-disciplinary discussion at NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... measures of translational risk (failure rates in translation, incidents, safety ...

Promote scalable validation ecosystems grounded in objective, continuous measures and physics-informed models.

Workshop validation and safety theme recommendations from panels and consensus-building exercises (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... presence and scalability of validation ecosystems; reliability/robustness metric...

Develop clinic workflow–aware systems and human–AI collaboration frameworks to fit real clinical practice and decision chains.

Stated systems and workflows recommendation from expert panels and clinician participants at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... compatibility of AI-enabled systems with clinical workflows; measures of clinici...

Build shared compute infrastructures tailored to medical workloads and validation needs.

Workshop recommendation from infrastructure-themed sessions and consensus outcomes (NSF workshop, Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... existence and utilization of shared compute infrastructure for medical R&D (comp...

Sustain investment in shared, standardized data infrastructures (datasets, ontologies, benchmarks) to support medical algorithm–hardware co-design.

Workshop infrastructure call presented during breakout sessions and final recommendations at the NSF workshop (Sept 26–27, 2024).

medium positive Report for NSF Workshop on Algorithm-Hardware Co-design for ... availability and use of standardized medical datasets/ontologies/benchmarks

« Prev 1 2 3 … 89 90 91 … 103 104 Next »