Evidence (2954 claims)
Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 369 | 105 | 58 | 432 | 972 |
| Governance & Regulation | 365 | 171 | 113 | 54 | 713 |
| Research Productivity | 229 | 95 | 33 | 294 | 655 |
| Organizational Efficiency | 354 | 82 | 58 | 34 | 531 |
| Technology Adoption Rate | 277 | 115 | 63 | 27 | 486 |
| Firm Productivity | 273 | 33 | 68 | 10 | 389 |
| AI Safety & Ethics | 112 | 177 | 43 | 24 | 358 |
| Output Quality | 228 | 61 | 23 | 25 | 337 |
| Market Structure | 105 | 118 | 81 | 14 | 323 |
| Decision Quality | 154 | 68 | 33 | 17 | 275 |
| Employment Level | 68 | 32 | 74 | 8 | 184 |
| Fiscal & Macroeconomic | 74 | 52 | 32 | 21 | 183 |
| Skill Acquisition | 85 | 31 | 38 | 9 | 163 |
| Firm Revenue | 96 | 30 | 22 | — | 148 |
| Innovation Output | 100 | 11 | 20 | 11 | 143 |
| Consumer Welfare | 66 | 29 | 35 | 7 | 137 |
| Regulatory Compliance | 51 | 61 | 13 | 3 | 128 |
| Inequality Measures | 24 | 66 | 31 | 4 | 125 |
| Task Allocation | 64 | 6 | 28 | 6 | 104 |
| Error Rate | 42 | 47 | 6 | — | 95 |
| Training Effectiveness | 55 | 12 | 10 | 16 | 93 |
| Worker Satisfaction | 42 | 32 | 11 | 6 | 91 |
| Task Completion Time | 71 | 5 | 3 | 1 | 80 |
| Wages & Compensation | 38 | 13 | 19 | 4 | 74 |
| Team Performance | 41 | 8 | 15 | 7 | 72 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 17 | 15 | 9 | 5 | 46 |
| Job Displacement | 5 | 28 | 12 | — | 45 |
| Social Protection | 18 | 8 | 6 | 1 | 33 |
| Developer Productivity | 25 | 1 | 2 | 1 | 29 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Skill Obsolescence | 3 | 18 | 2 | — | 23 |
| Labor Share of Income | 7 | 4 | 9 | — | 20 |
Human Ai Collab
Remove filter
Generative AI clinical decision support (GenAI CDS) can improve diagnostic and treatment suggestions through synthesis of patient data and medical knowledge, reducing missed diagnoses and standardizing care where evidence is clear.
Early evaluations reported in the paper: controlled tasks, simulated patient vignettes, retrospective validation comparing model outputs to historical chart-verified diagnoses or guideline-concordant actions; no large-scale RCTs cited and sample sizes for cited studies are not specified in the paper.
Researchers should develop benchmark datasets and validated simulation testbeds (industry‑anonymized) to enable reproducible economic analysis.
Explicit research recommendation in the paper's implications and research agenda section.
Simulations that incorporate government policy constraints can inform industrial policy, subsidies, regulation aimed at supply‑chain resilience, and quantify environmental externalities relevant to circular economy measures.
Policy‑relevance arguments and recommendations in the paper; conceptual claim without empirical policy evaluation.
Digital twins and real‑time analytics can make simulations dynamic, enabling economic evaluation of shock scenarios and policy interventions.
Conceptual argument and forward‑looking recommendations in the paper; no empirical test of digital twin implementations provided.
AI/ML methods (including reinforcement learning, optimization, and causal methods) can be used to calibrate and validate simulation models against firm‑level and operational data.
Recommendations and discussion in the paper's implications section; conceptual suggestion rather than demonstrated implementation.
Integration should start from the outsourcing decision: outsourcing choices are treated as a primary lever for supply‑chain integration and closed‑loop operations.
Argument and framing in the paper's conceptual framework and roadmap; based on literature synthesis rather than empirical estimation.
Policy levers such as privacy-preserving markets for personalization data (data trusts, opt-in marketplaces) and regulation of algorithmic constraints (fairness mandates, right-to-explanation) are viable approaches to manage risks from RS-enabled robots.
Policy recommendations drawing on regulatory and market-design literature; conceptual proposals not empirically evaluated in this work.
RS-enabled personalization creates opportunities for platformization of social-robot services, producing data network effects, lock-in, and cross-selling possibilities for firms.
Market-structure analysis and economic theory applied to RS-enabled services; no empirical market data provided.
Ethical constraints can and should be treated as first-class inputs to the ranking/selection process (e.g., safety filters, fairness constraints) to ensure value alignment in robots.
Conceptual design recommendation grounded in constrained optimization literature; no empirical demonstrations provided.
RS modules (user model, ranking engine, evaluator) can be modular and plug-and-play in existing robot architectures, augmenting LLMs and RL modules.
Design proposal mapping RS components to robot pipeline stages; no integration experiments reported.
Interpretability, fairness, and privacy-preserving methods (e.g., explainable recommendations, differential privacy, fairness-aware algorithms) are applicable and important for social-robot personalization.
Survey of algorithmic approaches in RS and privacy/fairness literature; conceptual recommendation without empirical application in robots.
Optimizing for diversity, novelty, and serendipity in recommendations can help avoid echo chambers and repetitive interactions with social robots.
Argument based on RS objectives and prior RS findings about diversity/serendipity; no robot-specific empirical evidence provided.
Multi-objective and constrained optimization techniques from RS can be used to balance engagement, well-being, fairness, privacy, and safety in social-robot behavior selection.
Conceptual proposal referencing multi-objective/constrained recommendation literature; no empirical tests within robots included.
Latent-factor models, embeddings, and hierarchical user models from RS can be used to capture long- and short-term preferences in social robots' user models.
Methodological proposal drawing on RS modeling techniques; no experimental validation in robotic systems provided.
Integrating recommender-system techniques across the robot pipeline (user modeling, ranking, contextualization, evaluation) can capture long-term, short-term, and fine-grained user preferences and enable proactive, ethically constrained action selection.
Conceptual framework and design proposal synthesizing recommender-systems (RS) and human–robot interaction (HRI) literature; no novel empirical experiments or sample size reported.
ANN analysis ranks information barriers as the most important predictor of organizational inertia.
ANN feature-importance analysis reported in the paper that ranks predictors for inertia, identifying information barriers as the top predictor; methodological specifics (sample size, ANN parameters) are not provided in the abstract.
Artificial neural network (ANN) analysis ranks functional values as the most important predictor of initial trust.
ANN feature-importance analysis reported in the paper that ranks predictors for initial trust, with functional values highest; method described as ANN-based relative importance ranking (details such as network architecture, training sample size, or validation metrics not reported in the abstract).
Human interaction, information, and norm barriers increase organizational inertia (resistance to change) toward GAICS.
Qualitative phase surfaced these barriers; quantitative validation showed statistically significant positive relationships between (a) need for human interaction barriers, (b) information barriers (lack of knowledge/clarity), and (c) norm barriers (cultural/social norms) and organizational inertia.
Functional and instrumental values increase initial trust in GAICS.
Mixed-methods evidence: qualitative exploratory phase identified functional and instrumental value as drivers; quantitative phase (inferential analysis) found positive, statistically significant effects of functional value (system usefulness/quality) and instrumental value (task-related benefits) on initial trust.
Based on findings and student-reported concerns, the authors recommend integrating explicit AI-literacy instruction to support critical and reflective use of Generative AI tools in education.
Authors' recommendation in discussion sections, motivated by observed heterogeneous effects, student concerns about accuracy and overreliance, and qualitative calls for guidance; recommendation not experimentally tested in this study.
Students reported that ChatGPT provided faster access to information, helped clarify concepts, and aided organization (e.g., outlining and summarizing).
Qualitative topic-based coding of open-ended survey responses from participating students (sample = 254 across six courses); thematic analysis identified benefits including speed, clarification, and organizational support.
There is a weak but statistically significant positive relationship between iterative engagement with ChatGPT (measured by number of edits to the tool's outputs) and better academic performance.
Correlational analysis between usage behavior (number of edits) and student scores reported as weak but significant; based on same experimental sample (N = 254) and usage logs/survey data.
The improvement from allowing ChatGPT use was statistically significant in specific courses (examples named: computer systems administration, informatics, childhood disorders).
Course-level analyses using GLM and non-parametric comparisons showing statistically significant treatment effects in some courses; sample drawn from the full N = 254 distributed across six courses (per-course Ns not specified in summary).
Allowing students to use ChatGPT on knowledge-based academic tasks led to generally higher scores compared with control groups restricted to non-GenAI resources.
Randomized/experimental assignment of students to treatment (allowed ChatGPT) vs control (no GenAI) across six courses at two institutions; overall sample N = 254; comparisons made using descriptive statistics, general linear model (GLM) controlling for covariates, and non-parametric tests.
Policy and platform design choices (e.g., provenance metadata, detection/disclosure of AI-generated content, monetization rule alignment) can reinforce or mitigate harms from GenAI-driven creator economies.
Policy recommendations and implications drawn from the qualitative findings across the 377-video sample and normative reasoning; not empirically tested.
For economic and policy analysis, researchers should estimate distributions of effects, account for dynamic adaptation/nonstationarity, pre-register plans, track model versions, and combine RCTs with longitudinal/observational/structural methods.
Implications and recommendations section synthesized from practitioner interviews (n=16) and authors' applied methodological reasoning.
High-stakes deployment, governance, and safety decisions should not rely on single uplift RCTs; they require synthesis across studies, ongoing monitoring, scenario analysis, and explicit uncertainty characterization.
Authors' recommendations drawn from thematic analysis of interview data (n=16) and the mapped validity consequences; policy implications section articulates this guidance.
Scaffold choice creates an economic opportunity for third-party tooling and open-source scaffolding because scaffold effects materially affect performance and reproducibility.
Observed performance differences across scaffolds (up to ~5 percentage points) and sensitivity of results to scaffold selection reported in the study.
NFD increases complementarities between domain experts and AI, raising demand for hybrid roles (expert + knowledge engineer) and skills in elicitation, verification, and artifact design.
Conceptual argument in implications section, supported by practical demands observed in the case study (coordination between analysts and knowledge engineering activities).
The case study produced modular knowledge artifacts (rules, templates, tests) that supported reuse and auditability.
Empirical artifact production in the case study: creation of templates, checklists, heuristics, and test suites; reuse counts and audit traces were tracked qualitatively and with reuse metrics (exact numbers not specified).
In the same case study, iterative crystallization increased the consistency/reliability of agent outputs.
Case study measurements of agent reliability and qualitative practitioner feedback/acceptance across development spirals; precise quantitative details and sample size are not reported.
In a detailed case study building a U.S. equity financial research agent, iterative crystallization reduced per-task human effort.
Case study with iterative co-development with financial analysts; interaction transcripts logged and operational metrics (time per analysis) reported across development spirals. The paper does not report sample size or statistical tests.
Annotator affective traits shift labeling propensity (toward positivity); classifiers trained on pooled annotator labels may inherit systematic biases from annotator heterogeneity.
Observed associations between trait mood/reactivity and increased positive labeling in GEE models; extrapolated implication for classifier training when using pooled labels from heterogeneous annotators.
Trait-level mood and emotional reactivity weakly predict a higher tendency to label statements as positive (and fewer as neutral).
Statement-level repeated-measures generalized estimating equations (GEE) using the 81 participants' repeated labels of 30 statements per round; trait mood and reactivity variables were significant predictors in GEE models for positive vs neutral labeling, but with small effect sizes.
CBCTRepD improves report structure, reduces omissions, and promotes more systematic attention to co-existing lesions across anatomical regions in CBCT reports.
Clinical evaluation findings reported in the paper indicate improvements in structure, reduced omissions, and increased attention to multi-region co-existing lesions when using the system. (Operational definitions of 'structure', how omissions were identified, and measurement methods are not detailed in the provided text.)
Senior radiologists using CBCTRepD produce collaborative reports with reduced omission-related errors, including fewer clinically important missed lesions.
Clinician-centered assessment described in the evaluation; paper reports reductions in omission-related errors and clinically important missed lesions for seniors when using the system. (The provided summary does not list the number of senior reviewers, counts of omissions before/after, or statistical testing.)
In the same co-authoring workflow, intermediate radiologists improve their report quality toward senior-level performance when assisted by CBCTRepD.
Paper reports comparative analyses across experience levels and states intermediates approached senior quality with AI assistance. (Exact metrics, reviewer counts, and quantitative effect sizes are not specified in the provided text.)
When used in a radiologist–AI co-authoring workflow, CBCTRepD consistently improves report quality for novice radiologists, bringing their reports toward intermediate-level quality.
Collaborative evaluation reported in the paper comparing radiologist-edited AI drafts across experience tiers; authors state novices improved toward intermediate-level reporting when using the system. (Details such as number of novice readers, magnitude of improvement, and statistical significance are not provided in the summary.)
Under a multi-level clinical evaluation (automatic metrics plus radiologist/clinician review), raw AI-generated draft reports from CBCTRepD achieve writing quality and standardization comparable to intermediate radiologists.
Evaluation described as multi-level and clinically grounded, combining automatic text/clinical metrics and radiologist/clinician review; the paper reports a comparison between AI drafts and radiologists stratified by experience (novice, intermediate, senior). (Specific sample sizes of reviewers, statistical tests, and numerical effect sizes are not provided in the supplied summary.)
Lowering fixed costs via shared resources can enable more entrants and niche innovators (e.g., specialized clinical apps).
Workshop economic implications and participant assertions in breakout sessions and plenary at the NSF workshop (Sept 26–27, 2024).
Public investment in shared data and compute as nonrival public goods will reduce duplication, lower entry barriers, and increase total R&D productivity.
Workshop implications for AI economics articulated by participants and authors as a policy recommendation; rationale stated in the summary document (NSF workshop, Sept 26–27, 2024).
De-risk pathways from lab to clinic via reproducible benchmarks, continuous monitoring, and cross-sector collaborations (academia, industry, clinicians, regulators).
Workshop translation-focused recommendations and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).
Enable safe, accountable, and resilient platforms (including virtual–physical healthcare ecosystems) to reduce translational risk.
Workshop recommendations addressing safety, resilience, and virtual–physical ecosystems from cross-disciplinary discussion at NSF workshop (Sept 26–27, 2024).
Promote scalable validation ecosystems grounded in objective, continuous measures and physics-informed models.
Workshop validation and safety theme recommendations from panels and consensus-building exercises (NSF workshop, Sept 26–27, 2024).
Develop clinic workflow–aware systems and human–AI collaboration frameworks to fit real clinical practice and decision chains.
Stated systems and workflows recommendation from expert panels and clinician participants at the NSF workshop (Sept 26–27, 2024).
Build shared compute infrastructures tailored to medical workloads and validation needs.
Workshop recommendation from infrastructure-themed sessions and consensus outcomes (NSF workshop, Sept 26–27, 2024).
Sustain investment in shared, standardized data infrastructures (datasets, ontologies, benchmarks) to support medical algorithm–hardware co-design.
Workshop infrastructure call presented during breakout sessions and final recommendations at the NSF workshop (Sept 26–27, 2024).
Principal recommendation: shift from isolated algorithm or hardware efforts to integrated algorithm–hardware–workflow co-design for medical contexts.
Stated workshop recommendation derived from panels and cross-disciplinary consensus at the NSF workshop (Sept 26–27, 2024).
Sustained public investment and new validation, governance, and translation ecosystems are needed to de-risk commercialization and accelerate safe, accountable clinical adoption.
Workshop principal recommendation based on qualitative synthesis of expert judgment from participants and breakout outcomes (NSF workshop, Sept 26–27, 2024).
Enabling next-generation medical technologies requires a fundamental reorientation toward algorithm–hardware co-design that is clinic-aware, validated continuously, and backed by shared data and compute infrastructures.
Consensus recommendation from a two-day NSF workshop (Sept 26–27, 2024) in Pittsburgh convening interdisciplinary participants (academic researchers in algorithms and hardware, clinicians, industry leaders). Methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building. Documentation at https://sites.google.com/view/nsfworkshop.