Evidence (3103 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	378	106	59	455	1007
Governance & Regulation	379	176	116	58	739
Research Productivity	240	96	34	294	668
Organizational Efficiency	370	82	63	35	553
Technology Adoption Rate	296	118	66	29	513
Firm Productivity	277	34	68	10	394
AI Safety & Ethics	117	177	44	24	364
Output Quality	244	61	23	26	354
Market Structure	107	123	85	14	334
Decision Quality	168	74	37	19	301
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	89	32	39	9	169
Firm Revenue	96	34	22	—	152
Innovation Output	106	12	21	11	151
Consumer Welfare	70	30	37	7	144
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	75	11	29	6	121
Training Effectiveness	55	12	12	16	96
Error Rate	42	48	6	—	96
Worker Satisfaction	45	32	11	6	94
Task Completion Time	78	5	4	2	89
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	17	9	5	50
Job Displacement	5	31	12	—	48
Social Protection	21	10	6	2	39
Developer Productivity	29	3	3	1	36
Worker Turnover	10	12	—	3	25
Skill Obsolescence	3	19	2	—	24
Creative Output	15	5	3	1	24
Labor Share of Income	10	4	9	—	23

Human Ai Collab Remove filter

Nurture-First Development (NFD) reframes agent creation from a one-time engineering task into a continuous, conversational growth process.

Conceptual formalization in the paper (architectural and operational descriptions). No large-scale empirical test reported; supported by theoretical argumentation and illustrative examples.

high null result Nurture-First Agent Development: Building Domain-Expert AI A... characterization of development process (one-time vs. continuous conversational ...

Findings are based on a student sample rating decontextualized messages, so external validity to industry communication or real project logs is uncertain and requires replication.

Study sample consisted of 81 students in team-based software projects labeling decontextualized statements; authors explicitly note this limitation as a caveat.

high null result Exploring Indicators of Developers' Sentiment Perceptions in... generalizability/external validity of the study findings to non-student, context...

Many apparent correlations between predictors and sentiment labels do not remain significant after global multiple-testing correction.

Correlation analyses across many predictors with explicit application of multiple-testing correction procedures; many initial signals failed to survive correction.

high null result Exploring Indicators of Developers' Sentiment Perceptions in... statistical significance of correlations between predictors (e.g., mood, team me...

The paper does not provide quantitative estimates of time saved per report, cost reductions, or effects on employment/wages; such economic impacts remain to be quantified.

Caveats noted in the paper: absence of quantitative estimates for time/cost/employment effects and a call for field trials and economic modeling. This is explicitly stated in the summary.

high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Absence of quantitative economic impact estimates (time saved, cost reduction, e...

The paper used a clinically grounded, multi-level evaluation framework that separately assessed raw AI drafts (automatic metrics + clinician review) and radiologist-AI collaborative final reports (how radiologists edit and downstream clinical effects), including comparisons across radiologist experience levels.

Methodology section summarized in the paper: multi-level assessment covering AI drafts and radiologist-edited collaborative reports; combination of automatic metrics and radiologist-/clinician-centered evaluations; experience-level stratified analyses (novice/intermediate/senior).

high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Evaluation framework components (draft assessment, collaborative report assessme...

CBCTRepD is a report-generation system trained on this curated paired dataset to produce bilingual CBCT radiology draft reports intended for radiologist-in-the-loop (co-authoring) workflows.

System description in the paper: CBCTRepD built using the curated dataset; authors state purpose is to generate clinically usable drafts for radiologist editing. (Model architecture and training hyperparameters are not specified in the provided text.)

high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... System capability: generation of bilingual CBCT draft reports for human editing

The authors curated a paired CBCT–report dataset of approximately 7,408 CBCT studies covering 55 oral and maxillofacial disease entities that is bilingual and includes diverse acquisition settings.

Data curation described in the paper: stated dataset size (~7,408 studies), coverage of 55 disease entities, bilingual reports, and inclusion of a range of acquisition settings to increase heterogeneity and clinical realism. (Exact languages, provenance of studies, and dataset split details are not specified in the provided text.)

high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Dataset composition (number of studies, disease-entity coverage, bilingual statu...

The workshop identifies specific research directions for AI economics: cost–benefit and ROI analyses of shared infrastructure; market design for procurement of co-designed systems; models of innovation incentives under different IP/data-governance regimes; labor market impact assessments; and empirical studies of how validation ecosystems affect adoption rates and pricing.

Explicitly listed research directions in the workshop summary and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... articulated research agenda items and priority areas for future empirical study

The workshop's findings are based on qualitative synthesis of expert judgment and stakeholder inputs rather than primary empirical data or controlled experiments.

Explicitly stated in the Data & Methods section of the workshop summary; methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building.

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... nature and strength of empirical support for the recommendations (qualitative vs...

The workshop convened researchers, clinicians, and industry leaders to address co-design across four thematic areas: teleoperations/telehealth/surgical operations; wearable and implantable medicine; home ICU/hospital systems/elderly care; and medical sensing/imaging/reconstruction.

Workshop agenda and participant list from the two-day NSF workshop (Sept 26–27, 2024); methods included thematic breakout sessions focused on these four areas. Documentation at https://sites.google.com/view/nsfworkshop.

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... topics and thematic coverage of the workshop

Evaluation was performed on five different material setups.

Experimental evaluation described in the summary: performance reported as averaged across five material setups. The summary does not list per-setup names or trial counts.

high null result Learning Adaptive Force Control for Contact-Rich Sample Scra... number of material setups used in evaluation (n = 5)

The simulation models samples as collections of spheres with per-sphere procedurally generated dislodgement-force thresholds derived from Perlin noise to introduce spatial heterogeneity and diversity.

Simulation/modeling description in the paper: discrete-sphere representation of sample; each sphere assigned a dislodgement threshold; spatial variation produced via Perlin noise. This is a concrete modeling choice reported in the methods.

high null result Learning Adaptive Force Control for Contact-Rich Sample Scra... representation of material heterogeneity in simulation (model design detail)

The paper uses a mixed-methods approach combining a systematic literature review with an empirical practitioner survey to assess perceptions, adoption, and impact of AI-driven tools.

Methodological statement in the paper; survey design covers tool usage, perceived benefits, challenges, and expectations.

high null result Artificial Intelligence as a Catalyst for Innovation in Soft... methodological coverage (presence of literature review and survey)

Empirical work (experiments and measurements) is needed to quantify how much value interpretive traces add to downstream outputs, how RATs affect platform incentives, and what governance frameworks fairly allocate resulting rents.

Concluding recommendation in the paper stating the research gaps; not an empirical claim but a stated need.

high null result Chasing RATs: Tracing Reading for and as Creative Activity research agenda items (value quantification, platform incentive effects, governa...

The current presentation of RATs is speculative and illustrative; empirical validation, scalability, and ethical safeguards remain to be developed.

Limitations section of the paper explicitly states the speculative nature and lack of empirical evaluation.

high null result Chasing RATs: Tracing Reading for and as Creative Activity status of empirical validation/scalability/ethical development

Implementation of RATs requires instrumentation at the browser/platform level or via plugins and must address privacy/consent, storage/ownership, sharing controls, and interoperable trace formats.

Design and implementation considerations enumerated in the paper; this is a requirements statement rather than an empirical claim.

high null result Chasing RATs: Tracing Reading for and as Creative Activity implementation requirements and privacy/governance needs

Analytical approaches compatible with RATs include sequence/trajectory mining, network analysis of associations/co-read graphs, embedding/clustering of trajectories, qualitative inspection of reflections, and experimental (A/B or RCT) evaluation of downstream effects.

Methods section of the paper listing suggested analytical techniques; these are proposed methods rather than applied analyses.

high null result Chasing RATs: Tracing Reading for and as Creative Activity analytical approaches applicable to RAT data

The authors recommend specific measurement metrics and empirical research priorities (e.g., MAPE, stockout frequency, inventory turns, lead times, fill rates, total supply chain cost, service-level volatility, resilience measures; causal studies like diff-in-diff or randomized interventions).

Explicit recommendations in the paper's measurement and research agenda sections.

high null result Optimizing integrated supply planning in logistics: Bridging... listed supply-chain performance and resilience metrics

The study's small sample size and qualitative design limit external generalizability and prevent causal effect size estimation; potential selection and reporting biases exist due to purposive sampling and interview-based data.

Authors explicitly state these limitations in the paper's limitations section.

high null result Optimizing integrated supply planning in logistics: Bridging... external generalizability and causal inference capability

The study is a qualitative multi-case study of five medium-to-large organizations, using semi-structured interviews across procurement, production planning, inventory management, and distribution, analyzed via cross-case comparison.

Methods section description provided by the authors (sample size n = 5, sectors, interview-based primary data, cross-case analysis).

high null result Optimizing integrated supply planning in logistics: Bridging... process-level, qualitative insights into ISP implementation

There is limited empirical causal evidence linking specific explanation types to long-term outcomes (safety, fairness, economic performance) in real-world deployments.

Meta-level finding of the review: authors report gaps in the literature—few causal or longitudinal studies of explanation interventions in deployed, high-stakes settings.

high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... evidence availability for causal effects on safety, fairness, economic performan...

The literature groups explainability impacts along three linked dimensions — user trust, ethical governance, and organizational accountability.

Analytical result of the review's thematic coding and synthesis across interdisciplinary literature (categorization derived from the reviewed corpus).

high null result Explainable AI in High-Stakes Domains: Improving Trust, Tran... categorization structure of explainability impacts (three-dimension taxonomy)

The paper is primarily theoretical and prescriptive: it synthesizes literature and proposes a framework and design guidelines rather than reporting large-scale empirical datasets or causal identification of economic outcomes.

Meta-claim about the paper's methods explicitly stated in the Data & Methods summary; based on the paper's methodological description.

high null result Toward a science of human–AI teaming for decision-making: A ... presence/absence of empirical datasets or causal identification studies in the p...

Key measurable outcomes to assess Human–AI teams include accuracy/efficiency, robustness to novel cases, decision consistency, trust/misuse rates, training costs, and inequity indicators.

Prescriptive list of metrics offered by the authors as part of the research agenda and evaluation guidance; not empirically derived from a dataset in the paper.

high null result Toward a science of human–AI teaming for decision-making: A ... accuracy, efficiency, robustness, consistency, trust/misuse rates, training cost...

Empirical evaluation strategies for Human–AI teams should include randomized interventions, field trials, lab experiments, phased rollouts (difference-in-differences), and structural models that allow interaction terms between human skill and AI quality.

Methodological recommendation in the paper; suggested study designs rather than implemented analyses.

high null result Toward a science of human–AI teaming for decision-making: A ... appropriate empirical identification of team-level complementarities and causal ...

Research priorities include empirical measurement of task‑level automation rates, firm and industry productivity effects, wage impacts across occupations, and diffusion patterns.

Paper's stated research agenda and identification of measurement gaps; based on methodological critique of current evidence base.

high null result How AI Will Transform the Daily Life of a Techie within 5 Ye... future empirical research outputs on automation rates, productivity, wage impact...

Measuring these productivity gains will be challenging because quality improvements, faster iteration, and creative outputs are harder to price/observe than lines of code.

Methodological argument about measurement difficulty; based on conceptual considerations, not empirical validation.

high null result How AI Will Transform the Daily Life of a Techie within 5 Ye... observability and measurability of productivity gains (availability of suitable ...

Heterogeneity in system designs and deployment contexts complicates cross-site comparisons.

Limitations section and observed variation in platform architectures, degrees of automation, and governance across sites reported via descriptive data and interviews.

high null result The Role of Artificial Intelligence in Healthcare Complaint ... comparability across deployment sites (heterogeneity in systems and contexts)

Non-random selection of institutions limits causal inference and external generalizability of the study's findings.

Study limitations explicitly state non-random site selection and heterogeneous deployments; methodological note that causal claims are constrained.

high null result The Role of Artificial Intelligence in Healthcare Complaint ... generalizability and causal inference validity

The study uses a quantitative, cross-sectional survey-based research design of managers and educational administrators and employs descriptive statistics, correlation, and regression analyses.

Methods described in the summary explicitly state research design and analytical techniques; this is a methodological claim rather than an empirical substantive finding. (Sample size not provided in summary.)

high null result Algorithmic Trust and Managerial Effectiveness: The Role of ... research design / analytic approach (methodological description)

Research and monitoring priorities for economists include task-level analyses of substitutability/complementarity, modeling adoption as a function of regulatory costs and reimbursement incentives, and evaluating long-run welfare and distributional effects.

Explicit research recommendations stated in the narrative review, based on gaps identified in the literature and evolving empirical questions.

high null result Will AI Replace Physicians in the Near Future? AI Adoption B... research activity in recommended areas; quality of evidence informing policy

Policymakers and payers should consider liability reform, reimbursement models that reward safe human–AI collaboration, funding for independent clinical validation, and measures to prevent market concentration.

Policy recommendations and implications derived from the narrative review's synthesis of regulatory, economic, and implementation challenges.

high null result Will AI Replace Physicians in the Near Future? AI Adoption B... policy actions implemented (liability reform, reimbursement changes, funding all...

The systematic review followed PRISMA protocol and analyzed a corpus of 103 items (peer‑reviewed articles and institutional reports) published 2010–2024.

Explicit methodological statement in the paper describing PRISMA use and corpus size/timeframe.

high null result Models, applications, and limitations of the responsible ado... review methodology and corpus characteristics (sample size, timeframe)

Research gaps remain: quantifying welfare gains from specific AI applications in extraction (productivity, safety, emissions), evaluating cost-effectiveness of policy bundles, and estimating dynamic returns to data ecosystems and human capital.

Identification of gaps from literature and data coverage in the comparative analysis; calls for future empirical and modelling work.

high null result ADOPTION OF ARTIFICIAL INTELLIGENCE IN THE RUSSIAN EXTRACTIV... magnitude of welfare gains from AI applications; cost-effectiveness metrics for ...

Significant empirical gaps remain on long-term impacts (wage trajectories, employment composition, firm-level returns), verification/remediation cost quantification, and public-good risks of insecure code proliferation.

Cross-study synthesis explicitly identifying missing longitudinal and firm-level empirical research in the reviewed literature.

high null result ChatGPT as a Tool for Programming Assistance and Code Develo... absence or paucity of longitudinal studies and firm-level quantitative measureme...

More granular firm- and household-level panel data are needed to empirically validate the dissertation's theoretical predictions about nonlinear effects and causal channels.

Author recommendation based on limitations noted in Essay 3 (no primary empirical estimation) and the conditional/simulation-based nature of other essays; this is a methodological claim about future research needs rather than an empirical result.

high null result MODELING HOSPITALITY AND TOURISM STRATEGIES empirical identification of nonlinear effects (research/data adequacy)

Further causal, experimental research (randomized deployments) is needed to precisely quantify net productivity and labor reallocation effects of AI agents.

Paper's stated research priorities and explicit acknowledgement of limitations from observational design; no randomized trials reported in the study.

high null result Artificial Intelligence Agents in Knowledge Work: Transformi... need for randomized causal estimates of productivity and labor reallocation

There are measurement challenges for quality-adjusted productivity—errors and downstream effects may reduce net benefits of agent automation and are under-measured in the study.

Authors' noted limitations and concerns about quality-adjusted productivity measurement (error rates, downstream externalities) based on observational deployment experience; no formal measurement of downstream costs reported.

high null result Artificial Intelligence Agents in Knowledge Work: Transformi... quality-adjusted productivity (including errors and downstream effects)

Small-scale, domain-specific deployments of Alfred AI limit external validity to other industries or larger firms.

Deployment context described as small-scale e-commerce; authors note generalizability limitations stemming from domain- and scale-specific nature of the experiments.

high null result Artificial Intelligence Agents in Knowledge Work: Transformi... external validity / generalizability

Because the study is observational and non-randomized, causal claims about the effect of AI agents on productivity and labor are limited.

Study design explicitly described as applied experimentation and observational deployments (non-randomized); potential confounding and selection biases acknowledged by the authors.

high null result Artificial Intelligence Agents in Knowledge Work: Transformi... causal identification ability (limits on attributing observed effects to the age...

Researchers and firms should measure generation throughput, verification throughput, defect accumulation rates, mean time to detection/fix, costs per incident, and the marginal value of additional verification capacity to evaluate the framework's claims.

Prescriptive measurement priorities listed in the paper as recommendations for empirical validation.

high null result Overton Framework v1.0: Cognitive Interlocks for Integrity i... set of recommended metrics (generation throughput, verification throughput, defe...

The abstract reports no empirical tests, simulations, or field experiments; empirical validation of the framework is left for future work.

Direct observation of the paper's abstract and methods description indicating lack of empirical validation.

high null result Overton Framework v1.0: Cognitive Interlocks for Integrity i... presence or absence of empirical validation in the paper

The paper's contribution is primarily conceptual/architectural rather than empirical.

Explicit statement in the paper and absence of reported empirical tests, simulations, or field experiments in the abstract and methods section.

high null result Overton Framework v1.0: Cognitive Interlocks for Integrity i... type of contribution (conceptual vs. empirical)

Priority research areas include evaluating long‑run distributional impacts of AI diffusion in agriculture, interactions between digital technologies and labor markets, inclusive financing models for adoption, and macroeconomic effects on food prices and trade.

Stated research agenda and gap analysis in the paper’s conclusions, derived from the review of existing literature and identified gaps.

high null result MODERN APPROACHES TO SUSTAINABLE AGRICULTURAL TRANSFORMATION research coverage (presence/absence of long‑run distributional studies, labor ma...

The current evidence base has gaps: more rigorous impact evaluations, long‑term soil and emissions accounting, and studies on distributional outcomes are needed.

Meta‑assessment within the paper noting limitations of existing literature (many short‑term pilots, limited long‑run soil/emissions data, few studies on who captures value); the claim is based on the review's appraisal of methods used in cited studies.

high null result MODERN APPROACHES TO SUSTAINABLE AGRICULTURAL TRANSFORMATION research evidence sufficiency (availability of long‑term causal estimates, soil/...

There are limited standardized measures of 'AI capital,' scarce data on firm-level AI investment and implementation quality, and few long-run causal estimates of AI’s effects on managerial productivity and labor outcomes.

Gap analysis based on literature review and methodological discussion within the book; observation about the state of available empirical evidence.

high null result Modern Management in the Age of Artificial Intelligence: Str... availability and standardization of AI investment/asset measures; existence of l...

The paper is primarily conceptual/architectural and does not present large empirical studies quantifying the phenomenon across firms or repositories.

Explicit methodological statement in the paper describing its use of thought experiments, mechanism reasoning, and illustrative examples rather than empirical datasets.

high null result Overton Framework v1.0: Cognitive Interlocks for Integrity i... presence/absence of empirical studies within the paper (binary)

Suggested empirical pathways include lab experiments measuring initiation probability/time-to-start with versus without conversational priming, and field A/B tests in productivity apps measuring task starts and completion conditional on start.

Methodological recommendations in the paper (proposed future empirical work); no data provided.

high null result A Model of Action Initiation Barrier Reduction through AI Co... proposed outcomes to measure in future work: initiation probability, time-to-sta...

The paper lacks quantitative validation; effects and magnitudes of the proposed initiation channel are unmeasured.

Methodological statement in the paper noting it is conceptual/theoretical and that it does not report systematic empirical analysis or randomized evaluation.

high null result A Model of Action Initiation Barrier Reduction through AI Co... absence of measured effect sizes or statistical estimates in the paper

The paper introduces the 'AI Conversation-Based Action Initiation Barrier Reduction Model' as a theoretical framework explaining how conversational AI reduces initiation frictions.

Descriptive/theoretical presentation in the paper (model specification and conceptual framing). No empirical validation provided.

high null result A Model of Action Initiation Barrier Reduction through AI Co... n/a (the claim is about the existence of a theoretical model)

« Prev 1 2 3 … 13 14 15 … 62 63 Next »