Evidence (7953 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

This study is descriptive and comparative rather than quantitative; it relies on available policy documents and secondary literature rather than original field interviews or measured outcomes.

Explicit methodological statement in the paper listing qualitative document analysis, comparative literature review, and policy commentary; limitation acknowledged by authors.

high null result <b>Regulating AI in National Security: A Comparative S... methodological approach and evidentiary scope (document/literature based, non‑qu...

A research agenda for AI economics should include: formalizing consent as a transaction/contracting problem; empirical RCTs and natural experiments measuring effects of consent designs; mechanism design for privacy-preserving data sharing; and policy evaluation of consent regulations.

Explicitly listed research directions in the workshop outputs and position papers; these are proposed next steps rather than empirical findings.

high null result Moving Beyond Clicks: Rethinking Consent and User Control in... proposed research topics and methodological approaches

Follow-up empirical methods should include qualitative interviews, focus groups, usability studies, field experiments (A/B tests), and policy/legal-technical assessments.

Recommended research methods enumerated in the workshop outputs and position papers; these are proposed future methods rather than findings from conducted studies.

high null result Moving Beyond Clicks: Rethinking Consent and User Control in... recommended empirical methods for future research

The Futures Design Toolkit (scenario planning, persona generation, speculative design) was used as a primary method in the workshop.

Methodological description in the workshop summary listing the Futures Design Toolkit and associated activities; procedural claim rather than empirical.

high null result Moving Beyond Clicks: Rethinking Consent and User Control in... use of specified design methods

The study has potential selection and ecological-validity constraints because it was conducted at two institutions across six courses, limiting generalizability.

Authors note limitations regarding sample scope (two institutions, six courses) and the ecological validity of the experimental tasks/settings.

high null result Expanding the lens: multi-institutional evidence on student ... external validity/generalizability (limitation)

The study employed a multi-method approach combining experimental quantitative analysis (descriptives, GLM, non-parametric robustness checks) with qualitative topic-based coding of open-ended survey responses.

Methods description: randomized/experimental assignment; quantitative analyses using GLM and non-parametric tests; qualitative topic-based coding of student responses; sample N = 254 across six courses at two institutions.

high null result Expanding the lens: multi-institutional evidence on student ... study methodology (mixed-methods design)

The study did not directly measure accessibility or impacts on students with disabilities, though qualitative results suggest possible intersections with inclusive and multimodal learning design.

Limitation stated by authors: no direct measurement of accessibility outcomes; qualitative responses hinted at potential relevance to inclusive design but no empirical measurement of disability-related impacts.

high null result Expanding the lens: multi-institutional evidence on student ... accessibility/disability-related educational outcomes (not measured)

The study focused on short-term, knowledge-based tasks and did not measure long-term learning or retention.

Authors explicitly note as a limitation that the experimental tasks were short-term and knowledge-based and that long-term retention was not measured.

high null result Expanding the lens: multi-institutional evidence on student ... long-term learning/retention (not measured)

Empirical generalization across all climate-AI systems is constrained by heterogeneous data availability and proprietary models, limiting the ability to produce universal quantitative claims.

Stated methodological limitation in the paper, noting heterogeneous data and the proprietary nature of some models restrict broad generalization.

high null result The Rise of AI in Weather and Climate Information and its Im... Extent of empirical generalizability across climate-AI systems

The paper does not provide granular quantitative estimates of the economic cost of infrastructural asymmetries in climate-AI.

Explicit limitation stated by the authors in the Methods/Limitations section.

high null result The Rise of AI in Weather and Climate Information and its Im... Absence of quantified economic cost estimates in the paper

There is a need for empirical research quantifying earnings dispersion, labor substitution effects, and the welfare impacts of GenAI-driven content economies over time.

Explicit research recommendation made in the paper based on gaps identified during analysis of the 377 videos (study is qualitative and does not measure these outcomes).

high null result Monetizing Generative AI: YouTubers' Collective Knowledge on... absence of quantitative measures in current study / identified need for future m...

The analysis identifies ten shared use cases that creators present as pathways to income using GenAI.

Coding of the 377-video corpus resulted in a catalog of ten use cases (as reported in the paper).

high null result Monetizing Generative AI: YouTubers' Collective Knowledge on... count and identification of distinct use-case categories (ten)

Risk and ambiguity manipulations: risk condition communicated a single explicit leak probability of 30%; ambiguity condition communicated the leak probability as a range (10–50%).

Paper's methods section describing the manipulations used in the randomized experiment (N = 610); these specific probability framings were the core independent-variable manipulations.

high null result The Data-Dollars Tradeoff: Privacy Harms vs. Economic Risk i... Manipulation parameters (leak-probability information presented to participants)

Experimental design: study used a 2 × 3 between-subjects design with N = 610, crossing information environment (Risk vs Ambiguity) with privacy-treatment conditions (including privacy-threatening vs neutral and different data-type labels).

Methodological description reported in the paper: participants (N = 610) randomized across 6 experimental arms derived from the 2 (Risk vs Ambiguity) × 3 (privacy treatments) factorial design; tasks included choosing between a standard product basket and an AI-personalized basket.

high null result The Data-Dollars Tradeoff: Privacy Harms vs. Economic Risk i... Experimental design / assignment (not an outcome variable)

When leak probabilities are known (risk condition: explicit 30% leak probability), adoption of personalization is about 50% and is not significantly affected by privacy-threatening versus neutral information.

Same randomized experiment (N = 610) with a risk manipulation that explicitly stated a single 30% leak probability. Measured adoption rates showed roughly 50% uptake and no statistically significant difference between privacy-threatening and neutral conditions under risk.

high null result The Data-Dollars Tradeoff: Privacy Harms vs. Economic Risk i... Adoption choice: percent choosing AI-personalized basket (≈50%)

Many apparent inter-domain differences vanish once measurement uncertainty is accounted for.

Bootstrap confidence intervals and repeated-sample comparisons showing that differences in citation share or prevalence observed in single-run snapshots are often not statistically significant when uncertainty from repeated sampling is included.

high null result Quantifying Uncertainty in AI Visibility: A Statistical Fram... statistical significance of inter-domain differences in citation share / prevale...

Falsifiability condition for intermediation-collapse: If intermediary margins remain stable despite measurable declines in information frictions, the intermediation-collapse mechanism is falsified.

Stated empirical test in the paper that compares measured intermediary markups/margins to proxies for information frictions and AI-driven automation across affected sectors.

high null result Abundant Intelligence and Deficient Demand: A Macro-Financia... intermediary margins versus measures of information frictions/automation

Falsifiability condition for Ghost GDP: If monetary velocity does not decline (or instead rises) as the labor share falls, the Ghost GDP channel is unsupported by the data.

Explicit falsification condition provided in the paper based on the model link labor share -> velocity -> consumption; suggested empirical test using monetary-velocity proxies and labor-share series from FRED.

high null result Abundant Intelligence and Deficient Demand: A Macro-Financia... empirical relationship between labor share and monetary velocity

Empirically, top-quintile households account for roughly 47–65% of U.S. consumption.

Calibration and reported quantitative scenarios in the paper using U.S. consumption concentration data (constructed from U.S. consumption/income micro- and macro-data sources referenced in the methods section).

high null result Abundant Intelligence and Deficient Demand: A Macro-Financia... share of U.S. consumption attributable to the top income quintile

Economy & Finance threads contained no self-referential content, suggesting agents can engage in market discussion without representing themselves as agents.

Topic-model-derived topical category labeling and tagging for self-referential themes showing zero instances of self-reference in posts categorized as Economy & Finance in the dataset; counts derived from the 361,605 posts.

high null result What Do AI Agents Talk About? Emergent Communication Structu... presence/absence of self-referential tags in Economy & Finance posts

Because the sample is small and purposive and the design is qualitative, insights are rich but not statistically representative or quantified across the broader research landscape.

Authors' stated study limitations in the paper acknowledging small purposive sample (n=16) and qualitative design.

high null result RCTs & Human Uplift Studies: Methodological Challenges and P... representativeness and generalizability of study findings

The study's data come from semi-structured interviews with 16 expert practitioners across biosecurity, cybersecurity, education, and labor.

Study methods reported in the paper: qualitative data source explicitly stated as 16 semi-structured interviews across listed domains.

high null result RCTs & Human Uplift Studies: Methodological Challenges and P... sample size and domain coverage of interviews

The authors released their code and data for reproducibility at https://github.com/blocksecteam/ReEVMBench/.

Statement in the paper indicating public release of code and dataset at the provided GitHub URL.

high null result Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... code_and_data_availability (repository_link)

Crystallization Efficiency (CE) is defined as Useful_Crystallized_Knowledge / (Human_Effort × Time).

Operational formalism and metric definitions presented in the paper (explicit formula provided). This is a proposed metric, not an empirically validated measure.

high null result Nurture-First Agent Development: Building Domain-Expert AI A... Crystallization Efficiency as defined

The paper proposes operational patterns (Dual-Workspace Pattern separating live interaction workspace and persistent knowledge workspace) and a Spiral Development Model (iterative interaction → crystallization → validation → redeployment).

Operational framework section describing patterns and workflows; illustrated in the case study implementation.

high null result Nurture-First Agent Development: Building Domain-Expert AI A... existence and application of dual-workspace and spiral development workflows

The Knowledge Crystallization Cycle formalizes operations (extract, synthesize, validate, integrate) and proposes efficiency and quality metrics including Crystallization Efficiency (CE), Fidelity, Reuse Rate, and Freshness/Volatility Score.

Operational formalism section of the paper presenting metric definitions and proposed calculations (e.g., CE = Useful_Crystallized_Knowledge / (Human_Effort × Time)). These are proposed metrics, not validated at scale.

high null result Nurture-First Agent Development: Building Domain-Expert AI A... Crystallization Efficiency and related proposed metrics

The paper introduces a Three-Layer Cognitive Architecture that organizes agent knowledge by volatility and degree of personalization (stable/core knowledge; institutionalized heuristics/patterns; volatile/session-level tacit details).

Architectural specification presented in the paper (conceptual design document). No experimental validation beyond the illustrative case study.

high null result Nurture-First Agent Development: Building Domain-Expert AI A... categorization of knowledge artifacts into three volatility/personalization laye...

Nurture-First Development (NFD) reframes agent creation from a one-time engineering task into a continuous, conversational growth process.

Conceptual formalization in the paper (architectural and operational descriptions). No large-scale empirical test reported; supported by theoretical argumentation and illustrative examples.

high null result Nurture-First Agent Development: Building Domain-Expert AI A... characterization of development process (one-time vs. continuous conversational ...

Findings are based on a student sample rating decontextualized messages, so external validity to industry communication or real project logs is uncertain and requires replication.

Study sample consisted of 81 students in team-based software projects labeling decontextualized statements; authors explicitly note this limitation as a caveat.

high null result Exploring Indicators of Developers' Sentiment Perceptions in... generalizability/external validity of the study findings to non-student, context...

Many apparent correlations between predictors and sentiment labels do not remain significant after global multiple-testing correction.

Correlation analyses across many predictors with explicit application of multiple-testing correction procedures; many initial signals failed to survive correction.

high null result Exploring Indicators of Developers' Sentiment Perceptions in... statistical significance of correlations between predictors (e.g., mood, team me...

The paper does not provide quantitative estimates of time saved per report, cost reductions, or effects on employment/wages; such economic impacts remain to be quantified.

Caveats noted in the paper: absence of quantitative estimates for time/cost/employment effects and a call for field trials and economic modeling. This is explicitly stated in the summary.

high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Absence of quantitative economic impact estimates (time saved, cost reduction, e...

The paper used a clinically grounded, multi-level evaluation framework that separately assessed raw AI drafts (automatic metrics + clinician review) and radiologist-AI collaborative final reports (how radiologists edit and downstream clinical effects), including comparisons across radiologist experience levels.

Methodology section summarized in the paper: multi-level assessment covering AI drafts and radiologist-edited collaborative reports; combination of automatic metrics and radiologist-/clinician-centered evaluations; experience-level stratified analyses (novice/intermediate/senior).

high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Evaluation framework components (draft assessment, collaborative report assessme...

CBCTRepD is a report-generation system trained on this curated paired dataset to produce bilingual CBCT radiology draft reports intended for radiologist-in-the-loop (co-authoring) workflows.

System description in the paper: CBCTRepD built using the curated dataset; authors state purpose is to generate clinically usable drafts for radiologist editing. (Model architecture and training hyperparameters are not specified in the provided text.)

high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... System capability: generation of bilingual CBCT draft reports for human editing

The authors curated a paired CBCT–report dataset of approximately 7,408 CBCT studies covering 55 oral and maxillofacial disease entities that is bilingual and includes diverse acquisition settings.

Data curation described in the paper: stated dataset size (~7,408 studies), coverage of 55 disease entities, bilingual reports, and inclusion of a range of acquisition settings to increase heterogeneity and clinical realism. (Exact languages, provenance of studies, and dataset split details are not specified in the provided text.)

high null result Bridging the Skill Gap in Clinical CBCT Interpretation with ... Dataset composition (number of studies, disease-entity coverage, bilingual statu...

The workshop identifies specific research directions for AI economics: cost–benefit and ROI analyses of shared infrastructure; market design for procurement of co-designed systems; models of innovation incentives under different IP/data-governance regimes; labor market impact assessments; and empirical studies of how validation ecosystems affect adoption rates and pricing.

Explicitly listed research directions in the workshop summary and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... articulated research agenda items and priority areas for future empirical study

The workshop's findings are based on qualitative synthesis of expert judgment and stakeholder inputs rather than primary empirical data or controlled experiments.

Explicitly stated in the Data & Methods section of the workshop summary; methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building.

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... nature and strength of empirical support for the recommendations (qualitative vs...

The workshop convened researchers, clinicians, and industry leaders to address co-design across four thematic areas: teleoperations/telehealth/surgical operations; wearable and implantable medicine; home ICU/hospital systems/elderly care; and medical sensing/imaging/reconstruction.

Workshop agenda and participant list from the two-day NSF workshop (Sept 26–27, 2024); methods included thematic breakout sessions focused on these four areas. Documentation at https://sites.google.com/view/nsfworkshop.

high null result Report for NSF Workshop on Algorithm-Hardware Co-design for ... topics and thematic coverage of the workshop

Evaluation was performed on five different material setups.

Experimental evaluation described in the summary: performance reported as averaged across five material setups. The summary does not list per-setup names or trial counts.

high null result Learning Adaptive Force Control for Contact-Rich Sample Scra... number of material setups used in evaluation (n = 5)

The simulation models samples as collections of spheres with per-sphere procedurally generated dislodgement-force thresholds derived from Perlin noise to introduce spatial heterogeneity and diversity.

Simulation/modeling description in the paper: discrete-sphere representation of sample; each sphere assigned a dislodgement threshold; spatial variation produced via Perlin noise. This is a concrete modeling choice reported in the methods.

high null result Learning Adaptive Force Control for Contact-Rich Sample Scra... representation of material heterogeneity in simulation (model design detail)

The paper uses a mixed-methods approach combining a systematic literature review with an empirical practitioner survey to assess perceptions, adoption, and impact of AI-driven tools.

Methodological statement in the paper; survey design covers tool usage, perceived benefits, challenges, and expectations.

high null result Artificial Intelligence as a Catalyst for Innovation in Soft... methodological coverage (presence of literature review and survey)

Empirical work (experiments and measurements) is needed to quantify how much value interpretive traces add to downstream outputs, how RATs affect platform incentives, and what governance frameworks fairly allocate resulting rents.

Concluding recommendation in the paper stating the research gaps; not an empirical claim but a stated need.

high null result Chasing RATs: Tracing Reading for and as Creative Activity research agenda items (value quantification, platform incentive effects, governa...

The current presentation of RATs is speculative and illustrative; empirical validation, scalability, and ethical safeguards remain to be developed.

Limitations section of the paper explicitly states the speculative nature and lack of empirical evaluation.

high null result Chasing RATs: Tracing Reading for and as Creative Activity status of empirical validation/scalability/ethical development

Implementation of RATs requires instrumentation at the browser/platform level or via plugins and must address privacy/consent, storage/ownership, sharing controls, and interoperable trace formats.

Design and implementation considerations enumerated in the paper; this is a requirements statement rather than an empirical claim.

high null result Chasing RATs: Tracing Reading for and as Creative Activity implementation requirements and privacy/governance needs

Analytical approaches compatible with RATs include sequence/trajectory mining, network analysis of associations/co-read graphs, embedding/clustering of trajectories, qualitative inspection of reflections, and experimental (A/B or RCT) evaluation of downstream effects.

Methods section of the paper listing suggested analytical techniques; these are proposed methods rather than applied analyses.

high null result Chasing RATs: Tracing Reading for and as Creative Activity analytical approaches applicable to RAT data

The approach shifts some computational burden to obtaining MCMC samples of the parameter posterior, requiring access to (or ability to compute) MCMC samples before surrogate training.

Method description: training data are MCMC-drawn parameter vectors; the paper notes this practical requirement and trade-off (MCMC cost vs. avoiding repeated expensive forward-model evaluations).

high null result MCMC Informed Neural Emulators for Uncertainty Quantificatio... need for and cost of MCMC sampling (computational requirement)

More theoretical work is needed to establish guarantees (consistency, asymptotic behavior, and frequentist coverage) for these networks when applied in economic settings.

Stated research need/caveat in the paper; no new theoretical proofs are provided in the summary to establish these properties.

high null result ForwardFlow: Simulation only statistical inference using dee... theoretical guarantees (absence of established consistency/asymptotic/coverage r...

The Boson Sampling Born Machine (BSBM) is a generative model whose model distribution is the output probability distribution of a linear-optical (bosonic modes) circuit.

Definition and constructive specification in the paper: model architecture described as linear-optical circuits with outputs given by bosonic-mode measurement probabilities (the paper's formal definition/construction). The claim is definitional/theoretical (no empirical sample size).

high null result Universality of Classically Trainable, Quantum-Deployed Boso... model distribution = linear-optical circuit output probabilities

Because this is a conceptual/systems-architecture paper, it does not present new empirical performance benchmarks.

Explicit statement in the paper's Data & Methods section that no new empirical benchmarks are presented.

high null result Reference Architecture of a Quantum-Centric Supercomputer presence or absence of new empirical performance benchmark data

The evaluated models consist of an MLP baseline and a GNN tailored to exploit relational/spatial structure among beams/antennas.

Model descriptions provided in the methods section: two supervised-learning architectures (MLP and GNN) used for beam prediction experiments.

high null result Federated Learning-driven Beam Management in LEO 6G Non-Terr... model architecture comparison (GNN vs MLP)

Using Federated Learning (FL) with orbital planes as distributed learners and HAPS for aggregation avoids centralization of raw channel data.

Method description: federated-learning architecture with clients mapped to orbital planes and HAPS performing coordination/aggregation; explicitly states no central pooling of raw channel samples.

high null result Federated Learning-driven Beam Management in LEO 6G Non-Terr... presence/absence of central pooling of raw channel data

« Prev 1 2 3 … 35 36 37 … 159 160 Next »