Evidence (4049 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	369	105	58	432	972
Governance & Regulation	365	171	113	54	713
Research Productivity	229	95	33	294	655
Organizational Efficiency	354	82	58	34	531
Technology Adoption Rate	277	115	63	27	486
Firm Productivity	273	33	68	10	389
AI Safety & Ethics	112	177	43	24	358
Output Quality	228	61	23	25	337
Market Structure	105	118	81	14	323
Decision Quality	154	68	33	17	275
Employment Level	68	32	74	8	184
Fiscal & Macroeconomic	74	52	32	21	183
Skill Acquisition	85	31	38	9	163
Firm Revenue	96	30	22	—	148
Innovation Output	100	11	20	11	143
Consumer Welfare	66	29	35	7	137
Regulatory Compliance	51	61	13	3	128
Inequality Measures	24	66	31	4	125
Task Allocation	64	6	28	6	104
Error Rate	42	47	6	—	95
Training Effectiveness	55	12	10	16	93
Worker Satisfaction	42	32	11	6	91
Task Completion Time	71	5	3	1	80
Wages & Compensation	38	13	19	4	74
Team Performance	41	8	15	7	72
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	17	15	9	5	46
Job Displacement	5	28	12	—	45
Social Protection	18	8	6	1	33
Developer Productivity	25	1	2	1	29
Worker Turnover	10	12	—	3	25
Creative Output	15	5	3	1	24
Skill Obsolescence	3	18	2	—	23
Labor Share of Income	7	4	9	—	20

Governance Remove filter

Structure predictors depend on training data and exhibit biases; experimental validation remains necessary.

Paper notes dependence on training data biases and the need for experimental validation; references data sources (PDB, UniRef, metagenomic catalogs) but does not quantify bias magnitudes.

high negative Protein structure prediction powered by artificial intellige... bias in model predictions attributable to training data coverage/quality; requir...

Current limitations include inaccurate prediction of multi‑chain complexes, flexible or rare conformational states, and limited prediction of dynamic ensembles.

Paper explicitly enumerates these limitations in the 'Ongoing limitations' section; no quantitative failure rates are given.

high negative Protein structure prediction powered by artificial intellige... accuracy for multi‑chain complexes, flexible/rare conformations, and ensemble/dy...

Traditional computational methods struggle without homologous templates or with complex folding/dynamics.

Paper discusses limitations of traditional computational methods, emphasizing dependence on homologous templates and difficulty with complex folding/dynamics; specific method comparisons or sample sizes are not provided.

high negative Protein structure prediction powered by artificial intellige... accuracy/success of traditional computational structure prediction in low‑homolo...

Opacity, bias, and errors in AI systems demand auditing, standards, and governance (algorithmic accountability) to ensure trustworthy assessment.

Synthesis of literature on algorithmic bias and accountability plus policy analysis recommending audits and standards; supported by country cases that discuss governance concerns.

high negative The Future of Assessment: Rethinking Evaluation in an AI-Ass... algorithmic fairness, transparency, and reliability

Student data used by AI vendors raises risks around consent, reuse, commercial exploitation, and other data-privacy concerns.

Policy analysis and literature on data governance, privacy law debates; examples from national policy documents in the comparative cases. No original data on breaches or misuse presented.

high negative The Future of Assessment: Rethinking Evaluation in an AI-Ass... privacy risks and governance of student data

Empirical evaluation of integrated defenses, quantitative cost/benefit analyses, and standardized threat models for VR are research gaps that remain unaddressed in the literature window surveyed (2023–2025).

Authors' stated limitations from their comparative literature review of 31 studies noting an absence of primary empirical validation and quantitative economic analyses in the reviewed corpus.

high negative Securing Virtual Reality: Threat Models, Vulnerabilities, an... presence/absence of empirical validation, cost‑benefit studies, and standard thr...

Immersive VR systems collect continuous multimodal signals (motion tracking, gaze, voice, biometrics) that enable novel inference, spoofing, and manipulation attacks beyond traditional IT threats.

Synthesis of threat descriptions across the 31 reviewed peer‑reviewed studies (2023–2025) documenting sensor modalities and attack vectors; qualitative comparative evaluation of attack surfaces.

high negative Securing Virtual Reality: Threat Models, Vulnerabilities, an... existence and extent of expanded attack surface due to multimodal signal collect...

The Omnibus overlaps substantively with the DSA and other digital policies, creating potential jurisdictional and interpretive ambiguities about which rules apply to platforms and AI-enabled services.

Comparative mapping and legal/regulatory review identifying overlapping provisions; qualitative analysis of proposed texts (no quantitative sample).

high negative The Digital Omnibus and the Future of EU Regulation: Implica... jurisdictional/interpretive clarity of applicable rules for platforms and AI ser...

Pakistan prioritizes economic and digital governance objectives, with comparatively weak governance of military AI.

Review of Pakistan’s economic and digital governance plans, export‑control materials, and secondary literature on Pakistan’s civil–military relations.

high negative <b>Regulating AI in National Security: A Comparative S... strength and formality of military AI governance

Large-scale machine learning enables invisible inferences about users from seemingly innocuous data.

Conceptual claim presented in the workshop and supported by referenced technical literature on inference capabilities of ML models (discussion in position papers); workshop itself did not present a new empirical experiment.

high negative Moving Beyond Clicks: Rethinking Consent and User Control in... privacy risk from inferred attributes (inference accuracy / presence of invisibl...

Inequities in climate-AI systems appear across three development phases—Inputs, Process, and Outputs—creating multiple failure points where Global North advantages propagate into final products.

Conceptual framework developed from cross-disciplinary synthesis, literature review, and illustrative examples (Inputs → Process → Outputs mapping).

high negative The Rise of AI in Weather and Climate Information and its Im... Presence of inequities at each phase of the AI development lifecycle (data avail...

Foundation-model development and high-performance computing (HPC) capacity are overwhelmingly located in the Global North.

Descriptive mapping of global HPC infrastructure and foundation-model authorship described in the paper (infrastructure mapping and authorship analysis). No single quantitative sample size reported; evidence based on spatial mapping and documented locations of compute centers and model-development institutions.

high negative The Rise of AI in Weather and Climate Information and its Im... Geographic distribution of HPC capacity and foundation-model development (locati...

Ambiguity about the probability of data leaks (a 10–50% range) reduces user adoption of AI personalization relative to a neutral privacy presentation.

Between-subjects online experiment, 2 (information environment: Risk vs Ambiguity) × 3 (privacy-treatment conditions), N = 610 participants randomized across arms. Leak-probability ambiguity presented as a 10–50% range; adoption (choice of personalized vs standard basket) was measured and privacy-threatening conditions under ambiguity produced a statistically significant reduction in adoption compared to neutral.

high negative The Data-Dollars Tradeoff: Privacy Harms vs. Economic Risk i... Adoption choice: proportion choosing AI-personalized basket versus standard bask...

Rank stability analysis across the whole citation distribution shows instability not only at the tail but across frequently cited domains; rankings shift substantially across samples.

Distribution-wide rank-stability methods applied to repeated-sample citation data from the three platforms and three topics, comparing domain ranks across samples and quantifying rank-change frequency and magnitude.

high negative Quantifying Uncertainty in AI Visibility: A Statistical Fram... rank stability of domains by citation frequency across repeated samples

Bootstrap-based confidence intervals show wide uncertainty: many domain-level differences that look meaningful in single-run snapshots fall within measurement noise.

Bootstrap resampling applied to repeated-sample data (collected across nine days and high-frequency sampling) to compute confidence intervals for citation shares and prevalence; many pairwise or between-domain differences were not statistically separable once CIs were considered.

high negative Quantifying Uncertainty in AI Visibility: A Statistical Fram... width of bootstrap confidence intervals for domain citation shares / prevalence ...

Single-run point estimates of citation share or prevalence are misleading; visibility metrics should be treated as estimators with uncertainty and reported with confidence intervals.

Comparison of single-run snapshots to distributions obtained from repeated sampling (daily and 10-minute interval regimes) and bootstrap resampling showing wide sample-to-sample variation and wide CI widths for domain-level shares and prevalence metrics.

high negative Quantifying Uncertainty in AI Visibility: A Statistical Fram... bias/precision of single-run estimates of domain citation share and prevalence

Generative search platforms are non-deterministic: the same query at different times can yield different answers and different cited domains.

Repeated-query experiments performed on three platforms (Perplexity Search, OpenAI SearchGPT, Google Gemini) across three consumer-product topics, using multi-day sampling (one collection per day over nine days) and high-frequency sampling (repeated queries at 10-minute intervals); observed variation in responses and cited domains across runs.

high negative Quantifying Uncertainty in AI Visibility: A Statistical Fram... response variability (changes in generated answers) and cited domains per query

Despite LoRA being parameter-efficient, fine-tuning and iterative human-in-the-loop workflows still require compute resources and researcher time; governance/versioning of tuned models is necessary.

Caveat stated in the paper about remaining computational and governance costs; no quantitative resource usage reported in the summary.

high negative THETA: A Textual Hybrid Embedding-based Topic Analysis Frame... compute/resource requirements and governance burden

Embedding fine-tuning (DAFT) risks amplifying domain-specific biases present in the tuning corpus, so domain experts and robust evaluation protocols are necessary.

Paper caveat noting bias-amplification risk from fine-tuning embeddings; aligns with known risks in the literature but no empirical bias audit results provided in the summary.

high negative THETA: A Textual Hybrid Embedding-based Topic Analysis Frame... amplification of biases in tuned embeddings / need for bias mitigation

Mean emotional self-alignment between poster and responder is 32.7%, indicating systematic affective mismatch rather than congruence.

Pairwise comparison of emotion labels across post–response pairs in the dataset; computation of mean percentage where poster and immediate responder share the same emotion (32.7%).

high negative What Do AI Agents Talk About? Emergent Communication Structu... percentage of post–response pairs with identical emotion labels (emotional self-...

Conversational coherence declines rapidly with thread depth, indicating shallow, weakly connected multi-turn exchanges.

Lexical-semantic coherence metrics (e.g., embedding-based similarity) computed across comment threads of varying depth in the Moltbook dataset; observed rapid decrease in coherence scores as thread depth increases.

high negative What Do AI Agents Talk About? Emergent Communication Structu... coherence (similarity) metric as a function of thread depth

When pipelines have cross-cutting ties, prices oscillate, allocation quality drops, and management becomes difficult.

Empirical simulation results from the ablation study: configurations with non-hierarchical, cross-cutting graph structures produced larger price volatility, frequent oscillations in price updates, and lower allocation value/throughput compared to hierarchical graphs (measured across many runs and random seeds within the 1,620-run experimental set).

high negative Real-Time AI Service Economy: A Framework for Agentic Comput... price volatility and oscillation frequency; allocation quality (value/throughput...

If deployment value is the time-average for one agent, optimizing the usual expected-value objective can lead to poor real-world outcomes.

Reasoning plus the paper's illustrative example demonstrating policies with high expected reward but poor or highly variable realized time-average outcomes; theoretical exposition, no empirical dataset.

high negative Ergodicity in reinforcement learning realized long-run (time-average) reward of deployed agent

Optimizing the expected cumulative reward (ensemble average across trajectories) can be misleading when reward-generating dynamics are non-ergodic because the ensemble expectation does not generally equal the time-average experienced by a single deployed agent.

Theoretical argumentation and a constructive illustrative example in the paper showing divergence between ensemble expectation and single-trajectory time-average; no empirical sample; analysis-based evidence.

high negative Ergodicity in reinforcement learning expected cumulative reward (ensemble expectation) vs. time-average realized rewa...

A small linear spatial disadvantage requires an exponentially larger population to obtain the same probability of early discovery (scaling relation).

Analytic scaling result derived from extreme-value analysis of first-passage times in the model, with confirmation by numerical simulations (stochastic realizations; number of runs not specified). The result is internal to the theoretical model.

high negative Macroscopic Dominance from Microscopic Extremes: Symmetry Br... population size required to match probability of early discovery (or probability...

Standard RLHF expected-cost constraints ignore distributional shape and can fail under heavy tails or rare catastrophic events.

Analytic/motivating argument presented in the paper contrasting expectation-based constraints with distributional behavior; illustrative examples and discussion of heavy-tailed/rara event failure modes (no sample-size or dataset details provided in the summary).

high negative Safe RLHF Beyond Expectation: Stochastic Dominance for Unive... safety cost distribution properties (tail probability of high-cost/unsafe rollou...

Improving explainability can trade off with predictive performance, privacy, and robustness; these trade-offs must be managed rather than ignored.

Review aggregates technical literature and conceptual analyses documenting trade-offs reported by researchers (e.g., simpler interpretable models sometimes having lower predictive accuracy; disclosure risks to privacy; robustness concerns). No single causal estimate provided.

high negative Explainable AI in High-Stakes Domains: Improving Trust, Tran... predictive performance, privacy risk, model robustness

The evidence base presented is limited to a single SME pilot, so generalizability across sectors, firm sizes, and data regimes is untested and requires further research.

Explicit limitation noted in the paper and the fact that the pilot illustrated is a single case study (sample size = 1 SME pilot).

high negative ALGORITHM FOR IMPLEMENTING AI IN THE MANAGEMENT LOOP OF SMES... external validity / generalizability of results beyond the single pilot

Common barriers to effective RM implementation include siloed functions/weak coordination, limited resources or expertise, poor data quality/lack of metrics, and cultural resistance driven by short-term incentives.

Frequent identification of these barriers across the reviewed literature and practitioner sources synthesized via thematic analysis over the last ten years.

high negative The Role of Risk Management as an Organizational Management ... barriers to RM adoption/implementation; likelihood of successful RM

Upfront costs for AI adoption are substantial: development, clinical validation, regulatory compliance, EHR integration, and ongoing monitoring.

Implementation and regulatory literature synthesized in the review documenting typical cost categories and reported expenditures for clinical AI projects.

high negative Will AI Replace Physicians in the Near Future? AI Adoption B... fixed and recurring implementation costs

Large language models (LLMs) suffer from hallucinations (fabricated facts), overconfidence, and unpredictable failure modes in open-ended tasks.

Technical papers and benchmarks on LLM factuality, calibration, and failure modes summarized in the review; empirical evaluations showing instances of fabricated outputs and calibration issues.

high negative Will AI Replace Physicians in the Near Future? AI Adoption B... factual accuracy of outputs; calibration (confidence vs accuracy); failure rate ...

Contemporary AI systems have no capacity for physical examination, sensorimotor procedures, or direct patient-contact diagnostics.

Technical limitations of CNNs and LLMs described in literature (lack of embodiment, no sensorimotor capabilities) and absence of credible empirical demonstrations of safe autonomous physical clinical procedures in reviewed studies.

high negative Will AI Replace Physicians in the Near Future? AI Adoption B... ability to perform physical exam / procedural tasks / direct patient-contact dia...

Current models exhibit poor out-of-distribution (OOD) generalization: performance degrades when inputs differ from training distributions.

Technical literature and robustness/domain-shift research reviewed in the paper documenting declines in model accuracy under domain shift and dataset changes.

high negative Will AI Replace Physicians in the Near Future? AI Adoption B... model accuracy/performance under domain shift / OOD inputs

High upfront costs and lack of tailored financing instruments are significant financial constraints on SME AI adoption.

Case studies, finance sector reports, and SME surveys cited in the review showing cost barriers and financing gaps; evidence descriptive rather than causal.

high negative Artificial Intelligence Adoption for Sustainable Development... upfront investment costs; access to tailored finance; adoption rates

Infrastructure deficits (unreliable power, inadequate broadband, limited local compute) materially constrain AI uptake by SMEs.

Policy reports and empirical studies in the literature documenting infrastructural limitations in LMIC contexts (including Botswana) that impede digital and AI deployment.

high negative Artificial Intelligence Adoption for Sustainable Development... infrastructure adequacy metrics (power reliability, broadband access); AI adopti...

Skills shortages (AI literacy, data science, digital management) are a primary constraint on SME AI adoption in developing economies.

Consistent findings across surveys, interviews, and case studies in the reviewed literature highlighting skill gaps as a common barrier; authors note multiple empirical sources pointing to this constraint.

high negative Artificial Intelligence Adoption for Sustainable Development... availability of AI-relevant skills; reported skills constraints limiting adoptio...

Heterogeneity in study designs and contexts within the literature limits direct comparability and generalizability of findings.

Limitation noted in the paper based on the authors' assessment of diversity across the 103 reviewed studies (varying methods, contexts, metrics).

high negative Models, applications, and limitations of the responsible ado... comparability/generalizability of evidence across studies

Institutional inertia, fragmented governance structures, limited technical capacity, and weak data stewardship impede scale‑up of AI systems in the public sector.

Thematic synthesis of barriers reported across empirical studies and institutional reports within the systematic review (103 items).

high negative Models, applications, and limitations of the responsible ado... ability to scale AI systems / scale‑up rate

Low‑ and middle‑income contexts face persistent gaps—infrastructure, data ecosystems, and talent retention—that slow AI adoption in public governance.

Consistent findings across multiple studies in the 103‑item corpus reporting infrastructure deficits, weak data ecosystems, and brain drain/retention issues in LMIC settings.

high negative Models, applications, and limitations of the responsible ado... rate/extent of AI adoption in public governance in low- and middle‑income contex...

The January 2026 DoD AI Strategy memorandum establishes a Barrier Removal Board that provides expanded authority to waive established governance controls.

Primary source analysis: close reading of the Department of Defense January 2026 AI Strategy memorandum and related policy text (policy language describing the Barrier Removal Board and its waiver authorities). No sample size required; based on document text.

high negative FEATURE COMMENT: Governance as a "Blocker": How the Pentagon... existence and authority of the Barrier Removal Board (waiver authority over gove...

Risks include bias and discrimination, opacity in decision-making, privacy and cybersecurity threats, liability gaps, and uneven distribution of benefits that can exacerbate inequality.

Compilation from academic and policy literature, regulatory gap analyses, and examples of problematic AI use cases identified in the report's sectoral review.

high negative AI Governance and Data Privacy: Comparative Analysis of U.S.... bias/discrimination incidents, decision-making opacity, privacy/cybersecurity in...

AI creates significant ethical, legal and distributional risks.

Review of policy documents, academic and policy literature, and documented examples of AI deployment across multiple sectors highlighting harms (bias, privacy breaches, liability gaps, unequal benefits).

high negative AI Governance and Data Privacy: Comparative Analysis of U.S.... ethical risks, legal gaps, and distributional outcomes (inequality)

Except for the EU, jurisdictions surveyed generally lack AI-specific energy-disclosure requirements.

Comparative analysis across eleven jurisdictions identifying presence/absence of AI-specific energy disclosure rules; EU singled out as having such requirements.

high negative The Global Landscape of Environmental AI Regulation: From th... existence of AI-specific energy disclosure rules (binary presence/absence by jur...

Regulatory regimes in the surveyed jurisdictions focus on training emissions more than on inference-phase energy consumption.

Regulatory mapping and lifecycle-phase analysis showing which phases (training vs inference) are covered by existing rules in the eleven jurisdictions.

high negative The Global Landscape of Environmental AI Regulation: From th... regulated lifecycle phase (training coverage vs inference coverage)

Current environmental governance across the eleven jurisdictions mapped in the paper is predominantly facility-level (data-center focused) rather than model-level.

Regulatory mapping: comparative legal/policy analysis across eleven jurisdictions identifying locus of existing rules (facility vs model).

high negative The Global Landscape of Environmental AI Regulation: From th... regulatory scope (proportion of jurisdictions with facility-level vs model-level...

Reliance on imperfect data and model assumptions can produce biased or misleading forecasts; careful validation, transparency about assumptions, and governance are necessary.

Risks & governance discussion in the paper raising this limitation and recommending practices (qualitative argumentation).

high negative AI-Based Predictive Skill Gap Analysis for Workforce Plannin... risk of biased or misleading forecasts arising from data/model limitations (qual...

Practical adoption challenges in African settings are substantial: limited digital infrastructure, sparse local computing capacity, weak regulatory frameworks for synthetic data use, and clinician skepticism about model validity.

Implementation and governance analyses, policy reports, and qualitative studies summarized in the review document infrastructural and regulatory barriers as well as clinician attitudes; evidence is interdisciplinary and largely descriptive, with varied geographic coverage and few large-scale empirical deployment studies.

high negative On the use of synthetic data for healthcare AI in Africa: Te... infrastructure availability (digital records, compute), regulatory maturity indi...

Fidelity gaps in synthetic data (missing rare events, distributional shifts, artefacts) create risks of misclassification and biased outcomes when models are deployed in real-world African clinical settings.

Synthesis of machine-learning evaluations and clinical validation studies identified in the literature review that document instances of missing rare events, distributional mismatch, and data artefacts in synthetic datasets; these studies link such fidelity gaps to degraded performance and biased predictions in downstream models. The review highlights case examples but does not provide pooled quantitative estimates.

high negative On the use of synthetic data for healthcare AI in Africa: Te... misclassification rates, biased prediction errors, distributional shifts between...

Significant financial and implementation barriers (infrastructure, staff, validation) risk worsening access inequities between well-resourced and low-resource providers.

Economic analyses, stakeholder surveys, and deployment trend reports synthesized in the paper showing higher upfront costs and validation burdens for adopters; no randomized trials.

high negative Framework for Government Policy on Agentic and Generative AI... access / equity disparities / adoption gap by resource level

Regulatory fragmentation and lack of harmonized standards increase compliance complexity for healthcare AI deployments.

Policy analyses, regulatory reviews, and industry reports synthesized in the paper describing divergent national/regional regulatory approaches and their operational consequences.

high negative Framework for Government Policy on Agentic and Generative AI... regulatory compliance complexity / administrative burden

« Prev 1 2 3 … 7 8 9 … 80 81 Next »