Evidence (4114 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Innovation Remove filter

Data security, privacy risks, unequal gains, and regulatory shortfalls can undermine the benefits of AI/robotics adoption.

Policy and risk analyses from secondary literature, case studies, and institutional reports synthesized in the paper; examples cited but no original incident-level dataset or incidence rates provided.

high negative AI and Robotics Redefine Output and Growth: The New Producti... data/privacy risk incidence, inequality measures, regulatory adequacy (qualitati...

Transition frictions and skills mismatches are important barriers to workers moving into newly created AI‑related roles.

Qualitative review of workforce and skills literature, case studies, and sector reports; evidence comes from secondary sources with varied methodologies; the paper does not report pooled quantitative estimates.

high negative AI and Robotics Redefine Output and Growth: The New Producti... transition costs, skills mismatch incidence, retraining needs (labor market fric...

Limited access to capital, data, digital infrastructure, skills, and insecure land tenure reduce adoption rates for advanced innovations among smallholders.

Multiple empirical studies and program evaluations synthesized in the review documenting adoption barriers; policy review identifying structural constraints across regions.

high negative MODERN APPROACHES TO SUSTAINABLE AGRICULTURAL TRANSFORMATION adoption rates of AI/IoT/precision tools, uptake of new practices

Key failure modes for AI in drug R&D include overfitting, poor generalizability, dataset bias, insufficient external validation, and misalignment with evolving regulatory expectations.

Synthesis of literature and case reports in the narrative review describing observed failures and risks across projects (qualitative evidence).

high negative Artificial Intelligence in Drug Discovery and Development: R... failure incidence of AI projects (model performance collapse, regulatory rejecti...

Absent rigorous controls (validation, applicability-domain reporting, attention to dataset bias), AI models risk overfitting, producing inequitable outcomes and regulatory friction that can undermine economic benefits.

Theoretical arguments plus case reports and literature cited in the review documenting instances and mechanisms of overfitting, dataset bias, and regulatory challenges; narrative summary rather than systematic quantification.

high negative Artificial Intelligence in Drug Discovery and Development: R... model generalizability (out-of-sample performance), subgroup performance dispari...

Generative AI is susceptible to social and representational biases and to factual errors or hallucinations; it lacks tacit, contextual domain expertise.

Documented examples in the literature of biased outputs and hallucinations; controlled evaluations and audits of model outputs; qualitative reports highlighting lack of tacit knowledge in domain-specific tasks.

high negative ChatGPT as an Innovative Tool for Idea Generation and Proble... incidence of biased content; factual error/hallucination rate; performance on do...

The quality of AI-generated outputs is highly variable; models frequently produce mediocre but plausible-sounding content that requires human filtering.

Multiple user studies and qualitative reports documenting variability in output quality and the need for human curation; outcome measures include error rates, user-rated quality, and time spent vetting.

high negative ChatGPT as an Innovative Tool for Idea Generation and Proble... output quality distributions; user-perceived quality; time/effort for human filt...

Algorithmic bias, unequal digital financial literacy, caregiving time constraints, and limited access to personalized solutions can sustain or reproduce gender investment gaps if not addressed.

Synthesis of literature on barriers to financial inclusion and AI fairness concerns, plus platform report observations (review of empirical and conceptual studies; not a single empirical test).

high negative Women's Investment Behaviour and Technology: Exploring the I... gender investment gap, differential product offerings, access metrics

Women statistically exhibit greater risk aversion in some settings compared with men.

Summary of empirical survey and experimental studies on gender differences in risk attitudes discussed in the review (multiple cross‑sectional and lab/field experiments referenced).

high negative Women's Investment Behaviour and Technology: Exploring the I... measured risk aversion / willingness to take financial risk

Data privacy and cross-border compliance issues arise from using cloud and SECaaS, complicating legal compliance for firms.

Regulatory analyses and compliance reports; documented examples in case studies and industry guidance on cross-border data flows.

high negative Security- as- a- service: enhancing cloud security through m... compliance incident rates / regulatory risk exposure

The cloud shared responsibility model creates potential ambiguities in liability between providers and customers.

Regulatory guidance, legal analyses, and documented post-incident case studies showing confusion over responsibilities.

high negative Security- as- a- service: enhancing cloud security through m... clarity/ambiguity of security and liability responsibilities

China manages the openness–security trade-off through a centralized, developmentalist, techno‑sovereignty approach that privileges coordinated state direction and control.

Qualitative content analysis of national‑level policy texts: 18 Chinese policy documents coded across four analytical dimensions (coordination objectives, institutional actors, governance mechanisms, stakeholder legitimacy).

high negative Balancing openness and security in scientific data governanc... governance logic / institutional coordination type (centralized, state‑led)

There is substantial uncertainty in economic forecasts due to possible scale-up failures, regulatory constraints, feedstock price volatility, and path‑dependent lock‑in effects.

Synthesis of technical failure modes, regulatory uncertainty, and sensitivity analyses reported in TEA/LCA literature and economic modeling sections of the review.

high negative Harnessing Microbial Factories: Biotechnology at the Edge of... forecast variance in cost trajectories, probability of commercial success, and s...

Regulatory and biosafety concerns (including environmental release risks and dual‑use issues) increase fixed costs and create entry barriers that shape industry structure and diffusion.

Policy and governance literature reviewed alongside technical case studies; citations of regulatory requirements, biosafety frameworks, and examples of compliance costs affecting project viability.

high negative Harnessing Microbial Factories: Biotechnology at the Edge of... regulatory compliance costs, time-to-market, number of approved facilities/proce...

Engineering and economic challenges—scale‑up hurdles, process robustness, feedstock cost, and downstream purification—limit industrial deployment of many bio-based processes.

Case study TEA/LCA summaries and process reports in the review highlighting scale-up failures or increased costs at larger scales, purification complexity for low‑concentration products, and sensitivity to feedstock prices.

high negative Harnessing Microbial Factories: Biotechnology at the Edge of... capital and operating costs, purification yield and cost, process robustness met...

Technical biological limitations—metabolic burden, pathway crosstalk, byproduct formation, and genetic instability—remain major constraints on strain performance and scalability.

Multiple experimental reports and method papers cited in the review documenting decreased growth/productivity due to engineered pathway burden, unintended interactions between pathways, accumulation of byproducts, and genetic mutations during production runs.

high negative Harnessing Microbial Factories: Biotechnology at the Edge of... strain growth rate, productivity (g/L/h), byproduct concentrations, genetic muta...

The described pipeline is cross-sectional as presented and should be extended to dynamic models (temporal embeddings, change-point detection) for trend or causal analyses.

Method description in summary indicates cross-sectional pipeline; recommendation to extend for temporal/dynamic modeling when analyzing trends or causal effects.

high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... temporal modeling capabilities (ability to analyze trends/change over time)

LLMs and corpora may reflect disciplinary, geographic, or language biases; analyses should adjust or stratify accordingly.

Caveat explicitly stated in summary noting potential biases in LLMs and corpora; recommendation to adjust/stratify analyses.

high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... presence and impact of disciplinary/geographic/language biases in topic maps and...

Cluster reliability should be validated (e.g., bootstrap, perturbations) and automatic labels complemented with expert human validation for critical analyses.

Caveat and recommended validation steps provided in summary; suggests bootstrap/perturbation and manual validation as best practices. No empirical stability metrics provided in summary.

high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... cluster stability/reliability and accuracy of automatically generated labels

Results are sensitive to model and prompt choice; researchers should perform robustness checks across LLMs, soft prompts, and embedding models.

Caveat explicitly stated in the paper summary noting model and prompt sensitivity; recommended validation steps include robustness checks across models and prompts.

high negative Soft-Prompted Semantic Normalization for Unsupervised Analys... sensitivity of clustering/labeling results to LLM, prompt design, and embedding ...

Higher complaint volume is significantly associated with near-term stock price declines.

Fixed-effects panel path models estimated on monthly data for 261 financial firms (2018–2023) report statistically significant negative associations between firm–month complaint volume and subsequent abnormal returns.

high negative More than words: valuation of words for stock price by using... near-term abnormal stock returns

Consumer complaints—measured by monthly volume, topic composition, and VADER sentiment of complaint narratives—contain behavioral signals that predict short-term abnormal stock returns in U.S. financial firms.

CFPB complaint records matched to 261 publicly traded U.S. financial firms (monthly observations, 2018–2023); analyses use fixed-effects panel path models to link firm–month complaint features (volume, LDA topic prevalences, aggregated VADER sentiment) to firm-level abnormal returns; complementary machine-learning models evaluate out-of-sample predictive performance.

high negative More than words: valuation of words for stock price by using... short-term firm-level abnormal stock returns

Federated infrastructures introduce adversarial risks (model/data poisoning, inference attacks on updates) that require robust aggregation, anomaly detection, and other defenses.

Threat modeling and taxonomy of adversarial/privacy threats with mapped mitigations (robust aggregation, anomaly detection, DP). Evidence is conceptual and based on standard threat frameworks; no empirical attack/defense experiments reported at scale.

high negative Privacy-Aware AI Advertising Systems: A Federated Learning F... vulnerability to poisoning/inference (attack success rate), effectiveness of def...

Delayed and sparse feedback (clicks/conversions) in advertising complicates credit assignment and timely model updates, degrading learning unless specific methods for delayed/sparse signals are used.

Analytical discussion of learning dynamics with delayed/sparse labels; conceptual solutions suggested (credit assignment methods). No large-scale empirical evaluation presented.

high negative Privacy-Aware AI Advertising Systems: A Federated Learning F... learning efficacy under delayed/sparse feedback (convergence, time-to-adapt), at...

Non-IID and heterogeneous data distributions across devices and publishers impair convergence and degrade personalization unless addressed with algorithmic adaptations.

Analytical modeling of convergence under non-IID conditions; threat/robustness discussion; prototype/simulation illustrations. This claim is supported by established literature and the paper's analytic treatment.

high negative Privacy-Aware AI Advertising Systems: A Federated Learning F... convergence behavior (rate, stability), personalization performance (accuracy on...

VIS inherits the limitations of input–output assumptions (fixed coefficients, no price feedbacks); AI-driven structural change may violate those assumptions, so dynamic extensions or calibration are needed.

Paper explicitly cautions about input–output model limitations and the need for dynamic extensions/calibration under structural/technological change.

high negative Measuring labor productivity dynamics in U.S. industrial and... validity/applicability of VIS estimates under structural/AI-driven change

ALE is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks.

Author-provided counts describing the benchmark taxonomy and task pool.

high neutral Agents' Last Exam taxonomy breadth (subfields, clusters, number of tasks)

ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy).

Design specification described in the paper referencing O*NET / SOC 2018.

high neutral Agents' Last Exam scope of industries covered by the benchmark

Agentic AI is best characterized as a continuum of autonomy and delegated authority, distinct from purely informational outputs and including systems capable of independently generating insured events through external actions.

Conceptual taxonomy and definitional argument presented in the paper distinguishing informational models from agentic systems with delegated authority; theoretical reasoning and classification.

high neutral Insurance of Agentic AI characterization of agentic AI along autonomy/delegation continuum

The results define three operating regimes.

Summary claim in results/conclusions indicating categorization of outcomes into three regimes.

high neutral Cross-domain benchmarks reveal when coordinated AI agents im... classification into operating regimes

We show that ρ ≥ 1 is the no-excess-crowding parity condition and connect Δ to an adoption game with exposure-dependent redundancy costs.

Theoretical result derived in the paper linking the human-relative diversity ratio ρ to a parity condition and relating the excess-crowding coefficient Δ to an adoption-game model with exposure-dependent redundancy costs.

high neutral Ex Ante Evaluation of AI-Induced Idea Diversity Collapse parity condition for no-excess-crowding (ρ ≥ 1) and economic/game-theoretic rela...

We position DAO-governed decentralized physical infrastructure networks (DePIN) within a vertically integrated stack that links energy and sensing to connectivity, storage/compute, models, and robots.

Architectural/framework description in the paper that maps DePIN elements into a vertically integrated stack; conceptual/mapping method without empirical measurement.

high neutral DAO-enabled decentralized physical AI: A new paradigm for hu... conceptual integration of DePIN components into a vertical infrastructure stack

Weight-based memory generalizes by applying abstract rules to inputs never seen before.

Conceptual claim grounded in the paper's theoretical distinction between weight-based learning and retrieval; references Complementary Learning Systems theory; no empirical sample in abstract.

high neutral Contextual Agentic Memory is a Memo, Not True Memory type of generalization performed by weight-based memory

Retrieval generalizes by similarity to stored cases.

Conceptual claim stated in paper (distinction between retrieval-based and weight-based generalization); supported by theoretical characterization, not empirical data in abstract.

high neutral Contextual Agentic Memory is a Memo, Not True Memory type of generalization performed by retrieval systems

Generally speaking, these systems place an agent in a feedback loop in which it can write code, compile that code to an assembly of CAD model(s), visualize the model, and then iteratively refine its code based on visual and other feedback.

Descriptive claim about the general architecture of Agent-Aided Design systems as asserted by the authors (methodological description), not an empirical test; no quantitative evaluation provided here.

high neutral Agent-Aided Design for Dynamic CAD Models system architecture / iterative design loop (agent writes code, compiles, visual...

Predictive outputs are translated into allocation rules, with emphasis on mean–variance optimization, shrinkage-based risk estimation, risk parity, hierarchical allocation, and reinforcement-learning-based dynamic rebalancing.

Surveyed literature on portfolio construction and allocation techniques described in the review (methodological overview; no single empirical dataset or sample size).

high neutral Artificial Intelligence in Financial Decision-Making methods for converting predictions into portfolio allocation rules

The economic model for IASCA follows the FDA's PDUFA precedent, with progressive certification fees representing 0.1-1% of model training costs.

Proposal specifies that IASCA's funding would mirror the FDA PDUFA model and states a fee range of 0.1–1% of model training costs; this is an asserted financing mechanism, not empirically validated in the excerpt.

high neutral IASCA: The International AI Safety Certification Authority —... progressive certification fees equal to 0.1-1% of model training costs

IASCA is modelled after existing international and national regulatory bodies such as the IAEA, FAA, and FDA.

Proposal explicitly states IASCA is modelled after the IAEA, FAA, and FDA; this is an analogy/organizational design claim rather than an empirical finding.

high neutral IASCA: The International AI Safety Certification Authority —... institutional design modeled on IAEA/FAA/FDA

The framework is calibrated with O*NET task data, a survey of 3,778 domain experts, and GPT-4o-derived task decompositions, and implemented in computer vision.

Calibration and empirical implementation using O*NET, a domain expert survey (n=3,778), and GPT-4o task decompositions; applied to computer vision tasks.

high neutral Economics of Human and AI Collaboration: When is Partial Aut... validity of calibration / empirical grounding of the framework

We introduce an entropy-based measure of task complexity that maps model accuracy into a labor substitution ratio, quantifying human labor displacement at each accuracy level.

New metric proposed in the paper (entropy-based task complexity) and mapping procedure from accuracy to substitution ratio; implemented in the framework.

high neutral Economics of Human and AI Collaboration: When is Partial Aut... labor substitution ratio (human labor displaced per unit accuracy)

Costinot and Werning (2023) develop a sufficient-statistic approach and find optimal technology taxes of 1–3.7% on robots.

Citation reported in the paper summarizing Costinot and Werning (2023)'s quantitative sufficient-statistic estimate.

high neutral NBER WORKING PAPER SERIES optimal robot tax rate

Guerreiro et al. (2022) characterize optimal Mirrleesian tax system with automation and find that robot taxes should be transitional—high when incumbent workers cannot retrain, converging to zero as new cohorts adjust skill investments.

Citation reported in the paper summarizing Guerreiro et al. (2022)'s theoretical result on transitional robot taxes.

high neutral NBER WORKING PAPER SERIES optimal robot tax path over time

If labor becomes economically redundant, the policy focus shifts from steering innovation to redesigning public finance and redistribution (e.g., new tax instruments, redistribution mechanisms).

Theoretical scenario analysis in the paper with references to related works (Korinek and Juelfs 2024; Korinek and Lockwood 2026).

high neutral NBER WORKING PAPER SERIES policy priority shift (steering -> public finance/redistribution)

The paper treats data as a new type of production factor and endogenizes it within the production function.

Theoretical/methodological: the paper constructs a macro-level theoretical model that explicitly includes data as an endogenous input in the production function (no empirical/sample data).

high neutral Study on the impact of big data sharing on individuals’ welf... inclusion of data as a production factor (model specification)

The paper's formalism shows that prompt/system messages shape distributions over possible execution paths (indirect control) but do not evaluate actual partial paths at runtime.

Formal mapping in the paper that treats prompts as shaping prior over paths; conceptual argument and illustrative examples.

high neutral Runtime Governance for AI Agents: Policies on Paths degree of control over execution path (distributional shaping vs. path-specific ...

Retrieval augmentation and scientist persona prompting yield only marginal gains.

Ablation/augmentation experiments comparing baseline LLM outputs to versions augmented with retrieval or scientist-persona prompting, showing only small improvements in judged quality.

high null result Contemporary AI lacks the imagination to diverge or negate i... change in judged quality due to retrieval augmentation or persona prompting

6,749 scientists returned 25,139 sets of ratings on novelty, empirical feasibility, probability of being true, and favorability of adoption.

Reported study participation and rating counts: 6,749 respondents providing 25,139 rating sets on specified dimensions.

high null result Contemporary AI lacks the imagination to diverge or negate i... number of respondents and rating sets

We invited authors of 121,640 recent preprints across biology, medicine, chemistry, and the social sciences to judge follow-up ideas that large language models (LLMs) generated from the context and puzzles of their own papers.

Study recruitment described in paper: invitations sent to authors of 121,640 recent preprints across multiple fields (biology, medicine, chemistry, social sciences).

high null result Contemporary AI lacks the imagination to diverge or negate i... number of invited authors (study recruitment)

The model frames near-complete AGI substitution not merely as an efficiency transition but as a boundary case for value production under a strict political-economy theory of value.

Interpretive conclusion drawn from the theoretical model and its limiting-case implications (conceptual/theoretical claim; no empirical sample).

high null result AGI and the Limits of Value Production characterization of economic transition

Under the paper's core value-theoretic assumption, AGI transfers value but does not itself create new value.

Explicit model assumption / value-theoretic premise stated in the paper (theoretical assumption, no empirical backing).

high null result AGI and the Limits of Value Production value_creation

« Prev 1 2 3 … 14 15 16 … 82 83 Next »