Evidence (4114 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
Data security, privacy risks, unequal gains, and regulatory shortfalls can undermine the benefits of AI/robotics adoption.
Policy and risk analyses from secondary literature, case studies, and institutional reports synthesized in the paper; examples cited but no original incident-level dataset or incidence rates provided.
Transition frictions and skills mismatches are important barriers to workers moving into newly created AI‑related roles.
Qualitative review of workforce and skills literature, case studies, and sector reports; evidence comes from secondary sources with varied methodologies; the paper does not report pooled quantitative estimates.
Limited access to capital, data, digital infrastructure, skills, and insecure land tenure reduce adoption rates for advanced innovations among smallholders.
Multiple empirical studies and program evaluations synthesized in the review documenting adoption barriers; policy review identifying structural constraints across regions.
Key failure modes for AI in drug R&D include overfitting, poor generalizability, dataset bias, insufficient external validation, and misalignment with evolving regulatory expectations.
Synthesis of literature and case reports in the narrative review describing observed failures and risks across projects (qualitative evidence).
Absent rigorous controls (validation, applicability-domain reporting, attention to dataset bias), AI models risk overfitting, producing inequitable outcomes and regulatory friction that can undermine economic benefits.
Theoretical arguments plus case reports and literature cited in the review documenting instances and mechanisms of overfitting, dataset bias, and regulatory challenges; narrative summary rather than systematic quantification.
Generative AI is susceptible to social and representational biases and to factual errors or hallucinations; it lacks tacit, contextual domain expertise.
Documented examples in the literature of biased outputs and hallucinations; controlled evaluations and audits of model outputs; qualitative reports highlighting lack of tacit knowledge in domain-specific tasks.
The quality of AI-generated outputs is highly variable; models frequently produce mediocre but plausible-sounding content that requires human filtering.
Multiple user studies and qualitative reports documenting variability in output quality and the need for human curation; outcome measures include error rates, user-rated quality, and time spent vetting.
Algorithmic bias, unequal digital financial literacy, caregiving time constraints, and limited access to personalized solutions can sustain or reproduce gender investment gaps if not addressed.
Synthesis of literature on barriers to financial inclusion and AI fairness concerns, plus platform report observations (review of empirical and conceptual studies; not a single empirical test).
Women statistically exhibit greater risk aversion in some settings compared with men.
Summary of empirical survey and experimental studies on gender differences in risk attitudes discussed in the review (multiple cross‑sectional and lab/field experiments referenced).
Data privacy and cross-border compliance issues arise from using cloud and SECaaS, complicating legal compliance for firms.
Regulatory analyses and compliance reports; documented examples in case studies and industry guidance on cross-border data flows.
The cloud shared responsibility model creates potential ambiguities in liability between providers and customers.
Regulatory guidance, legal analyses, and documented post-incident case studies showing confusion over responsibilities.
China manages the openness–security trade-off through a centralized, developmentalist, techno‑sovereignty approach that privileges coordinated state direction and control.
Qualitative content analysis of national‑level policy texts: 18 Chinese policy documents coded across four analytical dimensions (coordination objectives, institutional actors, governance mechanisms, stakeholder legitimacy).
There is substantial uncertainty in economic forecasts due to possible scale-up failures, regulatory constraints, feedstock price volatility, and path‑dependent lock‑in effects.
Synthesis of technical failure modes, regulatory uncertainty, and sensitivity analyses reported in TEA/LCA literature and economic modeling sections of the review.
Regulatory and biosafety concerns (including environmental release risks and dual‑use issues) increase fixed costs and create entry barriers that shape industry structure and diffusion.
Policy and governance literature reviewed alongside technical case studies; citations of regulatory requirements, biosafety frameworks, and examples of compliance costs affecting project viability.
Engineering and economic challenges—scale‑up hurdles, process robustness, feedstock cost, and downstream purification—limit industrial deployment of many bio-based processes.
Case study TEA/LCA summaries and process reports in the review highlighting scale-up failures or increased costs at larger scales, purification complexity for low‑concentration products, and sensitivity to feedstock prices.
Technical biological limitations—metabolic burden, pathway crosstalk, byproduct formation, and genetic instability—remain major constraints on strain performance and scalability.
Multiple experimental reports and method papers cited in the review documenting decreased growth/productivity due to engineered pathway burden, unintended interactions between pathways, accumulation of byproducts, and genetic mutations during production runs.
The described pipeline is cross-sectional as presented and should be extended to dynamic models (temporal embeddings, change-point detection) for trend or causal analyses.
Method description in summary indicates cross-sectional pipeline; recommendation to extend for temporal/dynamic modeling when analyzing trends or causal effects.
LLMs and corpora may reflect disciplinary, geographic, or language biases; analyses should adjust or stratify accordingly.
Caveat explicitly stated in summary noting potential biases in LLMs and corpora; recommendation to adjust/stratify analyses.
Cluster reliability should be validated (e.g., bootstrap, perturbations) and automatic labels complemented with expert human validation for critical analyses.
Caveat and recommended validation steps provided in summary; suggests bootstrap/perturbation and manual validation as best practices. No empirical stability metrics provided in summary.
Results are sensitive to model and prompt choice; researchers should perform robustness checks across LLMs, soft prompts, and embedding models.
Caveat explicitly stated in the paper summary noting model and prompt sensitivity; recommended validation steps include robustness checks across models and prompts.
Higher complaint volume is significantly associated with near-term stock price declines.
Fixed-effects panel path models estimated on monthly data for 261 financial firms (2018–2023) report statistically significant negative associations between firm–month complaint volume and subsequent abnormal returns.
Consumer complaints—measured by monthly volume, topic composition, and VADER sentiment of complaint narratives—contain behavioral signals that predict short-term abnormal stock returns in U.S. financial firms.
CFPB complaint records matched to 261 publicly traded U.S. financial firms (monthly observations, 2018–2023); analyses use fixed-effects panel path models to link firm–month complaint features (volume, LDA topic prevalences, aggregated VADER sentiment) to firm-level abnormal returns; complementary machine-learning models evaluate out-of-sample predictive performance.
Federated infrastructures introduce adversarial risks (model/data poisoning, inference attacks on updates) that require robust aggregation, anomaly detection, and other defenses.
Threat modeling and taxonomy of adversarial/privacy threats with mapped mitigations (robust aggregation, anomaly detection, DP). Evidence is conceptual and based on standard threat frameworks; no empirical attack/defense experiments reported at scale.
Delayed and sparse feedback (clicks/conversions) in advertising complicates credit assignment and timely model updates, degrading learning unless specific methods for delayed/sparse signals are used.
Analytical discussion of learning dynamics with delayed/sparse labels; conceptual solutions suggested (credit assignment methods). No large-scale empirical evaluation presented.
Non-IID and heterogeneous data distributions across devices and publishers impair convergence and degrade personalization unless addressed with algorithmic adaptations.
Analytical modeling of convergence under non-IID conditions; threat/robustness discussion; prototype/simulation illustrations. This claim is supported by established literature and the paper's analytic treatment.
VIS inherits the limitations of input–output assumptions (fixed coefficients, no price feedbacks); AI-driven structural change may violate those assumptions, so dynamic extensions or calibration are needed.
Paper explicitly cautions about input–output model limitations and the need for dynamic extensions/calibration under structural/technological change.
ALE is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks.
Author-provided counts describing the benchmark taxonomy and task pool.
ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy).
Design specification described in the paper referencing O*NET / SOC 2018.
Agentic AI is best characterized as a continuum of autonomy and delegated authority, distinct from purely informational outputs and including systems capable of independently generating insured events through external actions.
Conceptual taxonomy and definitional argument presented in the paper distinguishing informational models from agentic systems with delegated authority; theoretical reasoning and classification.
The results define three operating regimes.
Summary claim in results/conclusions indicating categorization of outcomes into three regimes.
We show that ρ ≥ 1 is the no-excess-crowding parity condition and connect Δ to an adoption game with exposure-dependent redundancy costs.
Theoretical result derived in the paper linking the human-relative diversity ratio ρ to a parity condition and relating the excess-crowding coefficient Δ to an adoption-game model with exposure-dependent redundancy costs.
We position DAO-governed decentralized physical infrastructure networks (DePIN) within a vertically integrated stack that links energy and sensing to connectivity, storage/compute, models, and robots.
Architectural/framework description in the paper that maps DePIN elements into a vertically integrated stack; conceptual/mapping method without empirical measurement.
Weight-based memory generalizes by applying abstract rules to inputs never seen before.
Conceptual claim grounded in the paper's theoretical distinction between weight-based learning and retrieval; references Complementary Learning Systems theory; no empirical sample in abstract.
Retrieval generalizes by similarity to stored cases.
Conceptual claim stated in paper (distinction between retrieval-based and weight-based generalization); supported by theoretical characterization, not empirical data in abstract.
Generally speaking, these systems place an agent in a feedback loop in which it can write code, compile that code to an assembly of CAD model(s), visualize the model, and then iteratively refine its code based on visual and other feedback.
Descriptive claim about the general architecture of Agent-Aided Design systems as asserted by the authors (methodological description), not an empirical test; no quantitative evaluation provided here.
Predictive outputs are translated into allocation rules, with emphasis on mean–variance optimization, shrinkage-based risk estimation, risk parity, hierarchical allocation, and reinforcement-learning-based dynamic rebalancing.
Surveyed literature on portfolio construction and allocation techniques described in the review (methodological overview; no single empirical dataset or sample size).
The economic model for IASCA follows the FDA's PDUFA precedent, with progressive certification fees representing 0.1-1% of model training costs.
Proposal specifies that IASCA's funding would mirror the FDA PDUFA model and states a fee range of 0.1–1% of model training costs; this is an asserted financing mechanism, not empirically validated in the excerpt.
IASCA is modelled after existing international and national regulatory bodies such as the IAEA, FAA, and FDA.
Proposal explicitly states IASCA is modelled after the IAEA, FAA, and FDA; this is an analogy/organizational design claim rather than an empirical finding.
The framework is calibrated with O*NET task data, a survey of 3,778 domain experts, and GPT-4o-derived task decompositions, and implemented in computer vision.
Calibration and empirical implementation using O*NET, a domain expert survey (n=3,778), and GPT-4o task decompositions; applied to computer vision tasks.
We introduce an entropy-based measure of task complexity that maps model accuracy into a labor substitution ratio, quantifying human labor displacement at each accuracy level.
New metric proposed in the paper (entropy-based task complexity) and mapping procedure from accuracy to substitution ratio; implemented in the framework.
Costinot and Werning (2023) develop a sufficient-statistic approach and find optimal technology taxes of 1–3.7% on robots.
Citation reported in the paper summarizing Costinot and Werning (2023)'s quantitative sufficient-statistic estimate.
Guerreiro et al. (2022) characterize optimal Mirrleesian tax system with automation and find that robot taxes should be transitional—high when incumbent workers cannot retrain, converging to zero as new cohorts adjust skill investments.
Citation reported in the paper summarizing Guerreiro et al. (2022)'s theoretical result on transitional robot taxes.
If labor becomes economically redundant, the policy focus shifts from steering innovation to redesigning public finance and redistribution (e.g., new tax instruments, redistribution mechanisms).
Theoretical scenario analysis in the paper with references to related works (Korinek and Juelfs 2024; Korinek and Lockwood 2026).
The paper treats data as a new type of production factor and endogenizes it within the production function.
Theoretical/methodological: the paper constructs a macro-level theoretical model that explicitly includes data as an endogenous input in the production function (no empirical/sample data).
The paper's formalism shows that prompt/system messages shape distributions over possible execution paths (indirect control) but do not evaluate actual partial paths at runtime.
Formal mapping in the paper that treats prompts as shaping prior over paths; conceptual argument and illustrative examples.
Retrieval augmentation and scientist persona prompting yield only marginal gains.
Ablation/augmentation experiments comparing baseline LLM outputs to versions augmented with retrieval or scientist-persona prompting, showing only small improvements in judged quality.
6,749 scientists returned 25,139 sets of ratings on novelty, empirical feasibility, probability of being true, and favorability of adoption.
Reported study participation and rating counts: 6,749 respondents providing 25,139 rating sets on specified dimensions.
We invited authors of 121,640 recent preprints across biology, medicine, chemistry, and the social sciences to judge follow-up ideas that large language models (LLMs) generated from the context and puzzles of their own papers.
Study recruitment described in paper: invitations sent to authors of 121,640 recent preprints across multiple fields (biology, medicine, chemistry, social sciences).
The model frames near-complete AGI substitution not merely as an efficiency transition but as a boundary case for value production under a strict political-economy theory of value.
Interpretive conclusion drawn from the theoretical model and its limiting-case implications (conceptual/theoretical claim; no empirical sample).
Under the paper's core value-theoretic assumption, AGI transfers value but does not itself create new value.
Explicit model assumption / value-theoretic premise stated in the paper (theoretical assumption, no empirical backing).