Evidence (4114 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
AI is driving states to reconsider interdependence not as the source of peace, but as a battlefield of power.
Normative and interpretive conclusion drawn from the paper's analysis of AI's geopolitical implications; no empirical data or sample reported in the abstract.
AI is redefining foreign policy in a multipolar world by making the line between economic cooperation and strategic vulnerability indistinct.
Theoretical claim and synthesis in the paper's thesis; no empirical evidence or sample size provided in the abstract.
AI is reshaping economic relationships between countries that were previously sources of mutually beneficial relations into instruments of coercion.
The paper presents a theoretical analysis drawing on international political economy and foreign policy theory; no empirical measurements reported in the abstract.
AI enhances the weaponization of economic interdependence by enabling states to monitor, predict, manipulate, and disrupt transnational networks with unprecedented accuracy.
The paper advances a theoretical argument and synthesis of international political economy and foreign policy literatures; no empirical sample or quantitative data reported in the abstract.
Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations.
Paper asserts that existing/standard benchmarks do not adequately isolate parsing and computation-orchestration abilities, motivating the new benchmark.
Current session-based context handling (sessions ending, context windows filling, memory APIs returning flat facts) produces intelligence that is powerful per session but amnesiac across time.
Descriptive diagnostic argument in the paper; no empirical measurement reported in this text.
The US restricts mobility and knowledge flows and challenges regulatory efforts to protect its advantage.
Descriptive claim about US strategy (policy observation stated in the paper's framing; not quantified in the excerpt).
The AI race amplifies security risks and international tensions.
Introductory/interpretive claim motivating the study (no specific empirical quantification provided in the excerpt).
The US and China form two poles around which global AI research increasingly revolves (i.e., global AI research is polarizing around these two countries).
Longitudinal network analysis of international collaboration and citation patterns derived from publication data compared to random realizations.
The US and China have long diverged in both cross-country collaboration and citation links, forming two poles around which global AI research increasingly revolves.
Large-scale data of scientific publications spanning three decades; analysis comparing cross-country collaboration and citation links to their random realizations (null models).
Under logit demand and symmetric rivals, the QoS gap is strictly decreasing in API price and rival entry elasticity.
Comparative statics derived from the analytical model (logit demand, symmetric rivals).
Traditional machine learning approaches, including the baseline methodology proposed in previous studies, typically optimize global predictive accuracy and therefore fail to capture business-critical outcomes, especially the identification of high-risk clients.
Conceptual critique and literature/contextual claim in the paper; contrasted with the study's business-aware methods (no direct external benchmarking numbers provided in the abstract).
Classifying customers without a prior history at a given company is particularly challenging due to the absence of historical behavior, extreme class imbalance, heavy-tailed loss distributions, and strict operational constraints.
Argumentation / problem statement in the paper (no empirical test reported); descriptive characterization of the insurance cold-start classification problem.
The pharmaceutical R&D process is persistently challenged by high financial costs, protracted timelines, and remarkably low success rates.
Background statement in the review synthesizing prior literature and field knowledge; no original empirical data or sample sizes reported in the provided text.
In the geographical network, both technological diversity and technological proximity inhibit main path formation, implying macro-regional evolution requires specialized focus and complementary knowledge.
ERGM results for the geographical diffusion layer showing negative (inhibitory) associations for diversity and proximity variables; interpreted in regional evolution context.
AI adoption is reinforcing existing structural disparities within the BRICS bloc, creating a two‑tier productivity hierarchy (China & India vs. Brazil, Russia & South Africa).
Observed divergence in TFP trajectories and differing links between AI indicators and TC/EC across the five BRICS economies; comparative analysis shows stronger frontier-shifting effects in China and India and weaker or negative effects in the other three economies.
Brazil, Russia, and South Africa experience stagnation or decline in both efficiency and technological advancement over 2005–2023.
Malmquist TFP decomposition (EC and TC) for each BRICS economy showing flat or negative trends in EC and TC for Brazil, Russia, and South Africa during 2005–2023.
Despite rapid progress, a key problem remains: none of these systems can build complex 3D assemblies with moving parts. For example, no existing system can build a piston, a pendulum, or even a pair of scissors.
Negative capability claim based on the authors' survey of prior work (asserted limitation); no systematic benchmark or exhaustive evaluation numbers provided in the excerpt.
While achieving financial autonomy, firms are also getting exposed to new constraints by shifting their reliance on third-party software, technological infrastructures and opaque algorithms (Gaviyau & Godi, 2025; Suhrab et al., 2026).
Stated with citations to Gaviyau & Godi (2025) and Suhrab et al. (2026); presented as an observed/paraphrased risk or unintended consequence in the paper. No empirical sample details in the excerpt.
SMEs are suffering from various financial constraints, mostly relying heavily on traditional financial institutions for their survival (Kadzima et al., 2025).
Statement supported by citation to Kadzima et al. (2025); presented as a literature-supported empirical generalization in the paper's background/introduction. No sample size or empirical details given in the excerpt.
Aligning the generative policy with nuanced user preference signals is a challenge for generative recommendation.
Paper lists this as one of three scaling challenges motivating the proposed methods (problem statement about preference alignment).
Encoding long user behavior sequences with multi-token item representations based on semantic IDs is prohibitively costly (a scaling challenge).
Paper lists this as one of three scaling challenges for deploying GR at industrial scale (problem statement about computational/cost burden).
Within a single request, identical model inputs may produce inconsistent outputs due to the pagination request mechanism (a challenge for GR/NTP recommendation at industrial scale).
Paper lists this as one of three scaling challenges for generative retrieval in large-scale industrial systems (problem statement).
Early iterations suffered severe execution decay.
Reported observation from the longitudinal study describing early-phase performance problems (qualitative; no quantitative metric in the excerpt).
Execution-based environments suffer from adversarial 'Test Evasion' by unconstrained agents.
Stated assertion in the paper's motivation/abstract; presented as a limitation of execution-based evaluation (no empirical sample size or experiment details provided in the excerpt).
Current paradigms, such as Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF), frequently induce model sycophancy.
Stated assertion in the paper's motivation/abstract; presented as a limitation of existing alignment paradigms (no empirical sample size or experiment details provided in the excerpt).
Environmental demands place an upper bound on the degree of heterogeneity required in a distributed production system.
Theoretical claim derived from the Distributed Production System framework and discussed in the paper; supported by conceptual argument and model constraints rather than empirical data; no sample size reported.
Lower survival rates among BDA adopters are driven by greater uncertainty in sales.
Paper states greater uncertainty in sales is an interrelated factor explaining lower survival for BDA adopters, based on empirical analysis of German start-ups.
Lower survival rates among BDA adopters are driven by higher operating costs.
Paper reports that higher operating costs are an interrelated factor explaining lower survival among BDA adopters, based on the same empirical sample of German start-ups.
Start-ups using BDA face lower survival rates.
Empirical comparison of BDA adopters versus non-adopters in a large sample of German start-ups (survival analysis implied by reported outcome).
Digital–intelligent integration generates positive spatial spillovers, reducing carbon intensity in neighboring provinces.
Spatial Durbin model results reported on the 30-province panel indicating significant negative effects on neighboring provinces' carbon intensity (spatial spillover effects).
Industrial structure upgrading and green technology innovation were identified as mediating pathways through which digital–intelligent integration reduces carbon intensity.
Mediation analysis reported in the paper showing these two mechanisms mediate the effect (mediation models applied to the provincial panel).
The negative association between digital–intelligent integration and carbon intensity is robust to endogeneity concerns and alternative model specifications.
Robustness checks and endogeneity treatments (as reported): alternative specifications and methods addressing endogeneity (details not provided in the summary).
Digital–intelligent integration is significantly associated with lower carbon intensity.
Fixed-effects regression estimates reported on the 2014–2023 provincial panel (30 provinces); significance described in the paper; models control for covariates.
Major methodological risks include overfitting, regime instability, interpretability deficits, and institutional dependence.
Critical evaluation within the review identifying key methodological risks across the surveyed streams (conceptual assessment; no empirical estimate provided).
The literature remains fragmented across at least three partially connected domains: financial time-series forecasting, portfolio construction, and firm-level sustainability analysis.
Author's characterization of the existing literature in the review (synthesis of published work; no single empirical sample; survey-based statement).
Traditional frameworks for competition law, which emphasize short-term price impacts and inflexible market definitions, are inadequate to address exclusionary effects in AI-driven markets.
Conceptual/legal analysis combined with the paper's empirical findings (panel-data evidence of non-price exclusionary dynamics) arguing the mismatch between observed AI-driven exclusion and conventional competition law focus.
Route dependency produced by dynamic learning processes disproportionately disadvantages late entrants.
Empirical and theoretical analysis in the paper: dynamic learning / cumulative learning modeled in the conceptual framework and empirically tested using panel data on AI-intensive markets showing persistent advantages for early entrants.
These effects are made worse by data concentration.
Moderator/interaction analysis reported in the paper showing that market-level data concentration amplifies the association between algorithmic advantage and both reduced entry and greater concentration in the panel-data analysis.
Elevated levels of algorithmic advantage are consistently linked to diminished entry rates.
Empirical analysis using panel data: regressions on an unbalanced panel of markets with high AI intensity, controlling for firm size, capital intensity, R&D expenditure, and industry growth (as described in the paper).
The expansion of AI in digital health has simultaneously introduced complex governance, privacy, and financial sustainability challenges.
Argument and synthesis across regulatory policy, ethics, and healthcare economics literatures presented in the review (literature review / conceptual synthesis).
Absent a validation sample, researchers cannot assess possible errors in LLM outputs, and consequently seemingly innocuous choices (which model, which prompt) can produce dramatically different parameter estimates.
Warning/claim in the abstract that without validation samples researchers lack a way to assess LLM output errors and that modeling/prompting choices can materially affect parameter estimates; no empirical example or quantified effect reported in the excerpt.
For prediction problems—forecasting outcomes from text—valid conclusions require 'no training leakage' between the LLM's training data and the researcher's sample, which can be enforced through careful model choice and research design.
Stated methodological requirement in the abstract arguing that prediction validity depends on preventing overlap/leakage between model training data and the evaluation sample; no empirical test or sample size given in the excerpt.
Multi-agent ecosystems also generate novel market failures, including miscoordination, conflict, and collusion among autonomous agents.
Conceptual analysis identifying plausible failure modes; no empirical incidents or statistical evidence reported.
Existing copyright frameworks are ill-equipped to govern AI agent-mediated interactions that occur at scale, speed, and with limited human oversight.
Normative/legal analysis and conceptual reasoning in the paper; no empirical tests or datasets provided.
Health disparities research is severely underrepresented at just 5.7% of AI-funded work.
Semantic/topic classification identifying projects addressing health disparities among AI-labelled projects, yielding a reported share of 5.7%.
A critical research-to-deployment gap exists: 79% of AI projects remain in research/development stages while only 14.7% engage in clinical deployment or implementation.
Stage classification of AI-labelled projects in the dataset, reporting 79% classified as research/development and 14.7% as clinical deployment/implementation.
The technological rivalry between the United States and China has led to exclusionary rulemaking on a global scale.
Claim presented in the chapter as a consequence of geopolitical rivalry; characterized as an interpretive conclusion from comparative legal/policy analysis rather than supported here by quantified evidence.
The effective altruism community's near-exclusive focus on existential risk from AI has created a dangerous blind spot around the political economy of who controls AI and who benefits from it.
Critical evaluation of the effective altruism movement's priorities as presented in the paper; argued via literature/agenda analysis rather than empirical survey data in the abstract.
AI infrastructure owners may come to command more wealth and capability than most governments, undermining the future viability of the nation-state.
Predictive economic and political analysis / modeling in the paper; claim presented as a projection without empirically quantified comparisons or sample size in the abstract.