Evidence (11677 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5921 claims
Human-AI Collaboration
5192 claims
Org Design
3497 claims
Innovation
3492 claims
Labor Markets
3231 claims
Skills & Training
2608 claims
Inequality
1842 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 738 | 1617 |
| Governance & Regulation | 671 | 334 | 160 | 99 | 1285 |
| Organizational Efficiency | 626 | 147 | 105 | 70 | 955 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 349 | 109 | 48 | 322 | 838 |
| Output Quality | 391 | 121 | 45 | 40 | 597 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 277 | 145 | 63 | 34 | 526 |
| AI Safety & Ethics | 189 | 244 | 59 | 30 | 526 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 106 | 40 | 6 | 188 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 79 | 8 | 1 | 152 |
| Regulatory Compliance | 69 | 66 | 14 | 3 | 152 |
| Training Effectiveness | 82 | 16 | 13 | 18 | 131 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Training data scarcity is an emerging challenge for organizations that aim to train proprietary LLMs.
Paper highlights training data scarcity as a challenge in its analysis and discussion sections (qualitative observation).
A gender gap persists, concentrated in the most exposed occupations.
Stratified/descriptive and regression analyses of the 2024 EWCS showing gender differences in self-reported generative AI adoption, with the gap largest among occupations with highest exposure; sample >36,600 workers across 35 countries.
An alternative specification that makes different choices about the timing of the pervasiveness of AI yields less robust results, though it also suggests that AI is labor saving.
Reported sensitivity analysis / alternative empirical specification in the paper; authors state the alternative yields less robust results but still indicates labor-saving effects.
Our baseline model finds evidence that AI is input saving.
Outcome reported from the baseline empirical specification indicating reductions in inputs associated with AI (authors' baseline model results).
AI is driving states to reconsider interdependence not as the source of peace, but as a battlefield of power.
Normative and interpretive conclusion drawn from the paper's analysis of AI's geopolitical implications; no empirical data or sample reported in the abstract.
AI is redefining foreign policy in a multipolar world by making the line between economic cooperation and strategic vulnerability indistinct.
Theoretical claim and synthesis in the paper's thesis; no empirical evidence or sample size provided in the abstract.
AI is reshaping economic relationships between countries that were previously sources of mutually beneficial relations into instruments of coercion.
The paper presents a theoretical analysis drawing on international political economy and foreign policy theory; no empirical measurements reported in the abstract.
AI enhances the weaponization of economic interdependence by enabling states to monitor, predict, manipulate, and disrupt transnational networks with unprecedented accuracy.
The paper advances a theoretical argument and synthesis of international political economy and foreign policy literatures; no empirical sample or quantitative data reported in the abstract.
The infrastructure for cross-user agent collaboration is entirely absent, let alone the governance mechanisms needed to secure it.
Authoritative claim in paper framing the research gap; presented as observational/argumentative (no empirical audit reported).
Current AI agent frameworks have made remarkable progress in automating individual tasks, yet all existing systems serve a single user.
Statement in paper's introduction/positioning; conceptual survey-style claim (no empirical study or systematic benchmark reported).
Standard benchmarks often fail to isolate an agent's core ability to parse queries and orchestrate computations.
Paper asserts that existing/standard benchmarks do not adequately isolate parsing and computation-orchestration abilities, motivating the new benchmark.
As multimodal AI achieves human-parity understanding of speech and gesture, [the keyboard's] necessity dissolves.
Theoretical claim supported by multidisciplinary review (history, neuroscience, technology, organizational studies); no quantified empirical test reported.
General-purpose LLMs pose misinformation risks for development and policy experts, lacking epistemic humility for verifiable outputs.
Conceptual/argumentative claim stated in the paper's motivation; no empirical test reported in the abstract.
Current session-based context handling (sessions ending, context windows filling, memory APIs returning flat facts) produces intelligence that is powerful per session but amnesiac across time.
Descriptive diagnostic argument in the paper; no empirical measurement reported in this text.
There was a nonsignificant absolute retest performance reduction in the AI condition and a larger retest performance decrement in the AI condition (i.e., retention decreased more after using Copilot).
Comparison of retest (one-week) performance across conditions reported in results; authors report a nonsignificant reduction and larger decrement for the AI/Copilot condition (n=22).
The US restricts mobility and knowledge flows and challenges regulatory efforts to protect its advantage.
Descriptive claim about US strategy (policy observation stated in the paper's framing; not quantified in the excerpt).
The AI race amplifies security risks and international tensions.
Introductory/interpretive claim motivating the study (no specific empirical quantification provided in the excerpt).
The US and China form two poles around which global AI research increasingly revolves (i.e., global AI research is polarizing around these two countries).
Longitudinal network analysis of international collaboration and citation patterns derived from publication data compared to random realizations.
The US and China have long diverged in both cross-country collaboration and citation links, forming two poles around which global AI research increasingly revolves.
Large-scale data of scientific publications spanning three decades; analysis comparing cross-country collaboration and citation links to their random realizations (null models).
Under logit demand and symmetric rivals, the QoS gap is strictly decreasing in API price and rival entry elasticity.
Comparative statics derived from the analytical model (logit demand, symmetric rivals).
Current operational approaches typically involve scattered testing tools, resulting in partial coverage and errors that surface only after deployment.
Authors' characterization of industry practice and limitations (assertion in paper; no empirical sample size reported in abstract).
Network change validation remains a critical yet predominantly manual, time-consuming, and error-prone process in modern network operations.
Statement in paper framing the problem; based on authors' characterization of current operational practice (no empirical sample size reported in abstract).
Thick subjectivist theories of meaning in life and meaningful work—those theories that emphasize that meaning-conferring activities are historically formed—enable us to appreciate how some losses cannot be made up, even if there are in principle ample alternative sources of meaning to be found elsewhere.
Theoretical claim about the explanatory power of 'thick subjectivist' normative theories; argued via conceptual philosophical analysis in the paper (no empirical testing reported).
Even if there are rich non-work sources of meaning, this does not entail that there is not a significant and multi-faceted loss of meaning, one that cannot be compensated for or offset elsewhere.
Normative/philosophical argument presented in the paper (conceptual reasoning rather than empirical measurement; no sample size).
The argument that non-work goods can replace work-derived meaning fails to consider the embeddedness and thickness of meaning in human lives.
Philosophical/theoretical critique based on conceptual analysis (author's argument invoking the notions of embeddedness and thickness of meaning; no empirical study reported).
The paper identifies governance challenges such as accountability gaps, digital sovereignty risks, ethical pluralism, and strategic weaponization arising from embedding AI in diplomatic practice.
Conceptual and normative analysis section of the paper outlining risks and governance challenges; illustrated by examples and argumentation.
Traditional machine learning approaches, including the baseline methodology proposed in previous studies, typically optimize global predictive accuracy and therefore fail to capture business-critical outcomes, especially the identification of high-risk clients.
Conceptual critique and literature/contextual claim in the paper; contrasted with the study's business-aware methods (no direct external benchmarking numbers provided in the abstract).
Classifying customers without a prior history at a given company is particularly challenging due to the absence of historical behavior, extreme class imbalance, heavy-tailed loss distributions, and strict operational constraints.
Argumentation / problem statement in the paper (no empirical test reported); descriptive characterization of the insurance cold-start classification problem.
Thin training coverage fosters anxiety about substitution and slows diffusion of AI tools.
Reported associations from surveys of mid-level managers and technical staff, interviews, and document analysis across cases; thematic coding identified links between limited training, worker anxiety, and slower diffusion. (Sample size not reported.)
Upstream textile SMEs frequently exhibit constrained supply chain resilience owing to persistent information latency and structural dependence on downstream orders.
Background/contextual claim stated in paper (motivation for study); no specific quantitative test reported in abstract.
There exist inequalities in the emergence of algorithmic bias and in transparency of these systems.
Paper states that inequalities and lack of transparency were observed/identified (citing Memarian, 2023; Bello, 2023; Gambacorta et al., 2024) and discusses these as findings.
Algorithmic bias in automated credit scoring systems may block marginalized groups from accessing financial services.
Explicit statement in the introduction citing prior literature (Agboola, 2025; Nwafor et al., 2024; Oguntibeju, 2024) and motivating the study.
The pharmaceutical R&D process is persistently challenged by high financial costs, protracted timelines, and remarkably low success rates.
Background statement in the review synthesizing prior literature and field knowledge; no original empirical data or sample sizes reported in the provided text.
Platforms can exploit workers' uncertainty about the cost of labor to effectively suppress wages.
Interpretation / implication drawn from the theoretical model and the result that a platform can achieve coverage while paying only O(log(M)/M) fraction of total labor cost under assumptions about workers' cost estimates.
There exists a simple pricing strategy for the platform that covers all M tasks with wait time O(M) while paying only an O(log(M)/M) fraction of the total cost of labor.
Theoretical result from the paper's posted-price procurement model under stated assumptions on workers' estimated costs; formal analysis/proof showing existence of such a pricing strategy for general M (no empirical sample).
Because the technical threshold for this transition is already crossed at modest engineering effort, the window for protective frameworks covering disclosure, consent, compensation and deployment restriction is the present, while deployment remains optional rather than infrastructural.
Authors' normative claim based on their implementation (distillation and deployment) and interpretation that modest engineering sufficed; used to argue policy urgency for disclosure/consent/compensation frameworks.
We term this the Relic condition: when publication systems make stable reasoning architectures legible, extractable and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement.
Conceptual framing introduced by the authors as an interpretation of the observed results and their implications; not an empirical measurement but a named condition/argument.
Agency in software engineering is primarily constrained by organizational policies rather than individual preferences.
Authors' synthesis of qualitative results across the ACTA/Delphi and task/review phases indicating organizational policy factors were cited as primary constraints.
The authors identify five 'decoys' that seemingly critique—but in actuality co-constitute—AI's emergent power relations and material political economy.
Analytical contribution of the paper: identification and conceptual description of five decoys based on literature synthesis; this is a descriptive/theoretical taxonomy rather than an empirical enumeration with sample size.
Decoys contribute to the network-making power that is at the heart of the Project's extraction and exploitation.
Theoretical synthesis and interpretive argument grounded in literature across relevant fields; the paper posits a mechanism (decoys → strengthened networks → increased extraction/exploitation) but provides no empirical quantification.
Decoys often create the illusion of accountability while masking the emerging political economies that the Project of AI has set into motion.
Conceptual critique supported by literature from communication, STS, and economic sociology; argument that particular practices/instruments function rhetorically to appear accountable while obscuring material political economy. No empirical sample or quantified measures reported.
As AI funders and developers expand their access to resources and configure sociotechnical conditions, they benefit from decoys that animate scholars, critics, policymakers, journalists, and the public into co-constructing industry-empowering AI futures.
Theoretical analysis and literature review; paper identifies and interprets how discursive and institutional phenomena (termed 'decoys') function to produce consent and co-construction of industry-aligned futures. No empirical sample size provided.
Those who fund and develop AI systems operate through and seek to sustain networks of power and wealth.
Conceptual argument and literature synthesis drawing on communication studies, science & technology studies (STS), and economic sociology; no empirical sample reported.
In the geographical network, both technological diversity and technological proximity inhibit main path formation, implying macro-regional evolution requires specialized focus and complementary knowledge.
ERGM results for the geographical diffusion layer showing negative (inhibitory) associations for diversity and proximity variables; interpreted in regional evolution context.
Existing evaluations of large language models remain limited to judgmental tasks in simple formats, such as binary or multiple-choice questions, and do not capture forecasting over continuous quantities.
Literature/benchmark critique asserted in the paper (argument that current benchmarks focus on simple judgmental formats and miss continuous numerical forecasting capabilities).
Calibration degrades sharply at extreme magnitudes, revealing systematic overconfidence across all evaluated models.
Empirical observations from QuantSightBench evaluation showing model calibration performance as a function of magnitude (paper statement noting sharp degradation and overconfidence at extremes).
The top performers Gemini 3.1 Pro (79.1%), Grok 4 (76.4%), and GPT-5.4 (75.3%) all fall at least 10 percentage points short of the 90% coverage target.
Reported empirical coverage percentages from evaluation on QuantSightBench for the listed models (paper provides these percentage values).
None of the 11 evaluated frontier and open-weight models achieves the 90% coverage target.
Empirical evaluation on the newly introduced QuantSightBench benchmark across 11 frontier and open-weight models; models were assessed on empirical coverage of prediction intervals versus a 90% target (paper statement).
The study identified significant implementation challenges including algorithmic bias, digital divide concerns, data privacy risks, and low technology readiness among HR teams in Tier 2 cities.
Synthesis of qualitative case study findings from 4 organizations plus survey responses (N=150) reporting barriers and risks encountered during adoption.
Current attack policies do not saturate LinuxArena (human-crafted attacks evade monitors at substantially higher rates than model-generated attacks, indicating headroom for attackers).
Empirical observation comparing human-crafted attacks (LaStraj) and elicited model-generated attacks; authors interpret higher human evasion rates as evidence that current automated attack policies have not saturated the challenge posed by LinuxArena.