Evidence (7198 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
8921 claims
Filter claims →
Productivity
8002 claims
Filter claims →
Governance
7198 claims
Filtered →
Human-AI Collaboration
6864 claims
Filter claims →
Org Design
4398 claims
Filter claims →
Innovation
4286 claims
Filter claims →
Labor Markets
3629 claims
Filter claims →
Skills & Training
3001 claims
Filter claims →
Inequality
2141 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 790 | 208 | 103 | 950 | 2117 |
| Governance & Regulation | 869 | 411 | 195 | 126 | 1630 |
| Organizational Efficiency | 817 | 202 | 126 | 87 | 1243 |
| Technology Adoption Rate | 675 | 258 | 128 | 106 | 1178 |
| Research Productivity | 462 | 138 | 64 | 347 | 1023 |
| Output Quality | 501 | 193 | 61 | 52 | 807 |
| Decision Quality | 346 | 180 | 84 | 51 | 668 |
| AI Safety & Ethics | 235 | 285 | 70 | 34 | 630 |
| Firm Productivity | 452 | 58 | 91 | 20 | 627 |
| Market Structure | 184 | 171 | 123 | 24 | 507 |
| Task Allocation | 221 | 65 | 76 | 34 | 401 |
| Skill Acquisition | 176 | 62 | 62 | 17 | 317 |
| Innovation Output | 207 | 28 | 48 | 18 | 303 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Employment Level | 105 | 56 | 108 | 13 | 284 |
| Consumer Welfare | 121 | 67 | 45 | 11 | 244 |
| Firm Revenue | 160 | 50 | 28 | 4 | 242 |
| Task Completion Time | 182 | 33 | 10 | 13 | 239 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 94 | 73 | 23 | 12 | 202 |
| Error Rate | 76 | 98 | 11 | 4 | 189 |
| Regulatory Compliance | 81 | 73 | 17 | 7 | 178 |
| Automation Exposure | 61 | 59 | 26 | 14 | 163 |
| Training Effectiveness | 97 | 21 | 14 | 19 | 153 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 21 | 1 | 117 |
| Hiring & Recruitment | 52 | 8 | 8 | 3 | 71 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 49 | 6 | 1 | 61 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 15 | 14 | — | 3 | 32 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
The paradigm rests on three governance primitives: (1) a layered identity architecture that separates a Manager Agent from multiple context-specific Identity Agents; the Manager Agent holds global knowledge but is architecturally isolated from external communication.
Architectural/design claim describing the proposed layered identity primitive (presentation of design; no empirical validation in excerpt).
We propose a human-symbiotic agent paradigm in which each user owns a permanently bound agent system that collaborates on the owner's behalf, forming a network whose nodes are humans rather than agents.
Design proposal / conceptual architecture presented in the paper (no large-scale deployment or empirical evaluation described in excerpt).
The next frontier for AI agents lies not in stronger individual capability, but in the digitization of human collaborative relationships.
Normative/strategic claim advanced by the authors as the central thesis (conceptual argument, no empirical test reported).
Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate.
Theoretical/argumentative claim presented as background motivation (conceptual reasoning, citation not provided in excerpt).
Given historical inequities in housing placement, it is crucial to audit LLM use in this context.
Authors' policy/recommendation motivated by historical inequities in housing placement and their empirical audit findings; presented as an argument in the report rather than a quantified experimental result.
Leveraging LLMs to augment tabular classification with casenote summaries can safely incorporate additional text information with low implementation burden.
Authors' reported experiments and practical assessment on augmenting tabular classifiers with LLM-derived casenote summaries from a nonprofit outreach dataset; described as having low implementation burden and being safe to use. (No sample size given in abstract.)
A fine-tuned model augmented with casenote summaries can improve accuracy while reducing algorithmic fairness disparities on the housing placement multi-class classification task.
Empirical audit of LLM-based tabular classification on a real housing placement prediction task augmented with street outreach casenotes from a nonprofit partner; authors report multi-class classification experiments comparing fine-tuned models with and without casenote summaries and auditing error disparities across groups. (Sample size not stated in the abstract.)
There is a positive relationship between disagreement among agents and trading volume in the simulated markets.
Observed correlation in the simulated open-call auction between measured disagreement (e.g., dispersion in beliefs) and trading volume; described as replicating classic experimental findings.
These individual-level patterns aggregate into equilibrium dynamics that replicate classic experimental findings (Smith et al., 1988), including the predictive power of excess demand for future prices.
Aggregation of simulated agent behavior in the open-call auction producing market-level time series; comparison of market dynamics to classic experimental benchmark (Smith et al., 1988) and reported finding that excess demand predicts future prices.
AI agents form recency-weighted extrapolative beliefs (i.e., overweight recent price history when forecasting future prices).
Analysis of agents' forecasts and trading behavior in the simulated open-call auction populated by autonomous LLM agents; identification of extrapolative forecasting patterns reported as a main finding.
AI agents exhibit a pronounced disposition effect.
Simulated open-call auction populated by autonomous LLM agents in experimental asset-market simulations; behavioral trading data showing agents' selling/holding patterns (paper describes this as a main documented finding).
We contribute design guidelines for specialized AI and articulate a vision for 'ecosystem-aware' Humble AI.
Paper's stated contributions (design guidelines and conceptual vision) described in the abstract.
Qualitatively, participants used AVA as a specialized 'evidence engine'; reasoned abstention clarified scope boundaries, and trust was calibrated through institutional provenance and page-anchored citations.
Qualitative findings from surveys and 20 interviews reported in the paper (participant quotations and thematic analysis implied in abstract).
Difference-in-Differences estimates associate sustained engagement with 2.4-3.9 hours saved weekly.
Quantitative claim reported in the paper based on Difference-in-Differences analysis of usage/engagement data from the evaluation (implicit sample drawn from the >2,200 participants).
AVA operationalizes epistemic humility through two mechanisms: citation verifiability (tracing claims to sources) and reasoned abstention (declining unsupported queries with justification and redirection).
Design claim describing implemented mechanisms in the platform; described in the paper as operational features.
AVA's multi-agent pipeline enables users to query and receive evidence-based syntheses.
System design and capability claim in the paper (description of multi-agent pipeline producing evidence-based syntheses).
AVA is a GenAI platform built on a curated library of over 4,000 World Bank Reports with multilingual capabilities.
System description provided in the paper; statement of dataset size and functionality (library count and multilingual support).
The governance architecture (privacy implemented as physics rather than policy, founder-controlled class shares on non-negotiable architectural commitments) is inseparable from the product itself.
Normative and architectural argument in the paper tying governance design choices to product architecture (no empirical validation in this text).
Physics limits now constraining the model layer make the continuity layer newly consequential.
Analytical argument in the paper linking physical constraints on model scaling to increased importance of continuity (no empirical measurement included here).
The paper proposes a four-layer development arc for continuity: from external SDK to hardware node to long-horizon human infrastructure.
Design/roadmap proposal described in the manuscript (no empirical testing provided here).
The engineering architecture for continuity is mapped to the theological pattern of kenosis and the symbolic pattern of Alpha and Omega, and the paper argues this mapping is structural rather than merely metaphorical.
Interpretive/mapping argument presented in the paper (theoretical/analogical reasoning).
The paper describes a storage primitive called Decomposed Trace Convergence Memory whose write-time decomposition and read-time reconstruction produce the continuity property.
Design proposal in the manuscript outlining a storage primitive and its read/write behavior (no empirical validation reported here).
Continuity is defined in the paper as a system property with seven required characteristics, distinct from memory and from retrieval.
Explicit definitional claim made in the manuscript (enumeration of seven characteristics described).
A companion paper (arXiv:2604.10981) positions the ATANT framework against existing memory, long-context, and agentic-memory benchmarks.
Citation to a companion paper that reportedly compares frameworks/benchmarks.
The formal evaluation framework for the property described here is the ATANT benchmark (arXiv:2604.06710), published separately with evaluation results on a 250-story corpus.
Citation to separate benchmark paper and reported evaluation on a 250-story corpus.
Engineering work to build the continuity layer has begun in public.
Statement in the paper asserting publicly visible engineering activity (no specific projects or quantitative audit included in this text).
The continuity layer is the most consequential piece of infrastructure the field has not yet built.
Normative claim/argument in the position paper (no empirical test presented in this text).
The most important architectural problem in AI is not the size of the model but the absence of a layer that carries forward what the model has come to understand (a "continuity layer").
Position paper argument and conceptual reasoning in the manuscript (no empirical study reported).
China leads initiatives of global governance (in AI).
Stated strategic observation in the paper's introduction (no empirical measures provided in the excerpt).
The United Kingdom and Germany have integrated exclusively with the US.
Analysis of cross-country collaboration and citation ties showing exclusive integration patterns for the UK and Germany with the US in the publication-based network comparisons to random models.
Illustrative welfare calculations suggest net gains in the tens of billions annually from the proposed policies/interventions.
Paper reports illustrative/calculatory welfare exercises (not structural estimates) that yield an aggregate welfare figure described as 'net gains in the tens of billions annually'.
The policy section proposes 'Neutral Inference', a four-pillar conduct framework consisting of QoS parity, routing transparency, FRAND-style non-discrimination, and tier transparency with release-pathway discipline.
Normative policy proposal laid out in the paper's policy section.
Under logit demand and symmetric rivals, the QoS gap is strictly increasing in inference-quality importance (alpha) and downstream margins.
Comparative statics derived from the analytical model (logit demand, symmetric rivals).
The main theoretical result provides an explicit local equilibrium characterization of the QoS gap under logit demand and symmetric rivals.
Analytical derivation in the formal game-theoretic model assuming logit demand and symmetric rivals; presented as the paper's main theoretical result.
An extension motivated by Anthropic's April 2026 release introduces a third mechanism, tier-based access discrimination, parameterized by a tier gap (tau) and partner-exclusivity (kappa).
Model extension in the paper explicitly adds parameters (tau, kappa) to represent tier-based access discrimination; motivated by a contemporaneous product release.
The model isolates two foreclosure mechanisms operating without predatory pricing: quality-of-service (QoS) discrimination against downstream rivals (via latency, throughput, context limits, or feature access) and routing bias in assistant-layer interfaces.
Formal game-theoretic model developed in the paper; mechanisms are derived and described in model set-up and analysis.
As generative AI commercializes, competitive advantage is shifting from model training toward inference, distribution, and routing.
Framing/introductory assertion in the paper (conceptual argument, literature synthesis), not an empirical test.
The model shows cooperative behaviour supported by reward-punishment schemes that discourage deviations.
Analysis of the learned strategies/behaviour of the simulated deep reinforcement learning agents showing emergence of cooperation enforced via reward-punishment mechanisms (as reported in the paper).
A modern deep reinforcement learning model deployed to price goods in a repeated oligopolistic competition game with continuous prices converges to a collusive outcome in an amount of time that matches empirical observations (under reasonable assumptions on the length of a time step).
Simulation/experiment using a modern deep reinforcement learning model in a repeated oligopoly pricing game with continuous prices; claim that convergence time matches empirical observations. (No sample size, number of runs, or numerical convergence time provided in the excerpt.)
Previous research shows that [pricing] algorithms can exhibit collusive behaviour.
Citation/summary of prior literature (as stated in paper); no specific studies or sample sizes given in the excerpt.
The study uses a combination of cognitive systems theory, diplomatic negotiation models, and empirical Human-in-the-Loop experiments as its methodological basis.
Methods description in the paper listing theoretical foundations and empirical HITL experiments as components of the study design.
The paper outlines recommendations for international norm development, capacity building, and the creation of interoperable, transparent AI systems for diplomacy.
Policy recommendation section of the paper proposing international norms, capacity-building measures, and interoperable transparent system design.
Experimental HITL data indicate a 17% reduction in cognitive bias for hybrid human-AI teams.
Human-in-the-Loop (HITL) experiments reported in the paper; comparison of cognitive bias measures between hybrid teams and baseline (sample size not provided in summary).
Experimental HITL data indicate that hybrid human-AI teams achieved 23% faster consensus-building.
Human-in-the-Loop (HITL) experiments reported in the paper; experimental comparison between hybrid human-AI teams and baseline (details on sample size not reported in summary).
The framework is validated through real-world and simulated case studies, including UN ceasefire mediation, EU sentiment-monitoring for conflict diplomacy, and African Union peacekeeping planning.
Validation reported via a set of real-world and simulated case studies described in the paper (case study methodology; specific cases named).
Each layer augments a core dimension of diplomatic reasoning, enabling interpretable AI contributions, foresight analysis, culturally sensitive framing, and legally compliant outputs.
Conceptual mapping of each proposed layer to functional capabilities described in the paper; claimed alignment with interpretability, foresight, cultural framing, and legal compliance.
The study proposes a five-layer Human-AI collaboration architecture tailored to multilateral diplomacy consisting of: (1) Context Modeling, (2) Scenario Generation, (3) Cognitive Interfacing, (4) Decision Support, and (5) Ethical-Normative Governance.
Architectural proposal in the paper based on synthesis of literature and design choices; claimed as the output of the conceptual framework.
This paper develops the concept of Artificial Diplomacy as a structured interface between human strategic cognition and machine-supported reasoning.
Theoretical development drawing on cognitive systems theory and diplomatic negotiation models; described design and conceptual argumentation in the paper.
Policymakers can reinforce these conditions by shifting from technology-neutral principles to auditable process standards that couple AI investment with reskilling and data-quality obligations.
Policy recommendation based on the study's findings and synthesis; presented as a normative implication rather than empirically tested within the study. (Sample size not reported.)
Leaders should fund training coverage and design (not just headline hours), equip non-specialists to interpret model outputs, pair performance artefacts with participatory routines, and treat explainability as a usability requirement to achieve durable, auditable value in safety-critical energy contexts.
Prescriptive recommendation based on a 'field-tested playbook' synthesised from the multi-case qualitative study (interviews, surveys, documents). The claim is drawn from authors' interpretation of cross-case patterns rather than causal inference. (Sample size not reported.)