Evidence (7198 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	790	208	103	950	2117
Governance & Regulation	869	411	195	126	1630
Organizational Efficiency	817	202	126	87	1243
Technology Adoption Rate	675	258	128	106	1178
Research Productivity	462	138	64	347	1023
Output Quality	501	193	61	52	807
Decision Quality	346	180	84	51	668
AI Safety & Ethics	235	285	70	34	630
Firm Productivity	452	58	91	20	627
Market Structure	184	171	123	24	507
Task Allocation	221	65	76	34	401
Skill Acquisition	176	62	62	17	317
Innovation Output	207	28	48	18	303
Fiscal & Macroeconomic	135	72	44	26	284
Employment Level	105	56	108	13	284
Consumer Welfare	121	67	45	11	244
Firm Revenue	160	50	28	4	242
Task Completion Time	182	33	10	13	239
Inequality Measures	45	126	50	6	227
Worker Satisfaction	94	73	23	12	202
Error Rate	76	98	11	4	189
Regulatory Compliance	81	73	17	7	178
Automation Exposure	61	59	26	14	163
Training Effectiveness	97	21	14	19	153
Wages & Compensation	78	37	25	6	146
Developer Productivity	105	18	14	6	144
Team Performance	87	17	28	10	143
Job Displacement	12	83	21	1	117
Hiring & Recruitment	52	8	8	3	71
Social Protection	39	17	8	2	66
Creative Output	32	20	8	3	64
Skill Obsolescence	5	49	6	1	61
Labor Share of Income	17	19	17	—	53
Worker Turnover	15	14	—	3	32
Industry	—	—	—	1	1

Governance Remove filter

The paradigm rests on three governance primitives: (1) a layered identity architecture that separates a Manager Agent from multiple context-specific Identity Agents; the Manager Agent holds global knowledge but is architecturally isolated from external communication.

Architectural/design claim describing the proposed layered identity primitive (presentation of design; no empirical validation in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... identity architecture and information flow constraints

We propose a human-symbiotic agent paradigm in which each user owns a permanently bound agent system that collaborates on the owner's behalf, forming a network whose nodes are humans rather than agents.

Design proposal / conceptual architecture presented in the paper (no large-scale deployment or empirical evaluation described in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... structure of agent networks (human-centric vs agent-centric) and delegation mode...

The next frontier for AI agents lies not in stronger individual capability, but in the digitization of human collaborative relationships.

Normative/strategic claim advanced by the authors as the central thesis (conceptual argument, no empirical test reported).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... focus of AI-agent development (individual capability vs collaboration digitizati...

Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate.

Theoretical/argumentative claim presented as background motivation (conceptual reasoning, citation not provided in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... human productivity as mediated by social/organizational relationships

Given historical inequities in housing placement, it is crucial to audit LLM use in this context.

Authors' policy/recommendation motivated by historical inequities in housing placement and their empirical audit findings; presented as an argument in the report rather than a quantified experimental result.

high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... need for auditing LLMs (policy recommendation)

Leveraging LLMs to augment tabular classification with casenote summaries can safely incorporate additional text information with low implementation burden.

Authors' reported experiments and practical assessment on augmenting tabular classifiers with LLM-derived casenote summaries from a nonprofit outreach dataset; described as having low implementation burden and being safe to use. (No sample size given in abstract.)

high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... feasibility/safety of augmenting tabular models with LLM casenote summaries; imp...

A fine-tuned model augmented with casenote summaries can improve accuracy while reducing algorithmic fairness disparities on the housing placement multi-class classification task.

Empirical audit of LLM-based tabular classification on a real housing placement prediction task augmented with street outreach casenotes from a nonprofit partner; authors report multi-class classification experiments comparing fine-tuned models with and without casenote summaries and auditing error disparities across groups. (Sample size not stated in the abstract.)

high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... multi-class classification accuracy; classification error disparities across dem...

There is a positive relationship between disagreement among agents and trading volume in the simulated markets.

Observed correlation in the simulated open-call auction between measured disagreement (e.g., dispersion in beliefs) and trading volume; described as replicating classic experimental findings.

high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles relationship between disagreement (belief dispersion) and trading volume

These individual-level patterns aggregate into equilibrium dynamics that replicate classic experimental findings (Smith et al., 1988), including the predictive power of excess demand for future prices.

Aggregation of simulated agent behavior in the open-call auction producing market-level time series; comparison of market dynamics to classic experimental benchmark (Smith et al., 1988) and reported finding that excess demand predicts future prices.

high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles predictive power of excess demand for future prices

AI agents form recency-weighted extrapolative beliefs (i.e., overweight recent price history when forecasting future prices).

Analysis of agents' forecasts and trading behavior in the simulated open-call auction populated by autonomous LLM agents; identification of extrapolative forecasting patterns reported as a main finding.

high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles recency-weighted extrapolative beliefs in price forecasts

AI agents exhibit a pronounced disposition effect.

Simulated open-call auction populated by autonomous LLM agents in experimental asset-market simulations; behavioral trading data showing agents' selling/holding patterns (paper describes this as a main documented finding).

high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles disposition effect (tendency to sell winners and hold losers)

We contribute design guidelines for specialized AI and articulate a vision for 'ecosystem-aware' Humble AI.

Paper's stated contributions (design guidelines and conceptual vision) described in the abstract.

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... design guidance / conceptual framework

Qualitatively, participants used AVA as a specialized 'evidence engine'; reasoned abstention clarified scope boundaries, and trust was calibrated through institutional provenance and page-anchored citations.

Qualitative findings from surveys and 20 interviews reported in the paper (participant quotations and thematic analysis implied in abstract).

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... user behavior and trust calibration (use as evidence engine; role of abstention ...

Difference-in-Differences estimates associate sustained engagement with 2.4-3.9 hours saved weekly.

Quantitative claim reported in the paper based on Difference-in-Differences analysis of usage/engagement data from the evaluation (implicit sample drawn from the >2,200 participants).

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... time saved per week

AVA operationalizes epistemic humility through two mechanisms: citation verifiability (tracing claims to sources) and reasoned abstention (declining unsupported queries with justification and redirection).

Design claim describing implemented mechanisms in the platform; described in the paper as operational features.

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... epistemic humility operationalization (citation verifiability and reasoned abste...

AVA's multi-agent pipeline enables users to query and receive evidence-based syntheses.

System design and capability claim in the paper (description of multi-agent pipeline producing evidence-based syntheses).

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... output: evidence-based syntheses

AVA is a GenAI platform built on a curated library of over 4,000 World Bank Reports with multilingual capabilities.

System description provided in the paper; statement of dataset size and functionality (library count and multilingual support).

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... system corpus size / multilingual capability

The governance architecture (privacy implemented as physics rather than policy, founder-controlled class shares on non-negotiable architectural commitments) is inseparable from the product itself.

Normative and architectural argument in the paper tying governance design choices to product architecture (no empirical validation in this text).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... relationship between governance architecture and AI product architecture

Physics limits now constraining the model layer make the continuity layer newly consequential.

Analytical argument in the paper linking physical constraints on model scaling to increased importance of continuity (no empirical measurement included here).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... relative consequentiality of continuity given physics limits on model scaling

The paper proposes a four-layer development arc for continuity: from external SDK to hardware node to long-horizon human infrastructure.

Design/roadmap proposal described in the manuscript (no empirical testing provided here).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... proposed development pathway for continuity infrastructure

The engineering architecture for continuity is mapped to the theological pattern of kenosis and the symbolic pattern of Alpha and Omega, and the paper argues this mapping is structural rather than merely metaphorical.

Interpretive/mapping argument presented in the paper (theoretical/analogical reasoning).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... conceptual mapping between engineering architecture and symbolic/theological pat...

The paper describes a storage primitive called Decomposed Trace Convergence Memory whose write-time decomposition and read-time reconstruction produce the continuity property.

Design proposal in the manuscript outlining a storage primitive and its read/write behavior (no empirical validation reported here).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... ability of a storage primitive to produce continuity

Continuity is defined in the paper as a system property with seven required characteristics, distinct from memory and from retrieval.

Explicit definitional claim made in the manuscript (enumeration of seven characteristics described).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... conceptual definition/characterization of continuity

A companion paper (arXiv:2604.10981) positions the ATANT framework against existing memory, long-context, and agentic-memory benchmarks.

Citation to a companion paper that reportedly compares frameworks/benchmarks.

high positive The Continuity Layer: Why Intelligence Needs an Architecture... comparative positioning of evaluation frameworks

The formal evaluation framework for the property described here is the ATANT benchmark (arXiv:2604.06710), published separately with evaluation results on a 250-story corpus.

Citation to separate benchmark paper and reported evaluation on a 250-story corpus.

high positive The Continuity Layer: Why Intelligence Needs an Architecture... benchmarking/evaluation of continuity property

Engineering work to build the continuity layer has begun in public.

Statement in the paper asserting publicly visible engineering activity (no specific projects or quantitative audit included in this text).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... public engineering activity toward continuity layer

The continuity layer is the most consequential piece of infrastructure the field has not yet built.

Normative claim/argument in the position paper (no empirical test presented in this text).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... relative infrastructural importance in AI systems

The most important architectural problem in AI is not the size of the model but the absence of a layer that carries forward what the model has come to understand (a "continuity layer").

Position paper argument and conceptual reasoning in the manuscript (no empirical study reported).

high positive The Continuity Layer: Why Intelligence Needs an Architecture... existence/importance of a continuity layer in AI architecture

China leads initiatives of global governance (in AI).

Stated strategic observation in the paper's introduction (no empirical measures provided in the excerpt).

high positive Polarization and Integration in Global AI Research leadership in global AI governance initiatives

The United Kingdom and Germany have integrated exclusively with the US.

Analysis of cross-country collaboration and citation ties showing exclusive integration patterns for the UK and Germany with the US in the publication-based network comparisons to random models.

high positive Polarization and Integration in Global AI Research international research integration (collaboration/citation) of UK and Germany wi...

Illustrative welfare calculations suggest net gains in the tens of billions annually from the proposed policies/interventions.

Paper reports illustrative/calculatory welfare exercises (not structural estimates) that yield an aggregate welfare figure described as 'net gains in the tens of billions annually'.

high positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... aggregate welfare gains (annual)

The policy section proposes 'Neutral Inference', a four-pillar conduct framework consisting of QoS parity, routing transparency, FRAND-style non-discrimination, and tier transparency with release-pathway discipline.

Normative policy proposal laid out in the paper's policy section.

high positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... regulatory/conduct framework (Neutral Inference) components

Under logit demand and symmetric rivals, the QoS gap is strictly increasing in inference-quality importance (alpha) and downstream margins.

Comparative statics derived from the analytical model (logit demand, symmetric rivals).

high positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... QoS gap

The main theoretical result provides an explicit local equilibrium characterization of the QoS gap under logit demand and symmetric rivals.

Analytical derivation in the formal game-theoretic model assuming logit demand and symmetric rivals; presented as the paper's main theoretical result.

high positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... QoS gap (equilibrium characterization)

An extension motivated by Anthropic's April 2026 release introduces a third mechanism, tier-based access discrimination, parameterized by a tier gap (tau) and partner-exclusivity (kappa).

Model extension in the paper explicitly adds parameters (tau, kappa) to represent tier-based access discrimination; motivated by a contemporaneous product release.

high positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... tier-based access discrimination (parameterized by tau and kappa)

The model isolates two foreclosure mechanisms operating without predatory pricing: quality-of-service (QoS) discrimination against downstream rivals (via latency, throughput, context limits, or feature access) and routing bias in assistant-layer interfaces.

Formal game-theoretic model developed in the paper; mechanisms are derived and described in model set-up and analysis.

high positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... presence of foreclosure mechanisms (QoS discrimination, routing bias)

As generative AI commercializes, competitive advantage is shifting from model training toward inference, distribution, and routing.

Framing/introductory assertion in the paper (conceptual argument, literature synthesis), not an empirical test.

high positive The Inference Bottleneck: A Formal Model of Vertical Foreclo... shift in source of competitive advantage (training -> inference/distribution/rou...

The model shows cooperative behaviour supported by reward-punishment schemes that discourage deviations.

Analysis of the learned strategies/behaviour of the simulated deep reinforcement learning agents showing emergence of cooperation enforced via reward-punishment mechanisms (as reported in the paper).

high positive Convergence to collusion in algorithmic pricing presence of cooperative behaviour and mechanisms (reward-punishment) that deter ...

A modern deep reinforcement learning model deployed to price goods in a repeated oligopolistic competition game with continuous prices converges to a collusive outcome in an amount of time that matches empirical observations (under reasonable assumptions on the length of a time step).

Simulation/experiment using a modern deep reinforcement learning model in a repeated oligopoly pricing game with continuous prices; claim that convergence time matches empirical observations. (No sample size, number of runs, or numerical convergence time provided in the excerpt.)

high positive Convergence to collusion in algorithmic pricing time to converge to a collusive pricing outcome

Previous research shows that [pricing] algorithms can exhibit collusive behaviour.

Citation/summary of prior literature (as stated in paper); no specific studies or sample sizes given in the excerpt.

high positive Convergence to collusion in algorithmic pricing occurrence of collusive behaviour by pricing algorithms

The study uses a combination of cognitive systems theory, diplomatic negotiation models, and empirical Human-in-the-Loop experiments as its methodological basis.

Methods description in the paper listing theoretical foundations and empirical HITL experiments as components of the study design.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... methodological approach (integration of theory and HITL experiments)

The paper outlines recommendations for international norm development, capacity building, and the creation of interoperable, transparent AI systems for diplomacy.

Policy recommendation section of the paper proposing international norms, capacity-building measures, and interoperable transparent system design.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... policy recommendations proposed (norm development, capacity building, interopera...

Experimental HITL data indicate a 17% reduction in cognitive bias for hybrid human-AI teams.

Human-in-the-Loop (HITL) experiments reported in the paper; comparison of cognitive bias measures between hybrid teams and baseline (sample size not provided in summary).

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... cognitive bias (reduction)

Experimental HITL data indicate that hybrid human-AI teams achieved 23% faster consensus-building.

Human-in-the-Loop (HITL) experiments reported in the paper; experimental comparison between hybrid human-AI teams and baseline (details on sample size not reported in summary).

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... time to consensus (consensus-building speed)

The framework is validated through real-world and simulated case studies, including UN ceasefire mediation, EU sentiment-monitoring for conflict diplomacy, and African Union peacekeeping planning.

Validation reported via a set of real-world and simulated case studies described in the paper (case study methodology; specific cases named).

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... case-study-based validation of framework applicability

Each layer augments a core dimension of diplomatic reasoning, enabling interpretable AI contributions, foresight analysis, culturally sensitive framing, and legally compliant outputs.

Conceptual mapping of each proposed layer to functional capabilities described in the paper; claimed alignment with interpretability, foresight, cultural framing, and legal compliance.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... interpretability, foresight analysis, culturally sensitive framing, legal compli...

The study proposes a five-layer Human-AI collaboration architecture tailored to multilateral diplomacy consisting of: (1) Context Modeling, (2) Scenario Generation, (3) Cognitive Interfacing, (4) Decision Support, and (5) Ethical-Normative Governance.

Architectural proposal in the paper based on synthesis of literature and design choices; claimed as the output of the conceptual framework.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... definition of five-layer architecture (components enumerated)

This paper develops the concept of Artificial Diplomacy as a structured interface between human strategic cognition and machine-supported reasoning.

Theoretical development drawing on cognitive systems theory and diplomatic negotiation models; described design and conceptual argumentation in the paper.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... conceptualization of 'Artificial Diplomacy' (design of an interface)

Policymakers can reinforce these conditions by shifting from technology-neutral principles to auditable process standards that couple AI investment with reskilling and data-quality obligations.

Policy recommendation based on the study's findings and synthesis; presented as a normative implication rather than empirically tested within the study. (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... policy effectiveness in reinforcing safe, equitable AI adoption

Leaders should fund training coverage and design (not just headline hours), equip non-specialists to interpret model outputs, pair performance artefacts with participatory routines, and treat explainability as a usability requirement to achieve durable, auditable value in safety-critical energy contexts.

Prescriptive recommendation based on a 'field-tested playbook' synthesised from the multi-case qualitative study (interviews, surveys, documents). The claim is drawn from authors' interpretation of cross-case patterns rather than causal inference. (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... durable, auditable value / legitimacy and sustained use

« Prev 1 2 3 … 72 73 74 … 143 144 Next »