Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Knowledge workers become adversarial auditors rather than keystroke-producers.
Projected role-shift based on the verification-bottleneck thesis and interdisciplinary supporting arguments; no empirical longitudinal workforce study reported.
The central contribution identifies the verification bottleneck: as AI collapses production friction, the primary constraint shifts from generation to evaluation.
Theoretical argument supported by literature synthesis across multiple fields; no direct experimental quantification provided.
We contribute design guidelines for specialized AI and articulate a vision for 'ecosystem-aware' Humble AI.
Paper's stated contributions (design guidelines and conceptual vision) described in the abstract.
Qualitatively, participants used AVA as a specialized 'evidence engine'; reasoned abstention clarified scope boundaries, and trust was calibrated through institutional provenance and page-anchored citations.
Qualitative findings from surveys and 20 interviews reported in the paper (participant quotations and thematic analysis implied in abstract).
Difference-in-Differences estimates associate sustained engagement with 2.4-3.9 hours saved weekly.
Quantitative claim reported in the paper based on Difference-in-Differences analysis of usage/engagement data from the evaluation (implicit sample drawn from the >2,200 participants).
AVA operationalizes epistemic humility through two mechanisms: citation verifiability (tracing claims to sources) and reasoned abstention (declining unsupported queries with justification and redirection).
Design claim describing implemented mechanisms in the platform; described in the paper as operational features.
AVA's multi-agent pipeline enables users to query and receive evidence-based syntheses.
System design and capability claim in the paper (description of multi-agent pipeline producing evidence-based syntheses).
AVA is a GenAI platform built on a curated library of over 4,000 World Bank Reports with multilingual capabilities.
System description provided in the paper; statement of dataset size and functionality (library count and multilingual support).
The governance architecture (privacy implemented as physics rather than policy, founder-controlled class shares on non-negotiable architectural commitments) is inseparable from the product itself.
Normative and architectural argument in the paper tying governance design choices to product architecture (no empirical validation in this text).
Physics limits now constraining the model layer make the continuity layer newly consequential.
Analytical argument in the paper linking physical constraints on model scaling to increased importance of continuity (no empirical measurement included here).
The paper proposes a four-layer development arc for continuity: from external SDK to hardware node to long-horizon human infrastructure.
Design/roadmap proposal described in the manuscript (no empirical testing provided here).
The engineering architecture for continuity is mapped to the theological pattern of kenosis and the symbolic pattern of Alpha and Omega, and the paper argues this mapping is structural rather than merely metaphorical.
Interpretive/mapping argument presented in the paper (theoretical/analogical reasoning).
The paper describes a storage primitive called Decomposed Trace Convergence Memory whose write-time decomposition and read-time reconstruction produce the continuity property.
Design proposal in the manuscript outlining a storage primitive and its read/write behavior (no empirical validation reported here).
Continuity is defined in the paper as a system property with seven required characteristics, distinct from memory and from retrieval.
Explicit definitional claim made in the manuscript (enumeration of seven characteristics described).
A companion paper (arXiv:2604.10981) positions the ATANT framework against existing memory, long-context, and agentic-memory benchmarks.
Citation to a companion paper that reportedly compares frameworks/benchmarks.
The formal evaluation framework for the property described here is the ATANT benchmark (arXiv:2604.06710), published separately with evaluation results on a 250-story corpus.
Citation to separate benchmark paper and reported evaluation on a 250-story corpus.
Engineering work to build the continuity layer has begun in public.
Statement in the paper asserting publicly visible engineering activity (no specific projects or quantitative audit included in this text).
The continuity layer is the most consequential piece of infrastructure the field has not yet built.
Normative claim/argument in the position paper (no empirical test presented in this text).
The most important architectural problem in AI is not the size of the model but the absence of a layer that carries forward what the model has come to understand (a "continuity layer").
Position paper argument and conceptual reasoning in the manuscript (no empirical study reported).
Code-generating Artificial Intelligence has gained popularity within both professional and educational programming settings over the past several years.
Background statement in the paper's introduction (observational claim about recent trends in AI adoption).
The emotional effect of the human teammate was significantly more positive and arousing compared to working with Copilot.
Subjective emotion measures (valence/arousal) collected in the study; reported significant differences favoring human teammate on positivity and arousal (n=22).
Several dimensions of participants' workload were significantly reduced when using GitHub Copilot.
Subjective workload measures collected during the experiment; multiple workload dimensions reported as significantly lower in the Copilot condition (n=22).
Participants performed significantly better with GitHub Copilot than with their human teammate.
Experimental comparison of task performance between Copilot-assisted individual condition and human pair condition; statistical significance reported in results (sample size n=22).
China leads initiatives of global governance (in AI).
Stated strategic observation in the paper's introduction (no empirical measures provided in the excerpt).
The United Kingdom and Germany have integrated exclusively with the US.
Analysis of cross-country collaboration and citation ties showing exclusive integration patterns for the UK and Germany with the US in the publication-based network comparisons to random models.
Illustrative welfare calculations suggest net gains in the tens of billions annually from the proposed policies/interventions.
Paper reports illustrative/calculatory welfare exercises (not structural estimates) that yield an aggregate welfare figure described as 'net gains in the tens of billions annually'.
The policy section proposes 'Neutral Inference', a four-pillar conduct framework consisting of QoS parity, routing transparency, FRAND-style non-discrimination, and tier transparency with release-pathway discipline.
Normative policy proposal laid out in the paper's policy section.
Under logit demand and symmetric rivals, the QoS gap is strictly increasing in inference-quality importance (alpha) and downstream margins.
Comparative statics derived from the analytical model (logit demand, symmetric rivals).
The main theoretical result provides an explicit local equilibrium characterization of the QoS gap under logit demand and symmetric rivals.
Analytical derivation in the formal game-theoretic model assuming logit demand and symmetric rivals; presented as the paper's main theoretical result.
An extension motivated by Anthropic's April 2026 release introduces a third mechanism, tier-based access discrimination, parameterized by a tier gap (tau) and partner-exclusivity (kappa).
Model extension in the paper explicitly adds parameters (tau, kappa) to represent tier-based access discrimination; motivated by a contemporaneous product release.
The model isolates two foreclosure mechanisms operating without predatory pricing: quality-of-service (QoS) discrimination against downstream rivals (via latency, throughput, context limits, or feature access) and routing bias in assistant-layer interfaces.
Formal game-theoretic model developed in the paper; mechanisms are derived and described in model set-up and analysis.
As generative AI commercializes, competitive advantage is shifting from model training toward inference, distribution, and routing.
Framing/introductory assertion in the paper (conceptual argument, literature synthesis), not an empirical test.
Evaluation demonstrates speed improvements of 6-7 minutes over traditional methods.
Reported empirical timing result in paper abstract: 6-7 minutes (presumably time to validate a change) compared to traditional methods (no further detail or sample size in abstract).
Evaluation demonstrates diagnostic coverage of 92-96%.
Reported empirical range in paper abstract (92-96% diagnostic coverage over evaluated cases; specific n not provided in abstract).
Evaluation demonstrates promising results in error detection (100%).
Reported empirical result in paper abstract: 100% error detection over evaluated scenarios (no sample size given in abstract).
By orchestrating agent collaboration atop this digital twin, Aether enables automated, rapid network change validation while reducing manual effort, minimizing errors, and improving operational agility and cost-effectiveness.
High-level claim supported by system design and subsequent empirical evaluation reported in paper (evaluation details referenced in abstract).
Aether agents use a unified Network Digital Twin integrating modeling, simulation, and emulation to maintain a consistent, up-to-date network view for verification and testing.
Design claim describing the digital twin's capabilities (modeling, simulation, emulation) as part of the system; presented in paper text.
Aether features an agentic architecture with five specialized Network Operations AI agents that collaboratively handle the change validation lifecycle from intent analysis to network verification and testing.
System architecture claim in paper describing five specialized agents (design specification; no empirical sample size).
Aether integrates Generative Agentic AI with a multi-functional Network Digital Twin to automate and streamline network change validation workflows.
Paper describes Aether system design and architecture combining agentic AI and a digital twin (design-level claim; architectural description).
To mitigate the curse of dimensionality in HRL, the paper introduces a capacity-aware state–action encoding mechanism that compresses the control interface into structured summary signals.
Methodological contribution described in the paper: proposed encoding mechanism intended to reduce state-action dimensionality and simplify the control interface.
The model shows cooperative behaviour supported by reward-punishment schemes that discourage deviations.
Analysis of the learned strategies/behaviour of the simulated deep reinforcement learning agents showing emergence of cooperation enforced via reward-punishment mechanisms (as reported in the paper).
A modern deep reinforcement learning model deployed to price goods in a repeated oligopolistic competition game with continuous prices converges to a collusive outcome in an amount of time that matches empirical observations (under reasonable assumptions on the length of a time step).
Simulation/experiment using a modern deep reinforcement learning model in a repeated oligopoly pricing game with continuous prices; claim that convergence time matches empirical observations. (No sample size, number of runs, or numerical convergence time provided in the excerpt.)
Previous research shows that [pricing] algorithms can exhibit collusive behaviour.
Citation/summary of prior literature (as stated in paper); no specific studies or sample sizes given in the excerpt.
A common response to these worries stresses that the goods derived from work can be found elsewhere, often in better activities, suggesting that the proliferation of AI-powered automation does not threaten the meaningfulness of people’s lives.
Description of a commonly offered counterargument in the literature and popular debate (conceptual/literature-summary; no empirical data or sample reported).
Intelligent textile technologies can effectively enhance operational efficiency in the textile industry's supply chain.
Overall result statement summarizing pilot study outcomes (inventory turnover, order fulfillment, cost control) as evidence; no numeric aggregate efficiency measure or sample size provided in the excerpt.
Intelligent textile technologies can effectively enhance supply chain transparency.
Conclusion based on the pilot study and the inclusion of blockchain-based data sharing in the model; no empirical transparency metrics or sample size reported in the provided text.
Intelligent textile technologies can effectively enhance supply chain collaboration.
Conclusion drawn from the pilot study reported in the paper; no quantitative measures of collaboration or supporting statistics provided in the supplied text.
A pilot study demonstrates significant improvements in customer satisfaction.
Reported pilot study in the paper; no details on how customer satisfaction was measured, sample size, or effect size are provided in the supplied text.
A pilot study demonstrates significant improvements in cost control.
Reported pilot study in the paper; the summary does not provide numerical cost reductions or sample size.
A pilot study demonstrates significant improvements in order fulfillment efficiency.
Reported pilot study in the paper; no sample size, quantitative metrics, or statistical tests reported in the provided text.