Evidence (4189 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
The three-axis framework provides a systematic way of diagnosing why misalignment arises in real-world systems and clarifies that alignment cannot be treated as a single technical property of models but an outcome shaped by how objectives are specified, how information is distributed, and whose interests count in practice.
Conceptual argument and analytic claim about the explanatory utility of the proposed framework (theoretical demonstration; no empirical tests reported).
Misalignment can be reconceptualised as arising along three interacting axes: objectives, information, and principals (drawing on the principal–agent framework).
Theoretical framing using the principal–agent framework; conceptual decomposition proposed in the paper (no empirical validation reported).
The alignment problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but whether it is aligned enough, for whom, and at what cost.
Normative and conceptual argument presented by the author proposing a governance-focused reconceptualization (theoretical analysis; no empirical data).
Statelessness is the load-bearing property explaining enterprises' preference for weaker but replayable retrieval pipelines, and DPM demonstrates this property is attainable without the decisioning penalty retrieval pays.
Synthesis/conclusion based on theoretical argument and empirical results presented (architectural analysis + experiments showing DPM performance and auditability).
The audit surface follows the same one-versus-N pattern: DPM logs two LLM calls per decision while summarization logs 83-97 on LongHorizon-Bench.
Empirical measurement on LongHorizon-Bench reported in the paper: logged LLM calls per decision are 2 for DPM vs 83-97 for summarization.
DPM is additionally 7-15x faster at binding budgets, making one LLM call at decision time instead of N.
Empirical runtime/efficiency measurement reported in the paper (range 7-15x speedup) comparing number of LLM calls and latency under tight memory budgets.
At a 20x compression ratio, DPM improves reasoning coherence by +0.53 (Cohen's h=1.13, p=0.0034) compared to summarization-based memory (paired permutation, n=10).
Paired permutation test over 10 cases at a 20x compression ratio; reported effect +0.53 with Cohen's h=1.13 and p=0.0034.
At a 20x compression ratio, DPM improves factual precision by +0.52 (Cohen's h=1.17, p=0.0014) compared to summarization-based memory (paired permutation, n=10).
Paired permutation test over 10 cases at a 20x compression ratio; reported effect +0.52 with Cohen's h=1.17 and p=0.0014.
On ten regulated decisioning cases at three memory budgets, DPM matches summarization-based memory at generous budgets and substantially outperforms it when the budget binds.
Empirical evaluation on 10 decisioning cases across three memory budgets; comparison between DPM and summarization-based memory as reported in the paper (n=10).
We propose Deterministic Projection Memory (DPM): an append-only event log plus one task-conditioned projection at decision time.
Method/architectural proposal described in the paper.
Under these conditions (alignment of forces and AI-driven ideation cost reductions), PIM offers a framework for organising governed discovery in real time and provides the methodological foundation for later applied work.
The paper presents PIM as a proposed framework and positioning statement for future applied research and implementations (theoretical proposal; no applied trials reported).
Organised attacks on complex problems can generate an epistemic mode transition: a shift from predominantly Knightian uncertainty toward probabilistically characterisable innovation dynamics as relevant structures become more visible, decomposed, coordinated, and testable.
The paper states and formalises this methodological claim within PIM as a central proposition (theoretical argumentation; no empirical validation reported).
When problem-relevant causal, informational, and coordinative forces become sufficiently aligned, the epistemic character of search changes and open-ended uncertainty can be progressively transformed into structured probabilistic search.
The claim is presented as the central theoretical argument and formalised within the PIM conceptual framework (theoretical/model-based argumentation; no empirical sample).
The proposed framework is intended to serve as a practical reference for engineering teams and decision-makers navigating enterprise LLM adoption.
Author statement of intent in the paper (qualitative claim about intended audience and utility).
The buy-versus-build decision should be viewed as a phased continuum: initial API adoption can give way to hybrid architectures as organizational maturity and requirements evolve.
Conceptual argument in the paper, illustrated by the Bills Converter experience (single-case narrative recommending phased/hybrid progression).
In the end-to-end development of the Bills Converter, the authors chose a closed-source, API-based approach over self-hosted or custom-built alternatives.
Case study: the Bills Converter system (single end-to-end project described in the paper).
This paper presents a multi-dimensional decision framework that synthesizes technical, financial, and strategic considerations into a coherent evaluation methodology for enterprise LLM adoption.
The paper is explicitly framed as presenting a decision framework; supported by conceptual synthesis and exposition within the manuscript (no reported quantitative validation).
ClawNet enables multiple users to collaborate securely through their respective agents.
Capability claim about the instantiated system (authors assert that ClawNet enables secure multi-user collaboration; excerpt contains no empirical security evaluation or user study).
We instantiate this paradigm in ClawNet, an identity-governed agent collaboration framework that enforces identity binding and authorization verification through a central orchestrator.
Implementation claim: authors state they built ClawNet as an instantiation of their paradigm (paper describes framework/architecture; no experimental evaluation included in excerpt).
Action-level accountability logs every operation against its owner's identity and authorization, ensuring full auditability.
Design claim describing an accountability primitive (paper asserts logging and auditability as a property; no audit or verification evidence shown in excerpt).
Scoped authorization enforces per-identity access control and escalates boundary violations to the owner.
Design/specification claim describing the scoped authorization governance primitive in the proposed paradigm (no empirical or security evaluation provided in excerpt).
The paradigm rests on three governance primitives: (1) a layered identity architecture that separates a Manager Agent from multiple context-specific Identity Agents; the Manager Agent holds global knowledge but is architecturally isolated from external communication.
Architectural/design claim describing the proposed layered identity primitive (presentation of design; no empirical validation in excerpt).
We propose a human-symbiotic agent paradigm in which each user owns a permanently bound agent system that collaborates on the owner's behalf, forming a network whose nodes are humans rather than agents.
Design proposal / conceptual architecture presented in the paper (no large-scale deployment or empirical evaluation described in excerpt).
The next frontier for AI agents lies not in stronger individual capability, but in the digitization of human collaborative relationships.
Normative/strategic claim advanced by the authors as the central thesis (conceptual argument, no empirical test reported).
Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate.
Theoretical/argumentative claim presented as background motivation (conceptual reasoning, citation not provided in excerpt).
We propose seven interface primitives operationalizing verification-centered HCI.
Design contribution: specification of seven interface primitives within the paper (conceptual/design proposal); no user-study or empirical validation reported.
We map synthetic literacy -- oral input generating literate output -- as the defining feature of this transition.
Conceptual mapping and theoretical framing within the paper; supported by examples from technology trends but no empirical evaluation reported.
Knowledge workers become adversarial auditors rather than keystroke-producers.
Projected role-shift based on the verification-bottleneck thesis and interdisciplinary supporting arguments; no empirical longitudinal workforce study reported.
The central contribution identifies the verification bottleneck: as AI collapses production friction, the primary constraint shifts from generation to evaluation.
Theoretical argument supported by literature synthesis across multiple fields; no direct experimental quantification provided.
The governance architecture (privacy implemented as physics rather than policy, founder-controlled class shares on non-negotiable architectural commitments) is inseparable from the product itself.
Normative and architectural argument in the paper tying governance design choices to product architecture (no empirical validation in this text).
Physics limits now constraining the model layer make the continuity layer newly consequential.
Analytical argument in the paper linking physical constraints on model scaling to increased importance of continuity (no empirical measurement included here).
The paper proposes a four-layer development arc for continuity: from external SDK to hardware node to long-horizon human infrastructure.
Design/roadmap proposal described in the manuscript (no empirical testing provided here).
The engineering architecture for continuity is mapped to the theological pattern of kenosis and the symbolic pattern of Alpha and Omega, and the paper argues this mapping is structural rather than merely metaphorical.
Interpretive/mapping argument presented in the paper (theoretical/analogical reasoning).
The paper describes a storage primitive called Decomposed Trace Convergence Memory whose write-time decomposition and read-time reconstruction produce the continuity property.
Design proposal in the manuscript outlining a storage primitive and its read/write behavior (no empirical validation reported here).
Continuity is defined in the paper as a system property with seven required characteristics, distinct from memory and from retrieval.
Explicit definitional claim made in the manuscript (enumeration of seven characteristics described).
A companion paper (arXiv:2604.10981) positions the ATANT framework against existing memory, long-context, and agentic-memory benchmarks.
Citation to a companion paper that reportedly compares frameworks/benchmarks.
The formal evaluation framework for the property described here is the ATANT benchmark (arXiv:2604.06710), published separately with evaluation results on a 250-story corpus.
Citation to separate benchmark paper and reported evaluation on a 250-story corpus.
Engineering work to build the continuity layer has begun in public.
Statement in the paper asserting publicly visible engineering activity (no specific projects or quantitative audit included in this text).
The continuity layer is the most consequential piece of infrastructure the field has not yet built.
Normative claim/argument in the position paper (no empirical test presented in this text).
The most important architectural problem in AI is not the size of the model but the absence of a layer that carries forward what the model has come to understand (a "continuity layer").
Position paper argument and conceptual reasoning in the manuscript (no empirical study reported).
The study uses a combination of cognitive systems theory, diplomatic negotiation models, and empirical Human-in-the-Loop experiments as its methodological basis.
Methods description in the paper listing theoretical foundations and empirical HITL experiments as components of the study design.
The paper outlines recommendations for international norm development, capacity building, and the creation of interoperable, transparent AI systems for diplomacy.
Policy recommendation section of the paper proposing international norms, capacity-building measures, and interoperable transparent system design.
Experimental HITL data indicate a 17% reduction in cognitive bias for hybrid human-AI teams.
Human-in-the-Loop (HITL) experiments reported in the paper; comparison of cognitive bias measures between hybrid teams and baseline (sample size not provided in summary).
Experimental HITL data indicate that hybrid human-AI teams achieved 23% faster consensus-building.
Human-in-the-Loop (HITL) experiments reported in the paper; experimental comparison between hybrid human-AI teams and baseline (details on sample size not reported in summary).
The framework is validated through real-world and simulated case studies, including UN ceasefire mediation, EU sentiment-monitoring for conflict diplomacy, and African Union peacekeeping planning.
Validation reported via a set of real-world and simulated case studies described in the paper (case study methodology; specific cases named).
Each layer augments a core dimension of diplomatic reasoning, enabling interpretable AI contributions, foresight analysis, culturally sensitive framing, and legally compliant outputs.
Conceptual mapping of each proposed layer to functional capabilities described in the paper; claimed alignment with interpretability, foresight, cultural framing, and legal compliance.
The study proposes a five-layer Human-AI collaboration architecture tailored to multilateral diplomacy consisting of: (1) Context Modeling, (2) Scenario Generation, (3) Cognitive Interfacing, (4) Decision Support, and (5) Ethical-Normative Governance.
Architectural proposal in the paper based on synthesis of literature and design choices; claimed as the output of the conceptual framework.
This paper develops the concept of Artificial Diplomacy as a structured interface between human strategic cognition and machine-supported reasoning.
Theoretical development drawing on cognitive systems theory and diplomatic negotiation models; described design and conceptual argumentation in the paper.
Policymakers can reinforce these conditions by shifting from technology-neutral principles to auditable process standards that couple AI investment with reskilling and data-quality obligations.
Policy recommendation based on the study's findings and synthesis; presented as a normative implication rather than empirically tested within the study. (Sample size not reported.)
Leaders should fund training coverage and design (not just headline hours), equip non-specialists to interpret model outputs, pair performance artefacts with participatory routines, and treat explainability as a usability requirement to achieve durable, auditable value in safety-critical energy contexts.
Prescriptive recommendation based on a 'field-tested playbook' synthesised from the multi-case qualitative study (interviews, surveys, documents). The claim is drawn from authors' interpretation of cross-case patterns rather than causal inference. (Sample size not reported.)