Evidence (4114 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
AutoScientists is a decentralized team of AI agents that interpret a shared experimental state, self-organize into teams around promising hypotheses, critique proposals before using experimental compute, and share successes and failures to reduce redundant exploration.
System design and implementation described in the paper (architecture and agent protocols); qualitative description of agent behaviors and coordination mechanisms; demonstrated in experiments.
GENESIS is built on three composable primitives (agents, skills, hooks) and a knowledge layer (SYNAPSE) that doubles as the source of ground truth and the recipient of every artifact the framework produces, making capabilities compound across runs.
Architectural description in the paper; claim about knowledge base acting as ground truth and enabling capability compounding (design-level claim). No quantitative evaluation given in the abstract.
GENESIS is an agentic AI framework that converts intents (e.g., a specification clause, a telemetry anomaly, or a research hypothesis) into solutions validated with over-the-air experiments, fed back into a persistent knowledge base.
System design / implementation claim presented in the paper (description of proposed framework). The abstract does not report empirical evaluation metrics or sample size.
Large Language Models (LLMs) have compressed comparable R&D work in general software engineering from days to minutes.
Paper's stated comparison/claim (likely based on prior reports or authors' experience); no experimental details or sample size provided in the abstract.
Operational reasoning paradigms such as ReasonOps may become foundational infrastructure for next-generation trustworthy AI ecosystems.
Author's forward-looking argument / conjecture about the potential future impact and adoption of operational reasoning paradigms; presented as an argument rather than demonstrated empirically in the excerpt.
The paper presents the ReasonOps architecture, demonstrates its workflow using an autonomous braking system analysis example, and discusses its potential role in future safety-critical autonomous AI systems.
Author statement about the paper's content and demonstration (explicitly claims an architecture and an example walkthrough); evidence is the paper's own descriptive content.
The proposed paradigm integrates semantic interpretation, autoformalization, symbolic reasoning, theorem proving, runtime assurance, probabilistic reliability estimation, and adaptive correction into a unified reasoning lifecycle.
Author claim about the architecture and components of ReasonOps; presented as a proposed integrated lifecycle in the paper (no empirical evaluation reported in excerpt).
ReasonOps treats reasoning as a continuously monitored, verifiable, reliability-aware operational process rather than an isolated inference task.
Author description of the ReasonOps paradigm and its operational stance (conceptual framework described in paper).
This paper introduces ReasonOps, a unified operational paradigm for trustworthy verified reasoning systems.
Declarative claim about the paper's contribution (introduction of a named paradigm); supported by the paper itself (architectural description and example claimed).
Recent advances in theorem proving, autoformalization, symbolic reasoning, and tool-augmented language models demonstrate substantial progress toward machine-assisted formal reasoning.
Author statement citing multiple research directions (theorem proving, autoformalization, symbolic reasoning, tool-augmented LMs); no specific empirical results or quantitative studies provided in excerpt.
Large Language Models (LLMs) have transformed artificial intelligence from primarily generative systems into increasingly capable reasoning agents.
Author assertion in paper's introduction; conceptual argument referencing recent developments in LLMs (no empirical study or sample size reported in text excerpt).
There exists a data supply chain that runs from individual translators through language service providers (LSPs) and platforms to model developers.
Mapping and descriptive analysis of industry supply chains and intermediary roles provided in the paper; conceptual and empirical examples of flows of translation data from translators to model developers. No numerical sample reported.
Article 30-4 of the Japanese Copyright Act legitimates a mode of use the paper terms 'appropriation without consumption'—i.e., mining works for statistical features rather than reading or experiencing them.
Textual/legal analysis of Article 30-4 of the Japanese Copyright Act and its interpretation; comparative legal reading presented in the paper. No numerical sample reported.
The development of statistical machine translation (SMT), neural machine translation (NMT), the Transformer architecture, and multilingual large language models (LLMs) cannot be disentangled from the accumulation of translation data (TM/parallel corpora).
Historical and technical literature review linking MT/NLP methodological advances to the availability and use of parallel corpora and TM; comparative analysis of model development histories described in the paper. No numerical sample reported.
Translation memories (TM) and parallel corpora preserve a one-to-one correspondence between source and target text and therefore constitute extraordinarily valuable supervised training data for machine translation.
Conceptual argument and literature review of machine translation practice (discussion of TM/parallel corpora as supervised training data); examples and descriptive evidence from MT research and industry practice presented in the paper. No numerical sample reported.
To balance promotion of innovation with preservation of human creativity, it is essential to revise existing laws and introduce novel approaches such as defining a specific intellectual property right for AI-generated works or designating ownership among associated human agents.
Normative recommendation derived from the paper's comparative legal analysis and discussion of enforcement challenges (no empirical sample size).
Artificial intelligence systems are capable of autonomously generating artistic, literary, musical works, and even inventions without direct human intervention.
Stated as part of the paper's premise and supported by the paper's literature/theoretical review of advances in AI creative and inventive capabilities (no empirical sample size reported).
The proposed policy framework contributes to establishing a foundation for Vietnam to proactively embrace the Agent Economy safely and effectively.
Claim in abstract about the intended contribution/impact of the proposed framework; no empirical evaluation or measured outcomes presented.
The Agent Economy promises substantial gains in productivity and innovation.
Asserted in paper abstract as an anticipated outcome; no empirical measurement, sample size, or quantified effect provided.
Adoption of Claude Code increases cumulative lifetime languages used by +0.51.
Panel analysis of 5,838 developers over 28 months using the Callaway & Sant'Anna estimator; treatment = first Claude-co-authored commit.
Adoption of Claude Code increases the count of newly-used languages by +0.31.
Same dataset and staggered-rollout estimator (Callaway & Sant'Anna), treatment = first Claude-co-authored commit; not-yet-treated controls.
Adoption of Claude Code increases Shannon language entropy by +0.14.
Estimated with the doubly robust Callaway & Sant'Anna approach on the 5,838-developer panel over 28 months, using first Claude-co-authored commit as treatment.
Adoption of Claude Code increases the number of distinct programming languages used by a developer by +0.83.
Same panel and staggered-rollout estimation as above (Callaway & Sant'Anna), treatment = first Claude-co-authored commit.
Adoption of Claude Code increases the number of repositories a developer contributes to by +1.5 (monthly).
Same panel (5,838 developers, 28 months) and estimator (Callaway & Sant'Anna). Treatment = first Claude-co-authored commit; not-yet-treated controls.
Adoption of Claude Code is associated with an increase of +41 monthly commits per developer.
Analysis of a panel of 5,838 GitHub developers observed monthly over 28 months, exploiting staggered rollout of Claude Code (May 2025–Jan 2026). Treatment defined by developer's first Claude-co-authored commit; not-yet-treated developers used as controls. Estimates from the doubly robust Callaway and Sant'Anna (2021) staggered-difference-in-differences estimator.
Case studies demonstrate exact power-water consistency between virtual attributions and physical generation-side withdrawals.
Simulation results on IEEE 30-bus and 118-bus test systems reported in the paper claiming exact consistency (two test systems used).
Case studies on the IEEE 30-bus and 118-bus test systems demonstrate reliable convergence of the method.
Simulation experiments reported in the paper using two standard test systems (IEEE 30-bus and IEEE 118-bus). Sample size: 2 test systems.
Combined with fixed-point coordination, the framework enforces consistency between virtual water attribution and physical generation-side withdrawals.
Methodological claim about algorithmic properties (fixed-point coordination used to align attributions with physical withdrawals); supported by theoretical description and later case-study demonstrations.
The framework represents dispatch optimization as a differentiable optimization layer embedded within a deep learning architecture, enabling efficient end-to-end learning of coordination policies while preserving operational feasibility.
Methodological description claiming an implementation approach (differentiable optimization layer within deep learning); evidence likely from algorithmic implementation and simulation experiments described later in the paper.
This paper develops an operational electricity-computation-water (ECW) nexus framework that internalizes virtual water impacts directly into power system dispatch.
Primary methodological contribution described in the paper (development and formulation of an ECW framework; implementation details implied but not quantified in the excerpt).
The expansion of data centers (DCs) drives a sustained increase in electricity demand and associated water withdrawals at generation sites.
Background assertion in paper introduction; general empirical observation motivating the work (no specific dataset or sample size reported in the excerpt).
The aim is to keep autonomous agency composable while keeping accountability non-negotiable, so that coordination itself can become shared infrastructure for a human-AI society that is open, pluralistic, and governable.
Stated design/ethical objective in the paper; normative claim about intended social and governance outcomes rather than an empirically validated result.
FP is designed to wrap and bridge existing protocols rather than replace them, enabling incremental adoption while reducing integration and governance overhead.
Design rationale/claim in the paper about interoperability and incremental adoption strategy; no empirical deployment, integration case studies, or measured overhead reductions presented.
FP treats policy, provenance, and audit as first-class concerns.
Design/architectural claim in the paper stating that policy, provenance, and audit are prioritized within FP; no empirical compliance or audit trials presented.
FP provides economic primitives for metering, receipts, and settlement.
Design claim in the paper listing economic primitives as part of FP; no deployment or economic experiments reported.
FP supports native multi-party organization and event-based collaboration.
Feature/architecture claim in the paper describing native support for multi-party organization and event-driven collaboration; no empirical evaluation or user studies provided.
FP unifies heterogeneous entities, including agents, tools, resources, humans, institutions, and organizations.
Design specification/feature claim in the paper describing FP's data and entity model; no empirical interoperability study reported.
This paper introduces the Foundation Protocol (FP), a graph-first coordination layer for an emerging human-AI society.
Claim of authorship/introduction in the paper; architectural/design proposal rather than an evaluated system.
Agents need to form reliable relationships, organize multi-agent work, exchange value, support an AI economy, and stay safe and accountable under real-world oversight.
Normative/requirements statement in the paper describing necessary capabilities for scaled multi-agent systems; no empirical validation or experimental data provided.
Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and increasingly interact with one another.
Statement in the paper's introductory/abstract text presenting an observed trend; conceptual/qualitative claim without empirical data or measured sample.
The paper proposes five evaluation dimensions for AutoResearch systems: novelty, validity, impact, reliability, and provenance.
Paper explicitly proposes these five dimensions as an evaluation rubric; conceptual proposal.
The field can be organized around five workflow conditions: literature and research grounding; hypothesis formation and planning; experimentation and tool use; feedback, validation, and review; and reporting and knowledge communication.
Authors propose this five-condition organizational framework as part of their survey and synthesis; conceptual contribution.
Vibe Research denotes the human-steered region of prompt-based assistance and human-verified execution within AutoResearch.
Paper-introduced terminology and conceptual delineation of a sub-region of the AutoResearch spectrum; definitional statement.
AutoResearch is defined as the developmental spectrum of AI-powered scientific workflow automation.
Paper provides an explicit definitional framing (terminology introduced by authors); conceptual contribution rather than empirical finding.
This shift marks a transition from task-level AI for science to workflow-level research automation.
Conceptual argument backed by literature survey and examples of systems that coordinate multiple research tasks; no single quantitative study reported.
Scientific research is being reshaped by AI systems that move beyond isolated assistance toward longer-horizon workflows spanning literature grounding, hypothesis generation, experimentation, validation, reporting, and revision.
Survey / conceptual synthesis of recent AI research systems and literature; paper presents this as an observed trend rather than reporting original empirical measurements.
XWind shows consistent gains across workload types, load levels, and GPU generations.
Reported experimental results spanning multiple workload types, different load levels, and various GPU generations (details in main paper); abstract states consistency of gains.
XWind reduces P99 end-to-end latency by up to 98% over baselines such as power-capping and GPU idling.
Experimental results on the 64-GPU A100 testbed with emulated wind sites and Azure traces; comparison against baseline strategies including power-capping and GPU idling.
XWind reduces P99 end-to-end latency by up to 52% over the strongest contender (also our idea).
Experimental results on the 64-GPU A100 testbed with emulated wind sites and Azure traces; comparison against a 'strongest contender' baseline (described as another idea from the authors).
We build XWind, a lightweight, reactive, and workload-agnostic AI inference router that uses only real-time signals (inference latency, KV-cache utilization, and queue depth) to dynamically configure sites and distribute requests under variable wind power.
System implementation described in paper; design specification lists the three real-time signals used.