Evidence (4114 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
We sped up the widely used LASSO algorithm by over 2x.
Benchmarking experiment reported in the paper comparing LASSO runtime/performance with and without SimpleTES (paper states >2x speedup).
SimpleTES consistently outperforms both frontier-model baselines and sophisticated optimization pipelines.
Comparative experimental evaluation vs. frontier-model baselines and optimization pipelines across the reported problems (paper claim).
Across 21 scientific problems spanning six domains, SimpleTES discovers state-of-the-art solutions using gpt-oss models.
Empirical experiments reported across 21 problems in six domains using gpt-oss models (paper states 21 problems).
We introduce Simple Test-time Evaluation-driven Scaling (SimpleTES), a general framework that strategically combines parallel exploration, feedback-driven refinement, and local selection.
Methodological contribution described in the paper (framework design and algorithmic description).
There is a positive relationship between disagreement among agents and trading volume in the simulated markets.
Observed correlation in the simulated open-call auction between measured disagreement (e.g., dispersion in beliefs) and trading volume; described as replicating classic experimental findings.
These individual-level patterns aggregate into equilibrium dynamics that replicate classic experimental findings (Smith et al., 1988), including the predictive power of excess demand for future prices.
Aggregation of simulated agent behavior in the open-call auction producing market-level time series; comparison of market dynamics to classic experimental benchmark (Smith et al., 1988) and reported finding that excess demand predicts future prices.
AI agents form recency-weighted extrapolative beliefs (i.e., overweight recent price history when forecasting future prices).
Analysis of agents' forecasts and trading behavior in the simulated open-call auction populated by autonomous LLM agents; identification of extrapolative forecasting patterns reported as a main finding.
AI agents exhibit a pronounced disposition effect.
Simulated open-call auction populated by autonomous LLM agents in experimental asset-market simulations; behavioral trading data showing agents' selling/holding patterns (paper describes this as a main documented finding).
The governance architecture (privacy implemented as physics rather than policy, founder-controlled class shares on non-negotiable architectural commitments) is inseparable from the product itself.
Normative and architectural argument in the paper tying governance design choices to product architecture (no empirical validation in this text).
Physics limits now constraining the model layer make the continuity layer newly consequential.
Analytical argument in the paper linking physical constraints on model scaling to increased importance of continuity (no empirical measurement included here).
The paper proposes a four-layer development arc for continuity: from external SDK to hardware node to long-horizon human infrastructure.
Design/roadmap proposal described in the manuscript (no empirical testing provided here).
The engineering architecture for continuity is mapped to the theological pattern of kenosis and the symbolic pattern of Alpha and Omega, and the paper argues this mapping is structural rather than merely metaphorical.
Interpretive/mapping argument presented in the paper (theoretical/analogical reasoning).
The paper describes a storage primitive called Decomposed Trace Convergence Memory whose write-time decomposition and read-time reconstruction produce the continuity property.
Design proposal in the manuscript outlining a storage primitive and its read/write behavior (no empirical validation reported here).
Continuity is defined in the paper as a system property with seven required characteristics, distinct from memory and from retrieval.
Explicit definitional claim made in the manuscript (enumeration of seven characteristics described).
A companion paper (arXiv:2604.10981) positions the ATANT framework against existing memory, long-context, and agentic-memory benchmarks.
Citation to a companion paper that reportedly compares frameworks/benchmarks.
The formal evaluation framework for the property described here is the ATANT benchmark (arXiv:2604.06710), published separately with evaluation results on a 250-story corpus.
Citation to separate benchmark paper and reported evaluation on a 250-story corpus.
Engineering work to build the continuity layer has begun in public.
Statement in the paper asserting publicly visible engineering activity (no specific projects or quantitative audit included in this text).
The continuity layer is the most consequential piece of infrastructure the field has not yet built.
Normative claim/argument in the position paper (no empirical test presented in this text).
The most important architectural problem in AI is not the size of the model but the absence of a layer that carries forward what the model has come to understand (a "continuity layer").
Position paper argument and conceptual reasoning in the manuscript (no empirical study reported).
China leads initiatives of global governance (in AI).
Stated strategic observation in the paper's introduction (no empirical measures provided in the excerpt).
The United Kingdom and Germany have integrated exclusively with the US.
Analysis of cross-country collaboration and citation ties showing exclusive integration patterns for the UK and Germany with the US in the publication-based network comparisons to random models.
Illustrative welfare calculations suggest net gains in the tens of billions annually from the proposed policies/interventions.
Paper reports illustrative/calculatory welfare exercises (not structural estimates) that yield an aggregate welfare figure described as 'net gains in the tens of billions annually'.
The policy section proposes 'Neutral Inference', a four-pillar conduct framework consisting of QoS parity, routing transparency, FRAND-style non-discrimination, and tier transparency with release-pathway discipline.
Normative policy proposal laid out in the paper's policy section.
Under logit demand and symmetric rivals, the QoS gap is strictly increasing in inference-quality importance (alpha) and downstream margins.
Comparative statics derived from the analytical model (logit demand, symmetric rivals).
The main theoretical result provides an explicit local equilibrium characterization of the QoS gap under logit demand and symmetric rivals.
Analytical derivation in the formal game-theoretic model assuming logit demand and symmetric rivals; presented as the paper's main theoretical result.
An extension motivated by Anthropic's April 2026 release introduces a third mechanism, tier-based access discrimination, parameterized by a tier gap (tau) and partner-exclusivity (kappa).
Model extension in the paper explicitly adds parameters (tau, kappa) to represent tier-based access discrimination; motivated by a contemporaneous product release.
The model isolates two foreclosure mechanisms operating without predatory pricing: quality-of-service (QoS) discrimination against downstream rivals (via latency, throughput, context limits, or feature access) and routing bias in assistant-layer interfaces.
Formal game-theoretic model developed in the paper; mechanisms are derived and described in model set-up and analysis.
As generative AI commercializes, competitive advantage is shifting from model training toward inference, distribution, and routing.
Framing/introductory assertion in the paper (conceptual argument, literature synthesis), not an empirical test.
To mitigate the curse of dimensionality in HRL, the paper introduces a capacity-aware state–action encoding mechanism that compresses the control interface into structured summary signals.
Methodological contribution described in the paper: proposed encoding mechanism intended to reduce state-action dimensionality and simplify the control interface.
The proposed real-time adaptive safety filter improves energy and cost efficiency — achieving up to 50% savings compared to a rule-based controller.
Empirical comparison reported in the paper between the safety-filter-enabled controller and a rule-based controller; exact experimental setup and sample size not provided in the excerpt.
We propose a real-time adaptive safety-filter to ensure that the system operates within predefined constraints during demand-side flexibility provision; the proposed real-time adaptive safety filter guarantees full compliance with flexibility requests from system operators.
Algorithmic proposal described in the paper; claim of guarantee likely supported by theoretical argument and/or tests in the paper (no sample size provided in excerpt).
A deep deterministic policy gradient algorithm is used as the core deep reinforcement learning method, enabling the controller to learn an optimal heating strategy through interaction with the building thermal model while maintaining occupant comfort, minimizing energy cost, and providing flexibility.
Methodological description in paper specifying DDPG as the core algorithm and its intended objectives; evidence likely includes simulation or experimental training on a building thermal model (sample size/details not given in excerpt).
This paper presents a safe deep reinforcement learning-based control framework to optimize building space heating while enabling demand-side flexibility provision for power system operators.
Methodological claim describing the proposed framework (DDPG + safety filter); supported by the paper's presented algorithmic design and experiments (details not provided in excerpt).
Enabling demand-side flexibility, particularly in heating, ventilation and air conditioning systems, is essential for grid stability and energy efficiency given the growing share of intermittent renewable energy sources.
Conceptual claim made in paper as motivation; no experimental sample size provided in excerpt.
The Transformer shows stronger robustness and generalization under data perturbations and achieves competitive results.
Empirical robustness experiments using the nine synthetic datasets and perturbation tests; authors' reported comparative performance and generalization behavior.
The balanced bagging ensemble offers a better balance of performance and efficiency compared to the Transformer and the baseline.
Empirical comparisons in experiments on the proprietary dataset and synthetic perturbations; authors' summary of comparative trade-offs between methods.
Both proposed approaches consistently outperform the baseline methodology (p < 0.001) in terms of profit.
Empirical results comparing proposed methods to baseline on proprietary dataset and synthetic datasets; statistical significance reported (ANOVA, Friedman, and pair-wise comparisons) with p < 0.001.
The empirical analysis is conducted on a proprietary large-scale auto insurance dataset comprising 51,618 customers and is complemented by validation on nine synthetic datasets to assess robustness.
Dataset description reported in the paper (explicit sample size and number of synthetic datasets).
Both proposed approaches incorporate the asymmetric financial cost structure of insurance and operate under operational selection limits.
Methodological claim in the paper; both models explicitly integrate cost structure and selection/omission constraints.
This study evaluates a lightweight Transformer-based architecture capable of learning richer feature representations for the cold-start insurance classification problem.
Method description in the paper: proposed Transformer architecture (model design described by authors).
This study evaluates a balanced bagging ensemble specifically designed to handle class imbalance and maximize expected profit under explicit customer-omission constraints.
Method description in the paper: proposed model architecture and optimization objective (ensemble with profit maximization and omission constraints).
Future work improving geometric fidelity, data efficiency, and integrated XAI workflows will lead to more accurate and faster 3D molecular prediction and generation and ensure transparent, reliable guidance in drug design.
Forward-looking recommendations and projections in the review; presented as hoped-for research directions rather than empirically demonstrated outcomes.
The authors propose an integrated Q-BioFusion framework that synergizes quantum computing, autonomous experimentation, and generative models to address systemic R&D constraints.
Proposed conceptual framework within the paper; no experimental implementation, benchmarking, or sample sizes reported in the provided text.
Explainable AI (XAI) methods support transparent validation and trustworthy guidance during computer simulation in drug design.
Argument in the review advocating XAI for transparency and validation; no empirical validation or metrics provided in the provided text.
Data scarcity in biological assays can be mitigated via Few-Shot Learning and meta-learning approaches.
Review recommendation and discussion of methodological approaches to data-scarcity problems; no empirical evidence, datasets, or success rates provided in the provided text.
De novo molecular design is being applied using biological foundation models and flow-matching generative architectures.
Review describes practical applications and method classes in de novo design; no experimental results or sample sizes are reported in the provided text.
The performance of AI models in chemoinformatics is intrinsically linked to the quality of molecular representation.
Conceptual and literature-based argument presented in the review emphasizing representational choice as a key determinant of model performance; no benchmarking details given in the provided text.
AI can predict pharmacodynamic (PD) and toxicological effects significantly earlier in the drug discovery process.
Review claim asserting earlier prediction capability via AI models; no empirical metrics, study sizes, or quantified timing improvements given in the provided text.
AI technology, by simulating complex biological systems, has accelerated the innovation of the entire drug discovery pipeline.
Claim made in the review, supported by synthesized examples and cited AI applications across the pipeline (no original empirical evaluation or quantified acceleration provided in the provided text).
Managers should view AI as a strategic tool to enhance SCR (not only as cost-saving), and focus on optimizing resource allocation, increasing R&D investment, and enhancing organizational agility to amplify AI's resilience effects.
Authors' practical recommendations derived from empirical findings and mechanism analysis.