Evidence (4114 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Innovation
Remove filter
The results demonstrate a 'less is more' pattern: simpler combination (memory + reflection) yields better performance than adding architectural complexity.
Authors' interpretation of the ablation study results showing that adding multiple extra mechanisms degraded performance compared to the memory+reflection configuration.
A nine-variant ablation reveals that memory and reflection together produce a 58% cumulative improvement over the stateless baseline.
Ablation study with nine variants on the sequential portfolio benchmark; authors report a 58% cumulative improvement when combining memory and reflection versus the stateless baseline.
AEL outperforms five published self-improving methods and all non-LLM baselines while maintaining the lowest variance among all LLM-based approaches on the benchmark.
Comparative empirical evaluation on the same sequential portfolio benchmark, comparing AEL to five published self-improving methods and multiple non-LLM and LLM baselines (reported relative ranking and variance).
On a sequential portfolio benchmark (10 sector-diverse tickers, 208 episodes, 5 random seeds), AEL achieves a Sharpe ratio of 2.13 ± 0.47.
Empirical experiment on the sequential portfolio benchmark with 10 tickers, 208 episodes, evaluated across 5 random seeds (reported Sharpe ratio and standard deviation).
We introduce Agent Evolving Learning (AEL), a two-timescale framework in which a Thompson Sampling bandit at the fast timescale learns which memory retrieval policy to apply each episode, while LLM-driven reflection at the slow timescale diagnoses failure patterns and injects causal insights into the agent's decision prompt.
Methodological description and proposed algorithmic design in the paper (no additional experimental sample size—design/algorithmic claim).
This survey provides scholars and practitioners with a structured understanding of how agentic AI is reshaping financial markets and identifies critical research directions to ensure these systems enhance both operational efficiency and market resilience.
Statement of contribution in the paper; based on the paper's literature review, taxonomy, and identified research agenda.
Agentic AI offers substantial potential for enhanced market efficiency, liquidity provision, and risk management.
Survey synthesis of foundational research, market applications, and technical architectures suggesting potential benefits; no original empirical evaluation reported.
The emergence of agentic AI represents a fundamental transformation in financial markets, characterized by autonomous systems capable of reasoning, planning, and adaptive decision-making with minimal human intervention.
Conceptual claim stated in the survey's introduction and synthesis of recent advances; based on literature review and theoretical framing rather than new empirical data.
Countries around the world are rushing to encourage greater investment and growth in their domestic AI industries.
Statement/observation presented in the paper's introduction; based on the paper's descriptive overview of global policy activity (literature review / policy survey implied). No sample size reported.
Dynamic combinations of AI and organizational structure can help managers overcome traditional trade-offs between scale and scope, opening pathways for scalable, cross-market expansion.
Managerial implication drawn from the paper's longitudinal case study of ByteDance; qualitative inference from observed organizational practices and AI deployment patterns.
AI transforms the scale–scope nexus from being a trade-off into a source of strategic advantage.
Synthesis and theoretical claim derived from longitudinal case study of ByteDance showing simultaneous scaling and diversification enabled by AI and organizational design.
AI reverses the conventional logic of the resource-based view: rather than valuable resources enabling diversification, diversification amplifies the value of resources.
Theoretical argument supported by the ByteDance case study; paper presents this as a theorized inversion based on observed patterns in the single-case study.
The value of AI learning transfer across domains is contingent on access to structurally related data that allow learning to transfer across domains.
Claim derived from the ByteDance longitudinal case study showing conditions for successful cross-domain AI transfer (qualitative evidence emphasizing data structure/relatedness).
AI evolves and improves through self-learning and cross-fertilization across domains, becoming increasingly valuable as learning accumulates.
Theoretical claim supported by longitudinal observations from the ByteDance case study (qualitative evidence from repeated AI deployments over time).
ByteDance leveraged AI and adaptive organizational design to scale rapidly and diversify across industries and markets without incurring rising costs or coordination complexity.
Longitudinal single-case (qualitative) study of ByteDance described in the paper; method reported as a longitudinal case study of one firm.
Effective governance of AI as a dual-use technology will likely require a multilateral institutional architecture functionally analogous (though not identical) to the role performed by the IAEA in the nuclear domain, with explicit safeguards against co-option of hardware controls for domestic repression.
Normative institutional design argument and analogy to the IAEA presented in the paper (policy proposal; comparative institutional analysis).
Hardware-layer governance, including chip-level attestation mechanisms such as FlexHEG, trusted execution environments, confidential computing, and complementary software-layer safeguards, offers a defense-in-depth alternative to the current binary framing of openness vs restriction.
Proposed governance architecture and technical discussion in the paper citing concrete mechanisms (technical-proposal and conceptual analysis; no experimental or deployment data reported in the summary).
The global concentration of compute infrastructure makes open-weight models one of the most viable pathways to sovereign AI capacity in the Global South.
Analysis of global compute infrastructure concentration and pathway mapping in the paper (conceptual/structural analysis; no numerical sample provided in the summary).
Long-term prospects of agentic AI include catalyzing accelerated innovation in physical design via autonomous algorithm discovery, continuous tool improvement, and closed-loop learning from large design corpora.
Forward-looking conclusion in the paper; framed as the authors' projection based on survey synthesis rather than as an empirically demonstrated outcome in the abstract.
Interfaces between agentic systems and traditional EDA frameworks are a key area of focus and enable tighter integration of agent capabilities into existing design workflows.
Survey highlights interfaces between agents and EDA frameworks as a focus area; claim is descriptive of research direction rather than reporting empirical outcomes.
Autonomous agents can explore heuristic spaces for placement, routing, and partitioning, enabling autonomous exploration of design heuristics.
Presented as an emphasized capability/area of research in the survey; the abstract asserts this possibility but does not report empirical benchmarks or sample sizes.
Tool-integrated agents can be used for algorithm evolution, debugging, and workflow automation in physical design R&D.
Paper emphasizes this as a primary area of application in the survey; rationale and examples are discussed but no quantitative trial sizes are given in the abstract.
Agentic AI systems can comprehend user specifications, modify code, run EDA tools, analyze results, perform multi-step reasoning, and iteratively refine design heuristics—unlike earlier ML uses that focused narrowly on prediction or optimization subroutines.
Descriptive claim in the paper contrasting agentic AI capabilities with earlier ML approaches; presented as an overview of functional capabilities rather than empirical measurement.
Recent advances in large language models (LLMs) and tool-using autonomous agents present new opportunities for accelerating research and development in physical design.
Stated as a central thesis in the paper's abstract/survey; based on the authors' synthesis of recent advances and emerging applications (no empirical sample or quantified evaluation reported in the abstract).
Local governments should develop coordinated AI policy mixes, align differentiated policy pathways with regional conditions, and prioritize technology R&D support, talent cultivation and collaboration, and application demonstration and promotion to sustain long-term regional competitiveness.
Authors' policy recommendations derived from the fsQCA findings and interpretation of which conditions are recurrent/core across configurations.
Technology R&D support, talent cultivation and collaboration, and application demonstration and promotion are the most recurrent core policy conditions across the identified configurations.
Frequency/core-condition analysis within the fsQCA configurations reported by the authors showing these three policy instruments repeatedly appear as core conditions.
The study identifies three driving pathways to sustained competitiveness: (supply and demand)-environmental resonance; demand-driven (supply-environmental) assurance; and supply–demand complementarity, which together cover five specific configurations.
Reported fsQCA solution paths (three aggregated driving pathways and five specific configurations) derived from the analysis of provincial AI policy instruments.
Sustained competitiveness is achieved through multiple equivalent configurations of policy instruments (i.e., policy instrument combinations rather than single instruments).
fsQCA results reported in the paper showing multiple configurations (solution paths) that are associated with high regional competitiveness.
Under these conditions (alignment of forces and AI-driven ideation cost reductions), PIM offers a framework for organising governed discovery in real time and provides the methodological foundation for later applied work.
The paper presents PIM as a proposed framework and positioning statement for future applied research and implementations (theoretical proposal; no applied trials reported).
Organised attacks on complex problems can generate an epistemic mode transition: a shift from predominantly Knightian uncertainty toward probabilistically characterisable innovation dynamics as relevant structures become more visible, decomposed, coordinated, and testable.
The paper states and formalises this methodological claim within PIM as a central proposition (theoretical argumentation; no empirical validation reported).
When problem-relevant causal, informational, and coordinative forces become sufficiently aligned, the epistemic character of search changes and open-ended uncertainty can be progressively transformed into structured probabilistic search.
The claim is presented as the central theoretical argument and formalised within the PIM conceptual framework (theoretical/model-based argumentation; no empirical sample).
Sustainable development outcomes in MENA economies are driven not only by technology adoption but by the interaction between digital infrastructure, AI, and institutional readiness.
Regression models including interaction terms between digital transformation, AI measures, and indicators of institutional readiness within the System GMM analysis.
There is significant regional heterogeneity: Gulf Cooperation Council (GCC) countries exhibit stronger effects of digital transformation and AI on sustainable development than non-GCC MENA economies.
Subgroup/interaction analyses by region (GCC vs non-GCC) within the System GMM framework reported differential coefficients.
Artificial intelligence (AI) has a positive but weaker impact on sustainable development relative to digital transformation, reflecting its complementary and maturity-dependent role within the digital ecosystem.
Same System GMM regressions on panel of MENA economies (2010–2023) that include measures of AI and digital transformation; reported positive but smaller coefficient for AI.
Digital transformation is the primary driver of sustainable development in MENA economies, exerting a stronger and more consistent effect than AI.
Dynamic panel data analysis of MENA economies (2010–2023) using System GMM; reported comparative effect sizes of digital transformation vs. AI in regression results.
In the ICT industry, Tobin's Q significantly increased following AI adoption (heterogeneous positive effect).
Subgroup/heterogeneity analysis within the main sample (KOSDAQ firms 2018–2025), estimating the post-adoption effect of AI on Tobin's Q in firms classified as ICT.
The Barcelona Declaration offers a promising forum for boundary governance.
Policy recommendation pointing to an existing initiative (Barcelona Declaration) as a suitable forum; stated without empirical evaluation in the excerpt.
Governance should calibrate the annulus, not abolish it: thin enough to serve research efficiently, wide enough to sustain innovation.
Normative policy recommendation from the authors; based on their conceptual framework rather than on empirical policy evaluation in the excerpt.
Artificial intelligence reshapes the annulus by lowering barriers to basic structuring.
Conceptual claim in the paper; asserted as an effect of AI on metadata production without empirical estimates in the excerpt.
States can adjust their foreign policies to this fact by focusing on resilience, technological sovereignty, strategic decoupling, and coordination through alliances.
Policy-prescriptive recommendations based on the paper's theoretical framework and analysis; no empirical testing or sample size reported in the abstract.
Time Series Augmented Generation (TSAG) enables LLM agents to delegate quantitative tasks to verifiable external tools.
Description of TSAG framework in paper stating delegation mechanism to external verifiable tools for quantitative computations.
We publicly release the evaluation framework and empirical insights to foster standardized research on reliable financial AI.
Paper states that the framework, benchmark, and empirical results are released publicly by the authors.
The results demonstrate that capable agents can achieve near-perfect tool-use accuracy with minimal hallucination, validating the tool-augmented paradigm.
Empirical results from the authors' experiments on the 100-question benchmark across multiple agents; paper states agents achieve 'near-perfect' tool-use accuracy and 'minimal' hallucination.
We apply this methodology in a large-scale empirical study using our framework, Time Series Augmented Generation (TSAG), where an LLM agent delegates quantitative tasks to verifiable, external tools.
Paper reports applying the TSAG framework in an empirical study in which agents call external tools to perform quantitative computations; described as 'large-scale' and implemented by the authors.
We introduce a novel evaluation methodology and benchmark designed to rigorously measure an LLM agent's reasoning for financial time-series analysis.
Paper describes a new methodology and benchmark (Time Series Augmented Generation, TSAG) developed by the authors for evaluating LLM reasoning on financial time-series tasks.
Effective evaluation-driven loop scaling is a central axis for advancing LLM-driven scientific discovery, and SimpleTES provides a simple yet practical framework for realizing these gains.
High-level claim supported by the aggregate experimental results and discussion in the paper.
When post-trained on successful trajectories, models not only improve efficiency on seen problems but also generalize to unseen problems, discovering solutions that base models fail to uncover.
Experiments in which models were post-trained on successful SimpleTES trajectories and evaluated on both seen and unseen problems (paper claim of improved efficiency and generalization).
SimpleTES produces trajectory-level histories that naturally supervise feedback-driven learning.
Methodological claim and supporting experiments where SimpleTES generates solution trajectories that are then used as supervision for learning.
We discovered new Erdos minimum overlap constructions that surpass the best-known results.
Reported novel combinatorial constructions (Erdos minimum overlap) in the experiments that improve on prior best-known results.
We designed quantum circuit routing policies that reduce gate overhead by 24.5%.
Experimental results reported for quantum circuit routing tasks showing a 24.5% reduction in gate overhead when using SimpleTES-designed policies.