Evidence (8066 claims)
Adoption
5586 claims
Productivity
4857 claims
Governance
4381 claims
Human-AI Collaboration
3417 claims
Labor Markets
2685 claims
Innovation
2581 claims
Org Design
2499 claims
Skills & Training
2031 claims
Inequality
1382 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 417 | 113 | 67 | 480 | 1091 |
| Governance & Regulation | 419 | 202 | 124 | 64 | 823 |
| Research Productivity | 261 | 100 | 34 | 303 | 703 |
| Organizational Efficiency | 406 | 96 | 71 | 40 | 616 |
| Technology Adoption Rate | 323 | 128 | 74 | 38 | 568 |
| Firm Productivity | 307 | 38 | 70 | 12 | 432 |
| Output Quality | 260 | 71 | 27 | 29 | 387 |
| AI Safety & Ethics | 118 | 179 | 45 | 24 | 368 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 75 | 37 | 19 | 312 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 74 | 34 | 78 | 9 | 197 |
| Skill Acquisition | 98 | 36 | 40 | 9 | 183 |
| Innovation Output | 121 | 12 | 24 | 13 | 171 |
| Firm Revenue | 98 | 35 | 24 | — | 157 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 87 | 16 | 34 | 7 | 144 |
| Inequality Measures | 25 | 76 | 32 | 5 | 138 |
| Regulatory Compliance | 54 | 61 | 13 | 3 | 131 |
| Task Completion Time | 89 | 7 | 4 | 3 | 103 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 33 | 11 | 7 | 98 |
| Wages & Compensation | 54 | 15 | 20 | 5 | 94 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 27 | 26 | 10 | 6 | 72 |
| Job Displacement | 6 | 39 | 13 | — | 58 |
| Hiring & Recruitment | 40 | 4 | 6 | 3 | 53 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 11 | 6 | 2 | 41 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 6 | 9 | — | 27 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
AI reduces excess inventory levels in manufacturing firms.
Thematic findings from interviews, site visits, and documents from industry experts and practitioners who reported decreased excess inventory following AI-driven forecasting and inventory optimization.
AI reduces stockouts in manufacturing supply chains.
Practitioner accounts and organizational document evidence from purposive qualitative sampling and thematic analysis indicating fewer stockouts associated with AI-driven forecasting and inventory controls.
AI adoption reduces operational inefficiencies in manufacturing processes.
Thematic analysis of qualitative data (semi-structured interviews, site observations, organizational documents) from purposively sampled industry practitioners reporting reductions in inefficiencies after AI implementation.
AI supports proactive decision-making among supply chain and production stakeholders.
Qualitative reports from interviews and document review with supply chain managers, production planners, and industry experts; thematic analysis identified proactive decision-making as a theme associated with AI use.
AI enables adaptive inventory management in manufacturing operations.
Findings from thematic analysis of semi-structured interviews with supply chain managers, production planners, and industry experts, plus observational site visits and organizational documents (purposive sampling).
AI technologies enhance forecasting accuracy in smart manufacturing.
Qualitative evidence from purposive sample of supply chain managers, production planners, and industry experts gathered via semi-structured interviews, observational site visits, and organizational documents; analyzed using thematic analysis.
Endogenous structural break analysis identifies 2007 as the break year for AI introduction in India.
Empirical analysis reported in the paper using an endogenous structural break test applied to relevant time-series data (paper states 2007 was identified as the break year).
A shift in preference towards non-traded AI services exacerbates income inequality among previously homogeneous workers in the non-traded sector (model finding).
Results from the paper's Finite Change General Equilibrium (theoretical) model which introduces AI as a shock in the non-traded sector and analyzes effects via price adjustments.
Artificial intelligence (AI) induced services are a reality in India and other developing countries.
Statement in paper citing existence/emergence of AI-powered services (examples given: Windows Live, AI ride-hailing apps such as Ola and Uber); descriptive assertion rather than quantified empirical analysis in the paper.
Our dataset is available at https://guide-bench.github.io.
Paper's statement providing a URL for dataset access.
Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software (e.g., PowerPoint, Photoshop).
Motivating claim in the paper's introduction/abstract, based on prior work and the authors' argument about potential application domains.
Providing user context significantly improved the performance, raising help prediction by up to 50.2pp.
Experimental comparison reported in the paper showing differences in Help Prediction performance with and without provided user context; reported improvement magnitude of up to 50.2 percentage points.
GUIDE defines three tasks - (i) Behavior State Detection, (ii) Intent Prediction, and (iii) Help Prediction that test a model's ability to recognize behavior state, reason about goals, and decide when and how to help.
Paper's benchmark/task definitions describing three evaluation tasks and their goals.
GUIDE consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations, across 10 software.
Paper's dataset description: dataset construction of screen recordings, number of demonstrations, duration, participant expertise (novice), and inclusion of think-aloud narrations across 10 software.
Geographical, cultural, and institutional proximities facilitate collaboration in the AI industry.
SAOM inclusion of dyadic proximity covariates in the longitudinal patent-collaboration model (2013–2024) with reported positive effects for geographic, cultural, and institutional proximity on tie formation.
Organizations with higher innovativeness attract more collaborative partners.
SAOM results linking organizational innovativeness (measured via patenting/innovation indicators) to greater degree (number of collaborative partners) in longitudinal patent data (2013–2024).
Universities and research institutions play a more central role in driving network evolution than firms.
SAOM analysis of patent-collaboration network trajectories (2013–2024) showing higher centrality/greater influence of universities and research institutions relative to firms in the modeled network evolution.
Endogenous structural effects — specifically transitivity and preferential attachment — actively shape tie formation in China’s AI industry collaboration network.
Empirical SAOM results on longitudinal patent collaboration data (2013–2024) testing endogenous network effects (transitivity, preferential attachment) on tie formation.
Collaboration networks play a crucial role in fostering innovation within the artificial intelligence (AI) industry.
Statement supported by analysis of longitudinal patent collaboration data (2013–2024) using a stochastic actor-oriented model (SAOM) integrating structural effects, organizational attributes, and dyadic proximities.
Overall, the results support the view that stable, deployable sentiment indicators require careful reconstruction, not only better classifiers.
Synthesis/conclusion drawn from the paper's empirical evaluations and proposed methods.
This three-week lead-lag is a structural regularity more informative than any single correlation coefficient.
Interpretation/claim based on empirical comparisons within the paper stating that the persistent lead-lag pattern provides more structural information than single correlation metrics.
The key empirical finding is a three-week lead lag pattern between reconstructed sentiment and price that persists across all tested pipeline configurations and aggregation regimes.
Empirical result reported in the paper: observed lead-lag relationship (three-week lead) between reconstructed sentiment and stock price across multiple pipeline/aggregation settings; no numerical sample size or statistical estimates provided in the abstract.
As a secondary external check, we evaluate the consistency of reconstructed signals against stock-price data for a multi-firm dataset of AI-related news titles (November 2024 to February 2026).
Empirical evaluation reported in the paper using reconstructed signals compared to stock-price time series over the specified date range; described as a 'multi-firm' dataset (exact number of firms not stated in the abstract).
Because ground-truth longitudinal sentiment labels are typically unavailable, we introduce a label-free evaluation framework based on signal stability diagnostics, information preservation lag proxies, and counterfactual tests for causality compliance and redundancy robustness.
Methodological contribution described in the paper (evaluation framework proposal).
We present a modular three-stage pipeline that (i) aggregates article-level scores onto a regular temporal grid with uncertainty-aware and redundancy-aware weights, (ii) fills coverage gaps through strictly causal projection rules, and (iii) applies causal smoothing to reduce residual noise.
Description of proposed algorithm/pipeline in the paper (design/implementation claim).
Rather than treating this as a classification challenge, we propose to frame it as a causal signal reconstruction problem: given probabilistic sentiment outputs from a fixed classifier, recover a stable latent sentiment series that is robust to the structural pathologies of news data such as sparsity, redundancy, and classifier uncertainty.
Methodological proposal presented in the paper (conceptual framing and problem statement).
Automatic speech recognition (ASR) has shown increasing potential to assist in the transcription of endangered language data.
Background claim in the paper, referring to advances in ASR and prior work suggesting utility for endangered-language transcription; stated as motivation rather than a novel empirical finding in this paper.
We train an ASR model that achieves a character error rate as low as 15%.
Reported quantitative evaluation of the trained ASR model on the constructed Ikema dataset (character error rate = 15%). Exact evaluation protocol, test set size, and train/test split not provided in the abstract.
We construct a {\totaldatasethours}-hour speech corpus from field recordings.
Stated in paper as an outcome of the authors' data-collection and corpus-construction effort from field recordings; no numeric value resolved in the provided text (placeholder present).
With calibrated oversight that aligns accountability to real-world risks, AI can secure the profession’s future.
Normative/prognostic claim in the Article (argument that appropriate governance will preserve or strengthen the legal profession).
With calibrated oversight that aligns accountability to real-world risks, AI can improve service quality in legal services.
Normative/prognostic claim in the Article (argument that governance plus AI yields quality improvements). No empirical effect sizes reported in the excerpt.
While the risks of AI are real, they must not eclipse the opportunity: with calibrated oversight that aligns accountability to real-world risks, AI can expand access to legal services.
Normative claim and projected benefit argued by the authors (theoretical/argumentative; no empirical evidence in excerpt).
Using agentic financial transactions as an example, we demonstrate how governments and regulators can use this monitoring method to extend oversight beyond model outputs to the tool layer to monitor risks of agent deployment.
Paper includes a case demonstration (agentic financial transactions) showing application of MCP monitoring to identify and assess risky tool deployments and to inform regulatory oversight.
The share of 'action' tools rose from 27% to 65% of total usage over the 16-month period sampled.
Time-series usage/download data from MCP servers across the 16-month sample (paper reports increase in share of action tools from 27% to 65%).
Software development accounts for 90% of MCP server downloads.
Download metrics from monitored MCP servers stratified by tool domain indicating 90% of downloads are for software development tools (paper statement).
Software development accounts for 67% of all agent tools.
Categorisation of the 177,436 monitored agent tools by task domain (O*NET mapping) yielding 67% in software development.
We evaluated 177,436 agent tools created from 11/2024 to 02/2026 by monitoring public Model Context Protocol (MCP) server repositories.
Empirical monitoring of public MCP server repositories; dataset of 177,436 agent tools collected over the period 11/2024–02/2026 (as stated in paper).
The framework provides a roadmap for coordinated response across educational institutions, government agencies, and industry to ensure workforce resilience and domestic leadership in the emerging agentic finance era.
Authors' proposed integrated roadmap (prescriptive recommendation; no empirical testing or outcome measurement reported in the provided text).
We develop a comprehensive government policy framework including: 1) Federal AI literacy mandates for post-secondary business education; 2) Department of Labor workforce retraining programs with income support for displaced financial professionals; 3) SEC and Treasury regulatory innovations creating market incentives for workforce development; 4) State-level workforce partnerships implementing regional transition support; and 5) Enhanced social safety nets for workers navigating career transitions during the estimated 5-15 year transformation period.
Author-presented policy framework and recommendations (policy design proposals and an asserted 5–15 year transformation timeframe; no empirical evaluation reported).
We propose a multi-layered integration strategy for higher education encompassing: 1) Foundational AI literacy modules for all business students; 2) A specialized "Agentic Financial Planning" course with hands-on labs; 3) AI-augmented redesign of core courses (Investments, Portfolio Management, Ethics); 4) Interdisciplinary project-based learning with Computer Science; and 5) A governance and policy module addressing regulatory compliance (NIST AI RMF, SEC regulations).
Proposed curricular framework presented by the authors (recommendation/proposal, not empirically tested within the paper).
The ultimate competitive edge lies in an organization's ability to treat AI not as a standalone tool, but as a core component of sustainable, long-term corporate strategy.
Concluding normative claim in the paper; presented as an interpretation/synthesis rather than supported by cited empirical evidence in the abstract.
Successful global expansion is no longer predicated solely on physical presence but on the deployment of scalable, localized AI models that navigate diverse regulatory, linguistic, and cultural landscapes.
Argumentative claim in the paper describing a strategic determinant for global expansion; no empirical sample or quantified outcomes presented in the abstract.
AI hyper-personalizes customer engagement.
Declarative claim in the paper about AI's effect on customer engagement personalization; no experimental or observational data reported in the abstract.
AI acts as an internal engine for operational agility by compressing R&D cycles.
Claim made in the paper asserting R&D cycle compression due to AI; no empirical data, sample size or quantitative measures provided in the abstract.
The strategic focus has transitioned from mere process automation to autonomous orchestration, where multi-agent systems independently manage complex, cross-border operations and real-time decision-making.
Analytic statement from the paper describing an observed/argued shift in strategic focus; no empirical methodology or sample reported.
Organizations leverage agentic workflows and domain-specific intelligence to catalyse strategic innovation and facilitate global expansion in the digital era.
Conceptual claim in the paper describing how organizations use specific AI capabilities; no empirical design or sample described in the abstract.
The rapid evolution of Artificial Intelligence (AI) has shifted from a disruptive trend to the fundamental operating layer of the modern enterprise.
Statement/assertion in the paper (conceptual/positioning claim); no empirical method, sample size, or statistical analysis reported in the abstract.
The analysis provides a transparent measurement framework and baseline statistics for tracking the emerging shift from AI discussion to action-oriented, agentic deployments in finance.
Methodological contribution claim: presentation of an auditable dictionary-and-context approach plus reported baseline statistics (percentages by year).
Autonomy evidence focuses on regions with higher control density, consistent with governance maturity serving as a prerequisite for action-taking deployments.
Comparative text-as-data analysis showing agentic/autonomy references concentrated in disclosure windows with higher measured controls density; interpretive claim linking this pattern to governance maturity as a prerequisite.
Agentic disclosures are absent in 2021–2023, appear in 2024 (0.4% of firm-years), and increase in 2025 (1.6% of firm-years), indicating a late but accelerating diffusion phase.
Empirical counts/percentages reported from the assembled panel; per-year denominators are 500 firm–year observations (500 firms per year).