Evidence (14156 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
For organizations of n humans with AI agents, the optimal team size decreases with agent capability.
Derived implication from the stylized model's analysis of multi-human organizations interacting with AI agents.
There is no smooth sublinear regime for human effort; it transitions sharply from O(E) to O(1) with no intermediate scaling class.
Mathematical derivation from a stylized model of human-AI collaboration that assumes tasks decompose into atomic decisions, a fraction ν are novel, and specification/verification/error correction scale with task size.
So far the maintenance and migration work was done largely manually by human experts.
Background assertion in the paper's introduction/abstract; no empirical backing provided in abstract.
The regime divide deepens under AI capital concentration, admits a permanent displacement attractor in shallow markets, and generates equity market participation hysteresis in which the ERP remains elevated after employment has normalised.
Model-based assertions: analysis shows capital concentration magnifies regime separation, yields a permanent displacement attractor in shallow-market parameterizations, and produces hysteresis in participation leading to persistently elevated ERP after employment recovery.
The alignment risk channel is specific to agentic AI: correlated misalignment in AI objectives generates aggregate output shocks with fat left tails; formalised via Hansen-Sargent multiplier preferences, the resulting alignment risk premium (ARP) enters the equilibrium ERP decomposition as a priced factor additively separable from the participation wedge.
Theoretical formalisation in the paper: uses Hansen-Sargent multiplier preferences to capture model uncertainty/robustness and defines an ARP that is additively separable in the ERP decomposition.
The participation compression channel operates through household wealth: displacement pushes marginal households below the equity market entry cost κ, concentrating aggregate consumption risk on a shrinking investor pool and—by the Basak-Cuoco mechanism—raising the required risk premium even as fundamentals improve.
Model mechanism described in the paper: heterogeneous-agent model with an explicit market entry cost κ and reference to the Basak-Cuoco mechanism leading to a higher required risk premium when investor base shrinks.
The literature singles out endemic data quality issues, algorithmic bias, governance frameworks, and regulatory compliance as concerns that require trusted AI and sustainable digital finance ecosystems.
Synthesis from the reviewed literature noting recurring concerns and limitations reported across studies; the paper lists these as major challenges identified in the field.
AI can worsen financial and market performance if it crowds out normal R&D.
Paper's empirical analysis and interpretation linking AI dependence to poorer financial/market performance through displacement of standard R&D activities; presented as a study finding.
High AI dependency disclosed in financial reports does not improve firms' financial health and may even endanger it.
Empirical results drawn from the study's analysis of listed new energy vehicle and automobile manufacturers (2013–2023); statement appears in the paper's findings/conclusions.
AI dependency reduces financial safety for listed new energy vehicle and automobile manufacturers.
Empirical analysis of a sample of listed new energy vehicle and automobile manufacturers covering 2013–2023; the paper reports data analysis showing AI dependency reduces financial safety.
More informative search can degrade both learning and consumer surplus unless the market learns as much as consumers (for example, by "reading the transcripts" of agentic conversations).
Analytical comparative statics in the paper's theoretical model showing how increasing the informativeness of consumer-side signals affects learning dynamics and welfare; relies on model assumptions about what information the market collects versus consumers.
Performance degradation persists even when context is provided via structured semantic layers including AST-extracted function context and import graph resolution.
Experiments comparing unstructured versus structured context provision; structured semantic layers (AST context, import graph resolution) were evaluated and models still degraded with more context.
Models' performance degrades monotonically from diff-only (config_A) to diff+file content (config_B) to full context (config_C) across all 8 models.
Systematic ablation across three frozen context configurations (config_A, config_B, config_C) reported; all 8 evaluated models show monotonic performance decline as more context is provided.
Eight frontier models detect only 15–31% of human-flagged issues on the diff-only configuration (config_A).
Empirical evaluation across 8 models on SWE-PRBench (350 PRs) under the diff-only configuration; reported detection rates of 15–31% relative to human-flagged issues.
There is a growing gap between rapid experimentation with AI tools and limited organizational capability to institutionalize them in everyday workflows.
Argument supported by targeted literature synthesis and review of recent scholarly and institutional sources; no primary empirical sample reported in this paper.
Data reveals that less than 0.7% of the Indian population uses AI-induced ride services.
Empirical statistic reported in the paper (declared as data) quantifying the share of the population using AI-induced ride services.
The lack of a significant worsening in transportation-sector inequality can be attributed to sluggish demand switching from non-AI to AI-based services in India.
Argument in the paper linking empirical finding (no significant increase in inequality) to low observed adoption rates of AI-based ride services; supported by reported adoption statistic.
Evaluations across eight state-of-the-art multimodal models reveal that models achieved only 55.0% accuracy on help prediction.
Experimental evaluation reported in the paper comparing eight multimodal models on the Help Prediction task with reported accuracy metric.
Evaluations across eight state-of-the-art multimodal models reveal that models achieved only 44.6% accuracy on behavior state detection.
Experimental evaluation reported in the paper comparing eight multimodal models on the Behavior State Detection task with reported accuracy metric.
Technological proximity has a noteworthy negative effect on collaboration, underscoring the importance of complementary knowledge in AI innovation.
SAOM estimates from longitudinal patent collaboration data (2013–2024) showing a statistically negative coefficient for technological proximity (implying organizations closer in technology space are less likely to form ties).
Sentiment signals derived from sparse news are commonly used in financial analysis and technology monitoring, yet transforming raw article-level observations into reliable temporal series remains a largely unsolved engineering problem.
Framing statement in the paper's introduction/abstract describing the problem motivation; conceptual argument rather than empirical test.
Ikema is a severely endangered Ryukyuan language spoken in Okinawa, Japan, with approximately 1,300 remaining speakers, most of whom are over 60 years old.
Demographic/descriptive claim reported in the paper's background (likely citing prior surveys or census estimates); the abstract states the ~1,300 speakers figure and age distribution.
The financial planning and investment management profession is undergoing a radical transformation driven by Generative AI (GenAI) and Agentic AI, creating urgent workforce displacement challenges that require coordinated government policy intervention alongside educational reform.
Author assertion in the paper's introduction/abstract; framing argument based on the paper's synthesized analysis (no empirical sample, no reported statistical test).
Within the set of agentic-mention filings, autonomy evidence remains rare.
Empirical statement derived from analysis of the identified agentic-mention filings (small number of such filings reported across 2024–2025).
LLM design agents can fixate on existing paradigms and fail to explore alternatives when solving design challenges, potentially leading to suboptimal solutions (a pathology analogous to human designers).
Literature/background claim and authors' characterization of observed agent behavior; motivated the proposed metacognitive interventions. No numerical sample size reported.
Current closed models are generally ill-suited for scientific purposes (with some notable exceptions).
Argumentative and evaluative reasoning in the paper comparing features of closed models to scientific needs; no empirical sample size reported in abstract.
Restrictions on information about model construction and deployment threaten reliable inference in research that involves those models.
Conceptual argument and analysis presented in the paper (no empirical sample or randomized evaluation reported in abstract). The paper analyzes how specific types of information restrictions (about model construction and deployment) create threats to inference.
This inefficiency directly undermines UN Sustainable Development Goals 13 (Climate Action) and 10 (Reduced Inequalities) by hindering equitable AI access in resource-constrained regions.
Normative/analytic claim in the paper linking energy inefficiency to negative impacts on specific UN SDGs (argumentative, not empirically quantified in the abstract).
Current paradigms indiscriminately apply computation-intensive strategies like Chain-of-Thought (CoT) to billions of daily queries, causing LLM overthinking that amplifies carbon emissions and operational barriers.
Claim/assertion in the paper framing the problem (conceptual/observational argument; no specific empirical backing provided in the abstract).
There is a potential for exclusion due to limited digital footprints, which can limit who benefits from AI-driven finance.
Abstract explicitly identifies potential exclusion of people with limited digital footprints as a challenge, based on qualitative interviews and case-study evidence.
Data privacy concerns are a notable challenge in deploying AI-driven financial solutions.
Abstract lists data privacy concerns among identified challenges drawn from interviews and analysis across the three case studies.
Infrastructure limitations pose a barrier to adoption and effective use of AI-enabled financial services.
Abstract identifies infrastructure limitations as a challenge, based on qualitative interviews and case-study evidence.
Digital literacy gaps are a challenge limiting the effectiveness and inclusion of AI-driven financial solutions.
Abstract lists digital literacy gaps among identified challenges, based on qualitative insights from the 1,500 interviews and case-study observations.
Triangulation with market data and sentiment analysis confirms that public enthusiasm often outpaces actual technological readiness.
Paper states market data and sentiment analysis were used to triangulate findings and reports this systematic gap; no numeric effect sizes or sample counts provided.
Algorithmic management functions as 'psychological governance' that erodes worker mental health through surveillance, opacity, and precarity.
Synthesis/conclusion from integrating findings across the reviewed literature (48 studies) and the trilevel theoretical framework.
Fear of deactivation (automated sanctions) creates chronic precarity; 78% report chronic fear.
Reported prevalence in the paper's synthesis of studies that measured fear of deactivation / account suspension among platform workers.
Task defragmentation (fragmenting tasks via platform algorithms) leads to a reduced sense of accomplishment among drivers.
Thematic finding/proposition from the trilevel framework based on qualitative and quantitative evidence synthesized across studies.
Rating pressure is associated with emotional exhaustion, with 41–67% reporting high burnout.
Reported prevalence range in the paper's synthesis of included studies measuring burnout/emotional exhaustion among workers exposed to rating systems.
Income volatility from dynamic pricing is associated with depressive symptoms (reported prevalence range 23–41%).
Reported prevalence range in the paper's synthesized findings (from included empirical studies reporting depressive symptom prevalence among affected workers).
Algorithmic opacity is linked to procedural anxiety.
Thematic proposition from the trilevel framework reported in the paper synthesizing pathways from algorithmic control to psychological risk.
Real estate pro forma development remains one of the most time-intensive functions in property investment, typically requiring twenty to forty hours per multifamily project through manual research, Excel-based modeling, and iterative scenario analysis.
Statement in paper asserting typical industry practice; not tied to the paper's controlled test. No empirical sample size or survey data reported alongside this assertion.
Policymakers in the EU and beyond will need to change course, and soon, if they are to effectively govern the next generation of AI technology.
Authors' prescriptive conclusion based on their analysis of shortcomings in the EU AI Act and institutional frameworks (policy recommendation; no empirical sample size in excerpt).
The Act's allocation of monitoring and enforcement responsibilities, reliance on industry self-regulation, and level of government resourcing illustrate how a regulatory framework designed for conventional AI systems can be ill-suited to AI agents.
Authors' institutional analysis of the EU AI Act's monitoring/enforcement allocation, reliance on self-regulation, and resourcing (qualitative legal/institutional analysis; no quantitative sample size in excerpt).
The EU AI Act faces significant obstacles in confronting governance challenges arising from AI agents, such as unequal access to the economic opportunities afforded by AI agents.
Authors' argument that the Act may not prevent or address unequal access to benefits of AI agents (policy/legal analysis; no empirical sample size in excerpt).
The EU AI Act faces significant obstacles in confronting governance challenges arising from AI agents, such as the risk of misuse of agents by malicious actors.
Authors' analysis highlighting misuse risks and the Act's limitations in addressing them (policy/legal analysis; no empirical sample size in excerpt).
The EU AI Act faces significant obstacles in confronting governance challenges arising from AI agents, such as performance failures in autonomous task execution.
Authors' analytical argument that the Act's design and provisions do not adequately address autonomous performance failures (policy/legal analysis; no empirical sample size provided in excerpt).
The EU AI Act was promulgated prior to the development and widespread use of AI agents.
Factual/timing claim by the authors referencing the Act's adoption date relative to development and proliferation of AI agents (historical/policy analysis; dates verifiable externally).
AI agents present particularly pressing questions for the European Union's AI Act.
Authors' normative/analytical claim based on the perceived fit between AI agents' characteristics and the EU AI Act's design (policy/legal analysis; no empirical sample size in excerpt).
AI can promote enterprises to adopt different income distribution modes by improving the marginal output of capital and substituting low-skilled labor (technology bias).
Theoretical mechanism articulated in the paper based on capital-labor substitution principle and factor reward theory; implied empirical testing using firm-level data.
Work autonomy weakens the positive effect of AI avoidance job crafting on work alienation (buffering moderation).
Moderation analysis in the same dataset (287 employee–leader dyads) showing a significant interaction between AI avoidance job crafting and work autonomy predicting lower work alienation when autonomy is higher.