Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
In this paper we present a prototype of AADvark, an agentic system designed for this task.
Statement of contribution: presentation of a prototype system (methodological contribution described in the paper); evidence would be the prototype and its implementation details (not provided here).
In order for Agent-Aided Design to make a real impact in industrial manufacturing, we need a system that is capable of generating such 3D assemblies.
Normative/argumentative claim by the authors that industrial impact requires capability to generate 3D assemblies with moving parts; no empirical test provided.
In the past year, researchers have started to create agentic systems that can design real-world CAD-style objects in a training-free setting, a new variety of system that we call Agent-Aided Design.
Literature/field observation asserted by the paper (statement of recent research trend); no sample size or empirical count provided in the excerpt.
By framing disclosure as epistemic infrastructure, this work outlines a conceptual roadmap for future empirical and design research on Human–AI collaboration.
High-level, forward-looking claim about the paper's contribution to research agenda (conceptual argument). No empirical validation in the abstract.
We contribute a research instrument that operationalizes these configurations in a collaborative chat setting and articulate testable design conjectures.
Paper contribution: a research instrument and set of conjectures described by the authors (design/methodological artifact). The abstract does not report empirical deployment or sample size.
We introduce an AI Disclosure Design Space that conceptualizes disclosure as an epistemic coordination mechanism.
Paper contribution: conceptual artifact (design space) introduced by the authors; this is a descriptive/foundational claim about the paper's contents.
What matters in practice is the design of disclosure: how systems reveal, signal, or conceal AI assistance within collaboration.
Central theoretical argument of the paper (conceptual/design claim); no empirical validation reported in the abstract.
Digital financial literacy and proper managerial competence are critical for a proper transition of AI outputs into strategic decisions, resulting in a robust governance and regulatory framework for sustainable development (Schrank & Kijkasiwat, 2025, p. 202; Tandilino et al., 2025).
Prescriptive/recommendation claim supported by citations (Schrank & Kijkasiwat, 2025; Tandilino et al., 2025); appears as a policy/managerial implication in the paper rather than an empirically tested result. No sample size or quantitative evidence in the excerpt.
Advanced AI replaces intuition-based decisions with precise and robust data, resulting in a significant increase in the firm's bargaining power during credit negotiations and enabling their access to long term capital (Hamdouni, 2025; Sanga & Aziakpono, 2023).
Assertion supported by citations (Hamdouni, 2025; Sanga & Aziakpono, 2023); framed as a causal pathway (AI -> better data-driven decisions -> increased bargaining power -> improved access to long-term credit). The excerpt does not describe sample size, empirical design, or quantitative estimates.
AI is transforming small business funding by optimizing their internal resources and transitioning the firms from these immediate and short-term loans to long-term capital (Pérez-Campdesuñer et al., 2026; Wu & Liao, 2025).
Claim asserted with citations to Pérez-Campdesuñer et al. (2026) and Wu & Liao (2025); presented as a thematic/finding of the paper (likely based on literature review and RDT framing). No sample size or direct empirical method reported in the excerpt.
Our results suggest that grounding reward design in empirical analysis of information impact and user answerability improves clarification efficiency.
Conclusion drawn from the paper's empirical work: identification of task relevance and user answerability properties, operationalization via RL rewards, and the CLARITI evaluation showing fewer questions for matched resolution rate; abstract does not report experimental details or metrics beyond the 41% reduction.
CLARITI is an 8B-parameter clarification module.
Model specification reported in the abstract; factual description of the trained model's scale (no further empirical detail provided in the abstract).
We operationalize these properties as multi-stage reinforcement learning rewards to train CLARITI, an 8B-parameter clarification module.
Methodological claim: the paper reports implementation of multi-stage RL rewards and training of a clarification model named CLARITI with 8 billion parameters (claim reported in abstract; no training dataset size reported).
Using Shapley attribution and distributional comparisons, we identify two key properties of effective clarification: task relevance (which information predicts success) and user answerability (what users can realistically provide).
Analytical methods reported in the paper: Shapley attribution and distributional comparisons applied to datasets of software engineering tasks and simulated user responses (abstract mentions these methods but gives no numeric sample size).
Humans often specify tasks incompletely, so assistants must know when and how to ask clarifying questions.
Background claim stated in the paper's introduction/abstract; likely supported by literature on underspecified task specifications and/or the authors' motivating examples (no specific sample size or experiment reported in the abstract).
The approach provides a practical path toward more transparent, controllable, and accountable AI use without requiring new model architectures.
Authors' asserted benefit of the proposed interaction-layer framework; no empirical demonstration that transparency, control, or accountability are achieved or that no architectural changes are required in practice.
The framework enables auditable reasoning traces and supports alignment with emerging governance standards, including the EU AI Act and ISO/IEC 42001.
Stated compliance/alignment claim linking the proposed interaction-layer approach to existing regulatory standards; no compliance testing or audit examples reported.
This reframes the question from whether the model can think to whether the human-AI system can reason.
Conceptual reframing stated in the paper; no empirical evidence required as it is a change of perspective.
We introduce 'The Architect's Pen' as a practical method where the human uses the model as an external medium for structured reflection by embedding phases of articulation, critique, and revision into human-AI interaction.
Method description / practical proposal included in the paper; no experimental evaluation, user study, or quantitative validation reported.
This perspective emphasizes collaborative intelligence, combining human judgment and contextual understanding with machine speed, memory, and associative capacity.
Theoretical claim about complementary strengths of humans and models within the proposed framework; presented without empirical tests.
Building on recent work on 'System-2' learning, reflective reasoning can be relocated to the interaction layer and framed as a cognitive protocol that can be structured, measured, and governed using existing systems.
Conceptual extension of prior literature ('System-2' learning) into an interaction-layer protocol; no empirical protocol testing or measurement evidence provided.
Reasoning should be treated as a relational process distributed between human and model rather than an internal capability of either.
Methodological proposal / theoretical framing presented by the authors; no empirical validation reported.
Large language models have advanced rapidly, from pattern recognition to emerging forms of reasoning.
Stated as an observational claim in the paper's introduction; no empirical evaluation or dataset provided.
This approach aligns with emerging compliance expectations, including the EU AI Act and ISO/IEC 42001, by making reasoning processes traceable under real conditions of use.
Claim of regulatory alignment made by the authors; presented as interpretive/legal/standards-relevant argument rather than supported by empirical analysis or legal review data in this excerpt.
Stabilising interaction makes uncertainty and drift visible before enforcement is applied, enabling more precise capability governance.
Normative/operational claim in the paper about the anticipated effect of the proposed interventions; no empirical test or measurement reported in this excerpt.
Together, these layers form a missing operational substrate for governance by increasing signal-to-noise at the point of use.
Argumentative claim from the paper proposing that the combined interventions improve the information available at the decision point; no empirical validation or sample size provided here.
This paper is the first in a five-paper research series on stabilising human-AI reasoning that proposes a two-layer approach: Parts II–IV introduce human-side mechanisms (uncertainty cues, conflict surfacing, auditable reasoning traces) and Part V develops a model-side Epistemic Control Loop (ECL) that detects instability and modulates generation.
Descriptive claim about the structure and scope of the paper series as stated by the authors; internal to the publication (no external dataset).
Large language models are increasingly integrated into decision-making in areas such as healthcare, law, finance, engineering, and government.
Statement in paper describing observed/adoptive trend; no empirical dataset, sample size, or quantitative analysis reported in the text.
For settings with multiple interventions, a tractable approximation that prioritizes interventions based on the magnitude of the policy-value discrepancy is effective.
Proposed algorithm/approximation in the paper (methodological contribution); evaluated empirically in simulations and experiments described in the paper.
In the single-intervention regime, the optimal strategy is to recommend the action that maximizes the human value function.
Theoretical result derived in the paper within a Markov decision process model for single-intervention settings.
Policy-value inconsistencies naturally identify opportunities for intervention.
Analytical/formal argument within a Markov decision process framework showing that when human policy-value consistency fails, discrepancies indicate intervention opportunities.
GenRec addresses the three listed challenges within a single decoder-only architecture.
Paper claims the proposed GenRec framework (single decoder-only architecture) addresses the three enumerated industrial challenges (method+design claim).
GRPO-SR (Group Relative Policy Optimization with NLL regularization and Hybrid Rewards) aligns generative policy outputs with user satisfaction, provides training stability, and mitigates reward hacking via a dense reward model combined with a relevance gate.
Proposed reinforcement learning method described in the paper (methodological claim about algorithmic design and intended benefits).
An asymmetric linear Token Merger compresses multi-token Semantic IDs in the prompt while preserving full-resolution decoding, reducing input length by ~2X with negligible accuracy loss.
Method description plus reported compression result (~2X reduction) and qualitative statement about accuracy loss in the paper.
Page-wise NTP (next-token prediction) task supervises over an entire interaction page rather than each interacted item individually, providing denser gradient signal and resolving the one-to-many ambiguity of point-wise training.
Proposed training objective described in the paper (methodological claim about training supervision and its intended effects).
In month-long online A/B tests serving production traffic, GenRec achieves 8.7% improvement in transaction count over the existing pipeline.
Reported result from month-long online A/B tests on production traffic (A/B test metric).
In month-long online A/B tests serving production traffic, GenRec achieves 9.5% improvement in click count over the existing pipeline.
Reported result from month-long online A/B tests on production traffic (A/B test metric).
GenRec is deployed on the JD App.
Paper states GenRec was deployed on the JD App (deployment statement).
These cooperation mechanisms become more effective under evolutionary pressures to maximize individual payoffs.
Authors report results from experiments or simulations applying evolutionary-pressure dynamics (selection for payoff-maximizing agents) and observing increased effectiveness of mechanisms; no numeric results or sample sizes in excerpt.
Contracting and mediation are most effective in achieving cooperative outcomes between capable LLM models.
Empirical results from the authors' experiments across four social dilemmas comparing mechanism performance; specifics (which models, quantitative cooperation rates) are not included in the excerpt.
Continuous learning and diversity of ideas are essential if AI is to play a meaningful role in original scientific discovery.
Normative/conditional claim supported by conceptual reasoning in the article; no empirical evidence or measured sample provided.
AI is likely to fundamentally reshape scientific publication.
Author's argument and discussion of implications for publishing and evaluation; no reported empirical study.
There is a gradual path from AI as a research tool to AI as a scientific collaborator.
Narrative/theoretical progression outlined in the article; conceptual roadmap rather than empirical demonstration.
AI for Science is especially important because it may transform not only the efficiency of research, but also the structure of scientific collaboration, discovery, publishing, and evaluation.
Argumentative/theoretical analysis in the article; forward-looking claim without reported empirical data or experimental sample.
The most important significance of the AI revolution, especially the rise of large language models, lies not simply in automation, but in a fundamental change in how complex information and human know-how are carried, replicated, and shared.
Conceptual argument presented in the article (theoretical/essayistic reasoning); no empirical sample or quantitative study reported.
The paper proposes a conceptual framework of the underlying mechanisms of the LLM fallacy and a typology of its manifestations across computational, linguistic, analytical, and creative domains.
Author(s) contribution described in the paper (framework and typology); no empirical testing reported in the abstract.
The rapid integration of large language models (LLMs) into everyday workflows has transformed how individuals perform cognitive tasks such as writing, programming, analysis, and multilingual communication.
Author(s) assertion based on literature review and conceptual overview; no empirical sample or experiment reported in the abstract.
This work contributes to the growing body of research on digital sovereignty and the political economy of AI in frontier markets.
Author's concluding claim about the study's contribution to literature.
Many advanced nations are already integrating AI into their core systems.
General descriptive statement in the paper's background/comparative context; no quantitative enumeration or country-sample provided in the excerpt.
To fund this transition, the paper introduces a blended finance structure designed to attract multilateral banks and private venture capital.
Policy/finance architecture proposed in the paper (design description); no funding rounds, commitments, or empirical investor responses reported in the excerpt.