Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

In this paper we present a prototype of AADvark, an agentic system designed for this task.

Statement of contribution: presentation of a prototype system (methodological contribution described in the paper); evidence would be the prototype and its implementation details (not provided here).

high positive Agent-Aided Design for Dynamic CAD Models existence of a prototype agentic system (AADvark) for assembling movable 3D part...

In order for Agent-Aided Design to make a real impact in industrial manufacturing, we need a system that is capable of generating such 3D assemblies.

Normative/argumentative claim by the authors that industrial impact requires capability to generate 3D assemblies with moving parts; no empirical test provided.

high positive Agent-Aided Design for Dynamic CAD Models industrial applicability / impact contingent on assembly-generation capability

In the past year, researchers have started to create agentic systems that can design real-world CAD-style objects in a training-free setting, a new variety of system that we call Agent-Aided Design.

Literature/field observation asserted by the paper (statement of recent research trend); no sample size or empirical count provided in the excerpt.

high positive Agent-Aided Design for Dynamic CAD Models emergence of agentic CAD systems (training-free)

By framing disclosure as epistemic infrastructure, this work outlines a conceptual roadmap for future empirical and design research on Human–AI collaboration.

High-level, forward-looking claim about the paper's contribution to research agenda (conceptual argument). No empirical validation in the abstract.

high positive Who Gets Credit? Operationalizing AI Disclosure as Epistemic... influence on future empirical and design research agendas

We contribute a research instrument that operationalizes these configurations in a collaborative chat setting and articulate testable design conjectures.

Paper contribution: a research instrument and set of conjectures described by the authors (design/methodological artifact). The abstract does not report empirical deployment or sample size.

high positive Who Gets Credit? Operationalizing AI Disclosure as Epistemic... operationalization of disclosure configurations in a collaborative chat research...

We introduce an AI Disclosure Design Space that conceptualizes disclosure as an epistemic coordination mechanism.

Paper contribution: conceptual artifact (design space) introduced by the authors; this is a descriptive/foundational claim about the paper's contents.

high positive Who Gets Credit? Operationalizing AI Disclosure as Epistemic... conceptualization of disclosure as an epistemic coordination mechanism

What matters in practice is the design of disclosure: how systems reveal, signal, or conceal AI assistance within collaboration.

Central theoretical argument of the paper (conceptual/design claim); no empirical validation reported in the abstract.

high positive Who Gets Credit? Operationalizing AI Disclosure as Epistemic... effects of AI disclosure design on collaboration

Digital financial literacy and proper managerial competence are critical for a proper transition of AI outputs into strategic decisions, resulting in a robust governance and regulatory framework for sustainable development (Schrank & Kijkasiwat, 2025, p. 202; Tandilino et al., 2025).

Prescriptive/recommendation claim supported by citations (Schrank & Kijkasiwat, 2025; Tandilino et al., 2025); appears as a policy/managerial implication in the paper rather than an empirically tested result. No sample size or quantitative evidence in the excerpt.

high positive Re-Evaluation of Resource Dependence in AI Enabled SME Finan... effective translation of AI outputs into strategic decisions; improved governanc...

Advanced AI replaces intuition-based decisions with precise and robust data, resulting in a significant increase in the firm's bargaining power during credit negotiations and enabling their access to long term capital (Hamdouni, 2025; Sanga & Aziakpono, 2023).

Assertion supported by citations (Hamdouni, 2025; Sanga & Aziakpono, 2023); framed as a causal pathway (AI -> better data-driven decisions -> increased bargaining power -> improved access to long-term credit). The excerpt does not describe sample size, empirical design, or quantitative estimates.

high positive Re-Evaluation of Resource Dependence in AI Enabled SME Finan... firm bargaining power in credit negotiations / access to long-term credit

AI is transforming small business funding by optimizing their internal resources and transitioning the firms from these immediate and short-term loans to long-term capital (Pérez-Campdesuñer et al., 2026; Wu & Liao, 2025).

Claim asserted with citations to Pérez-Campdesuñer et al. (2026) and Wu & Liao (2025); presented as a thematic/finding of the paper (likely based on literature review and RDT framing). No sample size or direct empirical method reported in the excerpt.

high positive Re-Evaluation of Resource Dependence in AI Enabled SME Finan... shift in funding structure (from short-term to long-term capital) / access to lo...

Our results suggest that grounding reward design in empirical analysis of information impact and user answerability improves clarification efficiency.

Conclusion drawn from the paper's empirical work: identification of task relevance and user answerability properties, operationalization via RL rewards, and the CLARITI evaluation showing fewer questions for matched resolution rate; abstract does not report experimental details or metrics beyond the 41% reduction.

high positive Asking What Matters: Reward-Driven Clarification for Softwar... clarification efficiency (fewer questions for similar resolution performance)

CLARITI is an 8B-parameter clarification module.

Model specification reported in the abstract; factual description of the trained model's scale (no further empirical detail provided in the abstract).

high positive Asking What Matters: Reward-Driven Clarification for Softwar... model parameter count

We operationalize these properties as multi-stage reinforcement learning rewards to train CLARITI, an 8B-parameter clarification module.

Methodological claim: the paper reports implementation of multi-stage RL rewards and training of a clarification model named CLARITI with 8 billion parameters (claim reported in abstract; no training dataset size reported).

high positive Asking What Matters: Reward-Driven Clarification for Softwar... ability to train a clarification module using the proposed reward design

Using Shapley attribution and distributional comparisons, we identify two key properties of effective clarification: task relevance (which information predicts success) and user answerability (what users can realistically provide).

Analytical methods reported in the paper: Shapley attribution and distributional comparisons applied to datasets of software engineering tasks and simulated user responses (abstract mentions these methods but gives no numeric sample size).

high positive Asking What Matters: Reward-Driven Clarification for Softwar... importance of information features for predicting task success and simulated-use...

Humans often specify tasks incompletely, so assistants must know when and how to ask clarifying questions.

Background claim stated in the paper's introduction/abstract; likely supported by literature on underspecified task specifications and/or the authors' motivating examples (no specific sample size or experiment reported in the abstract).

high positive Asking What Matters: Reward-Driven Clarification for Softwar... frequency/occurrence of incomplete task specifications (need for clarification)

The approach provides a practical path toward more transparent, controllable, and accountable AI use without requiring new model architectures.

Authors' asserted benefit of the proposed interaction-layer framework; no empirical demonstration that transparency, control, or accountability are achieved or that no architectural changes are required in practice.

high positive Governing Reflective Human-AI Collaboration: A Framework for... transparency_controllability_accountability_of_AI_use

The framework enables auditable reasoning traces and supports alignment with emerging governance standards, including the EU AI Act and ISO/IEC 42001.

Stated compliance/alignment claim linking the proposed interaction-layer approach to existing regulatory standards; no compliance testing or audit examples reported.

high positive Governing Reflective Human-AI Collaboration: A Framework for... auditable_reasoning_traces_and_regulatory_alignment (EU AI Act, ISO/IEC 42001)

This reframes the question from whether the model can think to whether the human-AI system can reason.

Conceptual reframing stated in the paper; no empirical evidence required as it is a change of perspective.

high positive Governing Reflective Human-AI Collaboration: A Framework for... system_level_reasoning_evaluation (human-AI system reasoning instead of model-on...

We introduce 'The Architect's Pen' as a practical method where the human uses the model as an external medium for structured reflection by embedding phases of articulation, critique, and revision into human-AI interaction.

Method description / practical proposal included in the paper; no experimental evaluation, user study, or quantitative validation reported.

high positive Governing Reflective Human-AI Collaboration: A Framework for... structured_reflection_via_interaction_protocol (articulation/critique/revision l...

This perspective emphasizes collaborative intelligence, combining human judgment and contextual understanding with machine speed, memory, and associative capacity.

Theoretical claim about complementary strengths of humans and models within the proposed framework; presented without empirical tests.

high positive Governing Reflective Human-AI Collaboration: A Framework for... collaborative_intelligence (integration of human judgment and machine capabiliti...

Building on recent work on 'System-2' learning, reflective reasoning can be relocated to the interaction layer and framed as a cognitive protocol that can be structured, measured, and governed using existing systems.

Conceptual extension of prior literature ('System-2' learning) into an interaction-layer protocol; no empirical protocol testing or measurement evidence provided.

high positive Governing Reflective Human-AI Collaboration: A Framework for... measurability_and_governability_of_reasoning (via interaction protocols)

Reasoning should be treated as a relational process distributed between human and model rather than an internal capability of either.

Methodological proposal / theoretical framing presented by the authors; no empirical validation reported.

high positive Governing Reflective Human-AI Collaboration: A Framework for... system_level_reasoning_capability (human-AI distributed reasoning)

Large language models have advanced rapidly, from pattern recognition to emerging forms of reasoning.

Stated as an observational claim in the paper's introduction; no empirical evaluation or dataset provided.

high positive Governing Reflective Human-AI Collaboration: A Framework for... model_capability (advancement from pattern recognition to emerging reasoning)

This approach aligns with emerging compliance expectations, including the EU AI Act and ISO/IEC 42001, by making reasoning processes traceable under real conditions of use.

Claim of regulatory alignment made by the authors; presented as interpretive/legal/standards-relevant argument rather than supported by empirical analysis or legal review data in this excerpt.

high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... alignment with regulatory/compliance requirements (traceability of reasoning)

Stabilising interaction makes uncertainty and drift visible before enforcement is applied, enabling more precise capability governance.

Normative/operational claim in the paper about the anticipated effect of the proposed interventions; no empirical test or measurement reported in this excerpt.

high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... visibility of uncertainty/drift and precision of capability governance

Together, these layers form a missing operational substrate for governance by increasing signal-to-noise at the point of use.

Argumentative claim from the paper proposing that the combined interventions improve the information available at the decision point; no empirical validation or sample size provided here.

high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... signal-to-noise ratio of reasoning outputs at point of use (informational qualit...

This paper is the first in a five-paper research series on stabilising human-AI reasoning that proposes a two-layer approach: Parts II–IV introduce human-side mechanisms (uncertainty cues, conflict surfacing, auditable reasoning traces) and Part V develops a model-side Epistemic Control Loop (ECL) that detects instability and modulates generation.

Descriptive claim about the structure and scope of the paper series as stated by the authors; internal to the publication (no external dataset).

high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... proposal of methodological architecture for stabilising human-AI reasoning

Large language models are increasingly integrated into decision-making in areas such as healthcare, law, finance, engineering, and government.

Statement in paper describing observed/adoptive trend; no empirical dataset, sample size, or quantitative analysis reported in the text.

high positive The Missing Knowledge Layer in AI: A Framework for Stable Hu... integration/adoption of LLMs into decision-making

For settings with multiple interventions, a tractable approximation that prioritizes interventions based on the magnitude of the policy-value discrepancy is effective.

Proposed algorithm/approximation in the paper (methodological contribution); evaluated empirically in simulations and experiments described in the paper.

high positive Improving Human Performance with Value-Aware Interventions: ... effectiveness of intervention prioritization under intervention budget constrain...

In the single-intervention regime, the optimal strategy is to recommend the action that maximizes the human value function.

Theoretical result derived in the paper within a Markov decision process model for single-intervention settings.

high positive Improving Human Performance with Value-Aware Interventions: ... optimality of single-intervention recommendation (maximizing human value functio...

Policy-value inconsistencies naturally identify opportunities for intervention.

Analytical/formal argument within a Markov decision process framework showing that when human policy-value consistency fails, discrepancies indicate intervention opportunities.

high positive Improving Human Performance with Value-Aware Interventions: ... identification of states/actions where intervention is beneficial (policy-value ...

GenRec addresses the three listed challenges within a single decoder-only architecture.

Paper claims the proposed GenRec framework (single decoder-only architecture) addresses the three enumerated industrial challenges (method+design claim).

high positive GenRec: A Preference-Oriented Generative Framework for Large... ability to address listed challenges

GRPO-SR (Group Relative Policy Optimization with NLL regularization and Hybrid Rewards) aligns generative policy outputs with user satisfaction, provides training stability, and mitigates reward hacking via a dense reward model combined with a relevance gate.

Proposed reinforcement learning method described in the paper (methodological claim about algorithmic design and intended benefits).

high positive GenRec: A Preference-Oriented Generative Framework for Large... alignment with user satisfaction / training stability / mitigation of reward hac...

An asymmetric linear Token Merger compresses multi-token Semantic IDs in the prompt while preserving full-resolution decoding, reducing input length by ~2X with negligible accuracy loss.

Method description plus reported compression result (~2X reduction) and qualitative statement about accuracy loss in the paper.

high positive GenRec: A Preference-Oriented Generative Framework for Large... input length (prompt length) and model accuracy

Page-wise NTP (next-token prediction) task supervises over an entire interaction page rather than each interacted item individually, providing denser gradient signal and resolving the one-to-many ambiguity of point-wise training.

Proposed training objective described in the paper (methodological claim about training supervision and its intended effects).

high positive GenRec: A Preference-Oriented Generative Framework for Large... training signal density / ambiguity resolution

In month-long online A/B tests serving production traffic, GenRec achieves 8.7% improvement in transaction count over the existing pipeline.

Reported result from month-long online A/B tests on production traffic (A/B test metric).

high positive GenRec: A Preference-Oriented Generative Framework for Large... transaction count

In month-long online A/B tests serving production traffic, GenRec achieves 9.5% improvement in click count over the existing pipeline.

Reported result from month-long online A/B tests on production traffic (A/B test metric).

high positive GenRec: A Preference-Oriented Generative Framework for Large... click count

GenRec is deployed on the JD App.

Paper states GenRec was deployed on the JD App (deployment statement).

high positive GenRec: A Preference-Oriented Generative Framework for Large... deployment on JD App

These cooperation mechanisms become more effective under evolutionary pressures to maximize individual payoffs.

Authors report results from experiments or simulations applying evolutionary-pressure dynamics (selection for payoff-maximizing agents) and observing increased effectiveness of mechanisms; no numeric results or sample sizes in excerpt.

high positive CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... mechanism effectiveness (cooperation outcomes) under evolutionary pressure

Contracting and mediation are most effective in achieving cooperative outcomes between capable LLM models.

Empirical results from the authors' experiments across four social dilemmas comparing mechanism performance; specifics (which models, quantitative cooperation rates) are not included in the excerpt.

high positive CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and... effectiveness of mechanisms at producing cooperative outcomes

Continuous learning and diversity of ideas are essential if AI is to play a meaningful role in original scientific discovery.

Normative/conditional claim supported by conceptual reasoning in the article; no empirical evidence or measured sample provided.

high positive The Agentification of Scientific Research: A Physicist's Per... AI's effectiveness in contributing to original scientific discovery

AI is likely to fundamentally reshape scientific publication.

Author's argument and discussion of implications for publishing and evaluation; no reported empirical study.

high positive The Agentification of Scientific Research: A Physicist's Per... structure and practice of scientific publication

There is a gradual path from AI as a research tool to AI as a scientific collaborator.

Narrative/theoretical progression outlined in the article; conceptual roadmap rather than empirical demonstration.

high positive The Agentification of Scientific Research: A Physicist's Per... role of AI in research from tool to collaborator

AI for Science is especially important because it may transform not only the efficiency of research, but also the structure of scientific collaboration, discovery, publishing, and evaluation.

Argumentative/theoretical analysis in the article; forward-looking claim without reported empirical data or experimental sample.

high positive The Agentification of Scientific Research: A Physicist's Per... efficiency of research and the structure of scientific collaboration, discovery,...

The most important significance of the AI revolution, especially the rise of large language models, lies not simply in automation, but in a fundamental change in how complex information and human know-how are carried, replicated, and shared.

Conceptual argument presented in the article (theoretical/essayistic reasoning); no empirical sample or quantitative study reported.

high positive The Agentification of Scientific Research: A Physicist's Per... how complex information and human know-how are carried, replicated, and shared

The paper proposes a conceptual framework of the underlying mechanisms of the LLM fallacy and a typology of its manifestations across computational, linguistic, analytical, and creative domains.

Author(s) contribution described in the paper (framework and typology); no empirical testing reported in the abstract.

high positive The LLM Fallacy: Misattribution in AI-Assisted Cognitive Wor... formal framework and typology coverage across domains

The rapid integration of large language models (LLMs) into everyday workflows has transformed how individuals perform cognitive tasks such as writing, programming, analysis, and multilingual communication.

Author(s) assertion based on literature review and conceptual overview; no empirical sample or experiment reported in the abstract.

high positive The LLM Fallacy: Misattribution in AI-Assisted Cognitive Wor... how individuals perform cognitive tasks (writing, programming, analysis, multili...

This work contributes to the growing body of research on digital sovereignty and the political economy of AI in frontier markets.

Author's concluding claim about the study's contribution to literature.

high positive A Framework for Sovereign AI Governance and Economic Growth ... contribution to academic/policy literature on digital sovereignty and political ...

Many advanced nations are already integrating AI into their core systems.

General descriptive statement in the paper's background/comparative context; no quantitative enumeration or country-sample provided in the excerpt.

high positive A Framework for Sovereign AI Governance and Economic Growth ... degree of AI integration into national/core systems

To fund this transition, the paper introduces a blended finance structure designed to attract multilateral banks and private venture capital.

Policy/finance architecture proposed in the paper (design description); no funding rounds, commitments, or empirical investor responses reported in the excerpt.

high positive A Framework for Sovereign AI Governance and Economic Growth ... availability/design of blended finance structure to attract lenders/investors

« Prev 1 2 3 … 143 144 145 … 276 277 Next »