Evidence (4189 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	761	200	101	904	2020
Governance & Regulation	829	400	191	122	1566
Organizational Efficiency	784	193	125	84	1197
Technology Adoption Rate	637	236	124	97	1103
Research Productivity	431	131	58	340	972
Output Quality	481	183	59	47	770
Decision Quality	332	177	82	49	647
Firm Productivity	439	57	88	20	610
AI Safety & Ethics	218	279	66	33	602
Market Structure	181	170	123	24	503
Task Allocation	214	64	72	33	388
Skill Acquisition	174	62	62	17	315
Innovation Output	204	27	45	18	295
Employment Level	105	54	108	13	282
Fiscal & Macroeconomic	132	69	43	26	277
Consumer Welfare	117	63	42	11	233
Firm Revenue	154	48	26	3	231
Task Completion Time	173	31	8	12	225
Inequality Measures	44	123	50	6	223
Worker Satisfaction	89	65	22	12	188
Error Rate	71	92	10	2	175
Regulatory Compliance	77	69	14	5	165
Automation Exposure	58	56	26	13	156
Training Effectiveness	96	21	14	19	152
Wages & Compensation	77	37	25	6	145
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	81	21	1	115
Hiring & Recruitment	52	7	8	3	70
Creative Output	32	20	8	3	64
Skill Obsolescence	5	47	6	1	59
Social Protection	28	16	8	2	54
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

Frontier directions include differentiable token budgets and dynamic markets to lay the theoretical foundation for scalable next-generation agent systems.

Paper's conclusion/recommendations based on surveyed literature and identified gaps; presented as proposed future research directions rather than empirically validated findings.

high positive Token Economics for LLM Agents: A Dual-View Study from Compu... proposal of differentiable token budgets and dynamic markets as key research fro...

Security: Internalizing adversarial threats as endogenous economic constraints.

Authors argue for modeling adversarial threats within the economic/tokens framework as endogenous constraints; conceptual/theoretical claim from the survey.

high positive Token Economics for LLM Agents: A Dual-View Study from Compu... treatment of adversarial threats as endogenous constraints in token economics mo...

Macro-level (Agent Ecosystems): Addressing congestion externalities and pricing via mechanism design.

Paper posits mechanism-design approaches to tackle congestion externalities and pricing in agent ecosystems; conceptual proposal based on economic theory and literature synthesis.

high positive Token Economics for LLM Agents: A Dual-View Study from Compu... mitigation of congestion externalities and improved pricing in agent ecosystems

Meso-level (Multi-Agent Systems): Minimizing collaboration friction using transaction cost and principal-agent theories.

Authors propose applying transaction-cost and principal-agent frameworks to multi-agent token interactions; presented as a theoretical taxonomy/synthesis without reported empirical sample.

high positive Token Economics for LLM Agents: A Dual-View Study from Compu... reduction of collaboration friction in multi-agent systems through economic-theo...

Micro-level (Single Agent): Optimizing budget-constrained factor substitution via neoclassical firm theory.

The paper asserts a micro-level taxonomy using neoclassical firm theory to model single-agent token-budget optimization; presented as conceptual/theoretical mapping rather than empirical test.

high positive Token Economics for LLM Agents: A Dual-View Study from Compu... ability to optimize budget-constrained factor substitution at single-agent level

We conceptualize tokens as production factors, exchange mediums, and units of account.

Paper provides a conceptual taxonomy framing tokens in three economic roles; based on theoretical argumentation and literature synthesis.

high positive Token Economics for LLM Agents: A Dual-View Study from Compu... conceptual framing of tokens into three economic roles

This survey presents the first comprehensive survey of Token Economics.

Author claim of novelty in the paper (self-declared 'first comprehensive survey'); based on the authors' scope and coverage comparison to prior literature as described in the manuscript.

high positive Token Economics for LLM Agents: A Dual-View Study from Compu... comprehensiveness and novelty of the survey

Tokens have emerged as the core economic primitives of Agentic AI.

Author assertion in the paper's introduction/abstract; supported by conceptual synthesis of agentic AI literature (survey/mapping rather than original empirical data).

high positive Token Economics for LLM Agents: A Dual-View Study from Compu... recognition of tokens as core economic primitives in agentic AI

An AI Workflow Store of hardened and reusable workflows would allow agents to invoke workflows with far greater reliability and security than improvised tool chains.

Vision/proposal in the paper advocating an AI Workflow Store as a solution; presented conceptually without experimental or deployment evidence.

high positive Engineering Robustness into Personal Agents with the AI Work... reliability and security of agent-invoked workflows

Integrating rigorous software engineering processes into the agentic loop will produce production-grade, hardened, and deterministically-constrained agent workflows that substantially outperform brittle on-the-fly synthesis.

Prescriptive claim / proposed hypothesis in the paper advocating integration of SE practices into agent workflows; offered as a reasoned proposal without empirical results.

high positive Engineering Robustness into Personal Agents with the AI Work... workflow reliability/security and overall performance compared to on-the-fly syn...

The study draws policy implications for EU Cohesion programming and Sustainable Development Goals 4, 8, 9, 10, and 17.

Paper explicitly states policy implications and links to specific SDGs in its conclusions.

high positive Artificial Intelligence, Social Capital, and Sustainable Emp... policy_relevance_to_SDGs_and_cohesion_programming

External technology partnerships, targeted education, and economic incentives operate as enablers [of AI adoption], all mediated by social and human capital availability.

Thematic analysis of interview data identifying these factors as enabling AI adoption, with mediation by social/human capital.

high positive Artificial Intelligence, Social Capital, and Sustainable Emp... enablers_of_AI_adoption

Team-based ventures are increasingly dominant in the top tiers of platform rankings.

Ranking-tier analysis in the Product Hunt dataset showing an increasing share of team-founded launches among top-tier (highest-ranked) products over the study period.

high positive Generative AI Fuels Solo Entrepreneurship, but Teams Still L... share of team-founded ventures among top-tier (highest-ranked) launches

The increase in entrepreneurial entry was driven disproportionately by solo entrepreneurs.

Same Product Hunt dataset (>160,000 launches) with analysis of launch ownership structure showing a larger post-release increase in launches by solo founders relative to teams.

high positive Generative AI Fuels Solo Entrepreneurship, but Teams Still L... share or count of launches by solo entrepreneurs

Entrepreneurial entry increased sharply following the public release of ChatGPT-3.5.

Analysis of over 160,000 product launches on Product Hunt comparing entry rates before and after the public release of ChatGPT-3.5 (event-study / pre-post comparison across the platform).

high positive Generative AI Fuels Solo Entrepreneurship, but Teams Still L... entrepreneurial entry (count of product launches)

The framework and results are developed/applied to two instances: AI agent oversight (motivating setting) and marketplace operation (a parallel mechanism-design domain).

Paper includes two instantiated examples/applications illustrating the formal framework: one in AI agent oversight and one in marketplace operation (illustrative case studies within the theoretical paper).

high positive The Endogeneity of Miscalibration: Impossibility and Escape ... applicability of theoretical results to AI oversight and marketplace operation d...

A constructive escape exists: a step-function approval threshold achieves first-best screening for every strictly proper scoring rule, because the agent's binary inflate-or-not choice creates a type-space threshold regardless of the generator's curvature.

Constructive existence proof in the paper showing a step-function approval rule that attains first-best screening; analytical argument based on agent's binary inflate/not strategy.

high positive The Endogeneity of Miscalibration: Impossibility and Escape ... achievement of first-best screening / principal welfare under step-function appr...

The principal's optimal oversight necessarily uses a non-affine approval function to screen types.

Analytical result derived from the paper's formal principal-agent model and optimization of the principal's objective (theoretical proof).

high positive The Endogeneity of Miscalibration: Impossibility and Escape ... shape of the approval function used in optimal oversight (affine vs. non-affine)

The paper articulates a research agenda for how MASS should be modeled, evaluated and governed.

Stated in the abstract (position paper concludes with an articulated research agenda); evidence is the discussion and proposed agenda sections in the paper.

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... proposed research directions for modeling, evaluation and governance of MASS

The importance of each structural prior is demonstrated through formal propositions.

Methodological claim in the abstract that the paper provides formal propositions demonstrating the role/importance of the four priors; evidence contained in proofs/propositions within the paper.

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... formal/theoretical demonstration of the role of each structural prior

MASS is represented as a class of dynamical systems of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability.

Descriptive claim from the abstract about the formal structure of MASS; supported by the framework and definitions presented in the paper (formal/modeling content).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... formal representation of multi-agent dynamics via four structural priors

The paper formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.

Direct methodological claim in abstract indicating the authors present a formal framework (MASS) in the paper; evidence consists of the formalization provided in the paper (propositions, definitions).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... formal modeling of agent interactions and system-level outcomes

Agentic AI systems must be modeled with social theory as a structural prior.

Normative / prescriptive claim from the paper's abstract (position paper arguing for this modeling choice; supported by the authors' theoretical arguments and formal framework in the paper).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... modeling approach for agentic AI systems (use of social-theory structural priors...

Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts.

Historical/contextual claim in the abstract; supported by reference to social-science literature (no sample size; general scholarly consensus).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... existence of a body of social-science research on emergent group dynamics

In multi-agent social settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time.

Conceptual claim in the paper's abstract, supported by the paper's argumentation and references to social-science literature on emergent dynamics (formal development likely in main text).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... emergent system-level behavior resulting from agent interactions

Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans (e.g., social media platforms, multi-agent LLM pipelines, autonomous robotics fleets).

Statement from the paper's abstract and motivating examples; implied supporting citation/literature review in the paper (no empirical sample size reported in abstract).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... deployment prevalence of agentic AI inside social environments (multi-agent sett...

The C³ Framework provides implementable design patterns and testable propositions intended to help accounting leaders capture productivity gains from human + AI work while preserving accountability, consistency, and alignment with governance expectations in high-stakes reporting contexts.

Conclusions section stating intended practical utility; presented as intended outcomes of applying the proposed framework, not as empirically demonstrated results in this paper.

high positive Collaborative Intelligence in Accounting: A Human + AI Compl... organizational_efficiency

The paper proposes a role taxonomy that clarifies review responsibility, escalation thresholds, and evidence retention for human–AI collaboration in accounting.

Results section proposing a role taxonomy as part of the C³ Framework; presented as a design artifact derived from synthesis of research and guidance.

high positive Collaborative Intelligence in Accounting: A Human + AI Compl... task_allocation

The framework specifies five mandatory control points for high-judgment use cases: source grounding and traceability, independent verification and tie-out, contradiction testing, escalation and approval, and audit-trail logging.

Results section listing five control points as mandatory design elements for high-judgment accounting use cases; conceptual recommendation from synthesis.

high positive Collaborative Intelligence in Accounting: A Human + AI Compl... governance_and_regulation

The paper develops the C³ Framework—Complementarity, Controls, and Competencies—which maps accounting tasks by task structure and judgment/materiality to recommend collaboration modes.

Results section: conceptual framework developed by the authors based on synthesized literature and guidance; no reported empirical validation in the abstract.

high positive Collaborative Intelligence in Accounting: A Human + AI Compl... task_allocation

AI accelerates drafting, summarization, and pattern detection in accounting while professionals remain accountable for judgment, materiality, and defensibility in financial reporting and analysis.

Statement in paper summarizing literature and practitioner guidance (2023–2025); conceptual synthesis rather than new empirical data.

high positive Collaborative Intelligence in Accounting: A Human + AI Compl... task_completion_time

AI tools can serve as valuable aids in task splitting, provided there is human oversight to filter out irrelevant tasks.

Paper's conclusion synthesizing experimental results and participant feedback, recommending human-in-the-loop oversight when using AI for task-splitting.

high positive Splitting User Stories Into Tasks with AI -- A Foe or an All... effectiveness of AI-assisted task-splitting under human oversight

Participants favored a hybrid approach, combining AI tools with conventional methods to maintain high accuracy in planning.

Participant preferences and qualitative feedback reported from the controlled experiment indicating preference for combining AI assistance with human methods; sample size not provided.

high positive Splitting User Stories Into Tasks with AI -- A Foe or an All... participant preference for planning approach / planning accuracy

AI-assisted approaches can help ensure no important tasks are overlooked during task-splitting.

Reported finding from the experiment indicating AI assistance reduced omissions in task lists (paper statement based on experiment and participant observations); sample size not stated.

high positive Splitting User Stories Into Tasks with AI -- A Foe or an All... task omission rate / completeness of task lists

AI-assisted approaches can generate more granular task lists than traditional methods.

Experimental comparison reported in the paper showing AI-generated task lists were more granular (based on task lists produced during the controlled experiment); sample size not provided in summary.

high positive Splitting User Stories Into Tasks with AI -- A Foe or an All... task list granularity

Adopting a critical software studies perspective enables the authors to offer final recommendations for socio-technical development programmes that could plausibly move toward AGI-adjacent capability while meeting requirements for transparency, moderation, wellbeing and sustainable business models.

Stated conclusion/intent in the paper's introduction that the chosen perspective allows the production of concrete recommendations; presented as a programmatic claim rather than empirically demonstrated in the excerpt.

high positive Pathways to AGI ability_to_propose_recommendations_for_socio-technical_programmes

The evaluation covers multiple collaborative tasks and a variety of base LLM models.

Paper states experiments were run across multiple collaborative tasks and a variety of base models (breadth of evaluation).

high positive Improving the Efficiency of Language Agent Teams with Adapti... evaluation breadth (number/types of tasks and models)

The LATTE protocol maintains consistency under partial observability and communication constraints while enabling dynamic allocation and adaptation.

Design claim supported by the protocol description and reported empirical results demonstrating consistent coordination under constrained conditions.

high positive Improving the Efficiency of Language Agent Teams with Adapti... consistency of coordination under partial observability/communication constraint...

LATTE empowers agents to dynamically allocate work, adapt coordination, and discover new tasks.

Claim supported by the framework design and demonstrations in the paper (agents use the coordination graph to reassign and discover tasks during execution).

high positive Improving the Efficiency of Language Agent Teams with Adapti... dynamic task allocation / discovery

LATTE matches or exceeds the accuracy of standard designs including MetaGPT, decentralized teams, top-down Leader-Worker hierarchies, and static decompositions.

Reported accuracy comparisons from empirical experiments across several collaborative tasks and base models.

high positive Improving the Efficiency of Language Agent Teams with Adapti... accuracy (output quality)

LATTE reduces coordination failures such as file conflicts and redundant outputs.

Empirical evaluation comparing incidence of coordination failures between LATTE and baseline team coordination approaches.

high positive Improving the Efficiency of Language Agent Teams with Adapti... coordination failures (file conflicts, redundant outputs)

LATTE reduces communication (and communication overhead) compared to standard designs.

Empirical comparisons reported across multiple collaborative tasks and base models, measuring communication and coordination metrics.

high positive Improving the Efficiency of Language Agent Teams with Adapti... communication / communication overhead

LATTE reduces wall-clock time compared to standard designs.

Empirical evaluation across multiple collaborative tasks and various base models with time measurements reported in comparisons to baselines.

high positive Improving the Efficiency of Language Agent Teams with Adapti... wall-clock time (task completion time)

LATTE reduces token usage compared to standard designs (including MetaGPT, decentralized teams, top-down Leader-Worker hierarchies, and static decompositions).

Empirical evaluation across multiple collaborative tasks and a variety of base models, comparing LATTE to listed baseline designs.

high positive Improving the Efficiency of Language Agent Teams with Adapti... token usage

In LATTE, a team of agents collaboratively construct and maintain a shared, evolving coordination graph which encodes sub-task dependencies, individual agent assignment, and the current state of sub-task progress.

Paper describes the protocol and its components (design/specification); supported by implementation details in the paper.

high positive Improving the Efficiency of Language Agent Teams with Adapti... task_allocation and coordination state (coordination graph)

We introduce Language Agent Teams for Task Evolution (LATTE), a framework for coordinating LLM teams inspired by distributed systems.

Paper describes the LATTE framework as a proposed coordination protocol (design/conceptual contribution).

high positive Improving the Efficiency of Language Agent Teams with Adapti... framework introduction / coordination protocol

We release a reproducible simulator with a small, extensible Python interface to support empirical study.

Software artifact claim in the paper: reproducible simulator described and (implicitly) provided with a minimal Python API for extensibility and reproducibility.

high positive A Benchmark for Strategic Auditee Gaming Under Continuous Co... availability of a reproducible simulation tool and Python interface

We provide an initial library of five auditee strategies (Delay, Drift, Cherry-pick, Attrition, OffAuditDrift) and five auditor policies, calibrated to summary statistics from published audits of the DSA Transparency Database.

Empirical calibration and simulation: paper reports calibration of strategy/policy parameters to summary statistics from published DSA Transparency Database audits and includes a library of five auditee strategies and five auditor policies.

high positive A Benchmark for Strategic Auditee Gaming Under Continuous Co... availability of calibrated strategy/policy library and calibration to DSA summar...

We formalize continuous auditing as a T-round Stackelberg game between an auditor that commits to a temporal policy and an adaptive auditee.

Theoretical/modeling contribution in the paper: formal game-theoretic model (T-round Stackelberg game) described and used as analytic framework.

high positive A Benchmark for Strategic Auditee Gaming Under Continuous Co... game-theoretic representation of auditor-auditee interaction (model formalizatio...

DePAI offers a path to scalable, resilient self-organization that integrates physical infrastructure, AI, and community ownership under transparent rules, on-chain incentives, and permissionless participation, aiming to preserve human autonomy.

Normative/conceptual claim and argument based on the proposed architecture and incentive design; presented without empirical evaluation.

high positive DAO-enabled decentralized physical AI: A new paradigm for hu... scalability and resilience of self-organization, integration of infrastructure/A...

« Prev 1 2 3 … 35 36 37 … 83 84 Next »