Evidence (4189 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
Frontier directions include differentiable token budgets and dynamic markets to lay the theoretical foundation for scalable next-generation agent systems.
Paper's conclusion/recommendations based on surveyed literature and identified gaps; presented as proposed future research directions rather than empirically validated findings.
Security: Internalizing adversarial threats as endogenous economic constraints.
Authors argue for modeling adversarial threats within the economic/tokens framework as endogenous constraints; conceptual/theoretical claim from the survey.
Macro-level (Agent Ecosystems): Addressing congestion externalities and pricing via mechanism design.
Paper posits mechanism-design approaches to tackle congestion externalities and pricing in agent ecosystems; conceptual proposal based on economic theory and literature synthesis.
Meso-level (Multi-Agent Systems): Minimizing collaboration friction using transaction cost and principal-agent theories.
Authors propose applying transaction-cost and principal-agent frameworks to multi-agent token interactions; presented as a theoretical taxonomy/synthesis without reported empirical sample.
Micro-level (Single Agent): Optimizing budget-constrained factor substitution via neoclassical firm theory.
The paper asserts a micro-level taxonomy using neoclassical firm theory to model single-agent token-budget optimization; presented as conceptual/theoretical mapping rather than empirical test.
We conceptualize tokens as production factors, exchange mediums, and units of account.
Paper provides a conceptual taxonomy framing tokens in three economic roles; based on theoretical argumentation and literature synthesis.
This survey presents the first comprehensive survey of Token Economics.
Author claim of novelty in the paper (self-declared 'first comprehensive survey'); based on the authors' scope and coverage comparison to prior literature as described in the manuscript.
Tokens have emerged as the core economic primitives of Agentic AI.
Author assertion in the paper's introduction/abstract; supported by conceptual synthesis of agentic AI literature (survey/mapping rather than original empirical data).
An AI Workflow Store of hardened and reusable workflows would allow agents to invoke workflows with far greater reliability and security than improvised tool chains.
Vision/proposal in the paper advocating an AI Workflow Store as a solution; presented conceptually without experimental or deployment evidence.
Integrating rigorous software engineering processes into the agentic loop will produce production-grade, hardened, and deterministically-constrained agent workflows that substantially outperform brittle on-the-fly synthesis.
Prescriptive claim / proposed hypothesis in the paper advocating integration of SE practices into agent workflows; offered as a reasoned proposal without empirical results.
The study draws policy implications for EU Cohesion programming and Sustainable Development Goals 4, 8, 9, 10, and 17.
Paper explicitly states policy implications and links to specific SDGs in its conclusions.
External technology partnerships, targeted education, and economic incentives operate as enablers [of AI adoption], all mediated by social and human capital availability.
Thematic analysis of interview data identifying these factors as enabling AI adoption, with mediation by social/human capital.
Team-based ventures are increasingly dominant in the top tiers of platform rankings.
Ranking-tier analysis in the Product Hunt dataset showing an increasing share of team-founded launches among top-tier (highest-ranked) products over the study period.
The increase in entrepreneurial entry was driven disproportionately by solo entrepreneurs.
Same Product Hunt dataset (>160,000 launches) with analysis of launch ownership structure showing a larger post-release increase in launches by solo founders relative to teams.
Entrepreneurial entry increased sharply following the public release of ChatGPT-3.5.
Analysis of over 160,000 product launches on Product Hunt comparing entry rates before and after the public release of ChatGPT-3.5 (event-study / pre-post comparison across the platform).
The framework and results are developed/applied to two instances: AI agent oversight (motivating setting) and marketplace operation (a parallel mechanism-design domain).
Paper includes two instantiated examples/applications illustrating the formal framework: one in AI agent oversight and one in marketplace operation (illustrative case studies within the theoretical paper).
A constructive escape exists: a step-function approval threshold achieves first-best screening for every strictly proper scoring rule, because the agent's binary inflate-or-not choice creates a type-space threshold regardless of the generator's curvature.
Constructive existence proof in the paper showing a step-function approval rule that attains first-best screening; analytical argument based on agent's binary inflate/not strategy.
The principal's optimal oversight necessarily uses a non-affine approval function to screen types.
Analytical result derived from the paper's formal principal-agent model and optimization of the principal's objective (theoretical proof).
The paper articulates a research agenda for how MASS should be modeled, evaluated and governed.
Stated in the abstract (position paper concludes with an articulated research agenda); evidence is the discussion and proposed agenda sections in the paper.
The importance of each structural prior is demonstrated through formal propositions.
Methodological claim in the abstract that the paper provides formal propositions demonstrating the role/importance of the four priors; evidence contained in proofs/propositions within the paper.
MASS is represented as a class of dynamical systems of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability.
Descriptive claim from the abstract about the formal structure of MASS; supported by the framework and definitions presented in the paper (formal/modeling content).
The paper formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.
Direct methodological claim in abstract indicating the authors present a formal framework (MASS) in the paper; evidence consists of the formalization provided in the paper (propositions, definitions).
Agentic AI systems must be modeled with social theory as a structural prior.
Normative / prescriptive claim from the paper's abstract (position paper arguing for this modeling choice; supported by the authors' theoretical arguments and formal framework in the paper).
Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts.
Historical/contextual claim in the abstract; supported by reference to social-science literature (no sample size; general scholarly consensus).
In multi-agent social settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time.
Conceptual claim in the paper's abstract, supported by the paper's argumentation and references to social-science literature on emergent dynamics (formal development likely in main text).
Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans (e.g., social media platforms, multi-agent LLM pipelines, autonomous robotics fleets).
Statement from the paper's abstract and motivating examples; implied supporting citation/literature review in the paper (no empirical sample size reported in abstract).
The C³ Framework provides implementable design patterns and testable propositions intended to help accounting leaders capture productivity gains from human + AI work while preserving accountability, consistency, and alignment with governance expectations in high-stakes reporting contexts.
Conclusions section stating intended practical utility; presented as intended outcomes of applying the proposed framework, not as empirically demonstrated results in this paper.
The paper proposes a role taxonomy that clarifies review responsibility, escalation thresholds, and evidence retention for human–AI collaboration in accounting.
Results section proposing a role taxonomy as part of the C³ Framework; presented as a design artifact derived from synthesis of research and guidance.
The framework specifies five mandatory control points for high-judgment use cases: source grounding and traceability, independent verification and tie-out, contradiction testing, escalation and approval, and audit-trail logging.
Results section listing five control points as mandatory design elements for high-judgment accounting use cases; conceptual recommendation from synthesis.
The paper develops the C³ Framework—Complementarity, Controls, and Competencies—which maps accounting tasks by task structure and judgment/materiality to recommend collaboration modes.
Results section: conceptual framework developed by the authors based on synthesized literature and guidance; no reported empirical validation in the abstract.
AI accelerates drafting, summarization, and pattern detection in accounting while professionals remain accountable for judgment, materiality, and defensibility in financial reporting and analysis.
Statement in paper summarizing literature and practitioner guidance (2023–2025); conceptual synthesis rather than new empirical data.
AI tools can serve as valuable aids in task splitting, provided there is human oversight to filter out irrelevant tasks.
Paper's conclusion synthesizing experimental results and participant feedback, recommending human-in-the-loop oversight when using AI for task-splitting.
Participants favored a hybrid approach, combining AI tools with conventional methods to maintain high accuracy in planning.
Participant preferences and qualitative feedback reported from the controlled experiment indicating preference for combining AI assistance with human methods; sample size not provided.
AI-assisted approaches can help ensure no important tasks are overlooked during task-splitting.
Reported finding from the experiment indicating AI assistance reduced omissions in task lists (paper statement based on experiment and participant observations); sample size not stated.
AI-assisted approaches can generate more granular task lists than traditional methods.
Experimental comparison reported in the paper showing AI-generated task lists were more granular (based on task lists produced during the controlled experiment); sample size not provided in summary.
Adopting a critical software studies perspective enables the authors to offer final recommendations for socio-technical development programmes that could plausibly move toward AGI-adjacent capability while meeting requirements for transparency, moderation, wellbeing and sustainable business models.
Stated conclusion/intent in the paper's introduction that the chosen perspective allows the production of concrete recommendations; presented as a programmatic claim rather than empirically demonstrated in the excerpt.
The evaluation covers multiple collaborative tasks and a variety of base LLM models.
Paper states experiments were run across multiple collaborative tasks and a variety of base models (breadth of evaluation).
The LATTE protocol maintains consistency under partial observability and communication constraints while enabling dynamic allocation and adaptation.
Design claim supported by the protocol description and reported empirical results demonstrating consistent coordination under constrained conditions.
LATTE empowers agents to dynamically allocate work, adapt coordination, and discover new tasks.
Claim supported by the framework design and demonstrations in the paper (agents use the coordination graph to reassign and discover tasks during execution).
LATTE matches or exceeds the accuracy of standard designs including MetaGPT, decentralized teams, top-down Leader-Worker hierarchies, and static decompositions.
Reported accuracy comparisons from empirical experiments across several collaborative tasks and base models.
LATTE reduces coordination failures such as file conflicts and redundant outputs.
Empirical evaluation comparing incidence of coordination failures between LATTE and baseline team coordination approaches.
LATTE reduces communication (and communication overhead) compared to standard designs.
Empirical comparisons reported across multiple collaborative tasks and base models, measuring communication and coordination metrics.
LATTE reduces wall-clock time compared to standard designs.
Empirical evaluation across multiple collaborative tasks and various base models with time measurements reported in comparisons to baselines.
LATTE reduces token usage compared to standard designs (including MetaGPT, decentralized teams, top-down Leader-Worker hierarchies, and static decompositions).
Empirical evaluation across multiple collaborative tasks and a variety of base models, comparing LATTE to listed baseline designs.
In LATTE, a team of agents collaboratively construct and maintain a shared, evolving coordination graph which encodes sub-task dependencies, individual agent assignment, and the current state of sub-task progress.
Paper describes the protocol and its components (design/specification); supported by implementation details in the paper.
We introduce Language Agent Teams for Task Evolution (LATTE), a framework for coordinating LLM teams inspired by distributed systems.
Paper describes the LATTE framework as a proposed coordination protocol (design/conceptual contribution).
We release a reproducible simulator with a small, extensible Python interface to support empirical study.
Software artifact claim in the paper: reproducible simulator described and (implicitly) provided with a minimal Python API for extensibility and reproducibility.
We provide an initial library of five auditee strategies (Delay, Drift, Cherry-pick, Attrition, OffAuditDrift) and five auditor policies, calibrated to summary statistics from published audits of the DSA Transparency Database.
Empirical calibration and simulation: paper reports calibration of strategy/policy parameters to summary statistics from published DSA Transparency Database audits and includes a library of five auditee strategies and five auditor policies.
We formalize continuous auditing as a T-round Stackelberg game between an auditor that commits to a temporal policy and an adaptive auditee.
Theoretical/modeling contribution in the paper: formal game-theoretic model (T-round Stackelberg game) described and used as analytic framework.
DePAI offers a path to scalable, resilient self-organization that integrates physical infrastructure, AI, and community ownership under transparent rules, on-chain incentives, and permissionless participation, aiming to preserve human autonomy.
Normative/conceptual claim and argument based on the proposed architecture and incentive design; presented without empirical evaluation.