Evidence (7198 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
8921 claims
Filter claims →
Productivity
8002 claims
Filter claims →
Governance
7198 claims
Filtered →
Human-AI Collaboration
6864 claims
Filter claims →
Org Design
4398 claims
Filter claims →
Innovation
4286 claims
Filter claims →
Labor Markets
3629 claims
Filter claims →
Skills & Training
3001 claims
Filter claims →
Inequality
2141 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 790 | 208 | 103 | 950 | 2117 |
| Governance & Regulation | 869 | 411 | 195 | 126 | 1630 |
| Organizational Efficiency | 817 | 202 | 126 | 87 | 1243 |
| Technology Adoption Rate | 675 | 258 | 128 | 106 | 1178 |
| Research Productivity | 462 | 138 | 64 | 347 | 1023 |
| Output Quality | 501 | 193 | 61 | 52 | 807 |
| Decision Quality | 346 | 180 | 84 | 51 | 668 |
| AI Safety & Ethics | 235 | 285 | 70 | 34 | 630 |
| Firm Productivity | 452 | 58 | 91 | 20 | 627 |
| Market Structure | 184 | 171 | 123 | 24 | 507 |
| Task Allocation | 221 | 65 | 76 | 34 | 401 |
| Skill Acquisition | 176 | 62 | 62 | 17 | 317 |
| Innovation Output | 207 | 28 | 48 | 18 | 303 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Employment Level | 105 | 56 | 108 | 13 | 284 |
| Consumer Welfare | 121 | 67 | 45 | 11 | 244 |
| Firm Revenue | 160 | 50 | 28 | 4 | 242 |
| Task Completion Time | 182 | 33 | 10 | 13 | 239 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 94 | 73 | 23 | 12 | 202 |
| Error Rate | 76 | 98 | 11 | 4 | 189 |
| Regulatory Compliance | 81 | 73 | 17 | 7 | 178 |
| Automation Exposure | 61 | 59 | 26 | 14 | 163 |
| Training Effectiveness | 97 | 21 | 14 | 19 | 153 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 21 | 1 | 117 |
| Hiring & Recruitment | 52 | 8 | 8 | 3 | 71 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 49 | 6 | 1 | 61 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 15 | 14 | — | 3 | 32 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
In multi-agent social settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time.
Conceptual claim in the paper's abstract, supported by the paper's argumentation and references to social-science literature on emergent dynamics (formal development likely in main text).
Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans (e.g., social media platforms, multi-agent LLM pipelines, autonomous robotics fleets).
Statement from the paper's abstract and motivating examples; implied supporting citation/literature review in the paper (no empirical sample size reported in abstract).
The C³ Framework provides implementable design patterns and testable propositions intended to help accounting leaders capture productivity gains from human + AI work while preserving accountability, consistency, and alignment with governance expectations in high-stakes reporting contexts.
Conclusions section stating intended practical utility; presented as intended outcomes of applying the proposed framework, not as empirically demonstrated results in this paper.
The paper proposes a role taxonomy that clarifies review responsibility, escalation thresholds, and evidence retention for human–AI collaboration in accounting.
Results section proposing a role taxonomy as part of the C³ Framework; presented as a design artifact derived from synthesis of research and guidance.
The framework specifies five mandatory control points for high-judgment use cases: source grounding and traceability, independent verification and tie-out, contradiction testing, escalation and approval, and audit-trail logging.
Results section listing five control points as mandatory design elements for high-judgment accounting use cases; conceptual recommendation from synthesis.
The paper develops the C³ Framework—Complementarity, Controls, and Competencies—which maps accounting tasks by task structure and judgment/materiality to recommend collaboration modes.
Results section: conceptual framework developed by the authors based on synthesized literature and guidance; no reported empirical validation in the abstract.
AI accelerates drafting, summarization, and pattern detection in accounting while professionals remain accountable for judgment, materiality, and defensibility in financial reporting and analysis.
Statement in paper summarizing literature and practitioner guidance (2023–2025); conceptual synthesis rather than new empirical data.
The Analysis Contract framework generalizes across domains of vibe inference through domain-specific instantiation.
Theoretical claim and conceptual generalization proposed in the paper; no cross-domain empirical tests or case studies reported.
The Analysis Contract, a proposed pre-commitment framework, can adapt the logic of pre-analysis plans and the Causal Roadmap to the AI-assisted setting by imposing three conditions before a causal claim is made: a method-data contract, a data audit, and a pre-commitment statement defining what would count as a disconfirming result.
Proposed methodological/framework contribution in the paper; described and motivated conceptually, without empirical validation or implementation evidence.
The positive effect of GGFs on digital–intelligent transformation is particularly strong for firms with robust dynamic capabilities.
Heterogeneity analysis reported in the paper comparing effects across firms with differing levels of dynamic capabilities using the DID sample of Chinese A–share listed firms (2012–2024).
The positive effect of GGFs on digital–intelligent transformation is particularly strong for firms operating in high‑tech industries.
Heterogeneity analysis reported in the paper comparing effects across industries (high‑tech vs. others) using the DID sample of Chinese A–share listed firms (2012–2024).
The positive effect of GGFs on digital–intelligent transformation is particularly strong in firms with high-quality internal controls.
Heterogeneity analysis reported in the paper comparing effects across firms with different internal control quality using the DID sample of Chinese A–share listed firms (2012–2024).
GGFs promote firms’ digital–intelligent transformation by encouraging knowledge spillovers.
Mechanism analysis reported in the paper that identifies knowledge spillovers as a channel from GGFs to firm-level digital–intelligent transformation, using the DID framework on Chinese A–share listed firms (2012–2024).
GGFs promote firms’ digital–intelligent transformation by transmitting policy guidance.
Mechanism analysis reported in the paper indicating a pathway from GGFs to firm transformation via policy guidance channels, based on the DID sample of Chinese A–share listed firms (2012–2024).
GGFs promote firms’ digital–intelligent transformation by easing firms' financing constraints.
Mechanism analysis reported in the paper (mediation / pathway analysis tied to the DID framework) using the same sample of Chinese A–share listed firms (2012–2024).
Government-guided funds (GGFs) significantly promote firms’ digital–intelligent transformation.
Difference-in-differences (DID) analysis applied to Chinese A–share listed firms over 2012–2024, as reported in the paper's main empirical results.
Broader equity markets, proxied by the S&P 500, remain the dominant source of spillovers throughout the sample period.
Directional spillover results from the TVP-VAR indicating the S&P 500 has the largest and persistent net outward spillover contributions over the full sample.
AI-related equities initially act as net transmitters of shocks.
Directional spillover measures from the TVP-VAR showing AI equity group had positive net directional connectedness early in the sample.
The results imply an urgency of early intervention in AI-driven economies to avoid extreme inequality and loss of redistribution options.
Synthesis and policy discussion in the paper based on the finite-time singularity, super-exponential divergence of wealth ratios, and the policy-irreversibility result.
Under mild conditions, the system exhibits a finite-time singularity where AI capability, AI capital, and financial capital diverge.
Analytical dynamical-systems analysis and proofs in the paper demonstrating finite-time blow-up (singularity) of A (AI capability), K_a (AI capital), and K_f (financial capital) for parameter ranges satisfying the stated mild conditions.
U.S. lawmakers and agencies have advanced standards, testing, and procurement oversight related to AI as the AGI race tightens.
Reported in the paper as a synthesis of recent policy and agency activity (standards, testing programs, procurement oversight); descriptive summary rather than a quantified empirical analysis (no sample size reported).
So far in 2026, agentic coding automation has advanced, with tools that enable end-to-end planning, coding, and debugging.
Asserted in the paper as an observed trend through 2026, based on examples of tooling and product announcements; presented descriptively without a stated empirical sample or controlled evaluation.
Milestones in 2025 also include early regulatory actions.
Reported in the paper's synthesis of 2025 events; based on review of policy developments and announcements rather than a quantitative evaluation (no sample size reported).
Milestones in 2025 highlight the broad adoption of multimodal and agentic AI.
Stated in the paper as part of a narrative synthesis of 2025 milestones; presented as an observational summary drawing on literature, industry reports and documented deployments rather than a systematic empirical study (no sample size or statistical analysis reported).
Adopting a critical software studies perspective enables the authors to offer final recommendations for socio-technical development programmes that could plausibly move toward AGI-adjacent capability while meeting requirements for transparency, moderation, wellbeing and sustainable business models.
Stated conclusion/intent in the paper's introduction that the chosen perspective allows the production of concrete recommendations; presented as a programmatic claim rather than empirically demonstrated in the excerpt.
Long-term disparities can vanish under simple investment policies that achieve a low Price of Fairness.
Theoretical analysis of a sequential selection model showing dynamics under 'investment' policies lead to convergence of group distributions and low PoF.
The semantic-envelope metric exhibits no such violation in the tested instances.
Empirical/experimental results reported in the paper showing that the semantic-envelope metric did not produce the violations observed for the fragile metric in the tested instances (using the three verification approaches).
The paper checks the claims at three levels: exhaustive enumeration on a finite-state grid of mixed strategies; an SMT encoding in Z3 cross-replayed in cvc5; and a bounded single-player MDP encoded in PRISM-games.
Experimental/verification methodology reported in the paper describing three complementary computational checks (finite-state exhaustive enumeration, SMT encodings in Z3 and cvc5, and PRISM-games MDP encoding).
A class-stratified certificate H*(x) ≤ (1/\hatα) M_{Env(m)}(x) + \barη holds for every platform strategy, with \barη absorbing annotation and protocol error.
Formal theorem in the paper deriving a class-stratified certificate (inequality) based on the semantic-envelope metric and accounting terms for annotation and protocol error.
The semantic-envelope lift, which assigns each variant the maximum score in its class, is the unique pointwise minimum among conservative classwise-constant repairs.
Formal theoretical characterization/proof in the paper showing uniqueness and pointwise minimality of the envelope repair within the classwise-constant conservative repairs.
We release a reproducible simulator with a small, extensible Python interface to support empirical study.
Software artifact claim in the paper: reproducible simulator described and (implicitly) provided with a minimal Python API for extensibility and reproducibility.
We provide an initial library of five auditee strategies (Delay, Drift, Cherry-pick, Attrition, OffAuditDrift) and five auditor policies, calibrated to summary statistics from published audits of the DSA Transparency Database.
Empirical calibration and simulation: paper reports calibration of strategy/policy parameters to summary statistics from published DSA Transparency Database audits and includes a library of five auditee strategies and five auditor policies.
We formalize continuous auditing as a T-round Stackelberg game between an auditor that commits to a temporal policy and an adaptive auditee.
Theoretical/modeling contribution in the paper: formal game-theoretic model (T-round Stackelberg game) described and used as analytic framework.
The main finding (that the reform increases grain yield) is robust to multiple checks, including parallel trend tests, placebo tests, propensity score matching DID (PSM-DID), and exclusion of special samples.
Battery of robustness tests reported in the paper: parallel trend tests, placebo tests, PSM-DID estimation, and analyses excluding special samples.
The grain-yield-enhancing effect is stronger in areas with stronger environmental regulation intensity.
Heterogeneity analysis in the paper comparing regions by environmental regulation intensity.
The grain-yield-enhancing effect is stronger in regions with higher levels of digital economy development.
Heterogeneity analysis dividing sample by regional digital economy development level.
The grain-yield-enhancing effect of the water resource tax reform is more pronounced in non-major grain-producing areas.
Heterogeneity analysis reported in the study comparing effects across major vs. non-major grain-producing regions.
The reform enhances regional green innovation, which contributes to higher grain yield by strengthening water-use efficiency and agricultural productivity.
Mechanism analysis presented in the study showing increases in measures of regional green innovation after the tax reform.
The water resource tax reform significantly increases grain yield.
Quasi-natural experiment using the pilot 'fee-to-tax' reform; panel dataset of Chinese prefecture-level cities, 2013–2019; multi-period difference-in-differences (DID) estimation supplemented by double machine learning and multiple robustness tests.
The final (Trace-Prior RL) policy matches Hotel B's RevPAR, occupancy, ADR, and price distribution within seed-level uncertainty, while still optimizing Hotel A's own reward.
Experimental results in the two-hotel simulator comparing Trace-Prior RL policy metrics to Hotel B benchmarks; reported alignment within seed-level uncertainty across multiple trace metrics while maintaining Hotel A reward optimization.
Trace-Prior RL: learn a distributional market prior from lagged market traces, then train a stochastic pricing policy with a RevPAR reward and a KL penalty to the learned prior.
Algorithmic method introduced and implemented in the simulator experiments; description of learning a distributional prior from lagged traces and training with KL penalty and RevPAR reward.
We introduce a trace-level diagnostic protocol using RevPAR, occupancy, ADR, full price-bucket distributions, L1/JS distances, and seed-level confidence intervals.
Methodological contribution proposed in the paper: a diagnostic protocol composed of listed trace-level metrics applied to simulator experiments.
Prompt refinements and deterministic routing guards guided by ASR diagnostics yield substantial TSR improvements, with gains up to +93.8 percentage points for previously struggling models.
Reported intervention experiments where authors used ASR diagnostics to refine prompts and add deterministic routing guards, observing TSR improvements up to +93.8 percentage points.
GPT-5.2 achieves perfect ASR.
Model-level evaluation reported in the paper indicating GPT-5.2 attained perfect ASR under the HMASP tests.
We introduce the Agentic Success Rate (ASR), a trajectory-fidelity metric that compares observed and expected agent execution sequences at the transition level, decomposing performance into Transition Recall and Transition Precision.
Methodological contribution described in the paper (definition of a new metric and its components).
LLM-based multi-agent systems are increasingly deployed for payment workflows.
Statement in the paper's introduction/abstract framing; no empirical deployment data or sample size provided.
Under conditions of strong productivity growth, high-skill complementarity, low obsolescence, and broad ownership, automation raises output, capital, and consumption.
Comparative-static results from the heterogeneous-agent general-equilibrium model calibrated/analyzed under parameter configurations (strong productivity growth, high-skill complementarity, low obsolescence, broad ownership).
Automation raises productivity.
Analytical results from a theoretical framework: a static benchmark and a stationary heterogeneous-agent general equilibrium model in which firms choose automation from a profit function and final-good production is Cobb–Douglas.
DePAI offers a path to scalable, resilient self-organization that integrates physical infrastructure, AI, and community ownership under transparent rules, on-chain incentives, and permissionless participation, aiming to preserve human autonomy.
Normative/conceptual claim and argument based on the proposed architecture and incentive design; presented without empirical evaluation.
These elements specify workflows that couple machine execution with human oversight, enabling enhanced self-organization of techno-socio-economic systems, which we call DePAI.
Theoretical workflow specification and argumentation in the paper; no reported experimental or observational validation.