Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
It empowers owners of data and code.
Explicit claim in the abstract asserting a power shift toward those who own data and code; presented as a conceptual conclusion from the authors' reflection and examples.
Global professional service firms are actively developing TaxTech to capture this market.
Direct statement in the abstract indicating market activity by global professional service firms; presented as an observed trend rather than supported by reported empirical data in the abstract.
Technological leaps in the algorithmic processing of information are providing financial actors with new opportunities for transnational financial and legal management that optimize asset allocation.
Stated as a conceptual observation in the paper's abstract; no empirical sample, presented as a general claim about technological change and its opportunities for financial actors.
I have developed LLMbench, a research instrument for the comparative close reading of LLM outputs that visualises token probability distributions, entropy curves, and cross-model divergence.
Description of a tool/method developed by the author (LLMbench); claim about the tool's features as stated in the abstract; no implementation details or evaluation sample sizes provided in the abstract.
Public examples referenced include the reported PocketOS and Replit agentic database-deletion incidents and Moffatt v. Air Canada as an adjudicated output/reliance case.
The paper cites specific public incidents and a legal case as examples supporting its discussion.
The paper makes three contributions: it defines the AI-specific reconstruction problem, operationalizes that problem through CER, and specifies claim-grade evidence for AI reconstruction.
Author-stated contributions in the paper; descriptive of the paper's goals and deliverables.
The paper introduces CER, a use-case-level diagnostic for AI residual risk transfer: C (control boundary) asks whether the system had an enforceable operating envelope; E (evidence reconstruction) asks whether the system state and causal chain can be reconstructed from retained artifacts; R (insurance response) asks whether the reconstructed loss is insured (coverage available and placed, and proof needed to support claim recovery).
Framework introduction and operationalization described in the paper; presented as the paper's primary methodological contribution.
The paper addresses losses in which the insured's AI system is in the causal chain, including externally triggered failures such as prompt injection, retrieval-augmented generation (RAG) poisoning, malicious tool output, credential misuse, and data poisoning.
Scope statement in the paper listing specific failure modes; descriptive rather than empirical.
The relevant question for such losses is not only what loss occurred, but what the system was allowed to do, what it actually did, and whether that reconstructed loss can support insurance claim recovery.
Conceptual framing provided in the paper; presented as the diagnostic/analytic focus rather than backed by empirical data in the excerpt.
AI losses that arise through an insured organization's generative or agentic AI system require state reconstruction, not merely event reconstruction, because the relevant state changes as the system reasons, retrieves, calls tools, and acts.
Argument presented in the paper as a conceptual/theoretical claim about the nature of AI-system-caused losses; no empirical sample or quantitative study reported in the excerpt.
The future of agentic-AI insurance lies not in a single monoline product but in a layered ecosystem of complementary coverages supported by improved governance, transparency, telemetry, and regulatory clarity.
Analytic conclusion/recommendation based on the paper's risk taxonomy, actuarial framework, and parallels to cyber insurance; forward-looking synthesis rather than empirical causal evidence.
A coordinated insurance architecture integrating cyber, technology errors and omissions, product liability, performance-warranty, and affirmative AI-liability coverages with explicit allocation mechanisms and dedicated AI aggregates is proposed.
Design proposal in the paper detailing a layered insurance architecture combining multiple coverages and allocation mechanisms; conceptual design not empirically tested.
The paper proposes an actuarial framework based on exposure assessment, scenario analysis, dependency mapping, and accumulation-risk management, drawing parallels to the evolution of cyber insurance.
Proposed actuarial approach described in the paper, invoking methods like scenario analysis and dependency mapping and analogizing to cyber insurance development; methodological proposal without empirical validation.
The paper develops a framework for understanding underwriting, pricing, reinsurance, and product-design implications for agentic-AI insurance.
Methodological contribution stated in the paper: proposed actuarial/underwriting framework (exposure assessment, scenario analysis, dependency mapping, accumulation-risk management); conceptual development rather than empirical validation.
The Talent pillar exerts a significant positive effect on tourism’s GDP share with a one-year lag.
Lagged specification (one-year lag) in fixed-effects panel models on 33 countries (2017–2023); reported coefficient β = 0.183, p = 0.025.
The Policy and Governance pillar is a significant positive driver of tourism’s GDP share.
Pillar decomposition with fixed-effects estimation on panel data (33 countries, 2017–2023); reported coefficient β = 0.353, p = 0.037; result robust to alternative SE and two-way fixed effects.
The AI-related R&D pillar is a significant positive driver of tourism’s GDP share.
Pillar decomposition using fixed-effects models on the same 33-country panel (2017–2023); reported coefficient β = 1.811, p = 0.005; effect robust to alternative standard errors and two-way fixed effects.
Journalists and editors exercise bounded and situational agency through local adaptation, self-training, and development of ethical guardrails that institutionalise responsible AI use.
Based on in-depth interviews with newsroom staff (journalists, editors, technical personnel) at Al-Masry Al-Youm; qualitative accounts of local practices such as self-training and the creation of internal ethical rules. Sample size not reported in the excerpt.
The synthesized mixed-objective program retains most of the profit-oriented baseline's funds.
Reported comparison in simulation between the synthesized program and a profit-oriented baseline showing the synthesized program keeps most of the baseline funds while reducing gaming behaviors.
The synthesized mixed-objective program halves rejection.
Results from the LLM-guided evolutionary search experiment reported in the paper: the synthesized program reduces rejection by half in the simulation.
LLM-guided evolutionary code search synthesizes an inspectable mixed-objective program that eliminates up-coding.
Experiment using LLM-guided evolutionary search over the rule-program space within Medi-Sim; the synthesized program reportedly eliminates up-coding behavior in the simulation.
A single audit lever exposes pressure migration: closing the coding channel more than doubles low-complexity selection.
Targeted simulation experiment in Medi-Sim where an audit intervention closes the coding channel; reported effect is >2x increase in low-complexity patient selection.
An incentive sweep recovers classical health-economics findings as adjacent regimes -- up-coding and low-complexity-patient selection under profit pressure.
Simulation experiments (an 'incentive sweep') run in Medi-Sim showing regimes with up-coding and selection of low-complexity patients when profit incentives are increased.
We recast hospital mechanism design as program synthesis for language models: typed, inspectable rule programs are executed and scored by Medi-Sim, a multi-agent simulator with five strategic provider channels (coding, selection, delay, effort, triage).
Implementation and description in the paper: development of Medi-Sim simulator with five provider channels and execution/scoring of typed rule programs.
By bridging established knowledge with emerging governance challenges, this study advances a more comprehensive understanding of platform governance and outlines future research avenues related to technological change, dynamic capabilities, and ecosystem perception.
Authors' stated contribution based on their integrative framework and literature synthesis of 644 publications.
The paper proposes a research agenda that examines how emerging technologies, including algorithmic governance, generative AI, and agentic systems, are reshaping governance practices.
Paper's concluding/prospective section proposing future research directions; conceptual proposal rather than empirical test.
The identified governance mechanisms foster innovation in platform ecosystems.
Claim based on the paper's integrative synthesis of 644 publications indicating governance's role in fostering innovation.
The identified governance mechanisms ensure quality in platform ecosystems.
Argument and synthesis from the systematic literature review of 644 publications as presented in the paper's framework.
The identified governance mechanisms (incentives, control, boundary resources) enable platform owners to coordinate value creation.
Argument based on the integrative framework derived from the systematic literature review (644 publications).
There are three core types of governance mechanisms that enable platform owners to coordinate value creation, ensure quality, and foster innovation: incentives, control, and boundary resources.
Synthesis and classification resulting from the systematic literature review of 644 publications, producing an integrative framework that identifies the three mechanism types.
This study conducts a systematic literature review of 644 publications to synthesize the governance landscape and develop an integrative framework.
Methodological statement from the paper reporting the authors performed a systematic literature review analyzing 644 publications.
Platform owners orchestrate complementor participation through governance mechanisms.
Synthesis and conceptual argument based on the systematic literature review of 644 publications.
Digital platform ecosystems rely on loosely coupled complementors to jointly create value with platform owners.
Synthesis of prior literature via the paper's systematic literature review (644 publications); conceptual framing in the literature on platform ecosystems.
Artificial intelligence (AI) increasingly participates in strategic decision-making, challenging leadership theories that assume human agency at the top of organizations.
Concept-centric literature review integrating management and information systems (IS) research; theoretical synthesis of prior empirical and conceptual studies (no primary empirical sample reported).
Dijital platformlar insan deneyimini veriye dönüştürerek ekonomik değere tahvil eden yeni bir rejim (gözetim kapitalizmi) kurmuştur.
Teorik ve kavramsal analiz; çalışma Zuboff'un gözetim kapitalizmi yaklaşımına atıf yapmaktadır. No empirical sample or quantitative evidence reported.
The field can be organized around an integrated decision-system framework consisting of five connected constructs—delegation frontier, reliance wedge, decision-useful XAI, meaningful oversight, and reflexive AI loop—to support cumulative research on investment, trading, credit, asset management, risk, compliance, and financial regulation.
Proposal of a conceptual framework grounded in the paper’s integrative literature review (no empirical validation or sample size reported in the abstract).
The review integrates evidence on methods, data, scenarios, explainability, trust, governance, financial large language models (FinLLMs), and agentic finance.
Descriptive claim about the scope of this paper’s literature synthesis (the review itself; content-based rather than empirical).
The central question is moving from model performance to decision architecture: how authority, oversight, and accountability should be allocated across financial workflows.
Argument based on synthesis of prior literature across relevant fields (conceptual review; no single empirical study or sample size reported).
AI is moving from a predictive tool to a component of human–AI hybrid financial decision systems.
Integrative conceptual literature review synthesizing work across finance, management, human–computer interaction (HCI), and AI (no primary empirical sample reported).
The benchmark is publicly available at: https://github.com/ant-research/meta-agent-challenge.
Statement of public release and URL provided in the paper.
MAC provides a rigorous, open-source benchmark for autonomous AI research and development and offers an empirical proxy for evaluating recursive self-improvement.
Claim about the utility and intended purpose of the released benchmark; supported by the benchmark's design and experiments described in the paper.
The few meta-agents that do match human-engineered baselines are dominated by proprietary frontier models.
Experimental observations reported in the paper indicating that successful meta-agents rely on proprietary frontier models; details (counts, model names) not provided in abstract.
To ensure evaluation integrity, the framework is secured by multi-layer defenses against reward hacking.
Methodological claim in paper about security measures implemented in the benchmark.
In MAC a code agent (the meta-agent) is given a sandboxed environment, an evaluation API, and a time limitation to iteratively program an agent artifact that maximizes performance on a held-out test set across five domains.
Method description of the benchmark setup; specification includes 'held-out test set across five domains'.
We introduce the Meta-Agent Challenge (MAC), an evaluation framework designed to test the capacity of frontier models for autonomous agent development.
Paper contribution: description of a new evaluation framework (methodological introduction).
Computable static rules raise signal boundary mass more sharply than ambiguous static rules (0.403 versus 0.281).
Same ABM/RL simulation described in the paper (see run counts and 2,880,000-row firm-period panel referenced in abstract).
Computable static rules raise conduct boundary mass relative to ambiguous static rules (0.411 versus 0.367).
Agent-based reinforcement-learning (ABM/RL) simulation reported in paper; results summarized across runs including a 2,880,000-row firm-period panel and multiple scenario/sweep designs (150 seed-level scenario runs, 378 common-random-number computability-sweep runs, 288 Latin-hypercube global-design runs).
AI-flagged complaints are disproportionately associated with first-time filers rather than repeat filers.
Linking complaint AI-flag status to filer metadata indicating prior filing history; reported disproportionate association with first-time filers.
AI-flagged complaints are more citation-dense.
Comparison of citation counts/density between AI-flagged complaints and other complaints using complaint text metadata.
Against a threshold calibrated to the pre-GenAI baseline, the net AI-flagged share is 13.9% of post-GenAI non-form complaints.
Application of the stylometric AI-consistent drafting measure with calibration to pre-GenAI baseline; reported net share for post-GenAI non-form complaints.