Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Conversational agents are increasingly integrated into the most private and intimate aspects of users' lives, from discussions of mental health to financial decisions.
Asserted as descriptive background in the paper (position/argumentative claim); examples provided (mental health, financial decisions); no empirical study or sample size reported in the excerpt.
Audit outputs can be leveraged as red-teaming inputs to stress-test fairness robustness and strengthen AI governance through improved data quality and oversight (proposed intervention).
Proposed methodological/policy recommendation in the paper (proposal, not evaluated empirically in the excerpt).
AI-enabled hiring systems are widely adopted.
Statement in paper (background claim); no empirical sample or citation provided in the excerpt.
The study advances an integrative framework of sustainable AI governance emphasizing regulatory adaptability, institutional coordination, and ethical oversight as mechanisms for aligning AI innovation with long-term financial stability and sustainability objectives, and offers policy-relevant guidance for regulators and financial institutions.
Study conclusion reported in the abstract describing the proposed integrative framework and its policy relevance; based on the study's comparative mixed-methods analysis.
AI-enabled financial innovation is associated with improvements in risk assessment capabilities.
Comparative institutional analysis and integration of secondary quantitative indicators with qualitative documentary evidence across China, the United States, and the United Kingdom (2022–2025) as described in the abstract.
AI-enabled financial innovation is associated with improvements in ESG integration.
Same comparative mixed-methods approach across China, the United States, and the United Kingdom (2022–2025) using secondary quantitative indicators and qualitative documentary evidence, reported in the abstract.
AI-enabled financial innovation is associated with improvements in market efficiency.
Comparative mixed-methods analysis (comparative institutional analysis) across leading financial systems in China, the United States, and the United Kingdom (2022–2025), integrating secondary quantitative indicators with qualitative documentary evidence as reported in the study abstract.
Managers can make both (Agentic Technical Debt and Stochastic Tax) visible through lightweight dashboards and governance controls.
Prescriptive/recommendation in the paper; authors state they 'outline' approaches for managers to surface these concepts using dashboards and governance controls. No empirical evaluation or case study evidence reported in the provided excerpt.
Aggregators and niche specialists employ more open governance and sourcing logics that foster innovation, specialization, and ecosystem diversity.
Presented as a comparative finding from the taxonomy and qualitative examination of non-hyperscaler ML platform providers; supports drawn from conceptual analysis and examples in the paper rather than quantitative measures (no sample size reported in abstract).
Ongoing efforts of the initiative aim to incorporate benchmarks that address concerns about bias by considering alternative perspectives and human centered use cases.
Statement of planned/ongoing work in the paper regarding future benchmark inclusion to address bias and human-centered use cases; no empirical results provided.
Implemented tests include causal translation, model iteration, causal reasoning, conformance, model behavior explanation, suggested model building steps, and suggested model fixes.
Specific list of implemented test categories provided in the paper; descriptive/reporting evidence from the initiative's work.
Tests for several distinct categories of evaluation have been implemented and applied to AI tools that support qualitative model building, quantitative model building, and model discussion.
Paper reports that a set of tests have been implemented and applied to AI tools across qualitative and quantitative modeling and discussion; no sample sizes or numeric evaluation results provided in the excerpt.
A steering group focuses on prioritizing potential benchmarks, while a technical group focuses on implementing the benchmarks in the form of automated tests.
Organizational description in the paper specifying roles (steering group and technical group); no quantitative evaluation reported.
The open source sd ai project hosted by the initiative establishes transparency and enables contributions to be shared broadly.
Descriptive statement about the open-source project hosted by the initiative; no empirical measures of transparency or contribution sharing provided.
The initiative uses open digital and organizational infrastructure to collaboratively evaluate AI tools for modeling and simulation.
Descriptive claim in the paper about organizational approach (open infrastructure and collaborative evaluation); no empirical testing or sample size reported.
The BEAMS Initiative aims to guide the development of AI tools for modeling and simulation toward forms that are responsible and ethical by establishing benchmarks for human centered modeling and simulation practices.
Descriptive statement about the Initiative's stated aims and purpose in the paper; organizational description rather than empirical evidence.
Tools that can automate aspects of modeling practice must complement human expertise, not replace it.
Normative claim made in the paper (argument about human-centered design); no empirical evidence or sample size reported.
AI tools to support real world decision making must be able to build simulation models that inform their recommendations and render them interpretable.
Normative assertion in the paper (position statement / requirement); no empirical study or sample size reported.
The agentic future is not predetermined; leaders must both skate to where the puck is going and actively steer it toward a good place, ensuring innovation delivers welfare gains felt by businesses and consumers around the world.
Normative recommendation offered by the authors; based on conceptual argument and interpretation of the framework rather than empirical testing in the excerpt.
These complementary investments produce the familiar 'productivity J-curve' of general-purpose technologies.
Stated as an economic analogy/claim drawing on general-purpose technology literature; presented as an asserted mechanism rather than shown with new empirical estimates in the excerpt.
The most consequential disruption resides in the third stage (Reconstruction) where workflows and markets are rebuilt around delegation, machine-to-machine interaction, continuous monitoring, and auditable constraints.
Theoretical claim in the paper backed by conceptual reasoning and illustrative sector examples; no quantitative evidence provided in the excerpt.
Because reputation-based, ex post sanctions cannot be relied upon for dissociative agents, governance should shift to observability-based, ex ante, constitutive, protocol-based behavioral harnesses.
Prescriptive recommendation derived from the theoretical critique of identity-based governance; paper proposes observability- and protocol-focused alternatives but does not present empirical tests or trials.
Reputation mechanisms function both as social signals and as corrective feedback that sustain an equilibrium of trustworthy behavior, presuming a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility.
Conceptual/theoretical argument presented in the paper drawing on reputation theory and social signaling; no empirical sample or quantitative data reported.
Restoring honest billing will require verification that ties reported token counts to evidence the provider does not control, such as trusted execution attestation, cryptographic proofs of inference, or third-party re-execution.
Policy/recommendation proposed by the authors based on their findings (argument that independent verification is necessary).
Even when the user can see the full reasoning string, tokenization ambiguity alone still allows 50.85% over-reporting below the detection threshold.
Experimental result reported in the paper showing over-reporting due solely to tokenizer ambiguity when reasoning string is visible (no sample size in excerpt).
At current frontier reasoning prices, that turns a $100 honest bill into roughly a $1,569 bill on the same query.
Numerical example/price calculation based on the reported inflation (uses current frontier reasoning prices; calculation given by the authors).
In the most permissive setting, hidden reasoning usage can be inflated by 1,469% on average without detection.
Experimental/adversarial evaluation reported in the paper showing average inflation in a permissive audit setting (no sample size for queries provided in excerpt).
We study three recent token auditing frameworks and show that a provider with ordinary commercial capabilities can systematically inflate billed token counts.
Empirical/analytical evaluation of three token-auditing frameworks studied by the authors; adversarial provider simulation/experiment (paper states three frameworks were studied).
Per-token billing is now the standard pricing model for commercial large language models (LLMs).
Author assertion about prevailing commercial pricing practices (no empirical sample or citation provided in excerpt).
We discuss implications for Information Systems (IS) design and propose future field evaluations.
Paper includes a discussion section outlining IS design implications and suggestions for future empirical/field work.
The approach preserves statistical rigour, traceability, and nuanced Persevere/Iterate decisions when accelerating experimentation.
Reported outcomes of controlled simulations and description of system design that enforces statistical procedures and logging; stated in manuscript as findings.
Logs render capabilities observable at the feature level, turning 'agentic AI' into a disciplined experimentation infrastructure rather than a generic assistant.
Implementation logs and descriptions from the Node.js instantiation reported in the paper; qualitative claim about observability and traceability at the feature level.
The Multi Agent System reduces time-to-validated-learning by roughly an order of magnitude while preserving statistical rigour, traceability, and nuanced Persevere/Iterate decisions.
Results from the controlled simulations reported in the paper (comparison between agentic multi-agent system and manual B-M-L cycles).
Controlled simulations compare agentic and manual B-M-L cycles on feature ideas.
Reported controlled simulation experiments in the paper comparing agentic (multi-agent) and manual B-M-L cycles; methodological description present in manuscript.
We instantiate them in a Node.js package instrumenting a production-grade SaaS codebase.
Implementation artifact reported in the paper (Node.js package) and description of instrumentation on a production-grade SaaS codebase.
Drawing on the Dynamic Capabilities View, we derive fifteen meta-requirements and thirty-three design principles (consolidated into seven goal-directed groups) for sensing, seizing, reconfiguring, orchestration, and governance.
Design-theory derivation reported in the paper (counts of meta-requirements and design principles are stated in the manuscript).
We propose a multi-agent artefact that operationalises the Build–Measure–Learn (B-M-L) cycle as a closed-loop control system.
Design science study described in the paper; conceptual derivation and artifact instantiation (Node.js package) reported in the manuscript.
The review synthesizes fragmented evidence and links AI use to SME performance improvements, while outlining directions for future research on sustainable AI adoption.
Self-description of the article's contribution based on the authors' focused literature review (2016-2024).
Cloud-based AI solutions, targeted employee training, and explainable AI are identified strategies to overcome AI adoption challenges in SMEs.
Recommendations synthesized from the reviewed literature (2016-2024); presented as enabling strategies rather than results from a single empirical intervention).
AI supports more data-driven financial planning for SMEs.
Identified across the reviewed empirical and conceptual studies in the 2016-2024 literature (synthesis rather than new empirical estimate).
AI enables real-time fraud detection for SMEs.
Synthesis of empirical and conceptual literature reporting AI applications in fraud detection (review-level claim; no aggregated quantitative effect provided).
AI enables more accurate credit risk assessment for SMEs.
Review synthesizing studies on credit scoring and risk assessment within the 2016-2024 corpus (no single pooled sample size or unified effect estimate provided).
AI improves cash flow and financial forecasting for SMEs.
Synthesis of empirical studies and conceptual papers in the 2016-2024 literature reviewed (review article does not report primary sample sizes/effect estimates).
AI offers strong potential to enhance the financial stability and growth of SMEs when supported by suitable organizational capacities and governance.
Focused review of high-quality research (2016-2024) synthesizing empirical and conceptual studies on AI applications in SME finance (no single-sample primary data reported).
Regulatory divergence across the European Union, United States, and China has moved AI governance from a compliance function to a strategy-shaping constraint.
Framing statement in the paper's introduction arguing that cross-jurisdiction regulatory divergence elevates governance to a strategic constraint; presented as contextual motivation rather than tested causal finding.
Firms with higher governance exposure and AI maturity exhibit more advanced, multi-dimensional adaptation across regulatory environments.
Paper conclusion synthesizing regression and index results linking governance exposure and AI maturity to adaptation intensity and configuration.
Governance exposure significantly predicted all adaptation indices (β = 0.35–0.47, R² = 0.29–0.41, all p ≤ 0.004).
Reported regression results: 'Regression showed governance exposure significantly predicted all adaptation indices (β = 0.35–0.47, R² = 0.29–0.41, all p ≤ 0.004)'.
Compartmentalization scores were highest for tri-jurisdictional organizations (0.82 ± 0.05).
Reported composite-index results: 'compartmentalization scores were highest for tri-jurisdictional organizations (0.82 ± 0.05).'
Ethical signaling intensity was greatest in tech firms (0.82 ± 0.04).
Reported composite-index results by sector: 'Ethical signaling intensity was greatest in tech firms (0.82 ± 0.04).'
Modularity peaked in multinational corporations (0.86 ± 0.04).
Reported composite-index results: 'modularity peaked in multinational corporations (0.86 ± 0.04).'