Evidence (7198 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
8921 claims
Filter claims →
Productivity
8002 claims
Filter claims →
Governance
7198 claims
Filtered →
Human-AI Collaboration
6864 claims
Filter claims →
Org Design
4398 claims
Filter claims →
Innovation
4286 claims
Filter claims →
Labor Markets
3629 claims
Filter claims →
Skills & Training
3001 claims
Filter claims →
Inequality
2141 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 790 | 208 | 103 | 950 | 2117 |
| Governance & Regulation | 869 | 411 | 195 | 126 | 1630 |
| Organizational Efficiency | 817 | 202 | 126 | 87 | 1243 |
| Technology Adoption Rate | 675 | 258 | 128 | 106 | 1178 |
| Research Productivity | 462 | 138 | 64 | 347 | 1023 |
| Output Quality | 501 | 193 | 61 | 52 | 807 |
| Decision Quality | 346 | 180 | 84 | 51 | 668 |
| AI Safety & Ethics | 235 | 285 | 70 | 34 | 630 |
| Firm Productivity | 452 | 58 | 91 | 20 | 627 |
| Market Structure | 184 | 171 | 123 | 24 | 507 |
| Task Allocation | 221 | 65 | 76 | 34 | 401 |
| Skill Acquisition | 176 | 62 | 62 | 17 | 317 |
| Innovation Output | 207 | 28 | 48 | 18 | 303 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Employment Level | 105 | 56 | 108 | 13 | 284 |
| Consumer Welfare | 121 | 67 | 45 | 11 | 244 |
| Firm Revenue | 160 | 50 | 28 | 4 | 242 |
| Task Completion Time | 182 | 33 | 10 | 13 | 239 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 94 | 73 | 23 | 12 | 202 |
| Error Rate | 76 | 98 | 11 | 4 | 189 |
| Regulatory Compliance | 81 | 73 | 17 | 7 | 178 |
| Automation Exposure | 61 | 59 | 26 | 14 | 163 |
| Training Effectiveness | 97 | 21 | 14 | 19 | 153 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 21 | 1 | 117 |
| Hiring & Recruitment | 52 | 8 | 8 | 3 | 71 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 49 | 6 | 1 | 61 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 15 | 14 | — | 3 | 32 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Structured upskilling and precise recourse mechanisms are associated with higher confidence, productivity, and clearer sustainability pathways.
Observed association in multi-case qualitative data: interviews, staff/manager surveys, and policy documents; triangulated through thematic coding and cross-case synthesis. (Sample size not reported.)
A tight workflow fit that minimises cognitive overhead at the decision point accelerates legitimate use and strengthens links to emissions monitoring and predictive-maintenance outcomes.
Synthesised from interviews, Likert-scale surveys of technical staff and managers, and internal workflow/policy documents across multiple cases in the energy sector. (Sample size not reported.)
Communicative governance — e.g. model cards, bias tests, validation reports, and explicit appeal rights — earns trust, curbs shadow workarounds, and improves safety culture.
Reported from thematic coding of interviews, surveys of staff and managers, and documentary evidence across multiple cases; triangulation claimed. (Sample size not reported.)
Broad-based capability building beyond specialist teams prevents benefits from concentrating in expert enclaves and reduces brittle scale.
Derived from cross-case thematic synthesis of interviews, Likert surveys of mid-level managers and technical staff, and internal policy/strategy document analysis (multi-case qualitative evidence). (Sample size not reported.)
Three reinforcing levers shape adoption outcomes: (1) broad-based capability building beyond specialist teams, (2) communicative governance that couples transparency with contestability, and (3) a tight workflow fit that minimises cognitive overhead at the decision point.
Qualitative, multi-case design triangulating a semi-structured interview with a senior manager, Likert-scale surveys of mid-level managers and technical staff, and analysis of internal policies and strategy documents; thematic coding with intercoder reliability and cross-case synthesis. (Sample size not reported.)
Managers should view AI as a strategic tool to enhance SCR (not only as cost-saving), and focus on optimizing resource allocation, increasing R&D investment, and enhancing organizational agility to amplify AI's resilience effects.
Authors' practical recommendations derived from empirical findings and mechanism analysis.
The paper provides empirical evidence that policy tools such as the National AI Innovation and Application Pioneer Zone can help enhance industrial and supply chain security (i.e., SCR).
Analysis was based on the policy of the National AI Innovation and Application Pioneer Zone and authors state their results provide empirical evidence supportive of such policies.
AI's impact on SCR is more significant in enterprises with lower levels of pollution.
Heterogeneity analysis reported by the authors that splits sample by pollution level.
AI's impact on SCR is more significant in private enterprises (versus non-private).
Heterogeneity analysis by ownership type reported in the paper.
AI's impact on SCR is more significant in large-scale enterprises.
Heterogeneity analysis across firm-size categories reported by the authors.
Enterprise agility significantly moderates the AI–SCR relationship: AI's positive effect on SCR is more pronounced in firms with higher agility.
Moderation analysis reported in the paper (moderation models applied to firm-level data).
AI boosts SCR by promoting continuous technological innovation.
Mediation analysis in the paper indicates continuous technological innovation (e.g., R&D/innovation indicators) is a channel through which AI enhances resilience.
AI mainly boosts SCR by improving total factor productivity (TFP).
Mechanism (mediation) analysis reported in the paper using firm-level data; authors identify TFP improvement as a key mediating channel.
The positive effect of AI on SCR holds after multiple robustness checks.
Authors state that the main conclusion remains valid after conducting multiple unspecified robustness checks on the empirical sample (multi-period DID).
AI significantly enhances supply chain resilience (SCR) in manufacturing firms.
Empirical analysis of A-share listed manufacturing companies (2011–2023) using a multi-period difference-in-differences (DID) model; authors report the finding and state it remains after robustness checks.
Advancing meaningful fairness or accountability in AI requires: (1) recognizing when and how decoys serve as a distraction, and (2) grappling directly with the material political economy of the Project of AI.
Normative prescription based on the paper's conceptual analysis and literature synthesis; recommended two-part approach rather than empirically validated intervention. No sample size or experimental validation provided.
Policy proposals including universal basic income, portable benefits, retraining programs, and AI taxation are viable mechanisms to manage the socio-economic transition associated with AI, and the paper assesses these proposals.
Paper states it evaluates these policy proposals drawing on empirical studies, reports, and historical analysis; the abstract does not report empirical tests or effectiveness estimates for these policies.
The distributional consequences of AI adoption will be shaped primarily by institutional factors—including labor market regulation, education policy, and corporate governance structures—rather than by the technology itself.
Argument based on a literature review drawing on recent empirical studies, industry reports, and historical analyses of past technological transitions; no new empirical estimate or sample size provided in the abstract.
AI differs from previous automation technologies in its capacity to perform cognitive and creative tasks.
Paper's conceptual claim supported by references to recent empirical studies and industry reports on generative AI and large language models; no specific sample size or quantified effect reported in the abstract.
These results suggest that LinuxArena has meaningful headroom for both attackers and defenders, making it a strong testbed for developing and evaluating future control protocols.
Authors synthesize results from sabotage evaluations, monitor evaluations, and the LaStraj human-attack dataset to conclude there is room for improvement on both attacker and defender sides; this is presented as an implication/recommendation rather than a strictly measured outcome.
LinuxArena contains 184 side tasks representing safety failures such as data exfiltration and backdooring.
Authors report the number of side tasks and describe their nature (safety failures) in the dataset/control setting documentation.
LinuxArena contains 1,671 main tasks representing legitimate software engineering work.
Authors report the number of main tasks when describing the contents of LinuxArena.
LinuxArena contains 20 environments.
Authors report constructing LinuxArena and state the number of environments explicitly in the paper's description of the dataset/control setting.
Drawing on Moral Foundations Theory and a multi-stakeholder perspective, moral (mis)alignment matters for the meaningful integration of AI in sensitive contexts.
Paper's theoretical framing and normative claim (method: conceptual synthesis using Moral Foundations Theory and multi-stakeholder argumentation; no empirical sample or quantitative results reported in the supplied text).
Moral alignment is defined as the perceived congruence between the values embedded in an AI system's decision logic and the moral intuitions of stakeholders.
Explicit definitional statement in the paper (conceptual definition; no empirical measurement reported in the supplied text).
Moral alignment may be a more fundamental dimension of human-AI decision-making than functional or behavioral alignment.
Paper's central argumentative claim (theoretical proposition building on conceptual reasoning and prior theory; no empirical evidence or sample size reported in the supplied text).
In high-stakes AI-supported decisions, considerations are not purely technical but involve moral judgments about fairness, responsibility, and harm.
Stated as a conceptual assertion in the paper's framing/abstract; presented as an observation building on prior literature (no empirical method or sample size reported in the supplied text).
The model's contribution lies in integrating four interdependent governance layers—technical, organizational, workforce, and regulatory—within a single labor-market framework.
Paper's stated conceptual contribution describing the four-layer governance model derived from the evidence map and synthesis.
Based on an evidence map of the included studies, we propose a hybrid governance model combining technical and organizational audits, inclusive upskilling/reskilling, participatory regulation, and responsible HR policies to align AI innovation with decent and inclusive work.
Conceptual proposal grounded in the paper's evidence map and qualitative synthesis of the 19 studies; model components explicitly listed in the text.
The evidence indicates that AI can support inclusion through assistive technologies and improved matching in labor-market settings.
Synthesis claim based on thematic analysis of the 19 included peer-reviewed studies (qualitative evidence across the corpus pointing to assistive technologies and improved matching as inclusion-supporting mechanisms).
Fairness should be evaluated at the system level (the interacting agents) rather than solely at the level of individual models, because fairness can be an emergent, procedural property of decentralized agent interaction.
Conceptual framing supported by the triage experiments showing emergent fairness properties from agent interaction that were not present at the single-agent level.
Aligned agents partially moderate bias through contestation rather than override, acting as corrective patches that restore access for marginalized groups without fully converting a biased counterpart.
Behavioral observations from the triage negotiation trials where aligned agents contested allocations proposed by biased/un-aligned agents and adjusted final allocations in ways that increased access for marginalized groups while not fully changing the adversarial agent's preferences.
Neither agent's allocation is ethically adequate in isolation, yet their joint final allocation can satisfy fairness criteria that neither would have reached alone.
Comparative analysis of individual-agent allocations versus joint allocations after three rounds of negotiation in the hospital triage simulation; claim based on observed differences between solitary and joint outcomes.
Fairness in language models emerges through interaction and exchange among agents, rather than being solely a property of a single, centrally optimized model.
Controlled simulation using a hospital triage framework in which two agents negotiate over three structured debate rounds; one agent is aligned via retrieval-augmented generation (RAG) and the other is unaligned or adversarially prompted. Observed final allocations and negotiation dynamics used to support the claim.
Digital financial literacy and proper managerial competence are critical for a proper transition of AI outputs into strategic decisions, resulting in a robust governance and regulatory framework for sustainable development (Schrank & Kijkasiwat, 2025, p. 202; Tandilino et al., 2025).
Prescriptive/recommendation claim supported by citations (Schrank & Kijkasiwat, 2025; Tandilino et al., 2025); appears as a policy/managerial implication in the paper rather than an empirically tested result. No sample size or quantitative evidence in the excerpt.
Advanced AI replaces intuition-based decisions with precise and robust data, resulting in a significant increase in the firm's bargaining power during credit negotiations and enabling their access to long term capital (Hamdouni, 2025; Sanga & Aziakpono, 2023).
Assertion supported by citations (Hamdouni, 2025; Sanga & Aziakpono, 2023); framed as a causal pathway (AI -> better data-driven decisions -> increased bargaining power -> improved access to long-term credit). The excerpt does not describe sample size, empirical design, or quantitative estimates.
AI is transforming small business funding by optimizing their internal resources and transitioning the firms from these immediate and short-term loans to long-term capital (Pérez-Campdesuñer et al., 2026; Wu & Liao, 2025).
Claim asserted with citations to Pérez-Campdesuñer et al. (2026) and Wu & Liao (2025); presented as a thematic/finding of the paper (likely based on literature review and RDT framing). No sample size or direct empirical method reported in the excerpt.
The approach provides a practical path toward more transparent, controllable, and accountable AI use without requiring new model architectures.
Authors' asserted benefit of the proposed interaction-layer framework; no empirical demonstration that transparency, control, or accountability are achieved or that no architectural changes are required in practice.
The framework enables auditable reasoning traces and supports alignment with emerging governance standards, including the EU AI Act and ISO/IEC 42001.
Stated compliance/alignment claim linking the proposed interaction-layer approach to existing regulatory standards; no compliance testing or audit examples reported.
This reframes the question from whether the model can think to whether the human-AI system can reason.
Conceptual reframing stated in the paper; no empirical evidence required as it is a change of perspective.
We introduce 'The Architect's Pen' as a practical method where the human uses the model as an external medium for structured reflection by embedding phases of articulation, critique, and revision into human-AI interaction.
Method description / practical proposal included in the paper; no experimental evaluation, user study, or quantitative validation reported.
This perspective emphasizes collaborative intelligence, combining human judgment and contextual understanding with machine speed, memory, and associative capacity.
Theoretical claim about complementary strengths of humans and models within the proposed framework; presented without empirical tests.
Building on recent work on 'System-2' learning, reflective reasoning can be relocated to the interaction layer and framed as a cognitive protocol that can be structured, measured, and governed using existing systems.
Conceptual extension of prior literature ('System-2' learning) into an interaction-layer protocol; no empirical protocol testing or measurement evidence provided.
Reasoning should be treated as a relational process distributed between human and model rather than an internal capability of either.
Methodological proposal / theoretical framing presented by the authors; no empirical validation reported.
Large language models have advanced rapidly, from pattern recognition to emerging forms of reasoning.
Stated as an observational claim in the paper's introduction; no empirical evaluation or dataset provided.
This approach aligns with emerging compliance expectations, including the EU AI Act and ISO/IEC 42001, by making reasoning processes traceable under real conditions of use.
Claim of regulatory alignment made by the authors; presented as interpretive/legal/standards-relevant argument rather than supported by empirical analysis or legal review data in this excerpt.
Stabilising interaction makes uncertainty and drift visible before enforcement is applied, enabling more precise capability governance.
Normative/operational claim in the paper about the anticipated effect of the proposed interventions; no empirical test or measurement reported in this excerpt.
Together, these layers form a missing operational substrate for governance by increasing signal-to-noise at the point of use.
Argumentative claim from the paper proposing that the combined interventions improve the information available at the decision point; no empirical validation or sample size provided here.
This paper is the first in a five-paper research series on stabilising human-AI reasoning that proposes a two-layer approach: Parts II–IV introduce human-side mechanisms (uncertainty cues, conflict surfacing, auditable reasoning traces) and Part V develops a model-side Epistemic Control Loop (ECL) that detects instability and modulates generation.
Descriptive claim about the structure and scope of the paper series as stated by the authors; internal to the publication (no external dataset).
Large language models are increasingly integrated into decision-making in areas such as healthcare, law, finance, engineering, and government.
Statement in paper describing observed/adoptive trend; no empirical dataset, sample size, or quantitative analysis reported in the text.