Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
The compendium’s findings and recommendations are based on a forensic audit of approximately 4,200 specialized texts covering doctrine, jurisprudence, regulation and technical literature.
Stated methodological claim in the compendium: forensic corpus audit of ~4,200 texts (sample size reported).
The evidence base is qualitative: the study uses conceptual framework synthesis, comparative analysis of multi-sector implementations, and case examples rather than randomized or large-sample empirical evaluation.
Methods and limitations section of the paper explicitly describing the evidence base and methods (qualitative synthesis, pattern extraction, cross-case lessons).
The paper presents a deployment pattern intended to be adapted by sector and regulatory context rather than a one-size-fits-all blueprint.
Explicit statement in the paper and the described pattern design; based on qualitative pattern extraction and prescriptive guidance.
Partial least squares structural equation modeling (PLS-SEM) was used to test hypothesized direct, mediated, and moderated paths.
Methods/analysis section states PLS-SEM was the statistical approach to estimate paths, mediation, and moderation effects.
The study employed a 2 × 2 between-subjects experimental design manipulating (1) identity disclosure (transparent vs. nondisclosed) and (2) conversational tone (empathetic/personalized vs. generic).
Explicit description of experimental factors and design in the methods (2 × 2 between-subjects).
Stimuli (chatbot dialogues) were standardized and pretested using a large-language-model (LLM) workflow to ensure consistent experimental stimuli across conditions.
Methods section describing stimuli creation: LLM-generated dialogues were produced and pretested to standardize messages across the 2 × 2 conditions.
Quasi-experimental designs (difference-in-differences, instrumental variables, event studies) and panel regressions are useful methods for identifying causal effects of AI adoption where plausibly exogenous variation exists.
Methodological summary in the paper listing common empirical strategies used in the literature to estimate causal impacts of technology adoption.
Current research is limited by measurement challenges in capturing AI capabilities and firm-level adoption, and by a lack of longitudinal worker-firm data and causal identification in many settings.
Explicit limitations noted by the paper: gaps in task measures, scarce longitudinal linked datasets, and methodological challenges in causal inference.
This paper's approach is qualitative and based on secondary literature synthesis; it does not collect primary survey, experimental, or administrative data.
Explicit statement in the Data & Methods section of the paper.
Key empirical gaps remain: better measurement of K_T (AI/software capital), more granular matched employer‑employee and wealth data, and improved estimates of task-substitution elasticities are required to precisely quantify incidence and policy impacts.
Authors’ stated research agenda and limitations section, including sensitivity analyses showing outcome variation with parameter choices and measurement uncertainty.
We conduct a pre-specified algorithm audit using a randomized choice-based conjoint: across personas, prompt templates, and twelve open-weight and proprietary models, assistants choose among five hotels whose guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position are independently randomized.
Statement of experimental design in the paper: a pre-specified randomized choice-based conjoint with independent randomization of listed hotel attributes across five hotels, varied personas, prompt templates, and twelve open-weight and proprietary LLMs/models.
Models are prompted to assess profiles along dimensions of social acceptance, marital stability, and cultural compatibility.
Experimental procedure: prompts asked models to rate profiles on the three named dimensions.
We evaluate five LLM families (GPT, Gemini, Llama, Qwen, and BharatGPT).
Methods: models enumerated as the LLM families evaluated in the audit.
We vary caste identity across Brahmin, Kshatriya, Vaishya, Shudra, and Dalit, and income across five buckets.
Experimental design described: caste identity explicitly manipulated across five named caste categories; income varied across five buckets.
We conduct a controlled audit of caste bias in LLM-mediated matchmaking evaluations using real-world matrimonial profiles.
Described methodology in the paper: a controlled audit using real-world matrimonial profiles to probe LLMs for caste bias.
The principal contribution of the paper is a practical framework for extending established model risk management concepts to autonomous AI systems and providing a rigorous foundation for their validation, governance, and monitoring.
Authors' stated contribution in the paper summarizing methodological and conceptual advances.
Large language models (LLMs) can be formalized as approximate Bayesian filtering operators within the proposed framework.
Theoretical formalization provided in the paper mapping LLM behavior to approximate Bayesian filtering.
The paper proposes a model validation framework for agentic AI based on Partially Observable Markov Decision Processes (POMDPs) that decomposes autonomous decision making into information, beliefs, forecasts, actions, and utility, allowing each component to be validated independently.
Methodological contribution: formal POMDP-based framework described in the paper.
TMT behavioral integration strengthens both stages of the indirect path (generative AI -> green dynamic capabilities, and green dynamic capabilities -> green innovation), reinforcing the overall mediated mechanism.
Moderated-mediation analyses reported in the paper indicating TMT behavioral integration positively moderates both the first-stage (AI -> green dynamic capabilities) and second-stage (green dynamic capabilities -> green innovation) effects.
Top management team (TMT) behavioral integration positively moderates the direct effect of generative AI on green innovation.
Moderation analysis in the empirical tests showing a positive interaction between generative AI application and TMT behavioral integration on the level of green innovation.
Green dynamic capabilities partially mediate the relationship between generative AI application and corporate green innovation.
Mediation analysis reported in the paper indicating a significant indirect effect of generative AI on green innovation through green dynamic capabilities (described as partial mediation).
The application of generative artificial intelligence is positively associated with corporate green innovation.
Empirical tests reported in the paper (regression/moderated-mediation analyses) on a sample of agricultural enterprises that show a positive association between generative AI use and green innovation.
The paper concludes with specific policy recommendations addressing procurement, workforce development, standards alignment, and interagency coordination to accelerate responsible AI adoption across the federal audit ecosystem.
Statement of the paper's conclusions and policy recommendations (descriptive of paper content). No empirical evaluation reported for the effectiveness of these recommendations.
Critical success factors for AI-augmented audit include executive sponsorship at the agency leadership level, dedicated cross-functional implementation teams with embedded data science competencies, iterative pilot deployments that generate performance evidence prior to enterprise rollout, and robust governance structures that maintain human judgment at consequential decision points.
Paper's recommended critical success factors based on synthesis of implementations and best-practice guidance; presented as prescriptive guidance rather than validated causal evidence.
A structured three-phase implementation approach spanning 24 to 48 months enables federal audit agencies to achieve meaningful AI augmentation of core audit functions while managing implementation risk within acceptable bounds.
Paper's proposed implementation timeline and argument (recommendation based on the paper's synthesis). No empirical test or sample size reported to validate the timeline.
The paper draws on recent advances in intelligent fraud monitoring, machine identity governance, adaptive risk scoring, and digital forensics analytics to ground its recommendations in the most current available evidence on AI audit capability development.
Paper cites and synthesizes recent technical advances and implementations in specific AI audit subdomains (literature/implementation synthesis). No sample sizes or systematic review metrics provided.
The roadmap addresses four core implementation domains: technical infrastructure and data architecture requirements; human capital and organizational change management for audit workforce transformation; governance, ethics, and risk management frameworks; and policy and standards development to enable AI-augmented oversight.
Paper's stated structure and recommendations (categorization of implementation domains). Descriptive; no quantitative evaluation reported.
The paper develops an original conceptual framework designated the AI-Augmented Audit Continuum (AIAC) to guide progressive capability development from foundational analytics to autonomous audit functions.
Paper claims and framework development (conceptual contribution). No empirical validation or sample size reported.
This paper develops a comprehensive policy and implementation roadmap for the deployment of AI-augmented audit capabilities within United States government agencies and multilateral organizations, synthesizing evidence and aligning strategies with GAO, OMB, and INTOSAI frameworks.
Statement of the paper's scope and methods (synthesis of evidence; alignment analysis with GAO, OMB, INTOSAI). This is a description of the paper's contribution rather than an empirical finding.
Artificial intelligence technologies, including machine learning, natural language processing, network analytics, and intelligent process automation, offer substantial potential to augment the analytical capacity of public audit institutions, extend audit coverage to previously inaccessible transaction populations, and accelerate detection timelines from years to days or hours.
Author's synthesis and claims in the paper; references to existing AI audit implementations across federal, state, and international contexts (literature/implementation synthesis). No specific sample size reported.
The results provide a principled, practical framework for mitigating the effects of strategic behavior in algorithmic decision-making systems.
Combination of theoretical analysis, algorithmic development, and an empirical case study as described in the abstract.
Through a real-world case study on a healthcare payments benchmark, the algorithm can guide the design of coarse policy levers in practice.
Empirical case study on a healthcare payments benchmark reported in the paper (dataset and sample size not specified in abstract).
We develop a practical algorithm for jointly choosing the feature set and the level of ridge regularization.
Algorithmic contribution described in the paper; implementation and method development (stated in abstract).
The interaction between feature selection and ridge regularization yields new insights for policy design.
Derived insights from the theoretical characterization and its implications (stated in abstract).
The paper provides a fine-grained characterization of the performance of a feature subset under optimal (ridge) regularization.
Analytical/theoretical characterization developed in the paper as described in the abstract.
From a practical perspective, the study offers a conceptual measurement framework and policy guidance for municipal decision makers seeking to improve productivity while strengthening resilience and reducing systemic risks in increasingly interconnected public governance systems.
Paper presents a conceptual measurement framework and policy recommendations derived from the integrative review and framework; asserted in discussion and implications sections.
Resilience depends on the ability of public organisations to anticipate, absorb, adapt to, and recover from AI-related disruptions while maintaining the continuity and quality of public services.
Theoretical framing (sociotechnical systems and resilience theory) supported by synthesis of reviewed empirical studies; proposed conceptual measurement framework in the paper.
Findings show that productivity gains associated with AI are strongly influenced by organisational readiness, including digital maturity, workforce capabilities, governance quality, and institutional coordination.
Synthesis of results from the systematic review of 68 empirical studies assessing productivity outcomes, methodological quality, effect sizes, and contextual factors.
The study highlights risks and opportunities of AI-related digital sovereignty dynamics and offers practical insights for organizational resilience and policy.
Derived recommendations and discussion based on findings from the empirical case study of early AI adoption in a Nordic public transportation organization (specific methods/sample size not provided).
AI adoption can work as a capability-building process enhancing worker autonomy and organizational resilience.
Interpretation of empirical findings from the case study of early AI adoption in a Nordic public transportation organization, arguing that AI adoption contributed to capability-building for workers and the organization (methods/sample size not specified).
Digital sovereignty is an ongoing negotiation between organizational governance and individual autonomy.
Findings from the empirical case study of early AI adoption in one Nordic public transportation organization; qualitative analysis leading to the assertion of negotiation dynamics between governance and autonomy (method and sample size not provided).
Digital sovereignty goals evolve across individual and organizational levels as AI is introduced into work settings.
Empirical investigation (single-case study) of early AI adoption in a Nordic public transportation organization; qualitative data from that organizational setting (method and sample size not stated in provided text).
Together these proposals constitute ten design principles for an agent-first internet that requires renegotiating the web's foundational social contract across access, economics, and content.
Synthesis/conclusion of paper; normative claim describing scope and ambition of the proposed redesign; no empirical testing reported.
Agent Text Markup Language (ATML), a four-level human supervision tier model, and a cryptographic provenance chain can counter the epistemic recursion threat.
Proposed technical/policy solution in paper combining tiered supervision (ATML) and cryptographic provenance; presented as design proposal without implementation results in provided text.
A commissioned content economy can anchor AI content production in human intentionality.
Normative proposal in paper advocating commissioned content to tie AI outputs to human intent; conceptual argument only, no empirical evidence provided.
A token-based subscription model can meter content in tokens rather than pageviews.
Policy/monetization proposal in paper recommending token-based metering; no pilot data or quantitative evaluation reported.
An intent-based tier framework should ground agent economics in the agent-as-human-proxy principle: an agent's economic obligation mirrors that of the human it represents.
Normative economic framework proposed in paper; conceptual justification provided but no empirical validation or sample.
A dual-layer architecture should serve human-readable and agent-optimized content from the same domain.
Design proposal in paper advocating serving two content layers (human and agent) on same domain; no empirical testing or rollout data presented.
Agents acting for humans should inherit equivalent access rights, governed by rate limiting and agent identification metadata in HTTP requests (analogous to browser headers).
Normative proposal in paper; design recommendation rather than an empirically validated intervention. No implementation trial or sample reported.
This study provides causal evidence on the green effects of intelligent manufacturing using a quasi-natural experiment and DID approach.
Use of pilot policy as a quasi-natural experiment applied to panel data (2011–2023) with difference-in-differences estimation claimed by the authors to identify causal effects.