Evidence (14922 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filter claims →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
On 2025 year-to-date (through 2025-08-01), the system achieved Sharpe 1.40 +/- 0.22 across 20 random seeds.
Backtest/performance claim: reported Sharpe ratio with reported uncertainty and a sample size of 20 seeds; time window specified as 2025 YTD through 2025-08-01. No further details on portfolio construction, leverage, transaction costs, or benchmark adjustment provided in the excerpt.
Regulatory sandboxes offer a flexible and innovation-friendly governance model compared to traditional command-and-control mechanisms.
Normative and comparative analysis within a law & economics framework; no empirical performance data reported in the abstract.
Comparative insights from FinTech identify the institutional design features necessary to ensure the effectiveness and resilience of regulatory sandboxes.
Comparative case-based reasoning drawing on FinTech regulatory sandbox experience (abstract does not report number or selection of cases).
AI regulatory sandboxes may correct specific government failures, including regulatory capture, rent-seeking, and knowledge gaps.
Analytical claims supported by comparative reasoning (FinTech examples) and economic analysis of government failure; no empirical testing or sample size reported in the abstract.
AI regulatory sandboxes facilitate iterative regulatory learning while promoting responsible AI innovation.
Theoretical argument using experimentalist governance concepts and law & economics reasoning; comparative insights referenced but no empirical sample detailed in the abstract.
AI regulatory sandboxes can reduce negative externalities associated with AI deployment.
Conceptual and economic analysis in the paper (no empirical quantification or sample size reported in the abstract).
AI regulatory sandboxes can mitigate information asymmetries between regulators and firms.
Analytical application of an economic analysis of law framework; theoretical argumentation rather than reported empirical measurement in the abstract.
A well-established legal framework for data privacy (e.g., PIPL) enhances the benefits of big data for corporate performance.
Inference drawn from the observed stronger positive big-data effect on firm value after PIPL implementation, as reported by the paper's moderation analysis.
Robust sensitivity tests confirm the main findings, indicating that the results are not driven by model specification or sample selection.
Paper reports multiple robustness/sensitivity checks (unspecified in summary) that the authors state produce consistent results supporting the primary conclusions.
The positive impact of big data on firm performance is strengthened following the implementation of China's Personal Information Protection Law (PIPL).
Moderation/interacted-specification analysis in the paper comparing pre- and post-PIPL periods (or interacting big-data measure with a PIPL indicator), showing a larger positive effect on firm value after PIPL implementation.
The positive effect of big data on firm value operates through improving operational efficiency and reducing costs.
Mechanism analysis reported in the paper indicating mediation/channel tests where big data adoption is associated with measures of operational efficiency and cost reductions, which in turn relate to higher firm value.
Big data application significantly improves firm value.
Results from fixed-effects regressions on the 2007–2021 panel showing a statistically significant positive coefficient for the big-data keyword-frequency measure on firm value (paper reports significance and effect direction).
It is optimal to start taxing AI when cognitive workers start to consider switching to manual jobs.
Analytical result derived from the extended dynamic taxation model and its comparative-static/optimal-policy analysis; the timing rule for introducing an AI tax follows from the model's equilibrium conditions and welfare optimization.
The model implies testable governance diagnostics linking latent fragility to observable patterns: recorded dissent (anonymous vs. formal voting gaps), scenario-set diversity, pipeline and method concentration, and anchor lag.
Theoretical mapping from model primitives and observable quantities to proposed diagnostics; the paper enumerates observable patterns that should correlate with model-implied fragility. This is a theoretical implication rather than an empirically validated claim.
The clearest added value of AI over structured self-reflection lies in increasing felt accountability.
Based on RCT comparisons showing no significant AI advantage over the written-reflection questionnaire on overall goal progress, but showing higher perceived social accountability in the AI condition and a significant mediation of the AI effect on progress via perceived accountability (indirect effect = 0.15, 95% CI [0.04, 0.31]).
AI-assisted goal setting can improve short-term (two-week) goal progress.
Aggregate interpretation based on the RCT finding that the AI condition outperformed the no-support control on two-week goal progress (d = 0.33, p = .016); two-week follow-up window specified in study.
The AI increased perceived social accountability relative to the written-reflection questionnaire.
Reported comparison from the RCT showing higher perceived social accountability in the AI condition versus the written-reflection condition; measured via self-report scales at follow-up (exact scale and statistics reported in paper).
JobMatchAI provides factor-wise explanations through resume-driven search workflows.
Paper states that the system gives factor-wise explanations and ties them to resume-driven workflows; the excerpt references interpretable reranking and demo artifacts but does not include user study or explanation-faithfulness metrics.
JobMatchAI optimizes utility across skill fit, experience, location, salary, and company preferences.
Paper claims the system's objective/utility function includes these factors and that the reranking/optimization accounts for them. No optimization algorithm details, weighting, or empirical utility gains are given in the excerpt.
JobMatchAI is production-ready.
Paper explicitly describes JobMatchAI as "production-ready" and also claims a hosted website and installable package (artifacts consistent with deployment readiness). No formal certification, deployment metrics, or uptime/performance SLAs are provided in the excerpt.
For AI agent tool design, surfacing contextual information outperforms prescribing procedural workflows.
Authors' conclusion drawn from the suite of experiments (GraphRAG vs TDD prompting vs auto-improvement) showing better regression reduction and/or resolution when contextual information is surfaced.
An autonomous auto-improvement loop raised resolution from 12% to 60% on a 10-instance subset with 0% regression.
Reported experiment on a 10-instance subset where an auto-improvement loop was applied (numbers provided in the excerpt).
Smaller models benefit more from contextual information (which tests to verify) than from procedural instructions (how to do TDD).
Inferred from comparative results across models (Qwen3-Coder 30B vs Qwen3.5-35B-A3B) and interventions (contextual test-surfacing vs TDD prompting) reported in the paper.
When deployed as an agent skill, GraphRAG improved resolution from 24% to 32%.
Empirical comparison reported in the evaluation on SWE-bench Verified (same experimental context as above).
TDAD's GraphRAG workflow reduced test-level regressions by 70% (from 6.08% to 1.82%).
Empirical result reported from the SWE-bench Verified evaluation using the GraphRAG workflow (sample details: Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances as reported).
Partial validation against observed AIS vessel behavior shows PIER is consistent with the fastest real transits while exhibiting 23.1× lower variance.
Comparison between PIER trajectories and observed fastest transits in AIS data (details in paper); reported relative variance reduction of 23.1×.
PIER eliminates catastrophic fuel waste: great-circle routing produces extreme fuel consumption (>1.5× median) in 4.8% of voyages, while PIER reduces this to 0.5% (a 9-fold reduction).
Analysis on the same 2023 AIS validation dataset across seven Gulf of Mexico routes (840 episodes per method) comparing distribution tails of voyage fuel consumption; reported incidence rates (4.8% vs 0.5%).
PIER reduces mean CO2 emissions by 10% relative to great-circle routing.
Offline evaluation using physics‑calibrated environments grounded in historical AIS data and ocean reanalysis products; validation on one full year (2023) of AIS across seven Gulf of Mexico routes with 840 episodes per method; reported mean reduction of 10% and bootstrap 95% CI for mean savings [2.9%, 15.7%].
The system is in production at Personize.ai.
Deployment statement in the paper asserting production use at Personize.ai.
The LoCoMo result confirms that governance and schema enforcement impose no retrieval quality penalty.
Interpretation in the paper linking LoCoMo benchmark accuracy (74.8%) to the conclusion that governance/schema enforcement did not degrade retrieval quality.
Governed Memory implements a closed-loop schema lifecycle with AI-assisted authoring and automated per-property refinement.
Design description in the paper describing the closed-loop schema lifecycle and AI-assisted authoring/refinement.
Governed Memory uses reflection-bounded retrieval with entity-scoped isolation.
Design description in the paper specifying reflection-bounded retrieval and entity-scoped isolation.
Governed Memory uses tiered governance routing with progressive context delivery.
Design description in the paper listing tiered governance routing and progressive delivery as mechanisms.
Governed Memory implements a dual memory model combining open-set atomic facts with schema-enforced typed properties.
Design specification within the paper describing the dual memory model (architectural mechanism).
The paper presents Governed Memory, a shared memory and governance layer addressing the memory governance gap.
System architecture and design description in the paper (proposal of a shared memory and governance layer).
The results confirm the positive impact of cognitive technologies on the development of entrepreneurial opportunities and innovative activity.
Conclusion drawn from the positive estimated association (0.33 coefficient) and the observed increases in the indices between 2020 and 2024 reported in the paper.
The Cognitive Tools Index and the Market Opportunity Index were -0.42 and -0.35 in 2020 and 0.94 and 0.92 in 2024, respectively.
Reported observed/computed index values for the years 2020 and 2024 in the study (data source and aggregation method not detailed in the excerpt).
The empirical study for 2020–2024 showed that a one standard unit increase in the Cognitive Tools Index is associated with an average 0.33 increase in the Market Opportunity Index.
Estimated coefficient reported from the panel econometric model over 2020–2024 (model included lags and used instrumental approach; sample size and standard errors not provided in the excerpt).
Pidgin significantly outperformed standard English on measures of knowledge transfer across agriculture, education, and health domains.
Aggregate analysis of questionnaire comprehension items (44-item instrument) across domain-specific modules administered to 45 participants; comparative language-performance results reported in study.
Volunteers who used proverbs and vernacular registers were incorporated into local kinship structures, granted traditional titles, and perceived as legitimate development actors rather than outsiders.
Qualitative evidence from participant observation and discourse samples collected during fieldwork; interview and questionnaire items on perceptions of volunteer legitimacy and social integration.
Agricultural techniques taught in Pidgin were nearly universally adopted by recipients.
Self-reported adoption/behavior-change items in the 44-item questionnaire and corroborating qualitative observation of agricultural practice among participants in the sample (N = 45).
Pidgin-mediated interventions achieved large comprehension gains on health messaging, exceeding 30 percentage points compared with standard English.
Quantitative comparison derived from the 44-item field questionnaire (comprehension items) administered to the 45-participant sample; reported percentage-point difference (>30 pp) in health-message comprehension by language of instruction.
Using Cameroon Pidgin English as the primary medium for Peace Corps development work produced substantially better knowledge transfer, uptake, and social legitimacy than standard English.
Mixed-methods field study of Peace Corps interventions in Cameroon's Northwest: 44-item questionnaire administered to 45 participants across agriculture, education, and health; quantitative measures of comprehension and self-reported adoption; supplemented by qualitative observation and discourse samples.
A hybrid strategic–computational framework, supported by governance mechanisms (human-in-the-loop checkpoints, escalation paths, accountability structures), is motivated to manage tensions and ensure responsible decision-making in AI-rich managerial contexts.
Synthesis-driven prescriptive framework produced by cross-framework analysis; conceptual recommendation rather than implementation evidence.
Roles oriented to information processing, optimisation, and operational precision (monitor, disseminator, resource allocator) are substantially enhanced by computational thinking (automation, optimisation, algorithmic decision-support).
Theoretical mapping of computational capabilities onto Mintzberg’s information-processing roles; conceptual reasoning without empirical validation.
AI adoption will shift fact-checking tasks (more monitoring, less rote verification), creating a need for reskilling and new roles (AI tool operators, analysts); donor and public investments should fund capacity building for local organizations.
Workforce implications inferred from interview reports about changing task mixes and the study's interpretive recommendations.
Investments should prioritize hybrid models where automation provides scale and humans handle contextual, adversarial, and legally sensitive judgments.
Recommendation based on interview findings about AI benefits and limitations and the study's interpretive synthesis.
The study distills context-sensitive best practices for fact-checking in restrictive environments, including safety protocols, local partnerships, and hybrid verification workflows.
Synthesis of findings from document analysis and interviews producing a set of recommended practices documented in the study's outputs.
AI can lower verification costs and scale reach by automating tasks such as classification, clustering, alerting, and translation.
Interview reports from platform staff and interpretive analysis identifying AI-assisted use cases for prioritization, monitoring, and translation.
Community reporting and audience-focused formats are used to improve engagement.
Platform outputs and staff interviews describing deployment of community-reporting mechanisms and tailored audience formats.