Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
The analysis identifies ten shared use cases that creators present as pathways to income using GenAI.
Coding of the 377-video corpus resulted in a catalog of ten use cases (as reported in the paper).
Risk and ambiguity manipulations: risk condition communicated a single explicit leak probability of 30%; ambiguity condition communicated the leak probability as a range (10–50%).
Paper's methods section describing the manipulations used in the randomized experiment (N = 610); these specific probability framings were the core independent-variable manipulations.
Experimental design: study used a 2 × 3 between-subjects design with N = 610, crossing information environment (Risk vs Ambiguity) with privacy-treatment conditions (including privacy-threatening vs neutral and different data-type labels).
Methodological description reported in the paper: participants (N = 610) randomized across 6 experimental arms derived from the 2 (Risk vs Ambiguity) × 3 (privacy treatments) factorial design; tasks included choosing between a standard product basket and an AI-personalized basket.
When leak probabilities are known (risk condition: explicit 30% leak probability), adoption of personalization is about 50% and is not significantly affected by privacy-threatening versus neutral information.
Same randomized experiment (N = 610) with a risk manipulation that explicitly stated a single 30% leak probability. Measured adoption rates showed roughly 50% uptake and no statistically significant difference between privacy-threatening and neutral conditions under risk.
Many apparent inter-domain differences vanish once measurement uncertainty is accounted for.
Bootstrap confidence intervals and repeated-sample comparisons showing that differences in citation share or prevalence observed in single-run snapshots are often not statistically significant when uncertainty from repeated sampling is included.
Falsifiability condition for intermediation-collapse: If intermediary margins remain stable despite measurable declines in information frictions, the intermediation-collapse mechanism is falsified.
Stated empirical test in the paper that compares measured intermediary markups/margins to proxies for information frictions and AI-driven automation across affected sectors.
Falsifiability condition for Ghost GDP: If monetary velocity does not decline (or instead rises) as the labor share falls, the Ghost GDP channel is unsupported by the data.
Explicit falsification condition provided in the paper based on the model link labor share -> velocity -> consumption; suggested empirical test using monetary-velocity proxies and labor-share series from FRED.
Empirically, top-quintile households account for roughly 47–65% of U.S. consumption.
Calibration and reported quantitative scenarios in the paper using U.S. consumption concentration data (constructed from U.S. consumption/income micro- and macro-data sources referenced in the methods section).
Economy & Finance threads contained no self-referential content, suggesting agents can engage in market discussion without representing themselves as agents.
Topic-model-derived topical category labeling and tagging for self-referential themes showing zero instances of self-reference in posts categorized as Economy & Finance in the dataset; counts derived from the 361,605 posts.
Because the sample is small and purposive and the design is qualitative, insights are rich but not statistically representative or quantified across the broader research landscape.
Authors' stated study limitations in the paper acknowledging small purposive sample (n=16) and qualitative design.
The study's data come from semi-structured interviews with 16 expert practitioners across biosecurity, cybersecurity, education, and labor.
Study methods reported in the paper: qualitative data source explicitly stated as 16 semi-structured interviews across listed domains.
The workshop identifies specific research directions for AI economics: cost–benefit and ROI analyses of shared infrastructure; market design for procurement of co-designed systems; models of innovation incentives under different IP/data-governance regimes; labor market impact assessments; and empirical studies of how validation ecosystems affect adoption rates and pricing.
Explicitly listed research directions in the workshop summary and roadmap produced by consensus at the NSF workshop (Sept 26–27, 2024).
The workshop's findings are based on qualitative synthesis of expert judgment and stakeholder inputs rather than primary empirical data or controlled experiments.
Explicitly stated in the Data & Methods section of the workshop summary; methods: expert panels, thematic breakout sessions, cross-disciplinary discussions, consensus-building.
The workshop convened researchers, clinicians, and industry leaders to address co-design across four thematic areas: teleoperations/telehealth/surgical operations; wearable and implantable medicine; home ICU/hospital systems/elderly care; and medical sensing/imaging/reconstruction.
Workshop agenda and participant list from the two-day NSF workshop (Sept 26–27, 2024); methods included thematic breakout sessions focused on these four areas. Documentation at https://sites.google.com/view/nsfworkshop.
Empirical work (experiments and measurements) is needed to quantify how much value interpretive traces add to downstream outputs, how RATs affect platform incentives, and what governance frameworks fairly allocate resulting rents.
Concluding recommendation in the paper stating the research gaps; not an empirical claim but a stated need.
The current presentation of RATs is speculative and illustrative; empirical validation, scalability, and ethical safeguards remain to be developed.
Limitations section of the paper explicitly states the speculative nature and lack of empirical evaluation.
Implementation of RATs requires instrumentation at the browser/platform level or via plugins and must address privacy/consent, storage/ownership, sharing controls, and interoperable trace formats.
Design and implementation considerations enumerated in the paper; this is a requirements statement rather than an empirical claim.
Analytical approaches compatible with RATs include sequence/trajectory mining, network analysis of associations/co-read graphs, embedding/clustering of trajectories, qualitative inspection of reflections, and experimental (A/B or RCT) evaluation of downstream effects.
Methods section of the paper listing suggested analytical techniques; these are proposed methods rather than applied analyses.
The paper does not present large-scale empirical validation; its evidence is primarily theoretical exposition, a constructed illustrative example, and a literature survey.
Explicit description of methods and data in the paper (analysis type: theoretical exposition + illustrative example; no experimental sample reported).
Local stochastic fluctuations can undo early discovery leads, preventing transient superiority from becoming permanent unless additional asymmetries intervene.
Dynamical analysis of monopolization stage in the model and simulation trajectories showing reversal or loss of early leads in symmetric interaction regimes; theoretical demonstration that fluctuations can destabilize early footholds.
Transient superiority (finding resources faster) by itself does not stabilize a system-wide monopoly; early leads are fragile and can be undone by local stochastic fluctuations.
Analysis of monopolization dynamics and absorbing-state stability within the stochastic spatial model, plus numerical simulations showing symmetric interaction scenarios do not produce robust absorbing monopolies. This is model-based (no empirical validation).
There is limited empirical causal evidence linking specific explanation types to long-term outcomes (safety, fairness, economic performance) in real-world deployments.
Meta-level finding of the review: authors report gaps in the literature—few causal or longitudinal studies of explanation interventions in deployed, high-stakes settings.
The literature groups explainability impacts along three linked dimensions — user trust, ethical governance, and organizational accountability.
Analytical result of the review's thematic coding and synthesis across interdisciplinary literature (categorization derived from the reviewed corpus).
The paper is primarily theoretical and prescriptive: it synthesizes literature and proposes a framework and design guidelines rather than reporting large-scale empirical datasets or causal identification of economic outcomes.
Meta-claim about the paper's methods explicitly stated in the Data & Methods summary; based on the paper's methodological description.
Key measurable outcomes to assess Human–AI teams include accuracy/efficiency, robustness to novel cases, decision consistency, trust/misuse rates, training costs, and inequity indicators.
Prescriptive list of metrics offered by the authors as part of the research agenda and evaluation guidance; not empirically derived from a dataset in the paper.
Empirical evaluation strategies for Human–AI teams should include randomized interventions, field trials, lab experiments, phased rollouts (difference-in-differences), and structural models that allow interaction terms between human skill and AI quality.
Methodological recommendation in the paper; suggested study designs rather than implemented analyses.
Measuring AI's economic impact requires new metrics that account for decision-value uplift, reduced tail-risk exposures, and dynamic gains from continuous learning; causal identification will require experiments or staggered rollouts.
Methodological recommendation backed by conceptual discussion of measurement challenges; no implementation of such measurement approaches is reported in the paper.
Performance and evaluation should be measured using forecast accuracy, decision lift/value added, latency, and false positive/negative rates.
Paper-prescribed evaluation metrics; presented as recommended practice rather than derived from empirical testing within the paper.
Core AI techniques for these frameworks include supervised/unsupervised ML, NLP for unstructured text, anomaly detection for control/transaction monitoring, and reinforcement/prescriptive models for recommendations.
Methodological claim listing standard ML/NLP/anomaly-detection techniques and prescriptive approaches; statement of methods rather than an empirical comparison of alternatives.
Next‑gen frameworks use large-scale structured (transactions, ledgers, KPIs) and unstructured sources (reports, news, contracts, call transcripts) to power models.
Descriptive claim listing data types the paper recommends; presented as design input requirements rather than empirically validated data-integration projects.
There is a need for quantitative studies and microdata on firm-level RM practices, AI adoption, and performance outcomes to measure effect sizes and causal pathways.
Stated research gaps and limitations in the review (lack of primary empirical quantification; heterogeneity across contexts).
The review's conclusions are limited by reliance on published literature (potential bias toward successful implementations), lack of primary empirical quantification (no effect sizes), and heterogeneity across organizational contexts limiting direct generalizability.
Explicit limitations stated in the paper summarizing scope and method (qualitative literature review, secondary evidence only).
Heterogeneity in system designs and deployment contexts complicates cross-site comparisons.
Limitations section and observed variation in platform architectures, degrees of automation, and governance across sites reported via descriptive data and interviews.
Non-random selection of institutions limits causal inference and external generalizability of the study's findings.
Study limitations explicitly state non-random site selection and heterogeneous deployments; methodological note that causal claims are constrained.
There is a need for standardized metrics and measurement protocols for public-sector productivity and non-market outcomes (service quality, processing time, cost per transaction, transparency, trust).
Methodological critique within the review pointing to heterogeneity of outcome measures across studies and calling for standardized metrics; based on synthesis of reviewed literature.
Much of the literature on public-sector digital/AI interventions is descriptive or case-based; causal, quantitative evidence on net productivity effects is limited and context-dependent.
Methodological assessment within the review noting heterogeneous study designs, reliance on secondary sources, and a lack of randomized or quasi-experimental studies; the review explicitly states this limitation.
Research and monitoring priorities for economists include task-level analyses of substitutability/complementarity, modeling adoption as a function of regulatory costs and reimbursement incentives, and evaluating long-run welfare and distributional effects.
Explicit research recommendations stated in the narrative review, based on gaps identified in the literature and evolving empirical questions.
Policymakers and payers should consider liability reform, reimbursement models that reward safe human–AI collaboration, funding for independent clinical validation, and measures to prevent market concentration.
Policy recommendations and implications derived from the narrative review's synthesis of regulatory, economic, and implementation challenges.
Research priorities include causal studies on AI’s impacts on SME productivity, employment and inequality in LMICs; cost–benefit analyses of financing and policy interventions; evaluation of data governance models; and development of metrics/monitoring systems for inclusive adoption.
Authors' identification of evidence gaps from the structured literature review highlighting areas with insufficient causal or evaluative research.
Empirical causal evidence on long-run welfare, distributional outcomes, and labor effects of AI in LMIC SMEs remains thin.
Gap identified through the structured review: few causal studies (e.g., RCTs, natural experiments) addressing long-run effects in LMIC SME contexts.
Heterogeneity in SME types and sectors limits the generalizability of findings about AI adoption and impacts.
Authors' methodological limitation noted in the review: the evidence base spans diverse firm sizes, sectors, and contexts, constraining broad generalization.
Theoretical framing integrates Resource-Based View (RBV), Dynamic Capabilities (DC), Technology–Organization–Environment (TOE), and Diffusion of Innovation (DOI) to explain how firm resources, learning capacity, organizational and environmental factors shape AI adoption.
Conceptual synthesis performed as part of the literature review; integration based on existing theoretical literature rather than primary empirical testing.
The systematic review followed PRISMA protocol and analyzed a corpus of 103 items (peer‑reviewed articles and institutional reports) published 2010–2024.
Explicit methodological statement in the paper describing PRISMA use and corpus size/timeframe.
Research gaps remain: quantifying welfare gains from specific AI applications in extraction (productivity, safety, emissions), evaluating cost-effectiveness of policy bundles, and estimating dynamic returns to data ecosystems and human capital.
Identification of gaps from literature and data coverage in the comparative analysis; calls for future empirical and modelling work.
The study is limited by being a single‑country case; contextual factors (regulatory regime, infrastructure capacity, procurement practices) may limit generalizability and the study emphasizes institutional and ethical analysis rather than quantitative measurement of economic impacts.
Explicit limitations reported in the paper summarizing scope and emphasis.
Methods used include qualitative interviews with researchers and administrators, observation/documentation of tool use, mapping of data flows and third‑party dependencies, and normative/legal analysis contrasting local practices with GDPR principles.
Methods section of the paper as reported in the provided summary.
The study's empirical basis is a qualitative case study centered on environmental science research in Chile that adopts the GDPR as an organizing normative framework.
Paper description of study scope and normative framing (methods and focus described in Data & Methods).
There is a need for validated administrative and firm-level data on AI adoption, workplace monitoring, and worker outcomes, and for evaluation of policy interventions (mandated impact assessments, transparency requirements, worker representation rules) using randomized or quasi-experimental designs where feasible.
Research and measurement priorities set out in the commentary based on identified gaps; prescriptive recommendation rather than evidence-based finding.
The paper is a policy and legal commentary/synthesis and not an empirical causal study; it does not provide microdata on employment or wage effects but identifies plausible channels and institutional dynamics.
Author-stated methodology and limitations section describing type of study and data sources; explicitly reports lack of primary empirical data.
The federal U.S. approach to AI governance combines export controls for key AI hardware/software with a relatively permissive domestic regulatory stance that relies on executive guidance, voluntary standards, and sector-specific measures rather than comprehensive federal worker protections.
Comparative policy and legal review of federal-level instruments (export control lists, executive orders, agency guidance, proposed/final rules) described in the commentary; no primary empirical data or sample size.