Evidence (4560 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
Productivity
Remove filter
The chosen ML technique is gradient boosting regression.
Explicit statement in the methods section that gradient-boosting regression was used for modeling.
Features used in modeling include pesticide/fertilizer use, farm size, crop type, harvest date, and climatic variables.
Listed predictor variables in the paper's modeling/methods section.
The paper is entirely theoretical/analytical and does not report an empirical dataset.
Paper methodology section and abstract state primary tool is an analytical economic model; no empirical data or sample sizes are reported.
The same formal framework can be interpreted as a firm-level model where human skill investment maps onto AI/chatbot investment decisions.
Paper provides an alternative interpretation and formally maps agent skill-investment choices into an analogous firm R&D/AI-capital decision problem within the same mathematical framework.
There is a need for standardized metrics and measurement protocols for public-sector productivity and non-market outcomes (service quality, processing time, cost per transaction, transparency, trust).
Methodological critique within the review pointing to heterogeneity of outcome measures across studies and calling for standardized metrics; based on synthesis of reviewed literature.
Much of the literature on public-sector digital/AI interventions is descriptive or case-based; causal, quantitative evidence on net productivity effects is limited and context-dependent.
Methodological assessment within the review noting heterogeneous study designs, reliance on secondary sources, and a lack of randomized or quasi-experimental studies; the review explicitly states this limitation.
Research and monitoring priorities for economists include task-level analyses of substitutability/complementarity, modeling adoption as a function of regulatory costs and reimbursement incentives, and evaluating long-run welfare and distributional effects.
Explicit research recommendations stated in the narrative review, based on gaps identified in the literature and evolving empirical questions.
Policymakers and payers should consider liability reform, reimbursement models that reward safe human–AI collaboration, funding for independent clinical validation, and measures to prevent market concentration.
Policy recommendations and implications derived from the narrative review's synthesis of regulatory, economic, and implementation challenges.
Research priorities include causal studies on AI’s impacts on SME productivity, employment and inequality in LMICs; cost–benefit analyses of financing and policy interventions; evaluation of data governance models; and development of metrics/monitoring systems for inclusive adoption.
Authors' identification of evidence gaps from the structured literature review highlighting areas with insufficient causal or evaluative research.
Empirical causal evidence on long-run welfare, distributional outcomes, and labor effects of AI in LMIC SMEs remains thin.
Gap identified through the structured review: few causal studies (e.g., RCTs, natural experiments) addressing long-run effects in LMIC SME contexts.
Heterogeneity in SME types and sectors limits the generalizability of findings about AI adoption and impacts.
Authors' methodological limitation noted in the review: the evidence base spans diverse firm sizes, sectors, and contexts, constraining broad generalization.
Theoretical framing integrates Resource-Based View (RBV), Dynamic Capabilities (DC), Technology–Organization–Environment (TOE), and Diffusion of Innovation (DOI) to explain how firm resources, learning capacity, organizational and environmental factors shape AI adoption.
Conceptual synthesis performed as part of the literature review; integration based on existing theoretical literature rather than primary empirical testing.
The systematic review followed PRISMA protocol and analyzed a corpus of 103 items (peer‑reviewed articles and institutional reports) published 2010–2024.
Explicit methodological statement in the paper describing PRISMA use and corpus size/timeframe.
Further longitudinal cost-benefit studies, scalability benchmarks, and cross-domain trials are needed to determine when on-prem RAG is the dominant economic choice.
Paper's research & evaluation recommendations calling for additional longitudinal and cross-domain empirical work; presented as a recommendation rather than an empirical finding.
Human-in-the-loop judgments were central to the paper's relevance/usefulness claims rather than relying solely on synthetic benchmarks.
Methods description explicitly states human evaluation by domain experts was used alongside quantitative benchmarks.
Research gaps remain: quantifying welfare gains from specific AI applications in extraction (productivity, safety, emissions), evaluating cost-effectiveness of policy bundles, and estimating dynamic returns to data ecosystems and human capital.
Identification of gaps from literature and data coverage in the comparative analysis; calls for future empirical and modelling work.
The report has limited primary quantitative impact evaluation and relies on policy texts and secondary sources rather than large-scale empirical measurement of AI’s economic effects.
Explicit limitations section in the report describing methods and data constraints.
Significant empirical gaps remain on long-term impacts (wage trajectories, employment composition, firm-level returns), verification/remediation cost quantification, and public-good risks of insecure code proliferation.
Cross-study synthesis explicitly identifying missing longitudinal and firm-level empirical research in the reviewed literature.
The paper's conclusions are limited by reliance on secondary sources, heterogeneous cross‑study comparisons, limited causal identification of long‑run macro effects, and measurement challenges for AI‑driven intangible capital.
Authors' stated limitations section summarizing the nature of evidence used (qualitative literature review, secondary macro indicators, sectoral examples); this is an explicit self‑reported methodological limitation rather than an external empirical finding.
Methodology used in the paper is a narrative review relying on secondary sources (literature, legal cases, policy reports, empirical perception studies) and conceptual synthesis; no new primary data were collected.
Paper's Data & Methods section explicitly states narrative review and secondary-data analysis.
Important empirical research gaps remain (consumer willingness-to-pay for authenticated vs. synthetic content, labor-displacement elasticities, market concentration dynamics, and cost–benefit evaluations of regulatory options).
Explicit statement of limitations and research needs in the paper, based on the authors' narrative review and absence of primary empirical studies within the paper.
The paper's methodology is a secondary-data, narrative (qualitative) literature review; it contains no original empirical data or primary quantitative analysis.
Explicit methodological statement in the paper describing secondary data analysis and narrative synthesis; absence of primary datasets or statistical analyses.
This paper is conceptual/theoretical and does not conduct primary empirical data collection.
Explicit methodological statement in the paper's Data & Methods section.
More granular firm- and household-level panel data are needed to empirically validate the dissertation's theoretical predictions about nonlinear effects and causal channels.
Author recommendation based on limitations noted in Essay 3 (no primary empirical estimation) and the conditional/simulation-based nature of other essays; this is a methodological claim about future research needs rather than an empirical result.
Further causal, experimental research (randomized deployments) is needed to precisely quantify net productivity and labor reallocation effects of AI agents.
Paper's stated research priorities and explicit acknowledgement of limitations from observational design; no randomized trials reported in the study.
There are measurement challenges for quality-adjusted productivity—errors and downstream effects may reduce net benefits of agent automation and are under-measured in the study.
Authors' noted limitations and concerns about quality-adjusted productivity measurement (error rates, downstream externalities) based on observational deployment experience; no formal measurement of downstream costs reported.
Small-scale, domain-specific deployments of Alfred AI limit external validity to other industries or larger firms.
Deployment context described as small-scale e-commerce; authors note generalizability limitations stemming from domain- and scale-specific nature of the experiments.
Because the study is observational and non-randomized, causal claims about the effect of AI agents on productivity and labor are limited.
Study design explicitly described as applied experimentation and observational deployments (non-randomized); potential confounding and selection biases acknowledged by the authors.
Researchers and firms should measure generation throughput, verification throughput, defect accumulation rates, mean time to detection/fix, costs per incident, and the marginal value of additional verification capacity to evaluate the framework's claims.
Prescriptive measurement priorities listed in the paper as recommendations for empirical validation.
The abstract reports no empirical tests, simulations, or field experiments; empirical validation of the framework is left for future work.
Direct observation of the paper's abstract and methods description indicating lack of empirical validation.
The paper's contribution is primarily conceptual/architectural rather than empirical.
Explicit statement in the paper and absence of reported empirical tests, simulations, or field experiments in the abstract and methods section.
Priority research areas include evaluating long‑run distributional impacts of AI diffusion in agriculture, interactions between digital technologies and labor markets, inclusive financing models for adoption, and macroeconomic effects on food prices and trade.
Stated research agenda and gap analysis in the paper’s conclusions, derived from the review of existing literature and identified gaps.
The current evidence base has gaps: more rigorous impact evaluations, long‑term soil and emissions accounting, and studies on distributional outcomes are needed.
Meta‑assessment within the paper noting limitations of existing literature (many short‑term pilots, limited long‑run soil/emissions data, few studies on who captures value); the claim is based on the review's appraisal of methods used in cited studies.
Economists and policymakers should fund long‑run evaluations (RCTs, quasi‑experimental designs) to estimate causal effects of AI interventions on productivity, welfare, and environmental outcomes.
Evidence‑gap analysis and policy recommendations in the paper; explicit call for rigorous impact evaluation methods given current paucity of long‑run causal evidence.
There are limited long‑run randomized controlled trials (RCTs) on AI/IoT impacts for smallholders and scarce cross‑country data on distributional effects.
Literature review and evidence‑gap identification within the study; explicit statement that long‑run RCTs and cross‑country distributional data are scarce.
Heterogeneous contexts mean impacts vary; careful piloting, monitoring, and adaptive policy are necessary to manage uncertainty in outcomes.
Synthesis and explicit discussion of uncertainties; evidence gaps section noting variable results across regions and interventions.
There are limited standardized measures of 'AI capital,' scarce data on firm-level AI investment and implementation quality, and few long-run causal estimates of AI’s effects on managerial productivity and labor outcomes.
Gap analysis based on literature review and methodological discussion within the book; observation about the state of available empirical evidence.
The paper is primarily conceptual/architectural and does not present large empirical studies quantifying the phenomenon across firms or repositories.
Explicit methodological statement in the paper describing its use of thought experiments, mechanism reasoning, and illustrative examples rather than empirical datasets.
Suggested empirical pathways include lab experiments measuring initiation probability/time-to-start with versus without conversational priming, and field A/B tests in productivity apps measuring task starts and completion conditional on start.
Methodological recommendations in the paper (proposed future empirical work); no data provided.
The paper lacks quantitative validation; effects and magnitudes of the proposed initiation channel are unmeasured.
Methodological statement in the paper noting it is conceptual/theoretical and that it does not report systematic empirical analysis or randomized evaluation.
The paper introduces the 'AI Conversation-Based Action Initiation Barrier Reduction Model' as a theoretical framework explaining how conversational AI reduces initiation frictions.
Descriptive/theoretical presentation in the paper (model specification and conceptual framing). No empirical validation provided.
The paper's conclusions are drawn from a mix of evidence types including literature review, surveys/interviews, case studies, usage-log or publication-metric analyses, and controlled experiments—although the abstract does not specify which of these were actually used or the sample sizes.
Explicitly noted in the Data & Methods summary as the likely underlying evidence types; the paper's abstract itself does not document original data or detailed methods.
There is a lack of large‑scale causal evidence on generative AI’s effects; the paper recommends RCTs, difference‑in‑differences, matched employer–employee panels, and longitudinal studies to fill empirical gaps.
Methodological critique and research agenda provided in the review; observation based on the authors' survey of the literature.
Policy interventions are needed for data protection, bias mitigation, model transparency, accountability, and public investments in workforce retraining to smooth transitions and reduce inequality.
Normative policy recommendations grounded in the review's synthesis of risks and distributional concerns; not an empirical claim but a recommendation.
New productivity metrics are needed to capture AI impacts, including time‑use changes, quality‑adjusted output, and accounting for intangible AI capital.
Methodological recommendation from the conceptual synthesis, motivated by limitations of existing measures discussed in the paper.
Static equilibrium and representative-agent models neglect dynamic reallocation, task re-bundling, and firm-level heterogeneity, limiting their realism for forecasting labour outcomes under AI adoption.
Theoretical critique offered in the paper and referenced critiques in the literature; evidence is conceptual and based on model assumptions identified across studies.
Common empirical strategies (cross-sectional exposure correlations and panel-difference analyses) often lack strong causal identification due to endogeneity of adoption and unobserved confounders.
Surveyed analytical strategies and explicit critique in the paper noting endogeneity and confounding; evidence is methodological critique grounded in the literature's reliance on observational exposure measures.
Researchers construct AI exposure indices at the task level to indicate susceptibility to AI automation or augmentation.
Cited examples (Felten et al., 2023; Eloundou et al., 2023) that develop task-level scores; evidence basis is methodological papers that publish indices and mapping procedures (often using O*NET tasks, expert labeling, or model-based scoring).
Commonly used data sources for measuring AI exposure include job postings and descriptions, occupational task databases (O*NET-style), employer/household surveys, administrative payroll data, and firm-level productivity measures.
List of data sources compiled in the paper; evidence is a methodological summary of datasets used across the cited literature rather than novel data collection.
Many studies rely on static assumptions (fixed comparative advantage, no adaptation) and theoretical models, which limits causal inference and makes projections model-dependent.
Methodological critique cited in the paper (e.g., critique of Acemoglu & Restrepo, 2022; Webb, 2020) and the paper's survey of common modeling choices (static equilibrium or representative-agent models); evidence basis is theoretical critique and literature review rather than new causal estimates.