Evidence (1902 claims)
Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 369 | 105 | 58 | 432 | 972 |
| Governance & Regulation | 365 | 171 | 113 | 54 | 713 |
| Research Productivity | 229 | 95 | 33 | 294 | 655 |
| Organizational Efficiency | 354 | 82 | 58 | 34 | 531 |
| Technology Adoption Rate | 277 | 115 | 63 | 27 | 486 |
| Firm Productivity | 273 | 33 | 68 | 10 | 389 |
| AI Safety & Ethics | 112 | 177 | 43 | 24 | 358 |
| Output Quality | 228 | 61 | 23 | 25 | 337 |
| Market Structure | 105 | 118 | 81 | 14 | 323 |
| Decision Quality | 154 | 68 | 33 | 17 | 275 |
| Employment Level | 68 | 32 | 74 | 8 | 184 |
| Fiscal & Macroeconomic | 74 | 52 | 32 | 21 | 183 |
| Skill Acquisition | 85 | 31 | 38 | 9 | 163 |
| Firm Revenue | 96 | 30 | 22 | — | 148 |
| Innovation Output | 100 | 11 | 20 | 11 | 143 |
| Consumer Welfare | 66 | 29 | 35 | 7 | 137 |
| Regulatory Compliance | 51 | 61 | 13 | 3 | 128 |
| Inequality Measures | 24 | 66 | 31 | 4 | 125 |
| Task Allocation | 64 | 6 | 28 | 6 | 104 |
| Error Rate | 42 | 47 | 6 | — | 95 |
| Training Effectiveness | 55 | 12 | 10 | 16 | 93 |
| Worker Satisfaction | 42 | 32 | 11 | 6 | 91 |
| Task Completion Time | 71 | 5 | 3 | 1 | 80 |
| Wages & Compensation | 38 | 13 | 19 | 4 | 74 |
| Team Performance | 41 | 8 | 15 | 7 | 72 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 17 | 15 | 9 | 5 | 46 |
| Job Displacement | 5 | 28 | 12 | — | 45 |
| Social Protection | 18 | 8 | 6 | 1 | 33 |
| Developer Productivity | 25 | 1 | 2 | 1 | 29 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Skill Obsolescence | 3 | 18 | 2 | — | 23 |
| Labor Share of Income | 7 | 4 | 9 | — | 20 |
Skills Training
Remove filter
Labor demand effects are ambiguous: junior/entry-level demand may be reduced for some tasks while demand for verification and higher-skill roles may rise.
Economic reasoning, early observational signals, and theoretical task-reallocation frameworks; empirical longitudinal evidence is limited or absent.
The effectiveness of generative AI depends critically on human-AI workflows: prompt design, iterative refinement, and human vetting materially affect outcomes.
Qualitative analyses of interaction patterns and experiments manipulating prompting/iteration showing variation in outcomes; many studies report improved outputs after iterative prompting and human-in-the-loop refinement.
Persistent declines in self-efficacy after passive AI exposure suggest potential for skill atrophy and slower reversion when tasks must be performed without AI.
Inference from observed persistent reductions in self-efficacy post-return in the experiment; skill atrophy and reversion costs not directly measured—this is an implied consequence.
Firms that adopt passive, copy-based AI workflows risk psychological costs that could offset short-run productivity gains from AI.
Inference drawn from experimental findings of reduced efficacy/ownership/meaningfulness under passive use and short-term enjoyment gains; not directly tested for firm-level productivity or turnover—extrapolation from individual-level psychological measures.
Emergent quality hierarchies among agents imply winner-take-most dynamics in informational value and potential market concentration in agent quality.
Observed formation of quality hierarchies in agent interactions and documented economic interpretation; this is a hypothesis/implication drawn from qualitative patterns rather than measured market outcomes.
Large-scale battlegrounds and competitions increase compute demand and associated costs, with implications for budgets and environmental externalities.
Paper notes that the Battling Track dataset (20M+ trajectories), model training for baselines/competitions, and running a living benchmark imply substantial compute; this is an argued implication rather than measured environmental impact.
Unclear liability frameworks increase perceived and real costs and can slow adoption by hospitals and insurers.
Policy analyses and procurement narratives noting liability uncertainty cited as a barrier to procurement and deployment.
Up-front implementation costs commonly include procurement, integration with PACS/EMR, UI/UX development, regulatory compliance, and staff training; recurring costs include monitoring, data labeling, software updates, and cybersecurity.
Implementation reports, vendor and hospital accounts, and qualitative studies documenting cost categories (specific dollar amounts vary across settings and are rarely published in detail).
Uneven organizational supports can concentrate returns to AI in firms and workers that successfully actualize affordances, potentially widening wage and employment disparities; targeted policy and training investments can mitigate these effects.
Theoretical implication from the framework with policy recommendations; no empirical testing or sample reported in the paper.
These trends (job polarization and differential wage/mobility outcomes) may exacerbate economic disparities across regions.
Interpretation and projection based on the observed trends in the reviewed literature and reports; presented as a risk/implication rather than an empirically tested causal finding in the summary.
Without continuous support for upskilling/reskilling and inclusive policies, AI risks becoming a source of exclusion rather than an enabler of human advancement.
Normative conclusion derived from reviewed literature and thematic interpretation in the qualitative study (literature-based; evidence is secondary and not quantified).
Research literature synthesis demonstrates 70-75% automation potential.
Quantitative estimate offered by the authors (70-75%) as part of function-by-function analysis; no described empirical evaluation or sample supporting the figure.
Knowledge transmission (teaching/lecturing) shows 75-80% AI substitutability.
Authors' quantitative estimate presented in the analysis (75-80%); the paper does not detail empirical methods or validation samples for this percentage.
Administrative tasks face 75-80% disruption risk from AI.
Paper provides a quantitative estimate (75-80%) as part of its functional disruption assessment; no empirical methodology, dataset, or sample size is described to support the numeric range.
The remaining difference (roughly 70%) is not explained by the factors observed in the data, indicating additional influences not captured in the survey.
Residual (unexplained) component from decomposition analyses on ESJS data.
Policy-relevant implication (extrapolated): identity heterogeneity implies family- and purpose-driven entrepreneurs may be less likely to pursue AI-enabled innovation after income shocks, suggesting targeted outreach and low-risk entry paths to avoid widening digital divides.
Extrapolation from documented identity-heterogeneous declines in innovation after income shocks (empirical result) to probable patterns in AI adoption; AI adoption is not directly measured in the paper's dataset.
Differential access to higher-quality (paid) versus free GenAI tools and differing ability to engage with the tool could widen inequality among students and institutions.
Authors' implication based on student-reported concerns about limitations of free ChatGPT versions and on heterogeneous gains across disciplines; this is a policy/implication claim not directly measured in the experiment.
Heterogeneous trust levels across firms and schools may produce uneven productivity gains and widen performance gaps.
Logical implication and policy discussion in the paper; the cross-sectional study documents relationships between trust and outcomes but does not provide aggregate diffusion or cross-firm longitudinal evidence to confirm unequal sectoral diffusion.
Overreliance on unvetted AI can propagate biases; economic gains from AI therefore require governance, auditing, and accountability mechanisms.
Framed as a risk and policy recommendation in the discussion; not an empirical finding from the cross-sectional survey reported in the summary.
If FDI brings capital‑intensive, AI‑enabled production without complementary upskilling, it may exacerbate wage inequality and deepen labor market dualism in SSA.
Theoretical inference and analogy from documented patterns of skill‑biased technological change and FDI-driven inequality in the reviewed literature; empirical evidence specific to AI in SSA is lacking in the review.
Centralized provision of high-quality coding models by a few vendors could produce vendor lock-in and increase platform power in software development inputs.
Market-structure analysis and industry observations synthesized in the paper; the claim is forward-looking and not established by longitudinal market data within the review.
If many firms adopt AI generation without matching verification, aggregate fragility in software-dependent infrastructure could rise, increasing downtime costs and systemic economic risk.
Macro-level risk projection and system fragility argument in the paper; no macroeconomic modeling or empirical scenario analysis provided.
Imported AI systems may impose foreign values and norms, risking erosion of indigenous knowledge and social cohesion.
Normative and conceptual argument supported by cited case studies and policy analyses; no original anthropological or sociological fieldwork in the paper.
Deployed AI systems can produce algorithmic bias that harms marginalized groups when models are trained on skewed or non‑representative data.
Synthesis of prior empirical findings and case studies on algorithmic bias and fairness in ML systems; paper does not present new empirical tests.
There are research opportunities to measure returns to 'teaching' (causal impact of configuring agents on human skill accumulation and earnings) and to model agent-platform ecosystems with network effects, spillovers, and endogenous quality hierarchies.
Author-stated research agenda and proposed empirical questions derived from the observed phenomena; not empirical results but recommended directions.
Empirical economics research should use firm-level and pipeline microdata and quasi-experimental designs to estimate causal effects of AI adoption on outcomes like time-to-hit, preclinical attrition, IND filings, and NME approvals per R&D dollar.
Research recommendation offered in the paper based on identified gaps; not an evidence claim but an explicit methodological suggestion.
The study recommends iterative prompt refinement, integration with adaptive learning models, and further exploration of autonomous self-prompting mechanisms.
Concluding recommendations derived from the study's results and interpretation; presented as future directions rather than empirically tested interventions within this study.
Future research should explore sector-specific AI adoption challenges and long-term workforce adaptation strategies.
Author recommendation presented in the paper's discussion/future work section of the summary.
Recommended future research includes scalable interoperability solutions, longitudinal lifecycle value validation, human‑centred adoption strategies, and sustainability assessment methods.
Authors' explicit recommendations at the end of the review based on identified gaps in the literature.
Researchers should combine qualitative studies with administrative/matched employer–employee data and experimental/quasi-experimental designs (pilot rollouts, staggered adoption) to identify causal effects of AI on tasks, productivity, and wages.
Methodological recommendation by authors based on limitations of their qualitative study (15 UX designers) and the need to quantify observed phenomena; not an empirical claim tested in the paper.
Future research priorities include obtaining causal estimates (e.g., field experiments) of productivity gains from trust-mediated AI adoption and conducting cost–benefit analyses of trust-building interventions.
Study’s stated research agenda/recommendations; not an empirical claim but a recommended direction for follow-up research.
Key research priorities include improving measurement of AI usage across countries, causal identification of long-run effects, and sectoral reskilling strategy evaluation.
Identified gaps and methodological limitations in the reviewed empirical literature (measurement heterogeneity, limited long-run panels, sectoral variation) motivating suggested future research agenda.
To measure and monitor these effects, researchers should track firm-level adoption of AI features, fulfillment automation intensity, platform-mediated market entry, and task-level labor shifts.
Author recommendations based on gaps identified in the case-based and multi-modal empirical work and the sensitivity of results to adoption measures; not an empirical finding but a methodological claim.
Policy priorities should differ by national Skill Imbalance: countries with strong demand for new skills should prioritize education and reskilling, while countries with strong supply should prioritize firm absorption (innovation, financing, technology adoption).
Interpretation of cross-country Skill Imbalance Index and its implications; prescriptive recommendation based on the observed demand–supply patterns rather than causal testing of policies.
Economic evaluations of AI adoption should include psychological and human-capital externalities (effects on self-efficacy, skill depreciation, job satisfaction) to fully account for welfare and productivity dynamics.
Argument grounded in experimental and survey findings showing psychological impacts of AI-use mode; general recommendation for research and evaluation rather than an empirical finding.
The benchmark provides a testbed useful for studying strategic behavior, coordination failures, and market-like interactions among agents, which can inform economic research and policy.
Paper claims the benchmark's multi-agent, strategic tasks can be used as experimental environments for economic and policy research; this is a normative claim supported by the benchmark's design rather than by empirical studies in the paper.
Open-source orchestration lowers entry barriers, broadening participation and potentially compressing rents that would otherwise accrue to well-resourced incumbents.
Paper's discussion section argues that releasing orchestration and evaluation tools publicly reduces the technical overhead for entrants; this is a theoretical/observational claim rather than empirically measured in the paper.
The clear performance gaps indicate high returns to specialized efforts (RL, domain-specific engineering) relative to generalist LLM-only approaches, shaping where teams invest labor and compute.
Paper links benchmarking results (performance gaps between baselines and humans) to economic implications, arguing specialization yields higher returns; this is an interpretive claim based on reported performance differentials.
Benchmarks like PokeAgent will reallocate researcher and industry attention toward multi-agent, partial-observability, and long-horizon planning problems—likely increasing funding and compute investment in RL and hybrid LLM+RL methods.
Paper offers an economic/implication analysis arguing that introducing such a benchmark changes incentives and investment patterns; this is a reasoned projection rather than an empirical observation.
Embedding LLM coaching tools in platforms (employee onboarding, customer support, peer-support communities) could raise overall conversational quality by improving expressive outcomes rather than only informational accuracy.
Authors' implication drawn from trial results showing improved alignment to empathic norms after personalized coaching; no field deployment evidence provided in the paper.
LLM-driven personalized coaching can cheaply scale soft-skill training (empathy expression) that would otherwise require costly human trainers, suggesting a high-return application of AI in workforce development.
Implication drawn from observed efficacy of brief automated coaching in the trial and the scalable nature of LLM deployment; no direct economic field trial provided in the paper.
Labor market programs should strengthen career counseling, job-matching services, and consider wage subsidies or transitional support to help workers re-enter labor markets during retraining.
Study's programmatic recommendations based on observed skill mismatches and distributional risks; recommendation is not backed by direct program evaluation within the paper.
Policy should prioritize investments in digital education, foundational data skills, targeted upskilling and retraining, and flexible, modular lifelong learning pathways to reduce inequality from AI-driven changes.
Policy recommendations derived from empirical patterns (occupational vulnerability, skill-demand shifts) and qualitative case studies in the study; these are prescriptive implications rather than tested interventions. No experimental or evaluation evidence presented for these policies in the Albanian context.
Fee-for-service payment structures may not reward efficiency gains from AI; value-based payment or shared-savings models are better aligned to incentivize adoption that reduces total cost and improves outcomes.
Health policy and reimbursement literature synthesizing incentives under different payment models; limited empirical testing of reimbursement models for AI-assisted services.
Effective human–AI collaboration will shift task content toward complementary activities (supervision, interpretation, creative/problem-solving), increasing demand for these complementary skills and potentially raising skill premia for workers who actualize AI affordances.
Theoretical prediction grounded in complementarity arguments and affordance actualization; no empirical sample or quantification provided.
Productivity gains from AI depend not only on the technology's capabilities but on organizational adaptation and successful affordance actualization; therefore investments in supportive strategy and mentoring can increase the fraction of potential AI productivity realized.
Theoretical implication derived from integrating AST and AAT literatures; recommended for empirical testing but not empirically demonstrated in the paper.
Strategic innovation backing (organizational investments, resource allocation, governance, and incentives) enables experimentation and scaling of human–AI work and thereby increases realized returns to AI investments.
Theoretical proposition based on literature integration and normative argument; no empirical sample or original data presented.
Policy interventions that promote transparency, standardized feedback channels, auditability, and training for oversight roles can improve trust calibration and economic returns to AI investments.
Policy recommendation based on synthesis of interview findings (N=40) regarding enablers of trust calibration and theoretical extension to expected economic impacts; this is a prescriptive inference rather than an empirically tested policy outcome in the study.
The digital transformation of vocational education is economically necessary in the Industry 4.0 era and can provide empirical support for policies to alleviate labor market polarization in Korea and similar East Asian economies.
Policy conclusion drawn from the empirical findings (wage premiums for specialized digital skills and heterogeneous returns across firm types and educational pathways) based on KLIPS-based extended Mincerian wage analyses.
Organizations can leverage these insights to design training programs, selection criteria, and AI systems that prioritize emergent team performance over standalone capabilities, marking a shift toward optimizing collective intelligence in human-AI teams.
Practical implication drawn from empirical findings (synergy effects, distinct collaborative ability, role of Theory of Mind) reported in the paper; recommendation rather than direct empirical test.