Evidence (7953 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
In the sentiment-analysis task, those individual differences do not produce human–AI complementarity: the joint performance of humans and AI did not exceed that of either alone.
Empirical finding reported from the preregistered sentiment-analysis experiment showing no complementarity effect (joint human-AI performance ≤ best individual performance). (Statistical tests and sample size not included in the excerpt.)
We conducted a systematic review and meta-analysis of the literature on AI/HR analytics and organizational decision making, using 85 publications and grounding the work in theories of algorithm-automated decision-making (AST) and matching/hybrid models (STS).
Paper's methods: systematic review and meta-analysis; sample = 85 publications; theoretical framing explicitly stated as AST and STS.
Macroeconomic fiscal moderation remains empirically unvalidated.
Synthesis conclusion from the review noting an absence of empirical evidence that Agentic AI produces macroeconomic fiscal moderation; i.e., no validated studies showing broad fiscal relief effects were identified in the reviewed literature.
By 2024 the RL-FRB/US model produced a federal budget deficit similar to the baseline: RL-FRB/US model: -1,767 trillion $ vs. FRB/US model: -1,758 trillion $.
Reported fiscal balance (federal budget deficit) simulation outputs for 2024 from comparative model runs in the paper.
No significant differences emerged in job titles and industry suggested by GPT-5 across genders.
Empirical finding from analysis of GPT-5 outputs comparing suggested job titles and industries for the 24 profiles; exact statistical tests not specified in the summary.
Self-generated (model-authored) Skills provide no average benefit.
Comparison of three evaluation conditions (no Skills, curated Skills, self-authored Skills) across SkillsBench. Averaged pass-rate deltas show that model-authored Skills do not increase average pass rate relative to baseline; analysis used 7,308 trajectories over 86 tasks and 7 agent–model configurations.
AI will not cause permanent mass unemployment at the aggregate level.
Analytical argument and literature synthesis using labor-economics theory (Skill-Biased Technological Change and structural transformation). No primary microdata, no stated empirical identification strategy or sample size in the paper (methodology appears to be theoretical and sectoral synthesis).
Empirical evaluation is needed on how AI-induced productivity gains translate into aggregate demand and labor absorption.
Identified research priority in the paper, based on theoretical uncertainty about demand-side labor absorption and lack of conclusive empirical evidence.
AI will not mechanically cause permanent mass unemployment at the aggregate level.
Theoretical framing and synthesis of existing empirical findings across task-based and macro studies; no single new dataset provided (paper draws on literature and conceptual models).
Occupation-level analyses (e.g., BLS OEWS cross-occupation wage regressions) risk misleading conclusions about AI’s distributional effects because they aggregate over the task- and firm-level heterogeneity that drives the mechanism.
Theoretical argument and empirical illustration in the paper showing how aggregation masks within-task compression and firm-level rent capture; example regressions on OEWS used to demonstrate the limitation.
Testing the model requires within-occupation, within-task panel data on task-level performance and wages linked to firm-level AI adoption, ownership of complementary assets, and measures of rent-sharing; such data are not available at scale.
Author statement about data requirements and current data limitations; empirical illustration and discussion note absence of large-scale linked microdata meeting these criteria.
Occupation-level regressions using BLS OEWS (2019–2023) are insufficient for testing the model’s task-level predictions because aggregation across tasks and firms hides the mechanism.
Empirical illustration in the paper using occupation-level regressions on BLS OEWS 2019–2023 showing that such aggregates do not reveal within-occupation, within-task dispersion or firm-level rent concentration effects; paper argues this is a data-adequacy limitation.
A sensitivity decomposition shows five of the moments (the non‑ΔGini moments) identify internal mechanism rates (how AI changes task production, education responses, screening intensity) but do not determine the aggregate sign of inequality change.
Local identification / sensitivity decomposition performed on the calibrated model; decomposition results reported in the paper attribute mechanism-rate identification to five moments and show they leave the sign of ΔGini indeterminate.
The paper introduces a novel taxonomy that separates patenting into three domains: core AI, traditional robotics, and AI-enhanced robotics.
Methodological contribution of the paper: construction and application of a classification scheme that assigns patent filings (1980–2019) into three domains (core AI, traditional robotics, AI-enhanced robotics). Data source: patent filings 1980–2019 (aggregate counts by domain and country). Exact number of patents not provided in the summary.
The proposed uncertainty measure connects to classical value-of-information concepts, bridging security mechanism analysis and economic theories of information, signaling, and screening.
Analytical comparison and discussion in the paper linking the entropy-style residual uncertainty metric to value-of-information literature (theoretical linkage).
AI did not significantly moderate the relationship between workplace stress and job performance.
Moderation test in PLS-SEM (SmartPLS 4.0) on N = 350; reported non-significant AI × Stress → Performance moderator (paper reports no significant moderating effect).
Use of AI raises needs for traceability, explainability, and continuous validation to maintain compliance and avoid error propagation in curricular decisions.
Paper's AI governance recommendations (prescriptive), referencing general AI risk principles rather than empirical study.
There is no accepted integrative digital model that maps measured or perceived value to algorithmic pricing.
Absence of such a model in the SLR sample of 30 articles and thematic coding that identified this gap explicitly.
There is no evidence of nonlinearities in the relationship between digital trade and urban house prices (the effect is linear across the sample).
Explicit tests for nonlinearity reported in the econometric analysis (details of test specification not provided in the summary).
When green-technology innovation is low (below the threshold), the main measurable effect of DE is on improving carbon emission efficiency (CEE), but DE does not yet reduce per capita emissions (PCE).
Results from the threshold-regression models on the 278-city panel (2011–2022) show that in the low-green-innovation regime DE coefficients are significant for CEE but not for PCE; mediating-effect models corroborate the efficiency channel in low-innovation contexts.
Realising DT value requires upfront investment in sensors, integration, standards, and skills; economic viability depends on contract structures and how gains are allocated between investors, owners, contractors, and operators.
Synthesis of cost/benefit discussions and case descriptions in the reviewed literature; policy and procurement examples referenced.
HCI has explored usable consent, but there is no systematic framework for consent in the AI era.
Literature synthesis and gap identification from workshop participants and solicited position papers; no systematic review or meta-analysis with counted studies reported in the summary.
Privacy-leak framing (risk vs ambiguity or privacy-threatening vs neutral) did not change participants' subsequent bargaining behavior with pricing algorithms.
The experiment measured downstream bargaining behavior with algorithms after the adoption/label tasks (N = 610) and reports no detectable effect of the privacy/leak framing on those bargaining outcomes.
Under truthful bidding, the decentralised price-based market matches a centralised value-optimal benchmark (i.e., decentralised allocation equals centralised value-optimal allocation).
Paper presents both a theoretical argument (mechanism properties under quasilinear utilities and discrete slices) and empirical validation in simulation by comparing decentralised outcomes to a centralised value-optimal baseline across configurations in the ablation study.
No clear evidence that project phase systematically shifts sentiment perception.
Project-phase indicators were collected each round and included in correlation and repeated-measures analyses; no consistent, systematic association between project phase and sentiment labeling was found.
Predictors of negative labeling are weak and at best trend-level (e.g., task conflict shows only weak/trend-level association with negative labels).
Correlation analyses and GEE models testing multiple predictors (mood states, life circumstances, team dynamics including task conflict) on negative vs other labels; effects for negative labeling were small and lacked robustness.
Experiments used realistic channel and beamforming datasets reflecting varying elevation angles and dynamic LEO link conditions.
Dataset description in the paper states use of realistic channel and beamforming data including varying elevation angles and dynamic links; no dataset size or public dataset identifiers provided in the summary.
There is a need for causal studies (randomized pilots, phased rollouts) to quantify net welfare effects including patient trust, equity, legal risk, and long-run labor impacts.
Authors' recommendation based on gaps identified in the mixed-methods evidence and acknowledged limitations around causal identification and long-term measurement.
Under the current estimated parameters, dynamics converge toward equilibria—implying convergent, policy-mediated adjustment rather than endogenous cyclical instability.
Inference from stability classification (stable-node equilibria) and model dynamics simulated or linearized around equilibria using 2016–2023–estimated parameters.
Equilibrium points of the estimated three-stock system are classified as stable nodes (no persistent endogenous cycles under the estimated parameters).
Stability analysis: equilibria computed from estimated parameters and local stability assessed via Jacobian eigenvalues; eigenvalues indicate stable nodes.
Results are robust across alternative AI index specifications, occupational classifications, and standard controls (country and year fixed effects, macroeconomic covariates).
Paper reports robustness checks across different index constructions and occupational taxonomies, with standard controls included in regressions.
Liability for harm from AI remains unresolved; current regulatory frameworks (notably in the EU) continue to emphasize human responsibility and require conformity and clinical validation.
Regulatory and legal analyses, with emphasis on European Union device regulation and liability principles, as reviewed in the paper.
On-Premise RAG matches commercial (cloud) RAG on standard quantitative retrieval and generation metrics.
Empirical comparative analysis using standard retrieval/generation benchmarks comparing three systems (zero-shot baseline, GPT RAG cloud, Open-source On-Prem RAG) under representative SME workloads; specific metric names and sample sizes not reported in the summary.
State-level advances in worker-protective AI measures exist but are uneven and many proposed state bills aimed at strengthening workers’ rights related to AI have stalled.
Review of state legislative proposals and enacted laws as compiled in the commentary (state-level policy scan); no systematic quantitative legislative count or sample reported.
Domain adaptation techniques (transfer learning, fine-tuning on local data) are underutilized in low-resource African contexts despite their potential to improve generalization to local populations and care processes.
Thematic coding of methodological sections across the reviewed literature showed relatively few studies employing transfer learning or local fine-tuning approaches in African or other low-resource settings; evidence comes from counts/qualitative summaries within the literature review rather than a formal meta-analysis.
Research priorities include causal studies on productivity gains from AI, firm‑level adoption dynamics, sectoral labor reallocation, long‑run general equilibrium effects, and heterogeneous impacts across regions and demographic groups.
Set of empirical research recommendations drawn from gaps identified in the literature review and limitations section; not an empirical claim but a prioritized research agenda based on secondary evidence.
Growth‑accounting frameworks and measurement approaches must be updated to capture AI/robotics as intangible and embodied capital, including quality improvements and spillovers.
Methodological argument grounded in literature on measurement challenges and examples of intangible capital; no new measurement exercise or empirical re‑estimation is provided in the paper.
Backtesting the proposed models against historical technological transitions (e.g., ATMs, robotics) and recent AI adoption episodes can validate model performance.
Recommended validation strategy; paper does not report backtest results but prescribes holdout/pseudo‑counterfactual experiments and calibration with administrative outcomes.
Scenario modelling in the reviewed literature typically uses counterfactual simulations with different adoption speeds, policy responses, and initial conditions to bound possible employment, wage, and productivity trajectories.
Description and citations of scenario-modelling practices by think tanks and organisations (TBI, IPPR, IMF) and academic work referenced; evidence is methodological and report-based.
NLP/LLM pipelines are used to extract tasks and skills from free-text job ads and to map those tasks to AI capabilities.
Described methods and citations (Xu et al., 2025; Hampole et al., 2025); evidence is methodological application of transformer-based models to job-ad text in recent studies.
Methods increasingly apply advanced NLP and large language models (BERT, LSTM, GPT-4) to parse job descriptions, map skills/tasks, and predict automation risk.
Cited methodological examples in the paper (Xu et al., 2025; Hampole et al., 2025) and discussion of common pipelines using transformer-based models to extract tasks from free-text job ads and to map tasks to AI capabilities; evidence is methodological and based on recent studies rather than a single benchmarked dataset.
Some functional domains show varying maturity: for example, procurement has more applied work compared with other functions.
Reviewer observation from the systematic search and screening across 2020–2025 literature noting uneven distribution of empirical/ applied studies across functions.
A centralized policy engine for access control, data handling rules, and change management is a necessary control point in the reference pattern.
Prescriptive recommendation in the paper supported by best-practice synthesis and case anecdotes; no direct empirical comparison of centralized vs federated policy engines provided.
Research gaps include the need for standardized evaluation metrics, robustness- and consistency-focused XAI methods, domain-informed explanation frameworks, and longitudinal/clinical impact studies.
Recommendations section of the review synthesizing recurring deficits across papers and proposing priorities.
Recommendation for research and modeling: economic models of AI markets should incorporate institutional regime types (centralized vs decentralized), enforcement uncertainty, and legitimacy effects as parameters affecting data access costs, R&D productivity, and market concentration.
Normative recommendation based on the comparative typology and inferred mechanisms from the document analysis; not empirically validated within the study.
Theoretical contribution: the paper extends modular coordination theory by treating openness–security trade‑offs as layered, adaptive institutional processes embedded in political regimes and 'legitimacy economies.'
Argumentative/theoretical development in the paper grounded in document analysis and literature on coordination and legitimacy.
Providing optional LLM access without training did not increase average exam scores versus no LLM access.
Intent-to-treat comparisons across randomized arms reported in the study: comparison of optional-access-without-training arm to no-access arm showed no average score improvement (sample n = 164).
Cross-border coordination is crucial because platform services and data flows often transcend jurisdictions.
Policy analysis and descriptive examples of cross-border platform operations in the reviewed literature; not empirically quantified in the paper.
Standardized metrics for 'inclusive outcomes' are needed beyond account ownership—e.g., active usage, quality of credit, stability of access, and welfare effects.
Critical assessment of measurement shortcomings in existing financial inclusion literature; prescriptive recommendation rather than empirical evidence.
Realizing AI’s potential for circular-economy and energy-efficiency goals requires coordinated interventions across environmental regulation, digital infrastructure, and workforce skill formation.
Policy interpretation drawn from heterogeneity results (regulation and infrastructure amplify AI effects) and the identified labor-market mechanism (skill composition matters); recommendation rather than direct causal estimate.