Evidence (8570 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
All models are severely overconfident: their 95% intervals contain the true value only 9--44% of the time, far below the expected 95%.
Analysis of model-produced 95% credible intervals across elicited population statistics, measuring empirical coverage rates reported between 9% and 44%.
Policy enforcement reduces total spending by 27.3%.
Quantitative result reported from the paper's experiments across baselines and scenarios (paper reports a 27.3% reduction attributed to policy enforcement).
In many deployment contexts, especially countries with strong real-time fiat systems like UPI, relying on crypto rails is misaligned with regulatory and infrastructure realities.
Contextual/argumentative claim in the paper contrasting crypto reliance with fiat systems such as UPI (no empirical country-level sample reported).
Traditional questionnaires yielded slightly higher accuracy in risk assessment.
Result reported from the two experiments comparing traditional questionnaires to adaptive ARQuest versions; no numeric accuracy or sample size provided in the excerpt.
Insurers must blindly trust users' responses, increasing the chances of fraud.
Stated as a motivating problem in the paper; presented as logical/empirical concern rather than supported by a reported study within the paper.
Insurance application processes often rely on lengthy and standardized questionnaires that struggle to capture individual differences.
Descriptive claim in paper introduction arguing limitations of standard questionnaires; no experiment or sample size reported for this assertion.
AI's disproportionate benefits for lagging regions help narrow interprovincial emission gaps.
Heterogeneity analysis reported in the provincial panel (2003–2021) showing stronger AI-related reductions in emissions inequality for lagging regions compared to advanced regions.
Green innovation is concentrated in coastal provinces and has not effectively diffused to inland areas, limiting its ability to reduce regional carbon inequality.
Spatial distribution analysis within the provincial panel showing geographic concentration of green innovation activity in coastal provinces and limited diffusion inland.
AI reduces carbon inequality primarily through improved energy efficiency, enhanced environmental monitoring, and more efficient resource allocation, disproportionately benefiting lagging regions and narrowing interprovincial emission gaps.
Mechanism analysis reported in the paper based on the provincial panel (2003–2021) linking AI development to proximate channels (energy efficiency, monitoring, resource allocation) and heterogeneous impacts across regions.
AI development significantly reduces carbon inequality, particularly when measured by the Gini index.
Empirical analysis using a provincial panel dataset covering 2003–2021; carbon inequality measured with the Gini index; reported statistically significant negative association between AI development and Gini-measured carbon inequality.
Cross-equipment generalization is poor, with 42.7% performance on held-out datasets.
Paper reports held-out dataset evaluation showing 42.7% (presumably accuracy or task completion) for cross-equipment generalization.
Multi-asset reasoning causes a 14.9 percentage point degradation in performance.
Paper reports a 14.9 percentage point performance degradation attributed to multi-asset reasoning in comparative analyses.
There are systematic failures in tool orchestration, with 23% incorrect sequencing.
Paper reports a measured incorrect sequencing rate of 23% during evaluation of agent tool orchestration across scenarios.
Even top-performing configurations achieve only 68% task completion.
Reported aggregated performance result from the benchmark evaluation across the tested frameworks and LLMs (paper statement). The benchmark contains 75 scenarios (used as evaluation instances).
Improvements in operational resilience (OR) effectively reduce corporate operational risk.
Further analysis reported in the paper linking higher OR to lower operational risk measures for firms in the sample.
AI promotes operational resilience by reducing management agency conflicts.
Mechanism (mediation) tests reported in the paper showing AI associated with reductions in measures of agency/management conflict, which in turn relate to OR improvements.
Specific occupations such as credit analysts, judges, and sustainability specialists reach ATE scores of 0.43-0.47 by 2030.
Reported model outputs / ATE score estimates for individual occupations within the paper's 2025-2030 regional application.
Applying the ATE framework across five major US technology regions (Seattle-Tacoma, San Francisco Bay Area, Austin, New York, and Boston) over a 2025-2030 horizon, 93.2% of the 236 analyzed occupations across six information-intensive SOC groups cross the moderate-risk threshold (ATE >= 0.35) in Tier 1 regions by 2030.
Modeling/application of the ATE score to O*NET-derived tasks for 236 occupations in six SOC groups across five named US regions with forecasts for 2025-2030; explicit numeric result reported (93.2%).
Agentic AI systems execute end-to-end workflows (multi-step reasoning, tool invocation, autonomous decision-making) and substantially expand occupational displacement risk beyond what existing task-level analyses capture.
Theoretical extension of the Acemoglu-Restrepo task exposure framework described in the paper; conceptual argument contrasting prior automation (subtask substitution) with agentic AI (end-to-end workflow automation). No empirical sample size reported for this conceptual claim.
Agent contributions are associated with more churn over time compared to human-authored code.
Longitudinal comparison between agent-generated and human-authored contributions reported in the paper (churn/survival estimates described; association between agent contributions and higher churn asserted).
Informal workers cannot capture augmentation rents: the estimated coefficient for H^A in informal sector is negative (beta_2 = -0.044).
Subsample or interaction estimate from the augmented Mincer regression using the same merged dataset (N = 105,517); reported coefficient beta_2 = -0.044 for informal workers.
Unbalanced or poorly governed adoption of Big Data and AI contributes to increased systemic risk, cybersecurity vulnerability, regulatory fragmentation and third-party dependence on BigTech platforms.
Argument based on qualitative literature review and synthesis of international empirical studies and comparative sector analysis; no single-sample empirical study in this paper.
Extreme automation (high AI intensity) causes employment decline.
Part of the U-shaped relationship reported by the paper's empirical results; described qualitatively in the abstract/summary.
Task orchestration is the most under-researched dimension among the five workplace-design components.
Finding from the PRISMA-guided systematic review of 120 papers, which mapped coverage across the five dimensions and identified task orchestration as having the least research attention.
Decision authority allocation emerges as the binding constraint for Society 5.0 transitions.
Result synthesized from the systematic review and theoretical analysis mapping the five workplace-design dimensions; stated as the binding constraint in the paper's findings.
The environmental impact of AI is weaker in energy-efficient countries.
Heterogeneity analysis in the paper dividing sample by energy-efficiency status (energy-efficient vs. energy-inefficient countries) shows a smaller AI→CO2 association in energy-efficient countries (104-country panel, 2000–2023).
Advanced digital infrastructure (DII) significantly mitigates the positive effect of AI on CO2 emissions.
Moderation analysis in the panel regressions (104 countries, 2000–2023) including interaction terms between AI adoption and digital infrastructure; results reported that stronger DII reduces the environmental impact of AI.
High institutional quality (GQI) significantly mitigates the positive effect of AI on CO2 emissions.
Moderation analysis in the panel regressions (same 104-country sample, 2000–2023) including interaction terms between AI adoption and governance quality; reported results indicate the AI→CO2 effect is weaker when GQI is stronger.
The literature shows persistent gaps in empirical validation, standardized evaluation methods, and sector-specific comparative analyses of agentic AI in financial services.
Review-level assessment noting limited empirical studies, heterogeneous evaluation metrics, and few direct cross-sector comparisons up to mid-2024.
Significant implementation barriers persist, notably workforce transformation challenges, legacy system integration difficulties, and trust deficits.
Thematic synthesis across empirical and conceptual papers in the review reporting implementation barriers and change management issues.
Ethical concerns—including bias, lack of transparency, and regulatory compliance risks—remain critical for agentic AI in financial services and necessitate layered governance and human-AI collaboration.
Collation of ethical, legal, and governance issues reported across the reviewed multidisciplinary studies and normative discussions.
Insurance is comparatively underrepresented in the literature and in reported agentic AI deployments compared with banking and investment.
Review finding (counts/themes across included studies indicating fewer studies/applications in insurance relative to banking and investment).
When predictions from the two sources conflict, the AI agent aligns more frequently with the prompt, despite its lower accuracy.
Analysis of cases where prompt-based and revealed-data-based AI predictions differed; reported frequency with which the AI's action matched the prompt versus the revealed-preference prediction.
Task complexity shapes substitution: low-complexity tasks see high substitution, while high-complexity tasks favor limited partial automation.
Calibration of the model to O*NET tasks + expert survey + GPT-4o decompositions; implementation results reported for computer vision showing substitution varies with task complexity.
AI systems exhibit predictable but diminishing returns to data, compute, and model size (scaling-law experiments), implying the cost of higher accuracy is convex: good performance may be inexpensive, but near-perfect accuracy is disproportionately costly.
Scaling-law experiments estimating performance as a function of data, compute, and model size; described experimental estimation of production function.
Kerangka hukum ketenagakerjaan Indonesia saat ini bersifat reaktif, dengan fokus pada kompensasi pasca-PHK yang belum mampu menjawab dampak jangka panjang disrupsi AI.
Analisis normatif terhadap peraturan perundang-undangan dan temuan dari literatur yang ditinjau; kesimpulan yang dilaporkan oleh penulis penelitian.
Belum terdapat pengaturan eksplisit mengenai kewajiban pelatihan ulang (retraining) maupun mekanisme distribusi manfaat teknologi secara adil dalam kerangka hukum ketenagakerjaan Indonesia saat ini.
Temuan dari analisis peraturan perundang-undangan nasional (UU Cipta Kerja dan peraturan turunannya) dan literatur yang dikaji dalam penelitian normatif.
Fenomena adopsi AI menimbulkan tantangan hukum terkait perlindungan hak pekerja, keadilan sosial, dan keberlanjutan sistem ketenagakerjaan.
Analisis normatif terhadap konsekuensi sosial-ekonomi AI yang disintesis dari literatur nasional (SINTA) dan internasional; pendekatan konseptual dan komparatif dijelaskan dalam metode.
Perkembangan pesat Artificial Intelligence (AI) telah membawa perubahan mendasar dalam struktur pasar tenaga kerja di Indonesia dengan meningkatnya risiko penggantian pekerjaan manusia oleh teknologi otomatisasi.
Pernyataan latar belakang yang didukung oleh tinjauan literatur pada jurnal nasional terindeks SINTA dan jurnal internasional bereputasi (metode: penelitian hukum normatif dengan pendekatan perundang-undangan, konseptual, dan komparatif).
The common claim that generative AI simply amplifies the Dunning–Kruger effect is too coarse to capture the available evidence.
Paper's synthesis of heterogenous empirical findings from human–AI interaction, learning research, and model evaluation used to critique the uniform-amplification interpretation; no single empirical countertest reported.
LLM use degrades metacognitive accuracy and flattens the classic competence–confidence gradient across skill groups (i.e., reduces calibration and narrows differences in self-assessed confidence by skill level).
Synthesis of studies from human–AI interaction and learning research reported in the paper that document worsened calibration and a reduction in the competence–confidence gradient when users rely on LLM outputs; the paper does not report a single combined sample size.
New technologies are initially skill intensive (demand more college-educated workers) but become less so as they age (they get standardized and accessible to less-skilled workers).
Empirical descriptive evidence from novel text-based data combining patent text and job postings (building on Kalyani et al., 2025) tracking technologies and their changing demand for skills as they age.
Observed declines in browsing time due to ChatGPT adoption are concentrated in website categories such as search and news, which are highly exposed to substitution by generative AI.
Category-level browsing time changes across website classification; concentration of declines in categories identified as highly overlap-exposed to chatbot capabilities using web-scraping and LLM site-level overlap classification.
High-income and younger households adopt generative AI substantially faster than low-income and older counterparts, and this gap is widening over time ('generative AI divide').
Descriptive heterogeneity analysis using Comscore household demographics (income and age bins) and observed adoption trajectories across 2021–2024; authors report widening gap rather than convergence.
Most of today's agents remain isolated tools or closed-ecosystem orchestrators rather than socially integrated participants in open networks.
Author claim/assessment presented as current-state analysis; no empirical breakdown or study sample provided in the text.
Prominent studies predict substantial job displacement due to automation.
Paper asserts this as background, referencing the existence of prominent studies in the literature (no specific citations or sample sizes provided in the abstract).
The literature singles out endemic data quality issues, algorithmic bias, governance frameworks, and regulatory compliance as concerns that require trusted AI and sustainable digital finance ecosystems.
Synthesis from the reviewed literature noting recurring concerns and limitations reported across studies; the paper lists these as major challenges identified in the field.
AI can worsen financial and market performance if it crowds out normal R&D.
Paper's empirical analysis and interpretation linking AI dependence to poorer financial/market performance through displacement of standard R&D activities; presented as a study finding.
High AI dependency disclosed in financial reports does not improve firms' financial health and may even endanger it.
Empirical results drawn from the study's analysis of listed new energy vehicle and automobile manufacturers (2013–2023); statement appears in the paper's findings/conclusions.
AI dependency reduces financial safety for listed new energy vehicle and automobile manufacturers.
Empirical analysis of a sample of listed new energy vehicle and automobile manufacturers covering 2013–2023; the paper reports data analysis showing AI dependency reduces financial safety.