Evidence (4560 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
Productivity
Remove filter
The paper is a position/normative paper (not an empirical study) that uses conceptual analysis, literature synthesis, and prescriptive roadmaping rather than new quantitative experiments or datasets.
Explicit methodological statement in the paper summarizing genre and methods used; absence of reported original data or controlled evaluations.
There is a need for longitudinal and cross‑country empirical research to measure how hybrid work and AI tools affect promotion rates, network centrality, productivity, privacy harms, trust, and long‑term career trajectories.
Statement of research gaps derived from the paper's methodological approach (conceptual synthesis and secondary case studies) and absence of longitudinal/cross‑cultural primary data.
Highly Autonomous Cyber-Capable Agents (HACCAs) are AI systems able to plan and execute multi-stage cyber campaigns across the full attack lifecycle with minimal or no human direction.
Conceptual definition provided in the report; constructed via literature review and threat-framework formulation (no empirical sample; definitional/analytic).
Potential risks of deploying such models include fairness/bias, privacy concerns from employee-level predictions, and adverse morale effects if interventions are unevenly applied.
Authors' discussion of risks and ethical considerations when applying predictive XAI models to employee data; this is a stated limitation/risk discussion rather than an empirical finding.
Generalizability is limited: results based on the IBM dataset may differ for real green-workforce populations, industries, or countries.
Authors' stated limitation regarding external validity and representativeness of the IBM HR Analytics dataset as a proxy for sustainability roles.
Counterfactual simulations reported are predictive rather than causal; estimated effects require causal validation (e.g., randomized trials) before large-scale policy rollout.
Authors' methodological caveat noting that simulation-based changes in model-predicted probabilities do not establish causality and recommending causal evaluation methods for policy adoption.
The IBM HR Analytics dataset was used as a proxy for sustainability-focused (green) roles, relying on objective HR records rather than self-report surveys.
Data statement in the paper: model trained and evaluated on the IBM HR Analytics dataset; authors explicitly treat it as a proxy for sustainability-oriented roles for purposes of demonstration.
The study shifts retention analysis from descriptive correlations and surveys toward actionable, employee-level predictions and policy evaluation.
Combination of objective HR records (IBM dataset), predictive modeling (logistic regression), calibration, XAI tools (SHAP, LIME), and counterfactual policy simulations to evaluate intervention effects at individual and aggregate levels.
Local explainability (SHAP and LIME) can identify employee-specific intervention levers for targeted retention actions.
Use of SHAP and LIME for local explanations of individual predictions; counterfactual simulations applied at the employee level to estimate impact of feature changes on that employee's calibrated attrition probability.
Practical recommendations for firms and policymakers include investing in training for AI curation/evaluation/coordination, experimenting with decentralised decision rights and governance safeguards, and monitoring competitive dynamics related to model/platform providers.
Policy and practitioner takeaways explicitly presented in the discussion/implications sections, deriving from the conceptual framework and mapped literature.
The paper recommends a research agenda for AI economists: causal microeconometric studies (DiD, IVs, RCTs), structural models with hybrid human–AI agents, measurement work on GenAI use, distributional analysis and policy evaluation.
Explicit recommendations listed in the implications and research agenda sections; logical follow‑on from bibliometric findings about gaps in causal and measurement evidence.
Bibliometric mapping profiles the intellectual structure and evolution of the field but does not establish causal effects of GenAI on organisational outcomes.
Methodological limitation explicitly stated in the paper; bibliometric approach (co‑word, citation, thematic mapping) is descriptive and historical in scope.
Co‑word and thematic analyses reveal six coherent conceptual clusters that bridge technical AI topics (e.g., LLMs, GANs) with managerial themes (e.g., autonomy, coordination, decision‑making).
Thematic mapping and co‑word network analysis performed on the 212‑paper corpus; identification of six clusters reported in results.
Bibliometric and conceptual tools (VOSviewer, Bibliometrix) were used to identify performance trends, co‑word structures, thematic maps, and conceptual evolution in the GenAI–organisation literature.
Methods section: use of VOSviewer for network visualization and Bibliometrix for bibliometric statistics, co‑word analysis, thematic mapping and Sankey thematic evolution.
The study analysed a corpus of 212 Scopus‑indexed publications covering 2018–2025 to map emergent literature on Generative AI and organisational change.
Bibliometric dataset constructed from Scopus; sample size = 212 peer‑reviewed articles; time window 2018–2025; analyses performed with Bibliometrix and VOSviewer.
Outcomes reported are primarily self-reported psychological measures rather than objective productivity metrics.
Paper reports measurement instruments focused on self-reported self-efficacy, psychological ownership, meaningfulness, and enjoyment/satisfaction; no primary objective productivity metrics reported.
The experiment was pre-registered, used occupation-specific writing tasks, and employed a between-subjects design with three conditions (No-AI, Passive AI, Active collaboration).
Study design reported in the paper: pre-registration statement, N = 269, between-subjects assignment to three conditions using occupation-specific writing tasks.
Active, collaborative AI use preserves perceived meaningfulness of work at levels comparable to independent work and does not produce the lasting psychological costs seen with passive use.
Pre-registered experiment (N = 269) with post-manipulation and post-return measures; Active-collaboration condition matched No-AI on meaningfulness and showed no persistent declines after returning to manual tasks.
Active, collaborative AI use preserves psychological ownership of outputs at levels comparable to independent work.
Pre-registered experiment (N = 269); Active-collaboration condition reported ownership levels similar to No-AI condition on self-report scales.
Active, collaborative AI use (human drafts first, then uses AI to refine) preserves self-efficacy at levels comparable to independent (no-AI) work.
Pre-registered experiment (N = 269) comparing Active-collaboration and No-AI conditions; no statistically meaningful differences in self-efficacy between them (self-reported measures).
The authors propose research priorities for economists: quantify productivity gains from closing the actionability gap; estimate firm-level heterogeneity in evaluation capability and its effect on adoption; and model investment trade-offs between building evaluation-to-action pipelines versus accepting reduced LLM performance.
Paper's concluding recommendations for future research directions (explicitly listed by the authors).
The paper produces as primary outcomes a taxonomy of ten evaluation practices, the articulation of the results-actionability gap, and recommended strategies observed among successful teams.
Authors report these as the main outcomes of their thematic analysis and syntheses from the 19 interviews.
The study method consisted of semi-structured qualitative interviews with 19 practitioners across multiple industries and roles, analyzed via thematic coding.
Explicit methods section of the paper stating sample size (n=19), participant diversity, interview approach, and coding/analysis procedure.
AI-economics research should treat quantum capability as a distinct, gradually diffusing factor of production with sectoral specificity and model complementarities and policy counterfactuals endogenously.
Modeling recommendations grounded in sensitivity of macro outcomes to diffusion patterns, complementarities, and policy choices observed in the scenario and counterfactual analyses.
Model parameters are calibrated using historical diffusion of enabling technologies (cloud computing, GPUs, AI toolchains), industry case studies, and expert elicitation where hard data are lacking.
Empirical grounding section describing calibration sources: historical diffusion, case studies (materials discovery, optimization), and expert elicitation.
Uncertainty quantification is performed by running Monte Carlo or scenario ensembles and conducting sensitivity and robustness checks.
Methodological claim in the uncertainty quantification section describing Monte Carlo/scenario ensemble approach.
Sectoral TFP shocks are integrated into computational general equilibrium (CGE) or multi-sector growth models (and optionally DSGE variants) to simulate GDP, sector output, trade impacts, and labor reallocation.
Method section stating integration of sectoral TFP shocks into CGE/multi-sector growth models with optional DSGE short-run dynamics.
Sectoral adoption is translated into total factor productivity (TFP) shocks or sector-specific Hicks-neutral productivity improvements based on micro evidence of quantum advantages.
Methodological description of productivity mapping linking adoption to TFP shocks using micro evidence and case studies.
The paper uses empirical diffusion functions (logistic/S-curve, Bass model) calibrated to analogous technologies to project uptake over time.
Methodological description: diffusion modeling section explicitly states use of logistic/S-curve and Bass models and calibration to past technologies (cloud, GPUs).
The analysis used sentence‑transformer models to produce dense vector representations of article text and UMAP to project those embeddings into a low‑dimensional thematic map for cluster identification and gap detection.
Methods section specifying use of sentence‑transformer embeddings and UMAP for dimensionality reduction/visualization of article text.
The study followed a PRISMA protocol for literature selection and included peer‑reviewed journal articles published between 2014 and 2024, with a final sample size of n = 109.
Explicit methodological statement in the paper describing the literature search, inclusion/exclusion criteria, and final sample.
Twenty‑seven papers study marketing in banking without using NLP methods.
PRISMA systematic review; categorization of the 109 selected articles into the three coverage groups (8, 74, 27).
Seventy‑four papers study NLP in marketing more broadly (not specifically banking).
Same PRISMA‑based systematic review and manual categorization of the final sample n = 109 into topical buckets (NLP in marketing vs. NLP in bank marketing vs. marketing in banking without NLP).
Only 8 peer‑reviewed papers directly examine NLP in bank marketing (out of a final sample of 109 articles published 2014–2024).
Systematic review following PRISMA protocol; final sample n = 109 peer‑reviewed journal articles published 2014–2024; manual screening and categorization yielding counts by topic.
The study's findings are qualitative and case-driven (Xiaomi and Deloitte); generalizability is limited by case selection and the absence of standardized quantitative metrics.
Methods section explicitly states case analysis and literature review as primary methods and notes lack of large-scale quantitative measurement.
The analysis in the paper is primarily qualitative and descriptive; it does not empirically quantify AI’s effects on trade flows or welfare.
Explicit statement in the methods/data description noting a mixed qualitative approach (theoretical analysis, comparative legal analysis, case studies, scenario reasoning) and absence of empirical quantification.
The study is qualitative and law-focused and uses Vietnam as a focused case study without collecting primary quantitative field data.
Explicit Data & Methods statement in the paper indicating doctrinal legal analysis, comparative institutional analysis, and normative framework development; no primary quantitative sample.
The study recommends empirical metrics for future evaluation of reforms, including processing time per case, reversal rates on appeal, administrative litigation frequency, compliance and procurement costs, investment flows into public-sector AI, and changes in labor composition and wages in administrative agencies.
Methodological recommendation arising from the paper's normative and comparative analysis.
The paper's argument is principally theoretical and prescriptive and requires empirical validation across domains and at scale.
Author-stated limitation in the Data & Methods section noting that the work is primarily conceptual and that empirical validation is needed.
Operationalizing DSS requires building domain ontologies/knowledge graphs, designing synthetic curricula, training compact domain models, benchmarking against monolithic LLMs, and measuring total cost-of-ownership (energy, latency, bandwidth, infrastructure).
Paper's recommended experimental and measurement agenda (procedural/methodological prescriptions); this is a proposed research plan rather than an empirical result.
Field experiments (A/B testing) and willingness-to-pay experiments are necessary to quantify monetary benefits, adoption curves, and optimal pricing for alignment capabilities.
Paper explicitly recommends these empirical approaches in the recommendations for economists and product teams; this is a methodological recommendation rather than an empirical finding.
Recommended evaluation directions include automatic metrics (embedding similarity, task success, turn counts), human evaluation (satisfaction, perceived collaboration), and A/B testing in deployed settings (latency, compute, retention).
Paper's explicit evaluation proposals and recommended metrics listed in the Data & Methods and Evaluation Directions sections; these are prescriptive recommendations rather than executed experiments.
The paper focuses on architecture and conceptual arguments rather than reporting large-scale empirical datasets or results.
Data & Methods section and overall document framing emphasize architecture description and proposed evaluations; explicitly notes absence of large-scale empirical results in the provided summary.
Alignment verification can be implemented using semantic embeddings (cosine similarity) or learned classifiers with threshold-based decision branching.
Paper describes these as recommended implementation approaches for the alignment verification component; no empirical benchmark comparing methods is reported.
Temporal decay in the retrieval component can be modeled with functions such as exponential decay and a tunable half-life parameter applied to dialogue-turn embeddings.
Methodological description in the paper specifying temporal decay modeling options (exponential decay example) and tunable parameters; descriptive claim about intended implementation (no empirical comparison of decay functions provided).
Research agenda items for economists include: quantifying willingness-to-pay for verifiable reasoning, studying labor-market impacts for validators, designing contracts/mechanisms to incentivize truthful argument provision, and evaluating regulatory interventions.
Paper's stated research and policy agenda; prescriptive rather than empirical.
Evaluation currently lacks metrics and benchmarks for argument quality, fidelity, contestability, and human trust; developing these is necessary.
Paper notes the gap and proposes evaluation metrics and experimental designs; no new benchmarks introduced.
Evaluation metrics for the architecture should include sample efficiency, generalization across tasks, robustness to distribution shift, autonomy (fraction of learning decisions made internally), transfer speed, lifelong retention, and safety/constraint adherence.
Explicit recommendations for evaluation metrics in the paper.
This paper is a conceptual/theoretical architecture proposal rather than an empirical study; empirical validation should follow via suggested experiments.
Explicit statement in the paper about nature of contribution.
Results are from role-play contexts and short-term interventions; economic estimates of benefit require validation in field settings, across diverse populations, and with different LLM models.
Authors' caveats and limitations stated in the paper noting external validity concerns and the experimental context (role-play, short-term follow-up).