Evidence (7953 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Study limitations: cross-sectional design, self-reported intentions, potential unobserved confounders, and limited generalizability to only three cities (Beijing, Guangzhou, Lanzhou).
Explicit methodological statements in the paper describing data and design: cross-sectional survey of 889 respondents from three cities and reliance on self-reported employment intentions.
Because the study is cross-sectional and self-report, causal claims are limited and generalizability is restricted to Generation Z (limitation noted in the paper).
Authors' limitations: cross-sectional/self-report design and sample restricted to Generation Z; these constraints are reported in the paper.
Study design: cross-sectional self-report survey of 450 Generation Z consumers analyzed with Structural Equation Modeling (SPSS AMOS).
Methods section reporting sample size (n = 450), target population (Generation Z), cross-sectional survey design, and analysis technique (SEM using SPSS AMOS).
The measurement and structural model show good to excellent fit and reliable constructs (CFI = 0.980, TLI = 0.974, RMSEA = 0.062, SRMR = 0.031).
Reported psychometric/model-fit indices from SEM analysis (SPSS AMOS) on sample of 450 respondents.
Outcomes reported are primarily self-reported psychological measures rather than objective productivity metrics.
Paper reports measurement instruments focused on self-reported self-efficacy, psychological ownership, meaningfulness, and enjoyment/satisfaction; no primary objective productivity metrics reported.
The experiment was pre-registered, used occupation-specific writing tasks, and employed a between-subjects design with three conditions (No-AI, Passive AI, Active collaboration).
Study design reported in the paper: pre-registration statement, N = 269, between-subjects assignment to three conditions using occupation-specific writing tasks.
Active, collaborative AI use preserves perceived meaningfulness of work at levels comparable to independent work and does not produce the lasting psychological costs seen with passive use.
Pre-registered experiment (N = 269) with post-manipulation and post-return measures; Active-collaboration condition matched No-AI on meaningfulness and showed no persistent declines after returning to manual tasks.
Active, collaborative AI use preserves psychological ownership of outputs at levels comparable to independent work.
Pre-registered experiment (N = 269); Active-collaboration condition reported ownership levels similar to No-AI condition on self-report scales.
Active, collaborative AI use (human drafts first, then uses AI to refine) preserves self-efficacy at levels comparable to independent (no-AI) work.
Pre-registered experiment (N = 269) comparing Active-collaboration and No-AI conditions; no statistically meaningful differences in self-efficacy between them (self-reported measures).
The paper identifies future research directions, including empirical causal studies on how DPP+AI interventions change recycling rates, second‑hand market prices, and firm investment in circular processes; and modeling firm strategy around proprietary vs shared DPP data.
Stated research agenda and gaps in the paper informed by the study's findings and limitations; these are recommendations rather than empirical claims.
The study used a mixed-methods design focused on the Italian fashion and cosmetics industries, employing two online surveys, k‑means clustering (consumer segmentation), principal component analysis (to identify underlying dimensions of DPP functionalities and sustainability practices), and logistic regression (to identify adoption drivers).
Methods section summary provided in the paper; explicit statement of methods and industry context. Note: sample sizes and survey instrument details are not provided in the summary.
Two consumer segments were identified: 'aware' consumers (environmentally attuned and receptive to digital innovation and sustainability information) and 'unaware' consumers (prioritize immediate, tangible benefits like price and convenience over sustainability information).
K‑means cluster analysis applied to consumer responses from one of the online surveys in the Italian fashion and cosmetics context; summary identifies two clusters; sample sizes not reported.
This work is a conceptual/policy analysis rather than an original empirical study.
Explicit statement in the paper's Data & Methods section.
Study limitations include single-country (China) listed‑firm sample and reliance on secondary/administrative proxies for digitalization and innovation, which may miss internal qualitative aspects and introduce measurement error.
Authors’ stated limitations: sample restricted to Chinese A-share listed firms (2012–2022) and measures of digitalization/innovation derived from administrative/secondary data rather than direct observation/survey of internal practices.
No new primary empirical tests were performed in this paper; conclusions are based on secondary analysis and are broad and diagnostic rather than demonstrating causal mechanisms.
Explicit methodological statement in the Data & Methods and Limitations sections of the paper describing it as a qualitative literature review and synthesis.
Research should prioritize causal identification (IV, difference‑in‑differences, regression discontinuity) to disentangle whether ESG causes better financial outcomes or instead proxies for unobserved firm quality.
Methodological recommendation based on limitations in the reviewed literature (many observational/correlational studies); the paper argues for stronger causal designs going forward.
The authors propose research priorities for economists: quantify productivity gains from closing the actionability gap; estimate firm-level heterogeneity in evaluation capability and its effect on adoption; and model investment trade-offs between building evaluation-to-action pipelines versus accepting reduced LLM performance.
Paper's concluding recommendations for future research directions (explicitly listed by the authors).
The paper produces as primary outcomes a taxonomy of ten evaluation practices, the articulation of the results-actionability gap, and recommended strategies observed among successful teams.
Authors report these as the main outcomes of their thematic analysis and syntheses from the 19 interviews.
The study method consisted of semi-structured qualitative interviews with 19 practitioners across multiple industries and roles, analyzed via thematic coding.
Explicit methods section of the paper stating sample size (n=19), participant diversity, interview approach, and coding/analysis procedure.
AI-economics research should treat quantum capability as a distinct, gradually diffusing factor of production with sectoral specificity and model complementarities and policy counterfactuals endogenously.
Modeling recommendations grounded in sensitivity of macro outcomes to diffusion patterns, complementarities, and policy choices observed in the scenario and counterfactual analyses.
Model parameters are calibrated using historical diffusion of enabling technologies (cloud computing, GPUs, AI toolchains), industry case studies, and expert elicitation where hard data are lacking.
Empirical grounding section describing calibration sources: historical diffusion, case studies (materials discovery, optimization), and expert elicitation.
Uncertainty quantification is performed by running Monte Carlo or scenario ensembles and conducting sensitivity and robustness checks.
Methodological claim in the uncertainty quantification section describing Monte Carlo/scenario ensemble approach.
Sectoral TFP shocks are integrated into computational general equilibrium (CGE) or multi-sector growth models (and optionally DSGE variants) to simulate GDP, sector output, trade impacts, and labor reallocation.
Method section stating integration of sectoral TFP shocks into CGE/multi-sector growth models with optional DSGE short-run dynamics.
Sectoral adoption is translated into total factor productivity (TFP) shocks or sector-specific Hicks-neutral productivity improvements based on micro evidence of quantum advantages.
Methodological description of productivity mapping linking adoption to TFP shocks using micro evidence and case studies.
The paper uses empirical diffusion functions (logistic/S-curve, Bass model) calibrated to analogous technologies to project uptake over time.
Methodological description: diffusion modeling section explicitly states use of logistic/S-curve and Bass models and calibration to past technologies (cloud, GPUs).
The analysis used sentence‑transformer models to produce dense vector representations of article text and UMAP to project those embeddings into a low‑dimensional thematic map for cluster identification and gap detection.
Methods section specifying use of sentence‑transformer embeddings and UMAP for dimensionality reduction/visualization of article text.
The study followed a PRISMA protocol for literature selection and included peer‑reviewed journal articles published between 2014 and 2024, with a final sample size of n = 109.
Explicit methodological statement in the paper describing the literature search, inclusion/exclusion criteria, and final sample.
Twenty‑seven papers study marketing in banking without using NLP methods.
PRISMA systematic review; categorization of the 109 selected articles into the three coverage groups (8, 74, 27).
Seventy‑four papers study NLP in marketing more broadly (not specifically banking).
Same PRISMA‑based systematic review and manual categorization of the final sample n = 109 into topical buckets (NLP in marketing vs. NLP in bank marketing vs. marketing in banking without NLP).
Only 8 peer‑reviewed papers directly examine NLP in bank marketing (out of a final sample of 109 articles published 2014–2024).
Systematic review following PRISMA protocol; final sample n = 109 peer‑reviewed journal articles published 2014–2024; manual screening and categorization yielding counts by topic.
The study's findings are qualitative and case-driven (Xiaomi and Deloitte); generalizability is limited by case selection and the absence of standardized quantitative metrics.
Methods section explicitly states case analysis and literature review as primary methods and notes lack of large-scale quantitative measurement.
The methodology is normative-philosophical argumentation supplemented by interdisciplinary synthesis (phenomenology, deconstruction, OOO, STS/material turn); this is not an empirical causal study and contains no quantitative datasets.
Author-declared methods and limits: statement that the intervention is theory-driven and qualitative; absence of quantitative analysis reported.
The paper’s empirical grounding consists of illustrative case studies and vignettes from healthcare robotics, autonomous vehicles, and algorithmic governance used to demonstrate distributed agency and responsibility.
Author-stated methodology: qualitative vignettes/case illustrations across three domains; no reported sample sizes or systematic data collection.
The analysis in the paper is primarily qualitative and descriptive; it does not empirically quantify AI’s effects on trade flows or welfare.
Explicit statement in the methods/data description noting a mixed qualitative approach (theoretical analysis, comparative legal analysis, case studies, scenario reasoning) and absence of empirical quantification.
The study is qualitative and law-focused and uses Vietnam as a focused case study without collecting primary quantitative field data.
Explicit Data & Methods statement in the paper indicating doctrinal legal analysis, comparative institutional analysis, and normative framework development; no primary quantitative sample.
The study recommends empirical metrics for future evaluation of reforms, including processing time per case, reversal rates on appeal, administrative litigation frequency, compliance and procurement costs, investment flows into public-sector AI, and changes in labor composition and wages in administrative agencies.
Methodological recommendation arising from the paper's normative and comparative analysis.
The paper's argument is principally theoretical and prescriptive and requires empirical validation across domains and at scale.
Author-stated limitation in the Data & Methods section noting that the work is primarily conceptual and that empirical validation is needed.
Operationalizing DSS requires building domain ontologies/knowledge graphs, designing synthetic curricula, training compact domain models, benchmarking against monolithic LLMs, and measuring total cost-of-ownership (energy, latency, bandwidth, infrastructure).
Paper's recommended experimental and measurement agenda (procedural/methodological prescriptions); this is a proposed research plan rather than an empirical result.
Analysis compared responses across 16 predefined dimension pairs (ethical dimensions or response axes) and used repeated measures and qualitative coding to characterize system behavior.
Methods and Analysis sections reporting use of 16 dimension-pair comparisons, repeated-measures tests for delta between blind and declared administrations, and qualitative coding to derive D3 failure taxonomy.
Probe administration included operational controls: runs were administered by two human raters across three machines to ensure operational consistency.
Methods statement describing administration by two human raters on three machines.
The ceiling discrimination probe used Gemini Pro (Google) and Copilot Pro (Microsoft) as independent judges.
Methods: reported use of Gemini Pro and Copilot Pro as independent judges for the ceiling probe.
Primary blind scoring was performed by Claude (Anthropic) used as an LLM judge.
Methods: primary blind scoring explicitly performed by Claude.
Re-administration under declared conditions produced zero delta across all 16 dimension-pair comparisons (no measurable change when declaration status changed).
Reported repeated-measures comparisons across 16 predefined dimension pairs between blind and declared administrations, with reported zero delta.
Series 2 consisted of local and API open-source systems (n = 6) administered blind and declared, with four systems re-administered under declared conditions.
Methods description detailing Series 2 composition, modes (blind and declared), and that four systems were re-tested under declared conditions.
Series 1 consisted of frontier commercial systems administered blind (n = 7).
Methods description specifying Series 1 composition and blind administration.
The study employed 24 experimental conditions spanning 13 distinct LLM systems across two series.
Study design reported in Methods: Series 1 (frontier commercial, blind, n=7), Series 2 (local/API open-source, blind and declared, n=6), plus re-administered declared runs and ceiling-probe runs summing to 24 conditions.
The paper does not claim proprietary deployment metrics beyond qualitative field observations; experimental formalizations are provided for reproducible evaluation instead.
Authors explicitly note they document how to reproduce experiments but do not claim proprietary deployment metrics beyond qualitative field observations.
The paper recommends tracking specific operational and economic metrics: MTTR for tool failures, per-invocation latency variance, per-interaction operational cost, frequency of identity-related incidents, human remediation hours per 1,000 incidents, and SLA breach rates.
Explicit list of recommended metrics in the implications and metrics-to-track sections of the paper.
The paper provides a production-readiness checklist and instructions for reproducible evaluation alongside the proposed mechanisms.
Deliverables enumerated in the paper include a production-readiness checklist and reproducible experimental methodology.
All three proposed mechanisms (CABP, ATBA, SERF) are formalized as testable hypotheses with reproducible experimental methodology (benchmarks, latency/error models, broker pipeline semantics).
Paper includes formal descriptions and reproducible evaluation instructions and benchmarks; authors state methods to reproduce experiments are provided.