← Papers

Firms that better harness data produce more AI patents, and data access especially helps low‑productivity Chinese digital firms close the innovation gap; the positive link holds across alternate productivity measures, though causality is not fully pinned down.

The level of data element utilization in the integration of the digital and real economies drives AI technological innovation

Chenjie Liu · April 15, 2026 · Economics & Business Management

openalex quasi_experimental medium evidence 7/10 relevance DOI Source PDF

Higher firm-level data element utilization is associated with greater AI patent output among Chinese digital-economy listed firms, with stronger effects for low-TFP (late‑entrant) firms indicating a catch-up dynamic.

Using a sample of Chinese A-share listed companies in core digital economy industries from 2015 to 2024, this study examines how data element utilization drives AI technological innovation. Employing a panel fixed‑effects regression model, we find that the level of data factor utilization has a significant positive impact on AI patent output. This effect is more pronounced in firms with low total factor productivity (TFP), exhibiting a "contrarian" catch‑up characteristic. The conclusions remain robust after substituting different TFP measurement methods. This study reveals the unique mechanism through which data elements enable late‑entrant firms to catch up technologically, providing empirical evidence for deepening data element market reforms.

Summary

Main Finding

Digital transformation (measured by firm annual‑report keyword intensity for digital/AI-related terms) significantly increases corporate green technological innovation among Chinese A‑share manufacturing firms (2019–2024). The effect is robust to multiple checks and operates mainly through three channels: easing financing constraints (resource effect), reducing agency problems (governance effect), and strengthening firms’ growth/absorptive capacity (multiplier effect). The positive effect is stronger for non‑high‑tech firms and firms in heavily polluting industries, and applies to both green invention patents and green utility model patents.

Key Points

Effect size (economic meaning): a one standard‑deviation rise in the digitalization measure is associated with a ~5.5% increase in total green patents (GPatent1) and ~12.8% increase in joint green patents (GPatent2) (paper reports these magnitudes).
Mechanisms identified:
- Resource effect: digitalization reduces information asymmetry and search costs, improving access to external finance and public support for green R&D.
- Governance effect: digital management and disclosure reduce managerial discretion and agency conflicts, improving implementation of green projects.
- Multiplier effect: digital technologies improve supply‑chain integration, knowledge sharing and market responsiveness, raising firms’ capacity to undertake sustained green innovation.
Heterogeneity: the positive digital → green innovation effect is more pronounced in non‑high‑tech enterprises and in heavily polluting sectors.
Robustness: results hold under propensity score matching (PSM), instrumental variable approaches (reported), alternate variable constructions, extended observation windows, multi‑fixed effects, and sample scope adjustments.
Policy recommendations (from paper): promote firm digital transformation—especially for non‑high‑tech and polluting firms—build digital infrastructure, improve information disclosure, and subsidize digitalized green R&D.

Data & Methods

Sample: A‑share listed manufacturing firms on Shanghai and Shenzhen exchanges, initial period 2019–2024; green patents measured one period ahead (effectively using patents in the subsequent year). Final sample: 5,810 firm‑year observations after screening and winsorization.
Data sources: digital keyword frequencies from WinGo Financial Text Data Platform; green patent counts from CNRDS; firm financials, governance, industry classification and high‑tech status from CSMAR.
Dependent variables:
- GPatent1 = ln(1 + number of green patents independently obtained next period) (invention + utility models).
- GPatent2 = ln(1 + number of green patents jointly obtained next period).
Core independent variable:
- Digword = ln(1 + total frequency of 94 digital‑related keywords in firm annual reports). Keywords built from policy seed words expanded via Word2Vec.
Empirical strategy:
- Baseline: OLS regressions with year fixed effects and a battery of firm controls (size, leverage, age, ROA, SOE indicator, top shareholder concentration, institutional ownership, cash ratio, Tobin’s Q, R&D intensity, etc.).
- Robustness/causal checks: propensity score matching (1:1 nearest neighbor), instrumental variable estimation (reported), alternative variable definitions, extended observation window, multiple fixed effects, and adjusted sample scopes.
Main baseline estimates: Digword coefficients positive and statistically significant (e.g., Digword ≈ 0.020 with t ≈ 3.49 in some specifications), with reported R‑squareds modest (typical for firm patent regressions).

Implications for AI Economics

Microeconomic channel: AI and related digital technologies embedded in firm operations (captured by textual indicators) act as tangible inputs that lower frictions (information asymmetry, coordination costs) and raise firms’ capacity to adopt and generate green innovations. This provides micro‑level empirical support for theories that AI/digital adoption can induce green transition via both efficiency and organizational channels.
Financial markets & investment: improved disclosure and data flows associated with digitalization can mobilize capital toward green projects. For AI economics, this highlights a measurable feedback loop—AI/digital adoption improves financing conditions, which in turn funds more R&D and innovation.
Policy design: targeted digitalization subsidies or infrastructure investments for less digitalized, polluting, or non‑high‑tech firms may yield outsized green innovation returns. Policies that couple AI/digital vouchers with green R&D incentives could be particularly effective.
Measurement & methods: the paper illustrates a replicable approach for quantifying firm digitalization using annual‑report text analysis (seed keywords + Word2Vec expansion). Researchers in AI economics can reuse/extend this textual approach to isolate AI‑specific adoption signals and link them to economic or environmental outcomes.
Research avenues:
- Disentangle AI‑specific channels from broader “digitalization” (e.g., isolate mentions of machine learning/AI versus cloud/big data/IoT).
- Study longer‑term causal dynamics (e.g., event studies on major AI investments) and firm‑level heterogeneity (size, market power, international exposure).
- Explore complementary market effects (labor reallocation, product market competition) and welfare implications of digital‑enabled green innovation.
- Link granular measures of AI adoption (software/hardware spending, AI patenting, deployment cases) with patent quality (citations), not only counts.

If you want, I can (a) extract the 94 keyword list or an example subset used to build Digword, (b) convert key regression tables into a compact CSV, or (c) draft suggested AI‑policy experiments to test causal channels further.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — Within-firm fixed effects and robustness to alternate TFP measures strengthen the association between data utilization and AI patent output, but the analysis lacks a clear exogenous source of variation (e.g., instrument, policy shock, or difference-in-differences) to rule out reverse causality or time-varying confounders, so causal claims are plausible but not strongly established. Methods Rigormedium — Use of panel FE is appropriate and robustness checks on TFP measurement are useful; however, the paper does not appear to address potential endogeneity from reverse causality (more AI patents could drive greater data utilization), omitted time-varying factors, measurement error in the data-utilization metric, or dynamic persistence in patenting with methods like IV, event study, or system GMM. SampleChinese A‑share listed firms in core digital-economy industries observed 2015–2024; firm-level measures of data factor utilization (proprietary/constructed index), AI patent counts as the innovation outcome, and firm total factor productivity (TFP) estimated by different methods for heterogeneity analysis; sample restricted to listed/digital-sector firms (sample size not stated). Themesinnovation adoption IdentificationPanel fixed-effects regression using firm-level panel data (2015–2024) to exploit within-firm variation in measured data element utilization; robustness checks include alternative TFP measurement methods. No exogenous shock, instrument, or natural experiment reported to address time-varying endogeneity. GeneralizabilityResults pertain to listed Chinese firms in core digital-economy industries and may not generalize to private, small, or non-Chinese firms., Post‑2015–2024 period captures rapid AI diffusion in China; effects may differ in earlier/later periods or other regulatory environments., AI patent counts are an imperfect proxy for AI technological progress and commercial impact., Measurement of 'data factor utilization' may be context- and method-specific, limiting replication across datasets.

Claims (5)

Claim	Direction	Confidence	Outcome	Details
The level of data factor utilization has a significant positive impact on AI patent output. Innovation Output	positive	high	AI patent output	0.48
The positive effect of data factor utilization on AI patent output is more pronounced in firms with low total factor productivity (TFP), exhibiting a 'contrarian' catch-up characteristic. Innovation Output	positive	high	AI patent output (differential effect by firm TFP level)	0.48
The conclusions remain robust after substituting different methods for measuring total factor productivity (TFP). Innovation Output	positive	high	AI patent output (robustness to TFP measurement method)	0.48
Data elements provide a unique mechanism that enables late‑entrant firms to catch up technologically. Innovation Output	positive	medium	technological catch‑up (proxied by AI patent output increases among late entrants)	0.05
The study analyzes Chinese A-share listed companies in core digital economy industries from 2015 to 2024 using a panel fixed‑effects regression model. Other	null_result	high	not applicable (methodological/sample description)	0.48