Technological heft, not network centrality, exposes hidden core innovators in China’s AI sector; simulations show removing high-capability firms damages knowledge networks more than targeting structural hubs.

Technological capability and innovation network resilience: evidence from the AI industry in China

Xin Yuan, Xinmin Tian, Wei Jiang · April 27, 2026 · Humanities and Social Sciences Communications

openalex descriptive medium evidence 7/10 relevance DOI Source PDF

Using LDA on 282,778 Chinese AI patents to build a composite technological-capability metric, the paper finds that innovators with high technological value are sometimes peripheral in structural networks and that removing such high-capability actors degrades knowledge-network resilience more than removing topological hubs.

In an era of escalating technological complexity, identifying core innovators is critical for mapping industrial trajectories and sustaining network resilience. Existing assessments predominantly rely on patent statistics and structural network centralities. However, these metrics inherently dilute substantive technological strengths and influence, thereby obscuring hidden core innovators in knowledge-intensive domains such as the Artificial Intelligence (AI) industry. To bridge this theoretical and methodological gap, this study develops a multidimensional, knowledge-driven evaluation framework that integrates text mining with complex network analysis. Leveraging 282,778 Chinese AI patents, this study deploys Latent Dirichlet Allocation to delineate fine-grained technological domains. Our work constructs a composite technological capability metric to identify core innovators and simulate targeted disruptions across collaboration and knowledge networks. The empirical results suggest that some innovators with substantial technological value are not necessarily located at the structural center of the network, indicating that network position alone may not fully capture the technological importance of innovators. Specifically, deliberate disruption simulations show that targeted attacks based on intrinsic technological capability led to a more pronounced decline in the knowledge network than attacks based on topological baselines. These findings suggest that substantive technological competencies play an important role in shaping network resilience and complement structure-based perspectives in understanding innovation networks.

Summary

Main Finding

The paper develops a knowledge-driven, multidimensional framework (combining topic modeling, pretrained language models, and network analysis) to identify core innovators in China’s AI industry using 282,778 patents. It shows that substantive technological capability — measured across scale, quality and semantic novelty within fine-grained topic domains — does not always coincide with topological centrality in collaboration networks. Deliberate-removal simulations demonstrate that targeting innovators by intrinsic technological capability produces larger disruptions in the knowledge network than targeting by conventional topological centrality measures. Thus technological competence is a distinct and important determinant of innovation-network resilience.

Key Points

Motivation: Conventional patent-count and structural centrality measures can miss “hidden” core innovators who are peripheral socially but hold high-value, non-substitutable technical knowledge.
Three research questions:
How to delineate fine-grained technological domains from patent text?
How to integrate multidimensional patent features (including semantics) into a capability metric?
Do peripheral-but-technically-strong innovators matter for network resilience?
Methods summary:
- LDA topic modeling to extract fine-grained technological knowledge domains from patent texts.
- A three-dimensional Composite Technological Capability (CTC) metric integrating scale, quality, and novelty; semantic features drawn from pretrained language models augment structured patent indicators.
- Construction of dual innovation networks: collaboration networks (co-inventorship) and knowledge networks (coupling of technological knowledge elements).
- Deliberate-attack simulations comparing removals based on CTC versus topological baselines.
Empirical finding: Some innovators with high CTC are not structurally central; removing those by CTC causes a sharper decline in the knowledge network’s functioning than removing topologically central nodes.
Contribution: Demonstrates the value of combining semantic patent analysis with network methods to reveal capability-driven vulnerabilities and to identify core innovators that structure technological trajectories.

Data & Methods

Data: 282,778 Chinese AI patent documents (paper focuses on AI industry in China as the empirical setting).
Text processing: Standard preprocessing (cleaning, tokenization, stopword removal, stemming) applied to patent texts.
Topic delineation: Latent Dirichlet Allocation (LDA) used to identify latent technological knowledge domains at a fine-grained level, producing topic assignments that reflect cognitive/semantic structure rather than coarse IPC categories.
Capability metric (Composite Technological Capability, CTC):
- Multi-dimensional: scale (e.g., patent output volume within domains), quality (structured indicators such as citations/claims/other quality proxies), and novelty (semantic/latent novelty measured using pretrained language models and topic-contextual metrics).
- Integration: combines structured patent statistics and unstructured semantic features to assess an innovator’s substantive technological value within each domain.
Network construction:
- Collaboration network: based on co-inventorship links among organizations/individuals.
- Knowledge network: built from couplings between knowledge elements/topics inferred from patent texts (captures knowledge recombination pathways).
Attack simulations:
- Targeted removals (node deletions) executed under different ranking rules (CTC-based vs topology-based centrality like degree/betweenness).
- Outcome metrics: degradation of the knowledge network (connectivity, information flow, or analogous resilience measures — paper reports more pronounced decline under CTC-targeted attacks).
Analytical novelty: integrates topic-level semantic granularity with network resilience analysis to detect capability-driven structural vulnerabilities.

Implications for AI Economics

Rethinking leader identification: Policymakers, firms, and investors should supplement traditional patent-count and centrality metrics with domain-specific, semantic-aware capability measures to identify genuinely influential innovators (including peripheral but high-CTC actors).
Industrial policy & resilience: Supporting and protecting high-CTC innovators — even if they are not central in collaboration networks — can be critical for maintaining the knowledge-base and long-term resilience of national AI ecosystems.
Competition and regulation: Antitrust and strategic-tech policies should account for technological substitutability and semantic novelty (not just market share or network position). Removing or constraining capability-rich nodes can disproportionately damage technological trajectories.
R&D strategy & partnerships: Firms seeking partners or M&A targets should evaluate partners’ domain-specific CTC to better assess complementarities and non-substitutable knowledge assets.
Funding & innovation financing: Venture capital and public R&D grants might misallocate if they rely on topology or volume alone; CTC-informed screening can improve targeting of high-impact, resilience-relevant innovators.
Risk assessment & supply-chain security: For techno-strategic planning (e.g., export controls, critical tech supply chains), capability-based mapping highlights single points of failure in knowledge space that topology-only analyses miss.
Limitations to consider: Results are patent-based (subject to patenting strategy bias and time lags) and focused on China’s AI patents; further validation across countries, industries, and using additional outcome measures would strengthen generalizability.

If you want, I can (a) extract and summarize the specific CTC formula components and their weights if present in the full manuscript, (b) produce a short figure-ready summary for presentations, or (c) map the policy recommendations to specific stakeholders (government, firms, investors).

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper analyzes a very large administrative dataset (282,778 Chinese AI patents) and combines text mining with network simulations to produce descriptive and counterfactual evidence; however, it does not employ a research design that identifies causal effects in real-world interventions, relies on patents as an imperfect proxy for 'technological capability', and its simulated attack outcomes depend on model choices and assumptions that may not be empirically validated. Methods Rigormedium — The workflow (LDA topic modeling to define technological domains, construction of a composite capability metric, and targeted-disruption simulations on collaboration/knowledge networks) is methodologically appropriate and leverages large-scale data, but the approach is sensitive to specification choices (topic number and labeling, weighting in the composite metric, network construction rules), and the description lacks evidence here of extensive robustness checks, validation of the capability metric against external benchmarks, or sensitivity analyses of simulation parameters. Sample282,778 patents filed in China identified as AI-related; textual data used for topic modeling (LDA) to delineate technological domains, patent assignees mapped to innovators, and patent citations/collaboration data used to build knowledge and collaboration networks; timeframe and exact years covered are not specified in the summary. Themesinnovation org_design GeneralizabilityPatents are an imperfect and selective proxy for substantive technological capability (omit tacit knowledge, trade secrets, open-source contributions)., Analysis is limited to Chinese AI patents and may not generalize to other countries or to multinational innovation networks., Definitions and coverage of 'AI' within patent classification may bias which technologies and firms are included., Results depend on modeling choices (LDA topic count, composite-metric weights, network edge definitions) that may not hold in other datasets or sectors., Simulated attack dynamics are model-dependent and may not reflect real-world organizational responses, entry/exit, or policy interventions.

Claims (7)

Claim	Direction	Confidence	Outcome	Details
This study develops a multidimensional, knowledge-driven evaluation framework that integrates text mining with complex network analysis to identify core innovators. Innovation Output	positive	high	identification of core innovators	n=282778 0.18
Latent Dirichlet Allocation (LDA) on the patent texts delineates fine-grained technological domains within the Chinese AI patent corpus. Innovation Output	positive	high	granular technological domain delineation	n=282778 0.18
A composite technological capability metric can be constructed (from textual and network information) to identify core innovators beyond simple topological measures. Innovation Output	positive	high	ability to identify core innovators	n=282778 0.18
Some innovators with substantial technological value are not located at the structural center of the collaboration/knowledge network, indicating network position alone may not fully capture technological importance. Innovation Output	negative	high	correspondence between technological value and network centrality	n=282778 0.18
Targeted disruption simulations based on intrinsic technological capability cause a more pronounced decline in the knowledge network than targeted attacks based on topological (structural) baselines. Innovation Output	negative	high	decline in knowledge network (network resilience/connectivity under targeted node removal)	n=282778 0.18
Substantive technological competencies play an important role in shaping network resilience and complement structure-based perspectives in understanding innovation networks. Innovation Output	positive	high	role of technological competency in network resilience	n=282778 0.18
Existing assessments that rely predominantly on patent statistics and structural network centralities dilute substantive technological strengths and thus can obscure hidden core innovators in knowledge-intensive domains such as AI. Innovation Output	negative	medium	adequacy of patent-count and centrality-based assessments to capture technological importance	n=282778 0.11