Technological heft, not network centrality, exposes hidden core innovators in China’s AI sector; simulations show removing high-capability firms damages knowledge networks more than targeting structural hubs.
In an era of escalating technological complexity, identifying core innovators is critical for mapping industrial trajectories and sustaining network resilience. Existing assessments predominantly rely on patent statistics and structural network centralities. However, these metrics inherently dilute substantive technological strengths and influence, thereby obscuring hidden core innovators in knowledge-intensive domains such as the Artificial Intelligence (AI) industry. To bridge this theoretical and methodological gap, this study develops a multidimensional, knowledge-driven evaluation framework that integrates text mining with complex network analysis. Leveraging 282,778 Chinese AI patents, this study deploys Latent Dirichlet Allocation to delineate fine-grained technological domains. Our work constructs a composite technological capability metric to identify core innovators and simulate targeted disruptions across collaboration and knowledge networks. The empirical results suggest that some innovators with substantial technological value are not necessarily located at the structural center of the network, indicating that network position alone may not fully capture the technological importance of innovators. Specifically, deliberate disruption simulations show that targeted attacks based on intrinsic technological capability led to a more pronounced decline in the knowledge network than attacks based on topological baselines. These findings suggest that substantive technological competencies play an important role in shaping network resilience and complement structure-based perspectives in understanding innovation networks.
Summary
Main Finding
The paper develops a knowledge-driven, multidimensional framework (combining topic modeling, pretrained language models, and network analysis) to identify core innovators in China’s AI industry using 282,778 patents. It shows that substantive technological capability — measured across scale, quality and semantic novelty within fine-grained topic domains — does not always coincide with topological centrality in collaboration networks. Deliberate-removal simulations demonstrate that targeting innovators by intrinsic technological capability produces larger disruptions in the knowledge network than targeting by conventional topological centrality measures. Thus technological competence is a distinct and important determinant of innovation-network resilience.
Key Points
- Motivation: Conventional patent-count and structural centrality measures can miss “hidden” core innovators who are peripheral socially but hold high-value, non-substitutable technical knowledge.
- Three research questions:
- How to delineate fine-grained technological domains from patent text?
- How to integrate multidimensional patent features (including semantics) into a capability metric?
- Do peripheral-but-technically-strong innovators matter for network resilience?
- Methods summary:
- LDA topic modeling to extract fine-grained technological knowledge domains from patent texts.
- A three-dimensional Composite Technological Capability (CTC) metric integrating scale, quality, and novelty; semantic features drawn from pretrained language models augment structured patent indicators.
- Construction of dual innovation networks: collaboration networks (co-inventorship) and knowledge networks (coupling of technological knowledge elements).
- Deliberate-attack simulations comparing removals based on CTC versus topological baselines.
- Empirical finding: Some innovators with high CTC are not structurally central; removing those by CTC causes a sharper decline in the knowledge network’s functioning than removing topologically central nodes.
- Contribution: Demonstrates the value of combining semantic patent analysis with network methods to reveal capability-driven vulnerabilities and to identify core innovators that structure technological trajectories.
Data & Methods
- Data: 282,778 Chinese AI patent documents (paper focuses on AI industry in China as the empirical setting).
- Text processing: Standard preprocessing (cleaning, tokenization, stopword removal, stemming) applied to patent texts.
- Topic delineation: Latent Dirichlet Allocation (LDA) used to identify latent technological knowledge domains at a fine-grained level, producing topic assignments that reflect cognitive/semantic structure rather than coarse IPC categories.
- Capability metric (Composite Technological Capability, CTC):
- Multi-dimensional: scale (e.g., patent output volume within domains), quality (structured indicators such as citations/claims/other quality proxies), and novelty (semantic/latent novelty measured using pretrained language models and topic-contextual metrics).
- Integration: combines structured patent statistics and unstructured semantic features to assess an innovator’s substantive technological value within each domain.
- Network construction:
- Collaboration network: based on co-inventorship links among organizations/individuals.
- Knowledge network: built from couplings between knowledge elements/topics inferred from patent texts (captures knowledge recombination pathways).
- Attack simulations:
- Targeted removals (node deletions) executed under different ranking rules (CTC-based vs topology-based centrality like degree/betweenness).
- Outcome metrics: degradation of the knowledge network (connectivity, information flow, or analogous resilience measures — paper reports more pronounced decline under CTC-targeted attacks).
- Analytical novelty: integrates topic-level semantic granularity with network resilience analysis to detect capability-driven structural vulnerabilities.
Implications for AI Economics
- Rethinking leader identification: Policymakers, firms, and investors should supplement traditional patent-count and centrality metrics with domain-specific, semantic-aware capability measures to identify genuinely influential innovators (including peripheral but high-CTC actors).
- Industrial policy & resilience: Supporting and protecting high-CTC innovators — even if they are not central in collaboration networks — can be critical for maintaining the knowledge-base and long-term resilience of national AI ecosystems.
- Competition and regulation: Antitrust and strategic-tech policies should account for technological substitutability and semantic novelty (not just market share or network position). Removing or constraining capability-rich nodes can disproportionately damage technological trajectories.
- R&D strategy & partnerships: Firms seeking partners or M&A targets should evaluate partners’ domain-specific CTC to better assess complementarities and non-substitutable knowledge assets.
- Funding & innovation financing: Venture capital and public R&D grants might misallocate if they rely on topology or volume alone; CTC-informed screening can improve targeting of high-impact, resilience-relevant innovators.
- Risk assessment & supply-chain security: For techno-strategic planning (e.g., export controls, critical tech supply chains), capability-based mapping highlights single points of failure in knowledge space that topology-only analyses miss.
- Limitations to consider: Results are patent-based (subject to patenting strategy bias and time lags) and focused on China’s AI patents; further validation across countries, industries, and using additional outcome measures would strengthen generalizability.
If you want, I can (a) extract and summarize the specific CTC formula components and their weights if present in the full manuscript, (b) produce a short figure-ready summary for presentations, or (c) map the policy recommendations to specific stakeholders (government, firms, investors).
Assessment
Claims (7)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| This study develops a multidimensional, knowledge-driven evaluation framework that integrates text mining with complex network analysis to identify core innovators. Innovation Output | positive | high | identification of core innovators |
n=282778
0.18
|
| Latent Dirichlet Allocation (LDA) on the patent texts delineates fine-grained technological domains within the Chinese AI patent corpus. Innovation Output | positive | high | granular technological domain delineation |
n=282778
0.18
|
| A composite technological capability metric can be constructed (from textual and network information) to identify core innovators beyond simple topological measures. Innovation Output | positive | high | ability to identify core innovators |
n=282778
0.18
|
| Some innovators with substantial technological value are not located at the structural center of the collaboration/knowledge network, indicating network position alone may not fully capture technological importance. Innovation Output | negative | high | correspondence between technological value and network centrality |
n=282778
0.18
|
| Targeted disruption simulations based on intrinsic technological capability cause a more pronounced decline in the knowledge network than targeted attacks based on topological (structural) baselines. Innovation Output | negative | high | decline in knowledge network (network resilience/connectivity under targeted node removal) |
n=282778
0.18
|
| Substantive technological competencies play an important role in shaping network resilience and complement structure-based perspectives in understanding innovation networks. Innovation Output | positive | high | role of technological competency in network resilience |
n=282778
0.18
|
| Existing assessments that rely predominantly on patent statistics and structural network centralities dilute substantive technological strengths and thus can obscure hidden core innovators in knowledge-intensive domains such as AI. Innovation Output | negative | medium | adequacy of patent-count and centrality-based assessments to capture technological importance |
n=282778
0.11
|