A new high‑precision patent classifier shows AI patenting has surged and converged in the U.S. and China, with China recently overtaking the U.S. in annual counts; however, the U.S. remains concentrated in large private incumbents and hubs while China’s AI patenting is more diffuse and university/SOE‑led. AI patents are associated with a substantial market‑value premium in both countries, and cross‑border citations point to continued technological interdependence rather than decoupling.

AI Patents in the United States and China: Measurement, Organization, and Knowledge Flows

Hanming Fang, Xian Gu, Hanyin Yan, Wu Zhu · April 12, 2026

arxiv descriptive medium evidence 8/10 relevance Source PDF

A high‑precision AI patent classifier applied to U.S. and Chinese patent corpora documents rapid, converging growth in AI patenting—with China recently outpacing the U.S. in annual counts—but sharp differences in organization (U.S. concentrated in large private incumbents and hubs; China more geographically diffuse with larger university and SOE roles), a robust market‑value premium for AI patents among listed firms, and sustained cross‑border citation linkages favoring Chinese reliance on U.S. frontier knowledge.

We develop a high-precision classifier to measure artificial intelligence (AI) patents by fine-tuning PatentSBERTa on manually labeled data from the USPTO's AI Patent Dataset. Our classifier substantially improves the existing USPTO approach, achieving 97.0% precision, 91.3% recall, and a 94.0% F1 score, and it generalizes well to Chinese patents based on citation and lexical validation. Applying it to granted U.S. patents (1976-2023) and Chinese patents (2010-2023), we document rapid growth in AI patenting in both countries and broad convergence in AI patenting intensity and subfield composition, even as China surpasses the United States in recent annual patent counts. The organization of AI innovation nevertheless differs sharply: U.S. AI patenting is concentrated among large private incumbents and established hubs, whereas Chinese AI patenting is more geographically diffuse and institutionally diverse, with larger roles for universities and state-owned enterprises. For listed firms, AI patents command a robust market-value premium in both countries. Cross-border citations show continued technological interdependence rather than decoupling, with Chinese AI inventors relying more heavily on U.S. frontier knowledge than vice versa.

Summary

Main Finding

The paper builds a high-precision AI-patent classifier (the FGYZ classifier) by fine-tuning PatentSBERTa on the USPTO’s manually labeled seed/anti-seed data and applies it to U.S. and Chinese patent universes. The FGYZ classifier substantially improves on the USPTO’s LSTM-based approach (FGYZ: precision 97.0%, recall 91.3%, F1 94.0% vs. USPTO: precision 40.5%, recall 37.5%, F1 ≈39%), generalizes to Chinese patents, and reveals (a) rapid, convergent growth in AI patenting in the U.S. and China, (b) sharp differences in organization and geography of AI innovation (U.S. concentrated in large private incumbents and established hubs; China more geographically diffuse with large roles for universities and SOEs), (c) an AI-related market-value premium for listed firms in both countries, and (d) continued cross-border technological interdependence with asymmetric knowledge flows (Chinese inventors rely more on recent U.S. frontier knowledge than vice versa).

Key Points

Measurement advance
- FGYZ (PatentSBERTa fine-tuned) achieves: precision 97.0%, recall 91.3%, F1 94.0% on AIPD test sets; outperforms the USPTO LSTM classifier, which misses a majority of true AI patents and yields many false positives.
- Evolutionary Computation subfield excluded from main analysis due to limited labeled examples.
Scale and trends
- Identified AI patents: 876,668 U.S. patents (1976–2023) and 651,630 Chinese patents (2010–2023), matched to ~400,000 unique inventors.
- Rapid acceleration of AI patenting since the mid-2010s; China surpasses the U.S. in recent annual AI patent counts.
- Broad convergence in AI patenting intensity and subfield composition (planning, vision, hardware prominent); differences in timing/intensity in NLP and other subfields.
Organization and geography
- U.S.: concentrated AI patenting among a few large private incumbents (e.g., IBM, Microsoft, Google, Amazon) and stable early hubs.
- China: more institutionally diverse (prominent roles for universities and SOEs alongside large tech firms like Tencent, Baidu, Huawei) and faster spatial diffusion from pioneer cities to secondary locations.
Economic value
- AI patents produce a robust stock-market valuation premium for listed firms in both countries, especially in data/software-intensive subfields (machine learning, NLP); absolute values higher in the U.S. but relative AI premia present in both markets.
Knowledge flows and cross-border links
- Validation shows FGYZ-identified patents are more tightly embedded in AI knowledge networks (citation connectivity, lexical similarity).
- Cross-border citations indicate sustained technological coupling: Chinese AI inventors cite U.S. frontier technologies heavily; U.S. citations to Chinese patents are more selective and concentrated outside core AI domains.
Role of non-market institutions
- U.S. universities act largely as academic enclaves (limited direct citation uptake by industry).
- Chinese universities and SOEs show dense reciprocal citation linkages with industry, implying non-market actors in China contribute substantively to economically relevant AI technologies.

Data & Methods

Raw data
- USPTO: ~7.7 million granted patents (1976–2023); CNIPA: ~5.4 million granted patents (2010–2023).
- Text fields used: abstracts, claims, (full text where available). Forward and backward citations retrieved from Google Patents. Assignee-type classification for China via SAIC.
Classifier development (FGYZ)
- Base model: PatentSBERTa (a Sentence-BERT variant pre-trained on patent corpora).
- Training data: USPTO AIPD seed and anti-seed manual labels (eight AI subfields).
- Procedure: fine-tune PatentSBERTa with contrastive objective; per-subfield binary classifiers; 80/20 train-test split with five-fold CV in training, hyperparameter tuning and early stopping.
- Performance (aggregate): precision 97.0%, recall 91.3%, F1 94.0%; robust across seven of eight subfields (Evolutionary Computation underpowered).
Validation strategies
- Citation-based connectivity: compared connectivity of patents classified as AI by FGYZ, by USPTO, by both, or by neither; FGYZ-only patents show stronger links to the high-confidence AI benchmark.
- Lexical similarity: TF–IDF–weighted word-distribution comparisons showing FGYZ-only patents align more closely in vocabulary to the AI benchmark than USPTO-only patents.
- Cross-country validation: applied FGYZ to Chinese patents and compared citation/lexical alignment with U.S. AI patents—showed strong alignment, supporting out-of-sample generalization.
Downstream analyses
- Longitudinal and geographic analyses of AI patenting intensity and diffusion.
- Institutional ownership analyses (firms, universities, SOEs).
- Market-value analyses: event-study-style valuation of patent disclosures following Kogan et al. (2017) to estimate AI premium.

Implications for AI Economics

Measurement matters
- Improved classification (FGYZ) reduces severe attenuation and misclassification bias from prior datasets; results built on USPTO AIPD alone may substantially understate AI activity or mis-attribute technological specialization.
- Researchers should use higher-precision classifiers (or the FGYZ labels, if available) when studying firm-level innovation, market value effects, or country comparisons.
Reassessing comparative innovation performance
- The finding that China now surpasses the U.S. in annual AI patent counts (recent years) and the observed convergence in subfield composition call for re-evaluation of narrative claims about “leadership” based only on older or noisy patent measures.
- Yet institutional and geographic divergence matters: concentration in U.S. incumbents vs. broader institutional mix in China implies different innovation dynamics and industrial policy implications.
Firm valuation and investment
- The AI-related patent premium across countries and subfields signals market recognition of AI-related intangible assets—important for corporate valuation, investor screening, and financing of AI R&D.
- Strong premia in ML/NLP suggest higher expected returns or strategic value in data- and software-intensive AI technologies.
Policy and industrial strategy
- China’s prominent roles for universities and SOEs, and dense industry–nonmarket linkages, suggest state-sector actors can play an active role in producing economically relevant AI knowledge—this affects how policy-makers design R&D funding, technology transfer, and IP strategies.
- Persistent cross-border knowledge flows (and asymmetric dependence) imply that attempts at full technological decoupling would be costly and difficult; trade and research policies should account for continued international interdependence in knowledge.
Regional development and innovation policy
- Different spatial dynamics—U.S. persistence of early hubs vs. China’s rapid diffusion—imply different regional policies: U.S. policies might target sustaining hub strengths and mitigating winner-take-all effects, while Chinese-style diffusion suggests policies that can leverage broader geographic spillovers.
Future research and data use
- The FGYZ approach (LLM fine-tuning on patent text) is a promising template for building high-quality technology-specific patent indicators beyond AI.
- Researchers should combine citation, lexical, and market-value evidence when assessing technological importance, and be cautious using raw or low-precision machine labels for causal inference.

If you’d like, I can: - Produce a one-page executive summary for policymakers emphasizing policy levers implied by the findings. - Extract the paper’s most relevant tables/figures and summarize the quantitative results (growth rates, geographic concentration measures, valuation premia) in a compact form.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper presents strong measurement work (a high‑precision classifier validated on multiple datasets) and comprehensive descriptive evidence on patent counts, geography, and institutional composition; however, claims about economic impact (e.g., market-value premium) are correlational rather than causally identified and patent counts are an imperfect proxy for innovation or productivity. Methods Rigorhigh — Methods combine careful manual labeling, fine‑tuning a modern patent language model (PatentSBERTa), quantitative validation (precision/recall/F1), cross‑language validation using lexical and citation checks, and systematic application to large administrative patent corpora across long time spans and countries. SampleTraining: manually labeled subset from USPTO's AI Patent Dataset used to fine‑tune PatentSBERTa; Validation: lexical and citation checks including Chinese patents; Analysis sample: granted U.S. patents 1976–2023 and Chinese patents 2010–2023, plus a subset of listed firms for market‑value analyses and cross‑border citation networks. Themesinnovation org_design GeneralizabilityPatents are an imperfect proxy for AI innovation and omit non‑patented AI activity (software, trade secrets, open-source)., Cross‑country differences in patenting incentives, legal regimes, and industry composition may affect comparability., Chinese patent window is shorter (2010–2023) versus US (1976–2023), limiting long‑run comparisons., Market‑value premium analysis restricted to listed firms and may not generalize to private firms or smaller entities., Classifier may retain biases from training labels and language differences despite validations., Counts ignore patent quality; growth in filings does not necessarily imply proportional technological or productive gains.

Claims (9)

Claim	Direction	Confidence	Outcome	Details
We develop a high-precision classifier to measure artificial intelligence (AI) patents by fine-tuning PatentSBERTa on manually labeled data from the USPTO's AI Patent Dataset. Other	positive	high	ability to classify patents as AI-related (classifier development)	0.18
Our classifier substantially improves the existing USPTO approach, achieving 97.0% precision, 91.3% recall, and a 94.0% F1 score. Other	positive	high	classification performance (precision, recall, F1)	97.0% precision; 91.3% recall; 94.0% F1 score 0.3
The classifier generalizes well to Chinese patents based on citation and lexical validation. Other	positive	high	generalization / validity of classifier on Chinese patents	0.18
Applying the classifier to granted U.S. patents (1976-2023) and Chinese patents (2010-2023), we document rapid growth in AI patenting in both countries. Innovation Output	positive	high	number of granted AI patents over time (patent counts)	0.18
There is broad convergence in AI patenting intensity and subfield composition between the United States and China. Innovation Output	positive	high	AI patenting intensity and distribution across AI subfields	0.18
China surpasses the United States in recent annual AI patent counts. Innovation Output	positive	high	annual number of AI patents (patent counts)	0.18
The organization of AI innovation differs sharply: U.S. AI patenting is concentrated among large private incumbents and established hubs, whereas Chinese AI patenting is more geographically diffuse and institutionally diverse, with larger roles for universities and state-owned enterprises. Market Structure	mixed	high	assignee concentration, geographic diffusion, institutional composition (share of patents by firm type and location)	0.18
For listed firms, AI patents command a robust market-value premium in both countries. Firm Revenue	positive	high	market-value premium for listed firms associated with AI patents	0.18
Cross-border citations show continued technological interdependence rather than decoupling, with Chinese AI inventors relying more heavily on U.S. frontier knowledge than vice versa. Innovation Output	mixed	high	cross-border patent citation patterns (directional reliance on frontier knowledge)	0.18