The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

A new concordance maps patents to market-facing trademarks, giving researchers a practical way to trace how patented technologies diffuse into products and markets; the validated mapping can be used to study technology adoption at firm, regional and national levels.

A concordance between patent and trademark classes to link technologies to markets
Milad Abbasiharofteh, Carolina Castaldi, Sergio Petralia · May 23, 2026 · Scientific Data
openalex descriptive n/a evidence 7/10 relevance DOI Source PDF
The authors create and validate a novel concordance linking patent technology classes to trademark market classes, enabling researchers to trace how patented technologies map into commercial markets across firms and regions.

Patent data is the preferred source of information for tracking technological change. However, the economic impact of patented technologies remains unclear unless patent data is linked to other data, which can reveal the mechanisms through which new technology diffuses. One such data source comes from the trademark filings that accompany the market introduction of new goods and services. Patent and trademark data can be combined to link given technologies to specific markets. Yet, differences in their classification systems represent a challenge. To enable linking patent and trademark data, we develop, validate and share a novel concordance between technology classes in patent records and market classes in trademark records. The concordance can be used to track the diffusion of patented technologies at the technology, firm, region, or country level, with many relevant applications in research and policy analyses of innovation.

Summary

Main Finding

The authors produce and validate a content-based, probabilistic concordance (PAT2TM) that links patent technology classes (4‑digit CPC; ~662 classes) to fine‑grained trademark market subclasses (616 Nice subclasses derived from EUIPO HDB terms). The concordance enables direct many‑to‑many mapping from patented technologies to the markets where they are commercialized, allowing researchers and policymakers to trace technology diffusion from invention to market deployment.

Key Points

  • Two public data sources: PATSTAT Global (DOCDB patent families, 1996–2020; ~21.6M documents) and EUIPO trademark filings (1996–2020; ~1.65M filings).
  • The standard 45 Nice classes were expanded into 616 Nice subclasses by clustering EUIPO HDB terms (85k terms → 62,944 unique HDB codes) using Doc2Vec (semantic vectors) within each Nice class and K‑means (elbow method).
  • Patent texts (title + abstract) and HDB terms were preprocessed (lowercasing, stopword removal, stemming, removal of sentences with negatives, etc.).
  • Matching strategy: extract bigrams from HDB subclass terms and search patent titles/abstracts to create weighted co‑occurrence links (result: ~400k CPC–NiceSubclass relations before probabilistic weighting).
  • Convert co‑occurrence counts into probabilistic links using a null expectation model, producing many‑to‑many CPC → NiceSub probabilities (PAT2TM).
  • Validation: content‑based links are compared against firm‑level patent–trademark co‑occurrences and prior indirect concordances (patent ↔ industry ↔ trademark) to assess plausibility (authors report validation steps and consistency with firm portfolios).
  • Outputs shared: the PAT2TM concordance and the Nice subclass taxonomy (HBS2Nicesub).

Data & Methods

  • Data
    • Patents: EPO PATSTAT Global (DOCDB families), patents filed 1996–2020; CPC classification at 4‑digit level (662 classes).
    • Trademarks: EUIPO filings 1996–2020; Nice classification plus harmonized database (HDB) of descriptors (85,233 terms; 62,944 HDB codes).
  • Preprocessing
    • Text normalization: lowercase, remove stopwords/punctuation/numbers, remove sentences with explicit negation, stemming.
  • Creating Nice subclasses
    • For each Nice class, embed HDB terms using Doc2Vec to capture semantic similarity in context.
    • Cluster embeddings with K‑means; choose cluster count by elbow method; produce 616 Nice subclasses, each labeled with Nice class, subclass index, and share of terms.
  • Linking patents to Nice subclasses
    • Extract bigrams from HDB terms for each subclass.
    • Search patent titles and abstracts for these bigrams (title+abstract chosen for standard practice and to reduce noise).
    • Build a weighted bipartite graph of CPC 4‑digit classes ↔ Nice subclasses based on matched counts (roughly 400k initial relations).
    • Compute probabilistic link weights by comparing observed co‑occurrence to a null expectation (accounts for baseline frequencies), yielding conditional probabilities/intensities rather than binary matches.
  • Validation
    • Compare content‑based CPC ↔ NiceSubclass links with firm‑level patent–trademark portfolios (companies that file both) and with existing indirect patent↔industry↔trademark concordances.
    • Reported diagnostics to confirm meaningful overlap and to surface where content links differ from firm‑based links (useful for ecosystem analyses).

Implications for AI Economics

  • Mapping AI inventions to markets
    • Researchers can identify which market subclasses are most likely downstream destinations of AI‑related patent classes (e.g., G06F, G06N, subfields in ML/AI), enabling quantitative tracking of AI commercialization.
  • Measuring diffusion and commercialization rates
    • Combine patent counts in AI CPC classes with PAT2TM probabilities to estimate expected trademark activity in particular markets — useful for measuring how patenting translates into market introductions over time and across geographies.
  • Firm strategy and commercialization gaps
    • At the firm level, compare a firm’s patent portfolio (AI technologies) to its trademark footprint (markets). Probabilistic links highlight markets where a firm’s technologies are under‑exploited (commercialization opportunities) or where other firms are active market converters.
  • Regional and industrial policy targeting
    • Use concordance to detect regional complementarities (tech specialization vs market specialization). For AI policy, this can identify local markets likely to absorb AI inventions (e.g., healthcare services, automotive systems, fintech), and inform place‑based innovation policy or ecosystem building.
  • Labor, skills, and demand forecasting
    • Link AI technology classes to downstream markets to forecast sectoral demand for skills, services, and complementary inputs as AI diffuses into consumer and business markets.
  • Competition, market structure, and welfare
    • Trace how patented AI capabilities map to consumer services and products (trademark markets) to study market concentration, barriers to entry, and the role of IP complementarities across ecosystems (specialized technology firms vs commercializer firms).
  • Advantages over indirect concordances
    • Direct, content‑based, many‑to‑many probabilistic links avoid relying on coarse industry codes; better suited for rapidly evolving domains like AI where industry boundaries blur.
  • Caveats and limitations for AI economics users
    • Geographic/office coverage: PAT2TM is based on PATSTAT global patents and EUIPO trademark filings — trademark side is EU‑centric; results may underrepresent markets outside EU trademark filings.
    • Time window: data through 2020 (paper accessed in 2021); recent post‑2020 AI commercialization trends (e.g., LLM deployment, new platforms) may not be captured.
    • Text matching limits: matching relies on bigrams within titles/abstracts and HDB terms; some AI inventions with opaque abstracts or different phrasing could be missed, and false positives may occur.
    • Granularity choices: 4‑digit CPC and clustering parameters (Doc2Vec/K‑means) influence results — users should inspect subclass labels and probabilistic weights for their focal AI topics.
    • Validation: concordance is probabilistic and should be used as a signal (not deterministic proof) of tech→market linkages; where possible, triangulate with firm‑level portfolios, product datasets, or web/app data.
  • Practical suggestions
    • For empirical analyses, weight trademark counts by PAT2TM probabilities to estimate expected market exposure of specific AI technology classes.
    • Use the Nice subclass taxonomy to refine market categories when constructing dependent variables (e.g., market entry events, product launches).
    • Combine PAT2TM with firm identifiers (patent assignees, trademark owners) to study firm roles across technology development vs market commercialization in AI ecosystems.

If helpful, I can: - Extract likely AI‑related CPC classes and list their top linked Nice subclasses from PAT2TM (example mapping). - Provide a short code recipe (pseudocode) for applying PAT2TM to compute expected trademark markets for a set of CPC classes or for a firm portfolio.

Assessment

Paper Typedescriptive Evidence Strengthn/a — This is a methodological/data-construction paper that does not attempt causal inference about economic outcomes; it produces a concordance and validates it rather than testing causal hypotheses. Methods Rigorhigh — The paper develops a systematic concordance between patent and trademark classification schemes and reports validation steps; creating such linkages typically requires careful mapping, lexical and statistical matching, and robustness checks across classes and jurisdictions, which the authors state they perform and share. SampleAdministrative patent records (technology classification codes such as IPC/CPC) and trademark filings (market-classification codes such as the Nice classes) drawn from national and/or international registries; the concordance maps technology classes to trademark market classes to enable linkage at the technology, firm, region, or country level (time period and exact jurisdictions not specified in the abstract). Themesinnovation adoption GeneralizabilityConcordance performance may vary across jurisdictions due to differences in patent/trademark classification practices and filing behavior, Only links patented and trademarked activity — excludes non-patented innovation, trade secrets, and markets without trademark filings, Changes in classification systems over time may reduce mapping accuracy for historical or future data unless updated, Sectors with weak patenting or trademarking propensities (e.g., some services, informal markets) will be underrepresented, Mapping errors or ambiguous class matches can bias downstream analyses if not accounted for

Claims (8)

ClaimDirectionConfidenceOutcomeDetails
Patent data is the preferred source of information for tracking technological change. Adoption Rate positive high usefulness of patent data for tracking technological change
0.09
The economic impact of patented technologies remains unclear unless patent data is linked to other data, which can reveal the mechanisms through which new technology diffuses. Firm Productivity negative high clarity of economic impact of patented technologies
0.09
Trademark filings that accompany the market introduction of new goods and services are a data source that can reveal the market introduction of technologies. Adoption Rate positive high ability to detect market introduction of goods/services via trademark filings
0.18
Patent and trademark data can be combined to link given technologies to specific markets. Adoption Rate positive high linkage between technologies (patents) and markets (trademarks)
0.18
Differences in patent and trademark classification systems represent a challenge to linking patent and trademark data. Adoption Rate negative high difficulty of linking patent and trademark records due to classification differences
0.18
We develop, validate and share a novel concordance between technology classes in patent records and market classes in trademark records. Adoption Rate positive high existence and release of a concordance mapping patent technology classes to trademark market classes
0.18
The concordance can be used to track the diffusion of patented technologies at the technology, firm, region, or country level. Adoption Rate positive high ability to track diffusion of patented technologies across multiple aggregation levels
0.18
The concordance has many relevant applications in research and policy analyses of innovation. Governance And Regulation positive high potential applicability of the concordance for research and policy
0.09

Notes