A shared AI professional vocabulary crystallized quickly in early 2024, but practitioners never coalesced into a distinct occupation; instead, AI skills diffused into existing jobs rather than creating a new 'AI Engineer' class.
Occupations form and evolve faster than classification systems can track. We propose that a genuine occupation is a self-reinforcing structure (a bipartite co-attractor) in which a shared professional vocabulary makes practitioners cohesive as a group, and the cohesive group sustains the vocabulary. This co-attractor concept enables a zero-assumption method for detecting occupational emergence from resume data, requiring no predefined taxonomy or job titles: we test vocabulary cohesion and population cohesion independently, with ablation to test whether the vocabulary is the mechanism binding the population. Applied to 8.2 million US resumes (2022-2026), the method correctly identifies established occupations and reveals a striking asymmetry for AI: a cohesive professional vocabulary formed rapidly in early 2024, but the practitioner population never cohered. The pre-existing AI community dissolved as the tools went mainstream, and the new vocabulary was absorbed into existing careers rather than binding a new occupation. AI appears to be a diffusing technology, not an emerging occupation. We discuss whether introducing an "AI Engineer" occupational category could catalyze population cohesion around the already-formed vocabulary, completing the co-attractor.
Summary
Main Finding
The paper introduces a zero-assumption NLP method that defines an occupation as a bipartite "co-attractor" of people and vocabulary, and operationalizes tests for its emergence in resume data. Applied to 8.2 million US resumes (Aug 2022–Jan 2026), the method finds that AI exhibits a striking asymmetry: a tightly cohesive AI tooling vocabulary crystallized rapidly in early 2024, but the practitioner population never formed a mutually sustaining occupational cohort. In short, AI behaves like a diffusing technology absorbed across existing careers, not as a newly emergent occupation — though creating an “AI Engineer” occupational category could, in principle, catalyze population cohesion around the already-formed vocabulary.
Key Points
- Co-attractor concept: an occupation is a mutually sustaining pair — a cohesive vocabulary and a cohesive population — each necessary for the other.
- Push vs. Pull: compression/algorithmic artifacts (“push”, e.g., frequent tokens like “ai”) can produce apparent topics; genuine semantic co-occurrence among terms and genuine population similarity (“pull”) must be separated.
- Dual, symmetric tests:
- Vocabulary cohesion: permutation-tested co-occurrence on XTX (term-term) to detect pull beyond frequency expectations.
- Population cohesion: permutation-tested co-occurrence on XXT (document-document) to detect groups of practitioners who are similar beyond focal terms.
- Ablation: removing the candidate vocabulary from resumes and retesting population cohesion determines whether vocabulary is necessary for the observed population cluster (tests mutual dependence).
- Trifactor NMF (X ≈ F S Gᵀ): used to independently identify document groups F (populations), term groups G (vocabularies), and the coupling S; large diagonal S[k,k] indicates strong within-topic population–vocabulary coupling.
- Empirical AI result: three independent timelines show (1) rapid vocabulary lock-in (early 2024), (2) dissolution/absence of a cohesive AI practitioner population, and (3) population signals driven mainly by generic signaling terms rather than tooling vocabulary.
- Validation: method recovers established occupations in held-out tests (paper reports validating on two known occupations).
- Interpretation: AI is observed as a diffusing general-purpose technology rather than a self-contained emerging occupation; prior AI practitioner communities fragmented as AI tools became mainstream.
Data & Methods
- Data
- Source: BOLD platform ecosystem resumes (users of resume-builder tools).
- Sample: 8.2 million US resumes, August 2022 – January 2026.
- Temporal resolution: monthly windows (42 months).
- Preprocessing / representations
- Document-term matrix X (rows = resumes, columns = terms/skills/vocabulary tokens).
- Cosine similarity used for pairwise co-occurrence measures.
- Push vs. pull filtration (compressionless co-occurrence test)
- Null construction: independently permute each column of X 200 times to preserve term frequencies but destroy co-occurrence structure.
- For each term pair (or document pair), compare observed similarity to the empirical permutation distribution; mark edges significant after Benjamini–Hochberg correction.
- Result: Boolean masks Mvoc and Mpop that keep only edges driven by pull (co-occurrence beyond frequency).
- Validation of groups (hypergeometric density test)
- Given the pull-filtered graph, compute whether a candidate vocabulary or population group has more significant internal edges than expected by chance using a hypergeometric/urn model; report density ratio (>1 indicates over-connection).
- Topic/co-cluster discovery
- Trifactor NMF (X ≈ F S Gᵀ) to identify document clusters (F), vocabulary clusters (G), and coupling strengths (S).
- Use S to measure within-topic coupling; use ablation (zero-out vocabulary terms in X and retest population cohesion on XXT) to test necessity of vocabulary for population coherence.
- Multiple complementary diagnostics: concentration of F[:,k], cohesion on raw XXT, significant vocabulary density on XTX, and ablation-driven changes.
- Statistical thresholds and corrections reported (permutation repeats = 200; multiple testing control via Benjamini–Hochberg).
Implications for AI Economics
- Measurement and monitoring
- Existing taxonomies (SOC, O*NET) and five-year surveys lag true occupational dynamics; the paper demonstrates a scalable, high-frequency resume-based approach to detect nascent occupations or diffusion in near–real time.
- The co-attractor tests separate buzzword-driven signals from genuine occupational formation — important for accurate labor market measurement and forecasting.
- Labor market interpretation
- If AI is primarily diffusing across occupations rather than forming its own occupation, policies focusing on retraining should emphasize upskilling within existing career ladders (role-specific tool adoption, integration into domain tasks) rather than creating separate AI-only career tracks.
- Employer classification and hiring taxonomies that treat AI expertise as a cross-cutting skill (rather than a discrete occupational label) may better reflect current labor-market structure — unless active steps are taken to institutionalize a new occupation.
- Policy and credentialing
- Introducing a formal occupational category (e.g., “AI Engineer”) — via standard occupational classification, credentialing programs, or industry hiring taxonomies — could act as a coordination mechanism and potentially catalyze population cohesion around the already-formed vocabulary (i.e., complete the co-attractor). Whether this is desirable depends on trade-offs (e.g., labor market segmentation vs. clearer training pipelines).
- Workforce forecasting and education
- Forecasts that assume rapid emergence of a unified AI occupation may overstate labor reallocation; instead expect AI skill diffusion to alter task content and productivity within many occupations.
- Educational and training programs should prioritize embedding AI tool fluency into domain curricula and continuing education for incumbent workers rather than primarily funding standalone “AI degrees.”
- Research and policy caveats
- The method relies on resume text (supply-side signal); employer-side uptake and job-posting dynamics matter for wages and demand. A combined analysis (resumes + job postings + employer surveys) would give a fuller picture.
- Data-source considerations: BOLD’s resume sample is described as broader than LinkedIn’s, but all resume platforms have selection biases that should be acknowledged when generalizing to the whole labor force.
- Ablation establishes necessity of vocabulary for population cohesion in the data, not causal mechanisms in the broader labor market.
- Practical recommendation
- Statistical occupational monitoring systems (national statistical agencies, workforce boards, and research centers) should adopt dual-side co-occurrence tests (vocabulary and population) and ablation diagnostics to distinguish occupation formation from technology diffusion. For AI-specific policy, prioritize cross-occupation upskilling pathways while tracking whether institutional actions (classification, credentials) are producing emergent co-attractor dynamics.
Assessment
Claims (8)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Occupations form and evolve faster than classification systems can track. Adoption Rate | positive | high | speed of occupation formation / evolution relative to classification updates |
n=8200000
0.18
|
| A genuine occupation is a self-reinforcing structure (a bipartite co-attractor) in which a shared professional vocabulary makes practitioners cohesive as a group, and the cohesive group sustains the vocabulary. Other | positive | high | conceptual definition of occupation formation (vocabulary ↔ population cohesion) |
0.03
|
| The co-attractor concept enables a zero-assumption method for detecting occupational emergence from resume data, requiring no predefined taxonomy or job titles: we test vocabulary cohesion and population cohesion independently, with ablation to test whether the vocabulary is the mechanism binding the population. Adoption Rate | positive | high | ability to detect occupational emergence (via vocabulary cohesion and population cohesion metrics) |
n=8200000
0.18
|
| Applied to 8.2 million US resumes (2022-2026), the method correctly identifies established occupations. Adoption Rate | positive | high | accuracy / correctness of detected occupations (established occupations identified) |
n=8200000
0.18
|
| For AI: a cohesive professional vocabulary formed rapidly in early 2024, but the practitioner population never cohered. Adoption Rate | mixed | high | vocabulary cohesion (rapid formation) and population cohesion (absence of cohesion) |
n=8200000
0.18
|
| The pre-existing AI community dissolved as the tools went mainstream, and the new vocabulary was absorbed into existing careers rather than binding a new occupation. Employment | negative | medium | population cohesion / absorption into existing careers (dissolution of standalone AI community) |
n=8200000
0.05
|
| AI appears to be a diffusing technology, not an emerging occupation. Adoption Rate | negative | high | status of AI as technology diffusion versus occupation formation |
n=8200000
0.18
|
| Introducing an 'AI Engineer' occupational category could catalyze population cohesion around the already-formed vocabulary, completing the co-attractor. Governance And Regulation | positive | high | potential for creating population cohesion (policy intervention effect) |
0.03
|