Occupation-exposure scores built from AI chat logs reflect who uses the platforms, not the workforce, and change estimated employment effects nearly twofold across vendors and channels; correcting for workforce composition cuts estimated impacts by up to 93% and theoretical bounds show the bias typically masks substitution rather than exaggerating it.
A growing literature uses artificial intelligence platform conversation logs to measure occupation exposure. We show that these scores partly measure platform user base rather than the workforce. Holding outcome, sample, controls, and estimator fixed while varying only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9, and within-vendor consumer-versus-enterprise channels produce estimates that disagree in sign. Reweighting to Bureau of Labor Statistics workforce shares attenuates estimates by 42 to 93 percent. We formalize the non-classical measurement error, derive probability limits and partial-identification bounds for employment elasticities. The bias understates substitution more than augmentation.
Summary
Main Finding
Platform-derived occupational AI-exposure scores (constructed from conversation logs) mix true occupational exposure with the platform’s user composition. That mixture creates non-classical, platform-specific measurement error that materially changes downstream estimates: holding everything else fixed, changing only the platform input alters the post‑ChatGPT employment coefficient by a factor of 1.9 and can even flip its sign. Reweighting platform shares to BLS workforce shares attenuates published effects by 42–93%. The authors formalize the bias, derive probability limits and partial‑identification bounds for the structural employment elasticity, and show the bias tends to understate substitution relative to augmentation.
Key Points
- What’s wrong with platform measures
- Platform conversation shares reflect both (a) which occupations use that platform (between‑occupation selection) and (b) which tasks platform users perform (within‑occupation selection). These are economically distinct.
- The platform selection parameter (ψi,p = platform conversation share / workforce share) departs systematically from 1 and correlates with AI‑applicability scores, producing non‑classical measurement error that is sign‑preserving but magnitude‑distorting.
- Formal decomposition (informal)
- Proxy exposure: Êi,p = ψi,p Ei + ηi,p + ui,p, where Ei is true exposure, ηi,p captures within‑occupation task selection, and ui,p is classical noise.
- Probability limit of OLS DiD on the proxy (equation summarized): plim β̂p = β · (λp κp) / (λp^2 κp + 1), where λp = Cov(Êp, E)/Var(E) and κp = Var(E)/σv,p^2 (signal‑to‑noise).
- Consequences: each platform has its own plim; bias is irreducible from within‑platform data alone; bias evolves over time as platform user composition changes.
- Empirical regularities and magnitudes
- Cross‑platform and within‑platform (consumer vs enterprise, and across release waves) coefficient instability: absolute employment coefficients vary by a factor of 1.9 and can differ in sign.
- Example: Anthropic Claude consumer coeff moved monotonically from −0.116 (late 2024) to −0.222 (early 2026).
- Platform coverage is highly non‑representative: the ratio of platform conversation density to workforce employment density spans a factor of ≈72 across major SOC groups; e.g., Computer & Mathematical occupations are massively overrepresented on some channels.
- Reweighting results: reweighting the platform shares to BLS workforce shares attenuates composite (Anthropic‑weighted) coefficients by up to 93% (rendering them statistically indistinguishable from zero) and attenuates Microsoft Copilot coefficients by 42% (retaining significance because Copilot user base is closer to workforce composition).
- Directional asymmetry
- Platforms more readily observe augmentation (ongoing users who benefit from AI) than substitution (workers displaced and no longer using the platform). Thus platform measures tend to bias toward detecting augmentation and understate substitution.
- Practical implication example
- A hypothetical $10B retraining allocation ranked by platform‑weighted exposure would allocate 39% toward occupations not flagged by a workforce‑weighted ranking—i.e., misdirect resources.
Data & Methods
- Exposure measures and inputs
- Ten platform‑derived exposure variants: Anthropic Claude (consumer and enterprise channels across multiple sampling waves), Microsoft Copilot, and composite measures constructed by weighting the Eloundou et al. (2024) task rubric with platform conversation shares (approach similar to Massenkoff & McCrory 2026).
- Eloundou et al. rubric (LLM‑rated task applicability) held fixed in the main exercises; platform conversation shares provide the weighting.
- Benchmark and outcome data
- American Community Survey (ACS) panel 2015–2024: 13.1 million person‑year observations merged to six‑digit SOC occupations.
- Bureau of Labor Statistics Occupational Employment and Wage Statistics (OEWS) used for workforce occupational shares (benchmark for reweighting).
- Supplementary sources referenced: Anthropic AEI mappings, OpenAI/Copilot mappings, Bick, Blandin & Deming (2026) survey micro release (self‑reported at‑work AI use).
- Estimation design
- Difference‑in‑differences (DiD) specification in the style used across the AI‑and‑labor literature to capture post‑ChatGPT employment responses, run repeatedly holding outcome, controls, estimator, and rubric constant while varying only the platform weighting.
- Reweighting: replace platform conversation shares f_p(i) with BLS workforce shares f(i) (equivalently divide out ψi,p) to remove between‑occupation selection; remaining bias then reflects within‑occupation task selection.
- Theoretical derivations: algebraic decomposition of measurement error, probability limits of OLS estimates under non‑classical error, and partial‑identification bounds (baseline OLS and workforce‑reweighted estimate as endpoints), with proofs and extensions in appendices.
- Key reported metrics
- Variation in estimated employment coefficient across platforms and waves (factor 1.9, sign changes).
- Between‑occupation density ratios (up to 9.41 for specific groups; overall span ~72).
- Attenuation percentages from reweighting (42–93%).
- Partial‑identification intervals constructed with maintained ordering assumption (that within‑occupation selection runs in same direction as between‑occupation selection).
Implications for AI Economics
- Measurement caution
- Platform‑derived exposure scores should not be treated as fixed occupational characteristics; they are convolved with platform user composition and can produce misleading point estimates and inference.
- Reporting a single point estimate from one platform (or one wave) is fragile: different platforms/waves can yield different magnitudes and even signs.
- Recommended practices for researchers
- Always report sensitivity to the platform input: run analyses across multiple platform inputs and waves when possible.
- Reweight platform shares to workforce shares as a baseline robustness check; report workforce‑reweighted estimates alongside raw platform estimates.
- Use partial‑identification intervals (baseline platform estimate and workforce‑reweighted estimate as endpoints) when external instruments are unavailable.
- Where feasible, pool multiple platforms or aggregate across independent platform samples to reduce Var(ψi,p) (cross‑platform aggregation can help if selection parameters are imperfectly correlated).
- Invest in mapping task‑level survey/administrative data to platform task taxonomies (build crosswalks between survey categories and O*NET tasks) or seek external instruments for identification.
- Policy and normative consequences
- Policymakers and funders using platform‑derived rankings (for retraining, targeting, regulation) risk misallocating resources because platforms overrepresent white‑collar, college‑educated occupations and underrepresent frontline/manual/disability‑prevalent occupations.
- Platform measures may understate the extent of substitution risk among vulnerable groups and thus underinform safety‑net or retraining policy.
- Broader research implications
- Heterogeneity in the empirical literature (divergent findings on employment and wages) may partly reflect differing exposure inputs rather than only real economic differences.
- Future empirical work on AI and labor should treat platform‑derived measures as noisy, platform‑specific proxies and either correct for selection (reweight, aggregate) or adopt methods that deliver bounds or identification via external variation.
Short takeaway: platform conversation logs are valuable but endogenous instruments for occupational exposure. Without adjustments (reweighting, aggregation, or external instruments) estimates using them can be substantially biased and time‑variant; researchers and policymakers should treat single‑platform exposure measures with caution and report robustness to platform selection.
Assessment
Claims (6)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| AI platform conversation-log exposure scores partly measure the platform user base rather than the underlying workforce. Other | mixed | high | occupation exposure scores derived from AI platform conversation logs |
0.48
|
| Holding outcome, sample, controls, and estimator fixed while varying only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9. Employment | mixed | high | post-ChatGPT employment coefficient |
factor of 1.9
0.48
|
| Within-vendor consumer-versus-enterprise channels produce estimates that disagree in sign. Employment | mixed | high | estimated employment (or employment-related) effects derived from channel-specific exposure measures |
0.48
|
| Reweighting platform-based exposure measures to Bureau of Labor Statistics workforce shares attenuates estimates by 42 to 93 percent. Employment | negative | high | magnitude of employment estimates (attenuation after reweighting) |
42 to 93 percent
0.48
|
| The paper formalizes the non-classical measurement error, deriving probability limits and partial-identification bounds for employment elasticities. Employment | mixed | high | employment elasticities (probability limits and partial-identification bounds) |
0.8
|
| The measurement bias understates substitution effects more than it understates augmentation effects. Job Displacement | negative | high | relative bias in estimated substitution versus augmentation effects on employment |
0.48
|