Occupation-exposure scores built from AI chat logs reflect who uses the platforms, not the workforce, and change estimated employment effects nearly twofold across vendors and channels; correcting for workforce composition cuts estimated impacts by up to 93% and theoretical bounds show the bias typically masks substitution rather than exaggerating it.

Who Uses AI? Platforms, Workforce, and AI Exposure

Michelle Yin, Burhan Ogut · May 20, 2026

arxiv quasi_experimental medium evidence 8/10 relevance Source PDF

AI platform-derived occupation-exposure scores largely capture platform user composition rather than the workforce, causing estimated post-ChatGPT employment effects to vary substantially across platforms and channels and, after reweighting to BLS workforce shares, to attenuate by 42–93%; formal measurement-error bounds show bias tends to understate substitution.

A growing literature uses artificial intelligence platform conversation logs to measure occupation exposure. We show that these scores partly measure platform user base rather than the workforce. Holding outcome, sample, controls, and estimator fixed while varying only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9, and within-vendor consumer-versus-enterprise channels produce estimates that disagree in sign. Reweighting to Bureau of Labor Statistics workforce shares attenuates estimates by 42 to 93 percent. We formalize the non-classical measurement error, derive probability limits and partial-identification bounds for employment elasticities. The bias understates substitution more than augmentation.

Summary

Main Finding

Platform-derived occupational AI-exposure scores (constructed from conversation logs) mix true occupational exposure with the platform’s user composition. That mixture creates non-classical, platform-specific measurement error that materially changes downstream estimates: holding everything else fixed, changing only the platform input alters the post‑ChatGPT employment coefficient by a factor of 1.9 and can even flip its sign. Reweighting platform shares to BLS workforce shares attenuates published effects by 42–93%. The authors formalize the bias, derive probability limits and partial‑identification bounds for the structural employment elasticity, and show the bias tends to understate substitution relative to augmentation.

Key Points

What’s wrong with platform measures
- Platform conversation shares reflect both (a) which occupations use that platform (between‑occupation selection) and (b) which tasks platform users perform (within‑occupation selection). These are economically distinct.
- The platform selection parameter (ψi,p = platform conversation share / workforce share) departs systematically from 1 and correlates with AI‑applicability scores, producing non‑classical measurement error that is sign‑preserving but magnitude‑distorting.
Formal decomposition (informal)
- Proxy exposure: Êi,p = ψi,p Ei + ηi,p + ui,p, where Ei is true exposure, ηi,p captures within‑occupation task selection, and ui,p is classical noise.
- Probability limit of OLS DiD on the proxy (equation summarized): plim β̂p = β · (λp κp) / (λp^2 κp + 1), where λp = Cov(Êp, E)/Var(E) and κp = Var(E)/σv,p^2 (signal‑to‑noise).
- Consequences: each platform has its own plim; bias is irreducible from within‑platform data alone; bias evolves over time as platform user composition changes.
Empirical regularities and magnitudes
- Cross‑platform and within‑platform (consumer vs enterprise, and across release waves) coefficient instability: absolute employment coeﬀicients vary by a factor of 1.9 and can differ in sign.
- Example: Anthropic Claude consumer coeﬀ moved monotonically from −0.116 (late 2024) to −0.222 (early 2026).
- Platform coverage is highly non‑representative: the ratio of platform conversation density to workforce employment density spans a factor of ≈72 across major SOC groups; e.g., Computer & Mathematical occupations are massively overrepresented on some channels.
- Reweighting results: reweighting the platform shares to BLS workforce shares attenuates composite (Anthropic‑weighted) coefficients by up to 93% (rendering them statistically indistinguishable from zero) and attenuates Microsoft Copilot coefficients by 42% (retaining significance because Copilot user base is closer to workforce composition).
Directional asymmetry
- Platforms more readily observe augmentation (ongoing users who benefit from AI) than substitution (workers displaced and no longer using the platform). Thus platform measures tend to bias toward detecting augmentation and understate substitution.
Practical implication example
- A hypothetical $10B retraining allocation ranked by platform‑weighted exposure would allocate 39% toward occupations not flagged by a workforce‑weighted ranking—i.e., misdirect resources.

Data & Methods

Exposure measures and inputs
- Ten platform‑derived exposure variants: Anthropic Claude (consumer and enterprise channels across multiple sampling waves), Microsoft Copilot, and composite measures constructed by weighting the Eloundou et al. (2024) task rubric with platform conversation shares (approach similar to Massenkoff & McCrory 2026).
- Eloundou et al. rubric (LLM‑rated task applicability) held fixed in the main exercises; platform conversation shares provide the weighting.
Benchmark and outcome data
- American Community Survey (ACS) panel 2015–2024: 13.1 million person‑year observations merged to six‑digit SOC occupations.
- Bureau of Labor Statistics Occupational Employment and Wage Statistics (OEWS) used for workforce occupational shares (benchmark for reweighting).
- Supplementary sources referenced: Anthropic AEI mappings, OpenAI/Copilot mappings, Bick, Blandin & Deming (2026) survey micro release (self‑reported at‑work AI use).
Estimation design
- Difference‑in‑differences (DiD) specification in the style used across the AI‑and‑labor literature to capture post‑ChatGPT employment responses, run repeatedly holding outcome, controls, estimator, and rubric constant while varying only the platform weighting.
- Reweighting: replace platform conversation shares f_p(i) with BLS workforce shares f(i) (equivalently divide out ψi,p) to remove between‑occupation selection; remaining bias then reflects within‑occupation task selection.
- Theoretical derivations: algebraic decomposition of measurement error, probability limits of OLS estimates under non‑classical error, and partial‑identification bounds (baseline OLS and workforce‑reweighted estimate as endpoints), with proofs and extensions in appendices.
Key reported metrics
- Variation in estimated employment coefficient across platforms and waves (factor 1.9, sign changes).
- Between‑occupation density ratios (up to 9.41 for specific groups; overall span ~72).
- Attenuation percentages from reweighting (42–93%).
- Partial‑identification intervals constructed with maintained ordering assumption (that within‑occupation selection runs in same direction as between‑occupation selection).

Implications for AI Economics

Measurement caution
- Platform‑derived exposure scores should not be treated as fixed occupational characteristics; they are convolved with platform user composition and can produce misleading point estimates and inference.
- Reporting a single point estimate from one platform (or one wave) is fragile: different platforms/waves can yield different magnitudes and even signs.
Recommended practices for researchers
- Always report sensitivity to the platform input: run analyses across multiple platform inputs and waves when possible.
- Reweight platform shares to workforce shares as a baseline robustness check; report workforce‑reweighted estimates alongside raw platform estimates.
- Use partial‑identification intervals (baseline platform estimate and workforce‑reweighted estimate as endpoints) when external instruments are unavailable.
- Where feasible, pool multiple platforms or aggregate across independent platform samples to reduce Var(ψi,p) (cross‑platform aggregation can help if selection parameters are imperfectly correlated).
- Invest in mapping task‑level survey/administrative data to platform task taxonomies (build crosswalks between survey categories and O*NET tasks) or seek external instruments for identification.
Policy and normative consequences
- Policymakers and funders using platform‑derived rankings (for retraining, targeting, regulation) risk misallocating resources because platforms overrepresent white‑collar, college‑educated occupations and underrepresent frontline/manual/disability‑prevalent occupations.
- Platform measures may understate the extent of substitution risk among vulnerable groups and thus underinform safety‑net or retraining policy.
Broader research implications
- Heterogeneity in the empirical literature (divergent findings on employment and wages) may partly reflect differing exposure inputs rather than only real economic differences.
- Future empirical work on AI and labor should treat platform‑derived measures as noisy, platform‑specific proxies and either correct for selection (reweight, aggregate) or adopt methods that deliver bounds or identification via external variation.

Short takeaway: platform conversation logs are valuable but endogenous instruments for occupational exposure. Without adjustments (reweighting, aggregation, or external instruments) estimates using them can be substantially biased and time‑variant; researchers and policymakers should treat single‑platform exposure measures with caution and report robustness to platform selection.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — The paper combines theory (formal measurement-error derivations and partial-identification bounds) with empirical checks across multiple platforms and within-vendor channels and uses reweighting to official BLS workforce shares, which strengthens credibility; however, identification is observational and hinges on assumptions about the relationship between platform users and the broader workforce and about the measurement-error structure, so causal claims remain partly conditional on those assumptions. Methods Rigorhigh — The authors explicitly model non-classical measurement error, derive probability limits and partial-identification bounds, run systematic sensitivity checks by varying only the platform input, exploit within-vendor channel variation, and apply reweighting to administrative BLS data—demonstrating careful econometric and robustness work—though empirical identification still depends on representativeness assumptions. SampleOccupational exposure scores constructed from multiple AI platform conversation logs (multiple vendors and both consumer and enterprise channels); employment outcome data covering the post-ChatGPT period (aggregated to the labor-market units used in the analysis); and Bureau of Labor Statistics workforce share data used for reweighting and benchmarking. Themeslabor_markets adoption IdentificationCompare estimates of the post-ChatGPT employment effect while holding outcome, sample, controls, and estimator fixed and varying only the occupational exposure input derived from different AI platform conversation logs (different vendors and consumer vs enterprise channels); reweight platform-derived exposure measures to Bureau of Labor Statistics workforce shares; and formalize a measurement-error model to derive probability limits and partial-identification bounds for employment elasticities. GeneralizabilityPlatform user bases are not representative of the overall workforce, limiting external validity to the general labor market., Findings may not generalize beyond the specific platforms, vendors, channels, and time window studied (early post-ChatGPT era)., Geographic or sectoral biases in platform usage may limit applicability to other countries or industries., Results depend on the occupation-mapping and exposure-construction methods used for conversation logs; different coding conventions could change outcomes., Reweighting assumes BLS categories align meaningfully with platform-derived exposure measures, which may be imperfect.

Claims (6)

Claim	Direction	Confidence	Outcome	Details
AI platform conversation-log exposure scores partly measure the platform user base rather than the underlying workforce. Other	mixed	high	occupation exposure scores derived from AI platform conversation logs	0.48
Holding outcome, sample, controls, and estimator fixed while varying only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9. Employment	mixed	high	post-ChatGPT employment coefficient	factor of 1.9 0.48
Within-vendor consumer-versus-enterprise channels produce estimates that disagree in sign. Employment	mixed	high	estimated employment (or employment-related) effects derived from channel-specific exposure measures	0.48
Reweighting platform-based exposure measures to Bureau of Labor Statistics workforce shares attenuates estimates by 42 to 93 percent. Employment	negative	high	magnitude of employment estimates (attenuation after reweighting)	42 to 93 percent 0.48
The paper formalizes the non-classical measurement error, deriving probability limits and partial-identification bounds for employment elasticities. Employment	mixed	high	employment elasticities (probability limits and partial-identification bounds)	0.8
The measurement bias understates substitution effects more than it understates augmentation effects. Job Displacement	negative	high	relative bias in estimated substitution versus augmentation effects on employment	0.48