Augmented Human Capital: A Unified Theory and LLM-Based Measurement Framework for Cognitive Factor Decomposition in AI-Augmented Economies

This paper proposes a decomposition of human capital into three orthogonal components -- physical-manual (H^P), routine-cognitive (H^C), and augmentable-cognitive (H^A) -- and develops a production function in which AI capital interacts asymmetrically with these components: substituting for routine cognitive work while complementing augmentable cognitive work through an amplification function phi(D). I derive a corrected Mincerian wage equation and show that the standard specification is misspecified in AI-augmented economies. Using LLM-generated measures of occupational augmentability for 18,796 O*NET task statements mapped to 440 Colombian occupations, merged with household survey microdata (N = 105,517 workers), I estimate the augmented Mincer equation. The wage return to H^A increases with AI adoption in the formal sector (beta_2 = +0.051, p < 0.001), while informal workers cannot capture augmentation rents (beta_2 = -0.044). A triple interaction confirms formality as the binding mechanism (beta_{AHC x D x Formal} = +0.272, p < 0.001). The augmentation premium is strongest for experienced workers (ages 46-65) and in health and education sectors. These results provide the first developing-country evidence of cognitive factor decomposition in AI-augmented labor markets and demonstrate that the binding constraint on human-AI complementarity in the Global South is not technology access but labor market institutions.

Summary

Main Finding

This paper develops a theory and measurement framework for decomposing human capital into three cognitive factors—physical-manual (HP), routine-cognitive (HC, AI-substitutable), and augmentable-cognitive (HA, AI-complementary)—and shows empirically for Colombia that AI adoption raises returns to HA only where firms capture digital labor (i.e., in the formal sector). Key empirical results: an HA level effect (~+9.1% per SD), a positive augmentation interaction in the formal sector (reported estimates range ≈ +0.05 to +0.16; IV 2SLS ≈ +0.234) and a negative/zero interaction in the informal sector (≈ −0.044). A triple interaction (AHC × D × Formal) strongly confirms formality as the binding channel.

Key Points

Theoretical contribution
- Proposes an orthogonal decomposition Hi = HP ⊕ HC ⊕ HA.
- Introduces an amplification function ϕ(Df) that multiplies the productivity of HA with a firm’s digital labor stock Df.
- Derives an “augmented Mincer” wage specification: log wage depends on HA, HA×D, HC×D with testable signs (β2 > 0 for HA×D; β3 < 0 for HC×D).
- Predicts that augmentation rents accrue only where firms invest in digital labor—formal firms in dual labor markets.
Measurement innovation
- Builds an Augmented Human Capital index (AHC) by scoring 18,796 O*NET task statements with an LLM (Claude Haiku 4.5) for augmentation potential (0–100).
- Aggregates task scores to occupations using O*NET importance weights and a chained SOC→ISCO→CIUO-08 crosswalk to map to 440 Colombian occupations.
- Validation: high convergent correlations with other AI exposure indices (e.g., r = +0.86 vs. Felten AIOE, r = +0.79 vs. Eloundou GPT), negative correlations with automation measures (Frey–Osborne r ≈ −0.79), and inter-LLM reliability (Krippendorff’s α ≈ 0.71 after adjusting level bias).
Empirical evidence (Colombia, GEIH microdata)
- Sample: 105,517 workers (estimation sample after restrictions), 442 occupations, 20 sectors.
- Constructs sector×occupation AI-adoption proxy D using within-sector indicators: formality rate (0.30), mean education (0.25), mean income (0.20), large-firm share (0.25).
- Main empirical patterns:
  - AHC increases explanatory power of wage regressions substantially (R² rises from ~0.28 to ~0.44 in augmented specifications).
  - HA × D interaction positive and significant in formal sector; negative/insignificant in informal sector.
  - Triple interaction AHC × D × Formal large and significant (paper reports +0.519, p < 0.001).
  - Heterogeneity: augmentation premium strongest for older/experienced workers (ages 46–65) and in health and education sectors.
- Identification: sector fixed effects, placebo permutation of AHC (yields null), and an IV strategy for D using pre-period (2018–19) sector capital intensity (first-stage F ≈ 229; 2SLS β2 ≈ +0.234, p = 0.008).
- Oaxaca–Blinder: AI adoption explains ~20.9% of the formal/informal wage gap (more than education’s contribution in this decomposition); AHC and AHC×D explain ~5.5%.

Data & Methods

Measurement pipeline
- Tasks scored: 18,796 unique O*NET task statements.
- LLM scoring outputs per task: augmentation potential (aok ∈ [0,100]), substitution risk (sok ∈ [0,100]), augmentation type label (decision support, information synthesis, creative amplification, etc.).
- Occupation AHC: weighted average of task augmentation scores using O*NET importance weights.
- Crosswalk: chained SOC → ISCO-08 → CIUO-08 (Colombian adaptation) covering ~99.9% of GEIH employment.
Microdata and construction of D
- Primary microdata: Colombia’s GEIH 2024 wave; final N ≈105.5k after age/income restrictions.
- AI-adoption proxy D built at sector × occupation-group cell (677 cells) from observable GEIH indicators (formality rate, mean education, mean income, large-firm share) with pre-specified weights.
Econometric specifications
- Baseline: augmented Mincer with controls (experience, experience², education years), sector fixed effects, and interactions HA × ln D and HC × ln D.
- Robustness/identification: sector FE to absorb sector-level confounders; permutation placebo tests; IV (pre-period capital intensity) to address endogeneity of D; heterogeneity and Oaxaca–Blinder decompositions; re-scoring subsample with alternate LLM (Claude Sonnet 4) to check reliability.
Validation and robustness checks
- Convergent/discriminant external validation against existing AI-exposure/automation indices.
- Inter-rater reliability across LLMs (Pearson r ≈ 0.76, Spearman ρ ≈ 0.75).
- Placebo permutations and model specifications (including restricting to formal workers) show the main pattern is robust.
- 2SLS yields larger point estimate than OLS, suggesting attenuation bias from measurement error in D.

Implications for AI Economics

Theory and measurement
- Advances human capital theory by treating cognitive capacity as a multidimensional vector (HP, HC, HA), clarifying why AI simultaneously substitutes and complements different cognitive activities.
- Demonstrates a practical LLM-based measurement strategy (AHC) for the augmentable component that can be adapted to other countries and datasets.
Labor markets and distributional effects
- Shows that capture of AI-driven augmentation rents depends on firm-level investment/institutions (formality), not only individual skills or technology access—highlighting institutional constraints as central in developing economies.
- Suggests AI adoption can widen wage gaps across formal/informal sectors and across workers with different cognitive compositions, shaping inequality and wage structure.
Policy and training
- Implication for education and training: policies should target augmentable cognitive skills (HA: complex communication, contextual judgment, synthesis) and complementarities with firm digital adoption.
- Formalization policies and encouragement of firm digital labor investments may be as important as access to AI tools for enabling workers to capture augmentation rents.
Measurement & research agenda
- Validates the use of LLMs as measurement instruments for latent task-level economic constructs, but underscores the need for inter-model validation and attention to systematic level biases.
- Opens avenues: apply the AHC framework to other developing countries, track dynamics as firm digital investments evolve, and combine task-level behavioral data (on-the-job use of AI) with AHC for stronger causal claims.

Caveats (as noted or implied in the paper) - AHC relies on LLM task scoring and a constructed sector×occupation D proxy; measurement assumptions matter. The paper addresses this with validation, robustness checks, and IV, but remaining concerns include possible residual confounding and generalizability beyond Colombia. - The mapping from model scores to real-world augmentability depends on how firms and workers actually deploy AI tools; uptake heterogeneity within cells could attenuate estimated effects.

Assessment

Paper Typecorrelational Evidence Strengthmedium — Large sample (105,517 workers) and highly significant interaction coefficients lend statistical credibility, and the paper combines a formal theoretical model with novel LLM-derived task measures; however, identification is observational and potentially confounded by endogenous AI adoption, selection into formality, measurement error in LLM augmentability and O*NET-to-Colombia mapping, so causal claims remain tentative. Methods Rigormedium — Rigorous theoretical derivation and thoughtful empirical specification (augmented Mincer, triple interactions, heterogeneity by age and sector) and use of rich microdata improve credibility, but the study lacks quasi-experimental variation (IV, policy shocks, panel/fixed-effects exploiting longitudinal changes) and relies on new LLM-generated measures whose validity and robustness to prompt/mapping choices require further validation. SampleLLM-generated augmentability scores for 18,796 O*NET task statements mapped to 440 Colombian occupations, merged with nationally representative household survey microdata covering N = 105,517 workers across formal and informal sectors (cross-sectional; age and sectoral detail; time period not specified in the summary). Themeshuman_ai_collab labor_markets adoption IdentificationCross-sectional augmented Mincer regressions that interact an occupation-level LLM-derived augmentability score with a sectoral/firm-level AI adoption indicator and a formal/informal employment indicator; identification rests on conditional exogeneity of AI adoption and augmentability after covariate controls (no instrument or natural experiment reported). GeneralizabilitySingle-country (Colombia) — results may not generalize to other developing or developed economies with different labor market institutions, O*NET (US)-based task taxonomy mapped to Colombian occupations may mischaracterize local job content, LLM-derived augmentability measures are novel and may be sensitive to prompt design and model selection, Observational, cross-sectional design limits causal generalization to dynamic adoption or long-run effects, Focus on wages excludes employment/occupational mobility outcomes and firm-level heterogeneity

Claims (11)

Claim	Direction	Confidence	Outcome	Details
The paper proposes a decomposition of human capital into three orthogonal components: physical-manual (H^P), routine-cognitive (H^C), and augmentable-cognitive (H^A). Skill Acquisition	positive	high	human capital decomposition (H^P, H^C, H^A)	0.05
AI capital interacts asymmetrically with those components: it substitutes for routine cognitive work (H^C) while complementing augmentable cognitive work (H^A) through an amplification function phi(D). Automation Exposure	positive	high	AI–human capital complementarity / substitution (impact on task allocation/automation exposure)	0.3
I derive a corrected Mincerian wage equation and show that the standard specification is misspecified in AI-augmented economies. Wages	positive	high	wage equation specification / wage determination	0.5
The empirical analysis uses LLM-generated measures of occupational augmentability for 18,796 O*NET task statements mapped to 440 Colombian occupations, merged with household survey microdata (N = 105,517 workers). Other	positive	high	occupational augmentability measure / dataset construction	n=18796 0.5
In the estimated augmented Mincer equation, the wage return to augmentable-cognitive capital (H^A) increases with AI adoption in the formal sector (beta_2 = +0.051, p < 0.001). Wages	positive	high	wages (return to H^A conditional on AI adoption and formality)	n=105517 +0.051 0.3
Informal workers cannot capture augmentation rents: the estimated coefficient for H^A in informal sector is negative (beta_2 = -0.044). Wages	negative	high	wages (return to H^A for informal workers)	n=105517 -0.044 0.3
A triple interaction confirms formality as the binding mechanism: beta_{AHC x D x Formal} = +0.272 (p < 0.001). Wages	positive	high	wages (interaction effect showing formal-sector amplification of H^A returns with AI adoption)	n=105517 +0.272 0.3
The augmentation premium (return to H^A with AI) is strongest for experienced workers (ages 46-65). Wages	positive	high	wages (heterogeneous augmentation premium by age cohort)	n=105517 0.3
The augmentation premium is strongest in the health and education sectors. Wages	positive	high	wages (sectoral heterogeneity of augmentation premium)	n=105517 0.3
These results provide the first developing-country evidence of cognitive factor decomposition in AI-augmented labor markets. Other	positive	high	evidence of cognitive factor decomposition in AI-augmented labor markets (novelty claim)	n=105517 0.15
The binding constraint on human–AI complementarity in the Global South is not technology access but labor market institutions (formality). Governance And Regulation	positive	high	binding constraint on human–AI complementarity (institutional vs. technological)	n=105517 0.3

AI raises wages for workers whose tasks are augmentable — but only in the formal sector; informal workers do not capture augmentation rents, implying labor market institutions, not technology access, are the binding constraint in the Global South.