The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

A simple dynamical model calibrated to PISA and domain data finds humans risk abrupt loss of core skills once AI performs roughly 85% of tasks, but scheduled practice or occasional AI outages can largely avert the collapse.

The enrichment paradox: critical capability thresholds and irreversible dependency in human-AI symbiosis
Jeongju Park, Musu Kim, Sekyung Han · March 25, 2026
arxiv theoretical medium evidence 7/10 relevance Source PDF
A two-variable dynamical model calibrated to domain data and PISA suggests there is a critical delegation threshold (K* ≈ 0.85) beyond which human capability collapses abruptly, and that periodic AI failures or mandated practice can substantially preserve human capability.

As artificial intelligence assumes cognitive labor, no quantitative framework predicts when human capability loss becomes catastrophic. We present a two-variable dynamical systems model coupling capability (H) and delegation (D), grounded in three axioms: learning requires capability, practice, and disuse causes forgetting. Calibrated to four domains (education, medicine, navigation, aviation), the model identifies a critical threshold K* approximately 0.85 (scope-dependent; broader AI scope lowers K*) beyond which capability collapses abruptly-the "enrichment paradox." Validated against 15 countries' PISA data (102 points, R^2 = 0.946, 3 parameters, lowest BIC), the model predicts that periodic AI failures improve capability 2.7-fold and that 20% mandatory practice preserves 92% more capability than the simulation baseline (which includes a 5% background AI-failure rate). These findings provide quantitative foundations for AI capability-threshold governance.

Summary

Main Finding

The paper develops a minimal two-variable dynamical model of human capability (H) and delegation to AI (D) that predicts a sharp, domain-robust critical capability threshold K ≈ 0.85. When AI capability K surpasses K, human capability can collapse abruptly (the "enrichment paradox"), producing near-irreversible dependency. The model is calibrated to multiple empirical domains and population-level PISA data, and yields actionable policy conclusions: modest mandatory manual-practice quotas (e.g., 20% of tasks) and/or engineered periodic AI failures can substantially preserve human capability.

Key Points

  • Model summary

    • State variables: human capability H(t) ∈ [0,1] and delegation rate D(t) ∈ [0,1].
    • Core ODEs:
      • dH/dt = α (H + ε)(1 − H)(1 − D) − β H D
      • dD/dt = γ (K − H)(1 − D)D + δ D(1 − D) D̄ (mean-field: D̄ = D)
    • Three minimal axioms underpinning the model: (1) learning requires existing capability, (2) learning requires practice, (3) disuse causes forgetting.
    • Multiplicative learning term (H·(1−D)) creates bistability and makes recovery from near-zero H extremely slow (timescale ∼ 1/(α ε)).
  • Principal dynamics and phenomena

    • Bistability with two stable attractors: autonomous (H ≈ 1, D ≈ 0) and dependent (H ≈ 0, D ≈ 1). A saddle separates their basins.
    • Critical threshold K ≈ 0.85 (robustly found in ABM sweeps across parameter ranges, numerically in 0.82–0.92) marks a cliff-like transition: small increases in K around K produce large drops in equilibrium H.
    • Enrichment paradox: improving AI capability (increasing K) can destabilize human–AI coexistence and drive capability collapse.
    • Irreversibility/hysteresis: once H falls below the saddle, removing AI does not restore capability on practical timescales.
    • Antifragility: stochastic/periodic AI failures paradoxically strengthen human capability by keeping agents in the autonomous basin.
  • Quantitative calibration & empirical support

    • Domain-specific forgetting rates β fitted to four empirical cases:
      • Education (Bastani et al.): β ≈ 0.047 per session — 17% drop after 4 sessions.
      • Medical endoscopy (Budzyn et al.): β ≈ 0.02 per week — 21% decline after 12 weeks.
      • Spatial cognition (Dahmani & Bohbot): β ≈ 0.01 per month — ~30% decline over 36 months.
      • Aviation (Casner & Schooler): β ≈ 0.002 per month — 38% failures on certain tasks with prolonged autopilot exposure.
    • Population-level validation: 15-country PISA panel (102 data points, 2003–2022) fitted with three global parameters:
      • α (MLE) = 0.013 (95% CI 0.008–0.038), βeff = 0.004, Hmax = 787 (scale to PISA points).
      • Fit statistics: R^2 = 0.946 (3 parameters), model favored by BIC vs. higher-parameter alternatives.
    • Operationalizing K via benchmark ratios Kd = S_AI / S_human (capped at 1):
      • GPT-4 (Mar 2023) raised mean ¯K ≈ 0.86 (near K*); later models (GPT-4o, Claude 3.5, GPT-4.1) reach ¯K ≈ 0.94–0.96, i.e., well into the risky regime (benchmarks may be optimistic upper bounds).
  • Policy-relevant numerical results (from simulations/ABM)

    • K* ~ 0.85 (baseline scope s = 0.7, social contagion δ = 0.5); maximum sensitivity |dH/dK| ≈ 12 at the threshold.
    • Periodic AI failures: at K = 0.9, introducing stochastic failures yields a 2.7× improvement in equilibrium human capability relative to perfectly reliable AI.
    • Mandatory manual practice: requiring 20% of tasks be done without AI preserves ≈ 92% more capability than the baseline simulation (which already included a 5% background AI-failure rate).
    • Sensitivity: K* location varies modestly with α, β, δ, scope s (range ~0.82–0.92), meaning domain-specific β matters: slow-decay domains (aviation) are more resilient.

Data & Methods

  • Modeling approach

    • Minimal ODE model coupling capability and delegation built from three behavioral/neuroscientific axioms.
    • Deterministic mean-field ODE analysis (fixed points, eigenvalues, bifurcation structure) combined with stochastic agent-based model (ABM) implementations to capture noise-induced tipping and variance across agents.
    • Learning baseline ε = 0.01 included to make H = 0 a near-absorbing state (extremely slow recovery).
  • Calibration & empirical data sources

    • Single-domain β calibration: fitted to published empirical deskilling measurements in education, endoscopy, spatial cognition, and aviation.
    • Multi-country panel: OECD PISA mathematics time series (2003–2022) from 15 countries; country-specific internet/smartphone adoption used as exogenous driver for delegation D(t).
    • Parameter estimation: nonlinear least squares; profile-likelihood used to resolve identifiability (α unidentifiable from OECD average alone, but identifiable from cross-country panel).
    • Model comparison: ODE vs. linear, exponential, logistic decay on PISA series using R^2, AIC/BIC; ODE favored by parsimony and fit (AIC/BIC).
    • Operational K: mapped using established benchmarks (MMLU, HumanEval, USMLE, Bar exam) to compute Kd and arithmetic mean ¯K over domains.
  • Computational details

    • ABM sweeps: K varied from 0.50 to 0.99 (50 grid points), 50 stochastic replicates per point, baseline social contagion δ = 0.5, scope s = 0.7, background crisis/failure rates modeled (baseline 5%).
    • Fit results reported with confidence intervals and information-criterion comparisons; sensitivity analyses presented for parameter ranges.
  • Limitations noted by authors (implicit in methods)

    • Benchmarks may overestimate real-world competence (benchmarks are upper bounds).
    • Mean-field social contagion is a simplifying assumption; richer network effects omitted.
    • The model is intentionally minimal—captures qualitative structural risks but abstracts many micro-level institutional and educational responses.

Implications for AI Economics

  • Macroeconomic and labor-market risk framing

    • Automation externalities: adoption of high-K AI generates negative externalities by eroding human skill capital—these are nonlinear and include tipping points and hysteresis (long-run path dependence).
    • Irreversibility implies that welfare analyses and cost–benefit models of automation must incorporate long tails of recovery costs, intergenerational effects, and potentially permanent reductions in human capital.
    • The bistability means small policy/regulatory differences or shocks can produce large, persistent cross-country divergence in labor capabilities and productivity.
  • Policy instruments & regulatory design

    • Focus on capability-gap management rather than binary adoption bans: keep effective ¯K below the critical threshold where possible, or manage delegation D through quotas/mandates.
    • Low-cost, high-impact interventions:
      • Mandatory manual-practice quotas (e.g., 20% of tasks done without AI) can dramatically preserve capability (authors report ≈92% better preservation vs. baseline).
      • Deliberate periodic AI "failures" or forced outages (or simulated failure drills) to induce antifragility—can increase equilibrium human capability (reported 2.7× improvement at K = 0.9).
      • Stress-testing AI for reliability and publishing realistic K estimates (benchmarks adjusted downward for real-world performance) to better inform adoption incentives.
    • Domain-specific regulation: because β varies by skill domain, policy should be targeted—fields with fast skill decay (education, basic knowledge tasks) need stricter practice requirements than slow-decay domains (some procedural or motor skills).
  • Measurement and monitoring

    • Operationalize and track K per domain using benchmark-to-expert ratios; monitor delegation rates D in firms/institutions as a leading indicator for potential tipping.
    • Incorporate measures of social contagion/adoption (δ) and scope (s) into industry risk assessments—network effects can move systems across K* even if K itself is modest.
  • Economic modeling recommendations

    • Macro and micro models of automation should incorporate nonlinearity (bistability, tipping points) and asymmetric recovery costs (hysteresis).
    • Inclusion of human-skill depreciation dynamics (β) and practice dependence (α) is essential when evaluating long-term returns to automation and optimal taxation/subsidy policies.
    • Consider insurance-like mechanisms (liability, certification, mandatory manual-practice credits) to internalize negative externalities from high-K AI deployment.
  • Distributional and strategic considerations

    • Lock-in and coordination failures: once a large fraction of a workforce crosses into the dependent basin, recovery is costly and requires large coordinated investments in retraining and enforced practice.
    • International competitiveness: countries with different adoption timing and policy regimes may permanently diverge in human capital endowments, affecting comparative advantage.
    • Firms may underinvest in human-skill maintenance because they capture short-run productivity gains from delegation while socializing long-run human-capability losses—justifying regulatory intervention.

Overall, the paper provides a compact, empirically calibrated dynamical framework that highlights a non-intuitive systemic risk of high-capability AI: across broad parameter ranges, small increases in K around a threshold can produce large and persistent human-capability losses. For AI economics, this implies that evaluations of automation should internalize nonlinear skill-depletion externalities and prioritize interventions that maintain a minimum level of human practice and controlled exposure to AI capability.

Assessment

Paper Typetheoretical Evidence Strengthmedium — The model delivers a tight within-sample fit (R^2 = 0.946, low BIC) and plausible cross-domain calibration, but empirical support is limited to observational calibration and simulation counterfactuals; the catastrophic-threshold claim hinges on model structure and parameter choices rather than independent causal tests or randomized interventions. Methods Rigormedium — Formal, transparent dynamical modeling and calibration with few parameters are strengths; however, rigor is limited by reliance on strong axioms, potential sensitivity to functional-form choices, limited external validation beyond PISA, and absence of robustness checks against alternative model specifications or causal identification strategies. SampleCalibration uses domain-specific data for four domains (education, medicine, navigation, aviation); validation uses PISA data from 15 countries comprising 102 data points fitted with three free parameters (reported R^2 = 0.946); simulations explore counterfactuals including periodic AI failures and mandated practice rates. Themeshuman_ai_collab skills_training governance IdentificationDerives a causal threshold from a two-variable dynamical system (capability H and delegation D) based on three axioms; parameters are calibrated to domain-specific data and validated by fitting to observational PISA data (102 points, 3 free parameters); counterfactuals come from model simulations rather than experimental/quasi-experimental identification. GeneralizabilityCalibrated domains (education/medicine/navigation/aviation) may not represent all economic activities or adult workplace skills, PISA covers school-aged learning outcomes and cross-country comparability issues, not workforce-level capability or firm productivity, Model assumes homogenous agents and simple two-variable dynamics, omitting heterogeneity, institutional responses, and skill complementarities, Results depend on chosen functional forms and parameter values; threshold K* is scope- and model-dependent, Assumes exogenous delegation and fixed AI failure/background rates—does not model endogenous adoption, technology improvement, or policy responses

Claims (9)

ClaimDirectionConfidenceOutcomeDetails
We present a two-variable dynamical systems model coupling capability (H) and delegation (D), grounded in three axioms: learning requires capability, practice, and disuse causes forgetting. Skill Obsolescence positive high human capability as a dynamical variable (H) and delegation level (D)
0.02
The model identifies a critical threshold K* approximately 0.85 (scope-dependent; broader AI scope lowers K*) beyond which capability collapses abruptly — the 'enrichment paradox.' Skill Obsolescence negative high critical delegation/capability threshold (K*) at which human capability collapses
n=4
K* approximately 0.85
0.12
Broader AI scope lowers the critical threshold K* (i.e., more general AI reduces the K* value at which capability collapse occurs). Skill Obsolescence negative high change in critical threshold K* with AI scope
n=4
0.12
The model was calibrated to four domains: education, medicine, navigation, and aviation. Skill Obsolescence positive high model parameter fits across domains
n=4
0.12
Validated against 15 countries' PISA data (102 points), the model achieves R^2 = 0.946 with 3 parameters and attains the lowest BIC among compared specifications. Skill Acquisition positive high fit of model to PISA data (explained variance, model selection via BIC)
n=102
R^2 = 0.946
0.2
The model predicts that periodic AI failures improve human capability 2.7-fold (relative improvement reported in simulations). Skill Acquisition positive high human capability (H) under periodic AI-failure regime
2.7-fold
0.12
A policy of 20% mandatory practice preserves 92% more capability than the simulation baseline (baseline includes a 5% background AI-failure rate). Skill Acquisition positive high preserved human capability under mandatory practice policy vs baseline
92% more capability
0.12
These findings provide quantitative foundations for AI capability-threshold governance. Governance And Regulation positive medium usefulness of model results for governance design
0.01
As artificial intelligence assumes cognitive labor, no existing quantitative framework predicts when human capability loss becomes catastrophic. Skill Obsolescence negative high absence of prior quantitative frameworks for catastrophic human capability loss
0.06

Notes