A simple dynamical model calibrated to PISA and domain data finds humans risk abrupt loss of core skills once AI performs roughly 85% of tasks, but scheduled practice or occasional AI outages can largely avert the collapse.
As artificial intelligence assumes cognitive labor, no quantitative framework predicts when human capability loss becomes catastrophic. We present a two-variable dynamical systems model coupling capability (H) and delegation (D), grounded in three axioms: learning requires capability, practice, and disuse causes forgetting. Calibrated to four domains (education, medicine, navigation, aviation), the model identifies a critical threshold K* approximately 0.85 (scope-dependent; broader AI scope lowers K*) beyond which capability collapses abruptly-the "enrichment paradox." Validated against 15 countries' PISA data (102 points, R^2 = 0.946, 3 parameters, lowest BIC), the model predicts that periodic AI failures improve capability 2.7-fold and that 20% mandatory practice preserves 92% more capability than the simulation baseline (which includes a 5% background AI-failure rate). These findings provide quantitative foundations for AI capability-threshold governance.
Summary
Main Finding
The paper develops a minimal two-variable dynamical model of human capability (H) and delegation to AI (D) that predicts a sharp, domain-robust critical capability threshold K ≈ 0.85. When AI capability K surpasses K, human capability can collapse abruptly (the "enrichment paradox"), producing near-irreversible dependency. The model is calibrated to multiple empirical domains and population-level PISA data, and yields actionable policy conclusions: modest mandatory manual-practice quotas (e.g., 20% of tasks) and/or engineered periodic AI failures can substantially preserve human capability.
Key Points
-
Model summary
- State variables: human capability H(t) ∈ [0,1] and delegation rate D(t) ∈ [0,1].
- Core ODEs:
- dH/dt = α (H + ε)(1 − H)(1 − D) − β H D
- dD/dt = γ (K − H)(1 − D)D + δ D(1 − D) D̄ (mean-field: D̄ = D)
- Three minimal axioms underpinning the model: (1) learning requires existing capability, (2) learning requires practice, (3) disuse causes forgetting.
- Multiplicative learning term (H·(1−D)) creates bistability and makes recovery from near-zero H extremely slow (timescale ∼ 1/(α ε)).
-
Principal dynamics and phenomena
- Bistability with two stable attractors: autonomous (H ≈ 1, D ≈ 0) and dependent (H ≈ 0, D ≈ 1). A saddle separates their basins.
- Critical threshold K ≈ 0.85 (robustly found in ABM sweeps across parameter ranges, numerically in 0.82–0.92) marks a cliff-like transition: small increases in K around K produce large drops in equilibrium H.
- Enrichment paradox: improving AI capability (increasing K) can destabilize human–AI coexistence and drive capability collapse.
- Irreversibility/hysteresis: once H falls below the saddle, removing AI does not restore capability on practical timescales.
- Antifragility: stochastic/periodic AI failures paradoxically strengthen human capability by keeping agents in the autonomous basin.
-
Quantitative calibration & empirical support
- Domain-specific forgetting rates β fitted to four empirical cases:
- Education (Bastani et al.): β ≈ 0.047 per session — 17% drop after 4 sessions.
- Medical endoscopy (Budzyn et al.): β ≈ 0.02 per week — 21% decline after 12 weeks.
- Spatial cognition (Dahmani & Bohbot): β ≈ 0.01 per month — ~30% decline over 36 months.
- Aviation (Casner & Schooler): β ≈ 0.002 per month — 38% failures on certain tasks with prolonged autopilot exposure.
- Population-level validation: 15-country PISA panel (102 data points, 2003–2022) fitted with three global parameters:
- α (MLE) = 0.013 (95% CI 0.008–0.038), βeff = 0.004, Hmax = 787 (scale to PISA points).
- Fit statistics: R^2 = 0.946 (3 parameters), model favored by BIC vs. higher-parameter alternatives.
- Operationalizing K via benchmark ratios Kd = S_AI / S_human (capped at 1):
- GPT-4 (Mar 2023) raised mean ¯K ≈ 0.86 (near K*); later models (GPT-4o, Claude 3.5, GPT-4.1) reach ¯K ≈ 0.94–0.96, i.e., well into the risky regime (benchmarks may be optimistic upper bounds).
- Domain-specific forgetting rates β fitted to four empirical cases:
-
Policy-relevant numerical results (from simulations/ABM)
- K* ~ 0.85 (baseline scope s = 0.7, social contagion δ = 0.5); maximum sensitivity |dH/dK| ≈ 12 at the threshold.
- Periodic AI failures: at K = 0.9, introducing stochastic failures yields a 2.7× improvement in equilibrium human capability relative to perfectly reliable AI.
- Mandatory manual practice: requiring 20% of tasks be done without AI preserves ≈ 92% more capability than the baseline simulation (which already included a 5% background AI-failure rate).
- Sensitivity: K* location varies modestly with α, β, δ, scope s (range ~0.82–0.92), meaning domain-specific β matters: slow-decay domains (aviation) are more resilient.
Data & Methods
-
Modeling approach
- Minimal ODE model coupling capability and delegation built from three behavioral/neuroscientific axioms.
- Deterministic mean-field ODE analysis (fixed points, eigenvalues, bifurcation structure) combined with stochastic agent-based model (ABM) implementations to capture noise-induced tipping and variance across agents.
- Learning baseline ε = 0.01 included to make H = 0 a near-absorbing state (extremely slow recovery).
-
Calibration & empirical data sources
- Single-domain β calibration: fitted to published empirical deskilling measurements in education, endoscopy, spatial cognition, and aviation.
- Multi-country panel: OECD PISA mathematics time series (2003–2022) from 15 countries; country-specific internet/smartphone adoption used as exogenous driver for delegation D(t).
- Parameter estimation: nonlinear least squares; profile-likelihood used to resolve identifiability (α unidentifiable from OECD average alone, but identifiable from cross-country panel).
- Model comparison: ODE vs. linear, exponential, logistic decay on PISA series using R^2, AIC/BIC; ODE favored by parsimony and fit (AIC/BIC).
- Operational K: mapped using established benchmarks (MMLU, HumanEval, USMLE, Bar exam) to compute Kd and arithmetic mean ¯K over domains.
-
Computational details
- ABM sweeps: K varied from 0.50 to 0.99 (50 grid points), 50 stochastic replicates per point, baseline social contagion δ = 0.5, scope s = 0.7, background crisis/failure rates modeled (baseline 5%).
- Fit results reported with confidence intervals and information-criterion comparisons; sensitivity analyses presented for parameter ranges.
-
Limitations noted by authors (implicit in methods)
- Benchmarks may overestimate real-world competence (benchmarks are upper bounds).
- Mean-field social contagion is a simplifying assumption; richer network effects omitted.
- The model is intentionally minimal—captures qualitative structural risks but abstracts many micro-level institutional and educational responses.
Implications for AI Economics
-
Macroeconomic and labor-market risk framing
- Automation externalities: adoption of high-K AI generates negative externalities by eroding human skill capital—these are nonlinear and include tipping points and hysteresis (long-run path dependence).
- Irreversibility implies that welfare analyses and cost–benefit models of automation must incorporate long tails of recovery costs, intergenerational effects, and potentially permanent reductions in human capital.
- The bistability means small policy/regulatory differences or shocks can produce large, persistent cross-country divergence in labor capabilities and productivity.
-
Policy instruments & regulatory design
- Focus on capability-gap management rather than binary adoption bans: keep effective ¯K below the critical threshold where possible, or manage delegation D through quotas/mandates.
- Low-cost, high-impact interventions:
- Mandatory manual-practice quotas (e.g., 20% of tasks done without AI) can dramatically preserve capability (authors report ≈92% better preservation vs. baseline).
- Deliberate periodic AI "failures" or forced outages (or simulated failure drills) to induce antifragility—can increase equilibrium human capability (reported 2.7× improvement at K = 0.9).
- Stress-testing AI for reliability and publishing realistic K estimates (benchmarks adjusted downward for real-world performance) to better inform adoption incentives.
- Domain-specific regulation: because β varies by skill domain, policy should be targeted—fields with fast skill decay (education, basic knowledge tasks) need stricter practice requirements than slow-decay domains (some procedural or motor skills).
-
Measurement and monitoring
- Operationalize and track K per domain using benchmark-to-expert ratios; monitor delegation rates D in firms/institutions as a leading indicator for potential tipping.
- Incorporate measures of social contagion/adoption (δ) and scope (s) into industry risk assessments—network effects can move systems across K* even if K itself is modest.
-
Economic modeling recommendations
- Macro and micro models of automation should incorporate nonlinearity (bistability, tipping points) and asymmetric recovery costs (hysteresis).
- Inclusion of human-skill depreciation dynamics (β) and practice dependence (α) is essential when evaluating long-term returns to automation and optimal taxation/subsidy policies.
- Consider insurance-like mechanisms (liability, certification, mandatory manual-practice credits) to internalize negative externalities from high-K AI deployment.
-
Distributional and strategic considerations
- Lock-in and coordination failures: once a large fraction of a workforce crosses into the dependent basin, recovery is costly and requires large coordinated investments in retraining and enforced practice.
- International competitiveness: countries with different adoption timing and policy regimes may permanently diverge in human capital endowments, affecting comparative advantage.
- Firms may underinvest in human-skill maintenance because they capture short-run productivity gains from delegation while socializing long-run human-capability losses—justifying regulatory intervention.
Overall, the paper provides a compact, empirically calibrated dynamical framework that highlights a non-intuitive systemic risk of high-capability AI: across broad parameter ranges, small increases in K around a threshold can produce large and persistent human-capability losses. For AI economics, this implies that evaluations of automation should internalize nonlinear skill-depletion externalities and prioritize interventions that maintain a minimum level of human practice and controlled exposure to AI capability.
Assessment
Claims (9)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| We present a two-variable dynamical systems model coupling capability (H) and delegation (D), grounded in three axioms: learning requires capability, practice, and disuse causes forgetting. Skill Obsolescence | positive | high | human capability as a dynamical variable (H) and delegation level (D) |
0.02
|
| The model identifies a critical threshold K* approximately 0.85 (scope-dependent; broader AI scope lowers K*) beyond which capability collapses abruptly — the 'enrichment paradox.' Skill Obsolescence | negative | high | critical delegation/capability threshold (K*) at which human capability collapses |
n=4
K* approximately 0.85
0.12
|
| Broader AI scope lowers the critical threshold K* (i.e., more general AI reduces the K* value at which capability collapse occurs). Skill Obsolescence | negative | high | change in critical threshold K* with AI scope |
n=4
0.12
|
| The model was calibrated to four domains: education, medicine, navigation, and aviation. Skill Obsolescence | positive | high | model parameter fits across domains |
n=4
0.12
|
| Validated against 15 countries' PISA data (102 points), the model achieves R^2 = 0.946 with 3 parameters and attains the lowest BIC among compared specifications. Skill Acquisition | positive | high | fit of model to PISA data (explained variance, model selection via BIC) |
n=102
R^2 = 0.946
0.2
|
| The model predicts that periodic AI failures improve human capability 2.7-fold (relative improvement reported in simulations). Skill Acquisition | positive | high | human capability (H) under periodic AI-failure regime |
2.7-fold
0.12
|
| A policy of 20% mandatory practice preserves 92% more capability than the simulation baseline (baseline includes a 5% background AI-failure rate). Skill Acquisition | positive | high | preserved human capability under mandatory practice policy vs baseline |
92% more capability
0.12
|
| These findings provide quantitative foundations for AI capability-threshold governance. Governance And Regulation | positive | medium | usefulness of model results for governance design |
0.01
|
| As artificial intelligence assumes cognitive labor, no existing quantitative framework predicts when human capability loss becomes catastrophic. Skill Obsolescence | negative | high | absence of prior quantitative frameworks for catastrophic human capability loss |
0.06
|