Resilience Meets Autonomy: Governing Embodied AI in Critical Infrastructure

Critical infrastructure increasingly incorporates embodied AI for monitoring, predictive maintenance, and decision support. However, AI systems designed to handle statistically representable uncertainty struggle with cascading failures and crisis dynamics that exceed their training assumptions. This paper argues that Embodied AIs resilience depends on bounded autonomy within a hybrid governance architecture. We outline four oversight modes and map them to critical infrastructure sectors based on task complexity, risk level, and consequence severity. Drawing on the EU AI Act, ISO safety standards, and crisis management research, we argue that effective governance requires a structured allocation of machine capability and human judgement.

Summary

Main Finding

Embodied AI (EAI) can materially improve resilience and operations in critical infrastructure (CI) but is vulnerable to systemic surprise, cascading failures, and adversarial manipulation that exceed typical training assumptions. Robust resilience requires bounded autonomy within a hybrid governance architecture that combines four oversight modes (fully automated, human-on-the-loop, human-in-the-loop, human-in-command). The appropriate mode depends on task complexity, risk, time constraints, and consequence severity; systems must be designed to switch or combine modes, supported by standards, operator training, and lifecycle governance.

Key Points

Problem framing
- CI faces systemic uncertainty (cascades, interdependence, unknown unknowns) that AI trained on historical distributions cannot always handle.
- Failures often arise from socio-technical misalignment (omissions, commissions), not only from component malfunctions.
- EAI magnifies fragilities because physical action ties perception errors to real-world harm and because infrastructures are tightly coupled (normal-accident dynamics).
Vulnerability taxonomy
- Exogenous: hostile environment, dynamic conditions, adversarial interference (data poisoning, backdoors).
- Endogenous: sensor failures, hardware wear, algorithmic brittleness, SLAM drift and data-association errors.
- Mixed: external pressures exploiting internal weaknesses leading to coupled failures across perception, control, and action.
Oversight taxonomy (four ideal types)
Fully AI-Automated (Human-out-of-the-Loop): high autonomy for ultra-fast, low-latency tasks (e.g., millisecond load balancing). Requires strong functional safety and fail-safes.
Human-on-the-Loop (HOTL): passive supervision with ability to monitor, interrupt, override. Suited to steady-state monitoring and predictive maintenance.
Human-in-the-Loop (HITL): active human approval is mandatory for high-impact actions (service restoration, reconfiguration).
Human-in-Command (HIC): humans set goals, constraints, and escalation rules for strategic, high-uncertainty decisions (crisis management).
Operational mapping
- Energy: UAVs/USVs for inspection → HOTL/HITL; HIC during large outages.
- Transport: autonomous vehicles, MASS → HOTL for routine, HITL/HIC in emergencies.
- Water/wastewater/digital infra: AUVs, crawling robots → HOTL/HITL depending on consequences.
- Banking/finance/public admin: mainly cybersecurity-focused automation; supervisory roles remain central.
Governance/standards context
- EU AI Act (2024) treats many CI AI systems as high-risk, requiring lifecycle obligations and human oversight design.
- ISO work (ISO/IEC TR 5469:2024; ISO/IEC TS 8200:2024) emphasizes controllability, observability, transfer-of-control, and functional safety.
- Policy instruments set boundary conditions but rarely operationalize mode selection or mode-switching mechanics.
Human factors and resilience
- Humans add contextual interpretation, normative judgment, and improvisational capacity in crises; AI adds speed, scale, and pattern recognition.
- Cognitive overload from continuous AI alerts is a real risk; design must manage operator workload and training (simulation exercises).

Data & Methods

Analytical approach: conceptual and normative analysis anchored in interdisciplinary literature (safety science, crisis management, AI governance, robotics).
Evidence base: synthesis of prior studies, standards, and policy texts (EU AI Act, EU Directive on Resilience of Critical Entities, NSM-25, recent ISO documents), and domain examples (SLAM literature, EAI deployments in energy/transport/water/space/health).
Outputs: a taxonomy of oversight modes, vulnerability mapping (exogenous/endogenous/mixed), and sectoral mappings linking oversight modes to representative EAI applications (summarized in paper tables).
Limitations: no primary empirical or quantitative experiments; arguments rely on literature synthesis, normative reasoning, and illustrative mappings rather than statistical measurement. Operational prescriptions require domain-specific calibration and empirical validation.

Implications for AI Economics

Investment and deployment trade-offs
- Bounded autonomy and hybrid governance increase upfront and recurring costs (human oversight staffing, operator training, simulation exercises, fail-safe engineering, compliance documentation).
- These costs are investments in reducing tail risks and increasing system resilience; cost-benefit depends on the magnitude and probability of cascade/externalities.
Labor and skill composition
- Demand shifts from manual operational roles toward supervisory, interpretative, and crisis-management skills (higher wages for skilled supervisors; retraining needs).
- Emergence of new occupations (AI safety engineers, oversight operators, simulation/training designers) and changes in labor bargaining over responsibility and liability.
Regulation, compliance, and market structure
- Stricter regulation (EU AI Act-style) raises compliance costs and may favor larger incumbents who can absorb certification and monitoring expenses, potentially slowing entry by smaller firms.
- Standardization (ISO) can lower transaction costs and increase interoperability, facilitating economies of scale in safe-by-design EAI solutions.
Liability, insurance, and externalities
- Blurred responsibility across designers, operators, infrastructure owners, and automated systems complicates liability allocation; clearer oversight modes can help assign legal and economic responsibility.
- Insurance markets will need new actuarial models for correlated/systemic failure risk; premiums may rise for high-autonomy deployments without robust oversight, and insurers may demand mode-specific mitigation measures.
Innovation incentives and regulatory arbitrage
- Requirements for human oversight and safety-by-design can slow rapid deployment but may increase social welfare by internalizing systemic risk.
- Differing regulatory regimes across jurisdictions create incentives for regulatory arbitrage; harmonized standards reduce inefficiencies but require coordination.
Resilience as a public good
- Systemic risks and cascading failures generate externalities that private firms may under-invest in mitigating; public provision (testing infrastructure, incident reporting, shared simulation platforms) and subsidies for oversight capabilities could be justified.
- Public-sector procurement can shape market incentives by requiring specific oversight modes and rigorous lifecycle governance.
Research agenda for AI economics
- Quantify costs and benefits of different oversight modes across sectors (including insurance cost impacts).
- Model externalities from cascading failures to derive optimal regulation and subsidy levels.
- Study labor market impacts and training/transition policy effectiveness.
- Develop econometric measures of resilience gains from EAI under bounded autonomy and map heterogeneous firm responses to regulation.

Overall, the paper argues that economically efficient and socially acceptable deployment of embodied AI in CI requires explicit governance design that prices the cost of oversight against the avoided systemic risk, aligns incentives through standards and procurement, and anticipates labor and insurance market adjustments.

Assessment

Paper Typetheoretical Evidence Strengthlow — The paper is primarily conceptual and normative, synthesizing policy texts, standards, and crisis literature rather than presenting new empirical analyses or causal inference; claims are plausible and grounded in existing literature but untested against field data or counterfactuals. Methods Rigormedium — The authors provide a structured synthesis of relevant legal, technical, and crisis-management sources and a clear analytical taxonomy (bounded autonomy, oversight modes, triage criteria), but methods rely on thought experiments and illustrative mappings without formal modeling, empirical validation, or quantitative sensitivity analysis. SampleNo original empirical sample; draws on secondary sources including the EU AI Act, ISO safety standards, and empirical/theoretical literature on crisis dynamics, cascading failures, and safety governance; uses illustrative mappings and thought experiments to apply concepts to critical infrastructure sectors. Themesgovernance human_ai_collab GeneralizabilityConceptual framework not empirically validated—real-world applicability depends on untested assumptions about system interactions and tail risks, Sectoral heterogeneity: critical infrastructures (energy, transport, water, telecoms) differ in technical architectures and risk profiles, limiting one-size-fits-all application, Jurisdictional and regulatory variation (EU vs US vs other regions) affects feasibility and compliance costs, Rapid technological change may alter capability-risk tradeoffs, requiring frequent updates to oversight modes, Organizational differences (size, resources, risk appetite) constrain adoption—small operators may face disproportionate compliance burdens, Interaction effects with third-party automated systems and supply chains are context-dependent and not fully characterized

Claims (16)

Claim	Direction	Confidence	Outcome	Details
Embodied AI in critical infrastructure is vulnerable to cascading failures and crisis dynamics outside training distributions. Ai Safety And Ethics	negative	medium	vulnerability to cascading/systemic failures (probability or severity of cascade when confronted with out-of-distribution crises)	0.04
Modern critical infrastructure increasingly uses embodied AI for monitoring, predictive maintenance, and decision support, but these systems are typically trained for statistically representable uncertainty rather than systemic, cascading crises. Ai Safety And Ethics	mixed	medium	mismatch between training uncertainty assumptions and real-world systemic crisis conditions (out-of-distribution performance degradation)	0.04
Purely capability-driven autonomy can exacerbate crises when AI actions interact with novel dynamics or other automated systems. Ai Safety And Ethics	negative	medium	change in crisis propagation/severity attributable to autonomous AI decisions (increase in cascade size or speed)	0.04
Robust resilience stems from 'bounded autonomy': constraining what an AI may decide and when humans must intervene. Ai Safety And Ethics	positive	medium	system resilience metrics (ability to avoid cascades, graceful degradation, containment of failures) under bounded-autonomy regimes	0.04
The paper defines and specifies four oversight modes (spanning near-full autonomy to strict human control) and provides criteria for selecting modes based on task complexity, risk level, and consequence severity. Governance And Regulation	null_result	high	existence and specification of four oversight modes and their mapping criteria (paper-internal descriptive outcome)	0.06
Governance should be hybrid and structured: legal/regulatory frameworks (e.g., EU AI Act), technical standards (ISO safety norms), and crisis-management practices must be combined to allocate responsibilities and intervention authority. Governance And Regulation	positive	medium	degree to which governance arrangements allocate responsibility and intervention authority effectively (qualitative governance effectiveness)	0.04
Allocation decisions should be explicit, auditable, and adaptive — with provisions for overriding, fallbacks, and graceful degradation during unanticipated conditions. Regulatory Compliance	positive	low	auditability, adaptability, and existence of override/fallback mechanisms in deployed governance arrangements	0.02
Requiring bounded autonomy and hybrid governance raises upfront costs (designing constraints, verification, auditing) and ongoing operational costs (human oversight, training, compliance), which will affect deployment timing and scale across sectors. Adoption Rate	negative	medium	change in deployment costs and timing (capital and operational expenditures, time-to-deploy) attributable to governance requirements	0.04
Demand will grow for tools and services that enable oversight (auditability, explainability, safe fallbacks), creating markets for verification, certification, safety middleware, and human-in-the-loop platforms. Adoption Rate	positive	low	market growth for oversight-enabling products and services (demand, number of vendors, revenue in verification/certification sectors)	0.02
Insurers will price systemic-tail risks differently from routine failure risk, potentially increasing premiums for high-autonomy deployments or requiring minimum oversight modes for coverage. Market Structure	negative	low	insurance pricing and coverage conditions for high-autonomy deployments (premiums, coverage exclusions, oversight requirements)	0.02
Increased need for oversight changes labor demand — growth in roles for system supervisors, incident managers, and auditors; potential reduction in purely operational positions but increased value for crisis-experienced expertise. Employment	mixed	low	labor demand shifts (employment levels by occupation, wages for oversight and crisis-experienced roles, decline in operational roles)	0.02
Aligning deployments with frameworks like the EU AI Act will influence cross-border competitiveness and create compliance costs that small operators may struggle to bear, possibly concentrating deployment among larger firms or those using third-party governance services. Market Structure	negative	medium	market concentration and competitiveness effects (number/size distribution of deploying firms, cross-border competitiveness indices) due to compliance requirements	0.04
Bounded-autonomy governance internalizes some externalities from automated interactions, reducing the probability of cascading failures and associated economic damages, but misaligned or heterogeneous governance across firms/sectors can still generate systemic vulnerabilities. Ai Safety And Ethics	mixed	medium	net effect on systemic risk (probability and expected loss from cascades) under bounded-autonomy governance versus heterogeneous governance	0.04
Policymakers must weigh productivity gains from higher autonomy against increased systemic risk and governance costs; optimal allocation will vary by sector (high-consequence systems justify stricter human oversight; lower-consequence tasks may tolerate more autonomy). Governance And Regulation	mixed	medium	policy-optimal oversight allocation by sector (trade-off between productivity gains and expected systemic risk/costs)	0.04
New metrics are needed to value resilience (robustness to out-of-distribution events, graceful degradation) in procurement and contracting; performance-based contracts and regulated minimums for oversight mode selection can help align incentives. Governance And Regulation	positive	low	existence and use of resilience metrics in procurement/contracts and resulting alignment of incentives (contract terms, procurement criteria adoption)	0.02
Methodology is primarily conceptual and normative: the paper synthesizes policy texts, safety standards, and crisis-management literature and relies on illustrative mappings and thought experiments rather than new empirical field data. Other	null_result	high	methodological characterization (use of conceptual synthesis vs. empirical data collection)	0.06

Emboldened autonomy can amplify crises: embodied AI in critical infrastructure must be constrained and paired with human oversight to prevent cascading failures, with clear, auditable allocations of machine capability and human judgement guiding deployment and regulation.