Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing

International shipping produces approximately 3% of global greenhouse gas emissions, yet voyage routing remains dominated by heuristic methods. We present PIER (Physics-Informed, Energy-efficient, Risk-aware routing), an offline reinforcement learning framework that learns fuel-efficient, safety-aware routing policies from physics-calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator. Validated on one full year (2023) of AIS data across seven Gulf of Mexico routes (840 episodes per method), PIER reduces mean CO2 emissions by 10% relative to great-circle routing. However, PIER's primary contribution is eliminating catastrophic fuel waste: great-circle routing incurs extreme fuel consumption (>1.5x median) in 4.8% of voyages; PIER reduces this to 0.5%, a 9-fold reduction. Per-voyage fuel variance is 3.5x lower (p<0.001), with bootstrap 95% CI for mean savings [2.9%, 15.7%]. Partial validation against observed AIS vessel behavior confirms consistency with the fastest real transits while exhibiting 23.1x lower variance. Crucially, PIER is forecast-independent: unlike A* path optimization whose wave protection degrades 4.5x under realistic forecast uncertainty, PIER maintains constant performance using only local observations. The framework combines physics-informed state construction, demonstration-augmented offline data, and a decoupled post-hoc safety shield, an architecture that transfers to wildfire evacuation, aircraft trajectory optimization, and autonomous navigation in unmapped terrain.

Summary

Main Finding

PIER (Physics-Informed, Energy-efficient, Risk-aware routing) is an offline RL framework that, when trained in an AIS- and reanalysis-calibrated physics environment, preserves transit performance while substantially reducing extreme fuel-wasting voyages. Average CO2 per-voyage falls ~10% in simulation versus great-circle routing (mean savings = 18.2 t CO2), but the principal operational value is variance reduction: per-voyage CO2 standard deviation is 3.5× lower (p < 0.001), worst-case single-voyage CO2 is reduced by ~70%, and the frequency of extreme (>1.5× median) fuel events falls 9-fold (4.8% → 0.5%). PIER is also robust to forecast uncertainty, unlike classical A* optimization.

Key Points

Core idea: combine physics-informed state features from AIS + ocean reanalysis with offline RL (IQL), demonstration-augmented datasets, and a post-hoc safety shield.
Components:
- Physics-informed state: fused AIS kinematics with wave/wind/current reanalysis; fitted speed-loss model and Hull-Fatigue (HF) exposure metric.
- Training data: mix A*-optimal teacher trajectories with stochastic rollouts sampled from an AIS-calibrated environment (offline RL dataset not just logged behavior).
- Safety shield: light-weight post-hoc constraint enforcement (prevents land collisions and hazardous wave exposure) instead of hard-coded reward constraints.
Evaluation:
- Data: full year (2023) AIS + Copernicus/NOAA reanalysis across 7 Gulf of Mexico routes; 840 simulated evaluation episodes per method; 1,132 arrived voyages on 5 core routes used for CO2 analysis.
- Performance vs baselines (great-circle, greedy, CQL, BC, DQN):
  - Arrival rate: PIER 83.3% vs great-circle 78.0%.
  - Mean transit time: PIER 45.6 h vs great-circle 49.8 h (≈8% faster).
  - Mean CO2: PIER 171.6 t/voyage vs great-circle 189.8 t (−9.6%).
  - Median CO2: small median change (0.6%); tail improvements drive mean.
- Tail & variance effects:
  - 95th percentile CO2 down 6.4% (242.9 t vs 259.5 t).
  - Max single-voyage CO2 reduced 69.8% (470.8 t vs 1,560.4 t).
  - Voyages >1.5× median: great-circle 4.8% → PIER 0.5% (9× reduction).
  - CO2 SD: PIER 76.4 t vs great-circle 141.9 t; Levene’s F = 13.5, p < 0.001.
- Forecast independence: A* path planning degrades under realistic forecast uncertainty (wave protection performance falls 4.5×); PIER maintains performance using only local observations.
Ablation insights:
- Safety shield most critical (removing it drops arrival by 54 episodes).
- Physics-informed features and HF-risk awareness materially affect arrival and safety; teacher demos less critical but helpful.
Model details and uncertainty:
- Speed-loss regression: ΔU = a·(Hs/Tp)·cos(µ)^1.5 + b·Hs^2 + c·Ctail + d·Vtail + e; cargo coefficients given; R^2 = 0.02 (captures directional trends, not operational noise).
- CO2 calibration: Admiralty coefficient + SFOC 170 g/kWh + 3.151 t CO2/t fuel.
- Bootstrap 95% CI for mean savings: [2.9%, 15.7%]; Monte Carlo CI on variance ratio [1.5×, 8.1×].
Partial real-world validation: on Mobile→Tampa corridor PIER’s CO2 estimates (95 ± 15 t) match the fastest observed transits (105–108 t) but with 23.1× lower variance than AIS-observed direct-transit behavior.
Limitations flagged by authors: simulator-based evaluation (no vessel has yet sailed a PIER route), modest R^2 in speed-loss model, grid resolution limits on narrow coastal routes, single-vessel-class calibration, and lack of direct comparison to proprietary commercial routing tools.

Data & Methods

Data sources:
- AIS vessel tracking (full 2023) for Gulf of Mexico routes.
- Ocean reanalysis products (Copernicus Marine Service; NOAA CoastWatch) for waves, winds, currents.
Environment construction:
- Physics-calibrated grid environment (0.1° grid; 0.05° noted as needed for narrow coastal corridors).
- Computed physics-informed features per grid cell: speed-loss estimate, Hull-Fatigue exposure EHF = Hs/Tp · max(0, cos µ)^1.5, energy indicators, along-track current/wind components.
Offline dataset generation:
- Teacher demonstrations: A*-optimal trajectories encoding domain knowledge.
- Stochastic behavioral rollouts: exploratory trajectories sampled from calibrated environment to broaden state-action coverage.
Learning algorithm:
- IQL (Implicit Q-Learning) offline RL with physics-informed states; post-hoc safety shield applied at evaluation.
- Baselines included CQL (offline RL), BC (behavioral cloning), online DQN, heuristic greedy and great-circle.
Safety:
- Safety shield enforces hard navigational constraints (land collision avoidance; hazardous wave exposure) at evaluation time, decoupled from reward shaping.
CO2 & fuel calibration:
- Admiralty coefficient method for reference Panamax bulk carrier (MCR 10,000 kW, service 14.0 kts), SFOC 170 g/kWh, VLSFO emission factor 3.151 t CO2/t fuel.
Statistical analysis:
- Per-voyage comparisons, percentiles, SD; Levene’s test for equality of variances; bootstrap for CI; Monte Carlo for propagation of speed-loss coefficient uncertainty.

Implications for AI Economics

Economic value of variance reduction > average improvement:
- Fleet operators care more about predictability (fuel budget stability, CII compliance, contractual penalties) than modest median gains. Eliminating rare but catastrophic fuel events can have outsized financial and regulatory impact.
- Example scale: using AIS voyage counts, potential annual Gulf savings estimated between ~24,000 t and ~332,000 t CO2 depending on tail-frequency assumptions; even conservative median-based estimates are non-negligible for budgeting.
Risk management & insurance:
- Systems that materially reduce tail risk (extreme fuel usage) can lower downside exposure, reduce insurance premiums, and stabilize provisioning/logistics costs.
Investment & deployment considerations:
- Offline RL with physics-informed states lowers the barrier to ML deployment in safety-critical domains lacking high-fidelity simulators, potentially accelerating R&D investment into operational routing tools.
- Forecast-independence reduces reliance on expensive long-horizon forecast infrastructure, shifting investment toward robust local sensing and environment-calibration pipelines.
Market adoption pathways:
- Operators and vendors may prefer solutions that improve worst-case outcomes and predictability even if median savings are small; this aligns incentives for trial deployments and commercial uptake.
- However, adoption requires field validation, vessel-specific calibration, high-resolution grids for short/coastal routes, and integration with scheduling/port constraints; these non-model frictions affect realized ROI.
Regulatory and policy impacts:
- Tools that reduce tail emissions and improve predictability could help shipping firms meet IMO CII targets and national/regional reporting requirements; regulators might encourage or standardize physics-informed offline evaluation for compliance claims.
Transferability to other domains:
- The recipe (physics-informed states + offline RL + demonstration augmentation + post-hoc safety shields) is broadly applicable to other transport/evacuation/trajectory problems where simulators are poor or absent, opening new economic opportunities for ML in infrastructure and logistics.
Caveats for economic models:
- Uncertainty in the speed-loss model, simulation-to-reality gap (crew behavior, engine degradation, port constraints), and route/grid resolution constraints mean economic impact estimates should be treated as scenario projections rather than guaranteed savings. Field trials and vessel-specific calibration are essential prior to large-scale investment decisions.

Assessment

Paper Typeother Evidence Strengthmedium — The paper evaluates an offline RL routing policy using a physics-calibrated environment and one year of historical AIS plus ocean reanalysis, with bootstrapped CIs and statistical tests; this provides credible performance evidence in-sample and on held-out AIS transits, but the results are based on simulated deployments (no randomized or real-world field trial), limited geographic coverage (seven Gulf of Mexico routes), and depend on modeling choices (environment calibration, baselines), which limit causal claims about operational fuel savings beyond the evaluated scenarios. Methods Rigormedium — Strengths include use of physics-informed state construction, calibration to reanalysis data, large episode counts (840 episodes per method), statistical inference (bootstrap CIs, p-values), and partial validation against observed AIS behavior; weaknesses include reliance on an offline simulator rather than live trials, potential unobserved operational constraints (port schedules, charterer instructions, vessel heterogeneity), unclear handling of vessel heterogeneity and uncertainty in environment calibration, and limited geographic/time scope making robustness to other seas, ship types, and commercial realities uncertain. SampleOne full year (2023) of AIS vessel tracking data for seven predefined Gulf of Mexico routes, combined with ocean reanalysis products (waves, winds, currents) to build a physics-calibrated offline environment; reported evaluation uses 840 episodes per method (routes × transits) and compares PIER to great-circle routing and A*-style path optimization, with partial validation against observed fastest real transits in the AIS record. Themesproductivity innovation GeneralizabilityGeographic limitation: tested only on seven Gulf of Mexico routes, may not generalize to open-ocean, polar, or congested straits conditions, Vessel heterogeneity: unclear how results vary by ship type, size, loading condition, propulsion system—may not generalize across fleet, Operational constraints: simulator may not capture commercial/operational constraints (schedules, pilotage, traffic separation schemes, regulatory routing) that affect real routing decisions, Modeling and calibration risk: performance depends on fidelity of physics calibration and reanalysis data; errors there can change outcomes, Baseline choice: comparison to great-circle and A* may overstate gains if real-world operators already use richer heuristics or commercial routing services, Time scope: one-year validation (2023) may not capture interannual variability in weather/climate patterns, Deployment gap: offline/simulated evaluation lacks evidence from live trials or economic cost–benefit including crew behavior and compliance

Claims (10)

Claim	Direction	Confidence	Outcome	Details
PIER reduces mean CO2 emissions by 10% relative to great-circle routing. Firm Productivity	positive	medium	mean CO2 emissions per voyage (percent reduction vs great-circle routing)	n=840 10% mean CO2 reduction vs great-circle routing 0.07
PIER eliminates catastrophic fuel waste: great-circle routing produces extreme fuel consumption (>1.5× median) in 4.8% of voyages, while PIER reduces this to 0.5% (a 9-fold reduction). Firm Productivity	positive	medium	fraction of voyages with fuel consumption >1.5× median	n=840 Incidence of extreme fuel consumption reduced from 4.8% to 0.5% (≈9× reduction) 0.07
PIER reduces per‑voyage fuel consumption variance by a factor of 3.5 (p < 0.001). Firm Productivity	positive	high	variance of per-voyage fuel consumption	n=840 Per-voyage fuel consumption variance reduced by factor 3.5 (p < 0.001) 0.12
Bootstrap 95% confidence interval for PIER mean CO2 savings relative to great-circle routing is [2.9%, 15.7%]. Firm Productivity	positive	high	95% bootstrap confidence interval for mean percent CO2 savings	n=840 Bootstrap 95% CI for mean percent CO2 savings: [2.9%, 15.7%] 0.12
Partial validation against observed AIS vessel behavior shows PIER is consistent with the fastest real transits while exhibiting 23.1× lower variance. Firm Productivity	positive	medium	variance of transit times or fuel use compared to fastest observed AIS transits	23.1× lower variance compared to fastest observed AIS transits 0.07
PIER is forecast‑independent: unlike A* path optimization whose wave protection degrades 4.5× under realistic forecast uncertainty, PIER maintains constant performance using only local observations. Firm Productivity	mixed	medium	robustness of routing performance under forecast uncertainty (degradation factor)	A* performance degrades 4.5× under forecast uncertainty; PIER maintains constant performance 0.07
PIER is an offline reinforcement learning framework that learns fuel‑efficient, safety‑aware routing policies from physics‑calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator. Other	positive	high	requirement for online simulator (method characteristic)	0.12
Voyage routing remains dominated by heuristic methods. Adoption Rate	negative	low	prevalence of heuristic methods in operational voyage routing (qualitative claim)	0.04
International shipping produces approximately 3% of global greenhouse gas emissions. Fiscal And Macroeconomic	null_result	medium	share of global greenhouse gas emissions attributable to international shipping (percentage)	approximately 3% 0.07
The PIER architecture (physics-informed state construction, demonstration-augmented offline data, decoupled post‑hoc safety shield) transfers to wildfire evacuation, aircraft trajectory optimization, and autonomous navigation in unmapped terrain. Other	positive	low	transferability of the PIER architecture to other domains (qualitative claim)	0.04

A physics-informed offline RL routing system cuts mean voyage CO2 by about 10% and slashes extreme fuel-waste events ninefold in simulated Gulf of Mexico transits, while remaining robust to forecast uncertainty.