AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework

AI inference is becoming a persistent and geographically distributed source of electricity demand. Unlike many traditional electrical loads, inference workloads can sometimes be executed away from the user-facing service location, provided that latency, state locality, capacity, and regulatory constraints remain acceptable. This paper studies when such digital relocation of computation can be interpreted as latency-constrained relocation of electricity demand. We develop an energy-geography framework for geo-distributed AI inference. The framework models a three-layer architecture of clients, service nodes, and compute nodes, and formulates inference placement as a constrained optimization problem over electricity prices, marginal carbon intensity, power usage effectiveness, compute capacity, network latency, and migration frictions. The key object is the energy-latency frontier: the marginal cost and carbon benefit unlocked by relaxing inference latency budgets. The paper makes four contributions. First, it distinguishes physical electricity transmission from digital relocation of electricity-consuming computation. Second, it formulates a geo-distributed inference placement model with feasibility masks and migration frictions. Third, it introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition. Fourth, it provides a transparent stylized simulation over representative global compute regions to show how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers. The results show that latency relaxation expands feasible geography, while migration frictions, egress costs, state locality, legal constraints, and capacity limits can sharply reduce realized benefits.

Summary

Title: AI Inference as Relocatable Electricity Demand: A Latency‑Constrained Energy‑Geography Framework Authors: Xubin Luo, Yang Cheng (Southwestern University of Finance and Economics) — Preprint, April 2026

Main Finding

AI inference can be meaningfully interpreted as a relocatable form of electricity demand for the subset of workloads that are latency‑tolerant, stateless (or cheaply migratable), legally feasible to move, and unconstrained by remote capacity. Relaxing latency budgets expands the geographic feasible set for execution and unlocks economic and carbon benefits, but those gains are sharply reduced or reversed once migration frictions (state transfer, cache loss, egress/egress fees, replica costs), legal masks, and capacity limits are included. The paper provides a formal optimization framework and operational metrics (RID, ERL, CRL, break‑even condition) to quantify when and how inference relocation maps to changes in where electricity is consumed.

Key Points

Conceptual distinction: Digital relocation of compute is not the same as physical electricity transmission. Still, for tasks that can be executed in multiple regions, placement decisions determine where the electricity is drawn and thus where emissions and costs are realized.
Three‑layer system model: Client → service node → compute node. The controllable geographic variable is the service→compute hop latency.
Formal optimization: Binary assignment of tasks to compute nodes minimizes a weighted objective combining (energy cost, carbon emissions, latency penalty, migration friction) subject to latency SLOs, capacity, and feasibility masks (legal/system).
Migration frictions: Explicitly model state transfer, KV‑cache loss, egress charges, and replication costs; these can dominate benefits for stateful or multi‑round interactions.
Operational metrics:
- Relocatable Inference Demand (RID): fraction of energy that can be executed off‑local sites.
- Energy Return on Latency (ERL) and Carbon Return on Latency (CRL): marginal energy/carbons saved per unit of relaxed latency budget.
- Relocation break‑even condition: remote execution is only meaningful if net benefit (local minus remote objective) > 0 after friction and penalties.
Workload stratification: Heterogeneous τ (latency budgets) implies a hierarchical execution geography — local (low τ), regional (moderate τ), and energy‑oriented/far (high τ).
Simulation: Stylized, transparent global compute‑region scenarios illustrate qualitative patterns; not calibrated for production magnitudes. Results show substantial sensitivity to latency tolerance, migration costs, legal masks, and capacity constraints.

Data & Methods

Modeling approach:
- Task tuple k = (client uk, service node sk, latency τk, energy Ek, compute demand Dk, rounds mk).
- Compute node attributes: electricity price Pi(t), marginal operating emissions MOERi(t), PUEi(t), available capacity Capi(t).
- Latency decomposition: Lk(i) = Lcs_k + mk·Lsc_sk,i + Lq_k(i) + Linf_k(i). Feasibility constraint Lk(i) ≤ τk.
- Facility energy and impacts: FacilityEnergy = Ek·PUEi; EnergyCost = Ek·PUEi·Pi; CarbonCost = Ek·PUEi·MOERi.
- Binary assignment variables xk,i with capacity and feasibility constraints.
- Objective: minimize Σk,i xk,i [ α·EnergyCost + β·CarbonCost + γ·DelayPenalty + η·MigrationCost ].
- Weights α, β, γ, η represent operator/policy objectives (cost vs carbon vs latency discipline vs migration friction).
Migration friction decomposition: Mstate + Mcache + Megress + Mreplica.
Feasibility masks: alegal_k,i and asystem_k,i (0/1) to model legal and system constraints that can forbid placement.
Metrics:
- RID = share of total inference energy executed off local node.
- ERL(Δτ) and CRL(Δτ) computed as marginal reductions in cost/carbon per incremental slack in latency.
- Net benefit NBk,i = Jk,local − Jk,i used to check break‑even.
Simulation:
- Stylized global compute regions (representative heterogeneity in prices, carbon intensity, latency).
- Scenario‑based parameter sweeps over τ budgets, migration friction magnitudes, capacity constraints, and legal masks.
- Purpose: illustrate structural mechanisms and sorting of workloads, not to forecast absolute volumes.

Implications for AI Economics

Market and procurement signals:
- Latency budgets act like a “price” on relocation: small increases in allowable latency can yield disproportionate energy/cost/carbon gains (quantified by ERL/CRL). Cloud providers and clients could monetize or trade latency flexibility (e.g., discounted inference for relaxed SLOs).
- Electricity price and marginal carbon heterogeneity across regions create arbitrage opportunities that routing systems can exploit — but only up to latency, capacity, legal, and friction limits.
Carbon accounting and policy:
- Carbon savings from relocation depend on marginal operating emissions (MOER) and PUE; proper accounting must use marginal (not average) grid signals and include PUE and migration overheads.
- Claims that moving inference “to greener regions” reduces overall emissions should be qualified: relocation displaces the location of consumption but does not change grid generation unless it affects grid dispatch; system‑level interactions (e.g., demand response) and marginal generation matter.
- Regulation and procurement standards (e.g., green tariffs, contractual egress rules, data residency laws) materially change feasible relocation and therefore the real emissions/cost outcomes.
Infrastructure siting and capacity planning:
- Operators planning regional capacity should consider workload stratification: low‑latency services need local edge capacity; workloads with intermediate tolerance can be handled regionally; very latency‑tolerant workloads can be concentrated in energy‑cheap/low‑carbon hubs (subject to migration costs).
- Investments in transmission or local renewable buildout versus strategic compute placement represent alternative ways to achieve low‑carbon outcomes; the optimal mix depends on latency elasticity of demand and migration frictions.
Platform design and pricing:
- System design choices that reduce migration friction (e.g., state synchronization, KV‑cache sharing, lower egress fees, smart replica placement) can unlock more relocatable demand — but lowering frictions may create rebound effects (more remote execution) altering grid impacts and costs.
- Differential pricing for latency tiers and explicit markets for latency slack could align incentives: buyers of inference could pay premiums for low τ while sellers optimize location to minimize energy/carbon for more relaxed tiers.
Market structure and externalities:
- If many providers route to low‑price/low‑carbon regions, local capacity and grid impacts (congestion, marginal emissions) change; the model should be embedded in system‑level grid dispatch to evaluate second‑order effects.
- Policymakers should be cautious about simplistic “move computation to X region” mandates; legal/data‑residency and grid dynamics critically shape whether movement produces net carbon or cost benefits.
Practical takeaways for economists and operators:
- Use RID, ERL, CRL and break‑even tests to identify which portions of demand are realistically relocatable and to quantify marginal returns to relaxing latency.
- Incorporate migration frictions and legal masks into cost–benefit and contract design; neglecting them overstates the mobility of demand.
- Consider dynamic pricing contracts for latency flexibility as an efficient mechanism to reveal and capture the energy–latency tradeoff.

Limitations emphasized by the authors - Stylized simulation (not trace‑calibrated); results are structural/qualitative rather than production‑scale estimates. - The framework abstracts away dynamic grid feedbacks (how relocated load affects marginal generation and prices) and system‑level equilibrium effects. - Real deployments may face additional institutional, contractual, and operational constraints not fully captured by the feasibility masks.

Short summary line The paper formalizes when and how AI inference can act as relocatable electricity demand, provides operational metrics to measure the energy–latency tradeoff, and shows that latency tolerance, migration friction, legal constraints, and capacity jointly determine whether relocation yields real cost or carbon benefits.

Assessment

Paper Typetheoretical Evidence Strengthlow — Findings are based on a stylized optimization framework and simulated scenarios over representative global regions rather than on empirical or experimental data; results are illustrative and sensitive to model assumptions and parameter choices. Methods Rigormedium — The paper presents a formal three-layer model, clear objective and constraints, well-defined operational metrics (e.g., energy/carbon return on latency) and a transparent simulation exercise, but it relies on simplifying assumptions, limited calibration to real-world operational data, and no empirical validation or robustness checks against observed relocation behavior. SampleA stylized simulation using representative global compute regions and parameters (regional electricity prices, marginal carbon intensities, power usage effectiveness, compute capacities, network latencies, feasibility masks, egress/migration frictions and latency budgets); no primary observational or experimental dataset. Themesadoption governance GeneralizabilitySimulated and stylized parameters may not match real-world operator costs, market structures, or time-varying grid conditions, Ignores dynamic effects (temporal variability in prices/carbon intensity, diurnal demand, spot markets), Simplifies legal, contractual, and data locality constraints that vary across jurisdictions, Assumes known and fixed latency budgets and migration frictions that may differ across applications and operators, Does not model user behavior, demand elasticity, or multi-tenant interactions at scale

Claims (9)

Claim	Direction	Confidence	Outcome	Details
AI inference is becoming a persistent and geographically distributed source of electricity demand. Other	positive	high	electricity demand (geographic distribution and persistence)	0.12
Inference workloads can sometimes be executed away from the user-facing service location, provided that latency, state locality, capacity, and regulatory constraints remain acceptable. Task Allocation	null_result	high	feasibility of relocating inference workload execution given constraints (latency, state locality, capacity, regulatory constraints)	0.12
We develop an energy-geography framework for geo-distributed AI inference that models a three-layer architecture of clients, service nodes, and compute nodes, and formulates inference placement as a constrained optimization problem over electricity prices, marginal carbon intensity, power usage effectiveness, compute capacity, network latency, and migration frictions. Task Allocation	null_result	high	inference placement feasibility and optimization across energy and latency dimensions	0.2
The paper distinguishes physical electricity transmission from digital relocation of electricity-consuming computation. Other	null_result	high	conceptual differentiation between transmission of electrons and relocation of computational demand	0.2
The paper formulates a geo-distributed inference placement model with feasibility masks and migration frictions. Task Allocation	null_result	high	modeling of placement feasibility including migration friction effects	0.2
The paper introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition. Task Allocation	null_result	high	definitions/metrics for relocatability, energy and carbon return per latency relaxed, and break-even conditions	0.2
The paper provides a transparent stylized simulation over representative global compute regions to show how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers. Task Allocation	positive	high	assignment of workloads into execution layers (local, regional, energy-oriented) based on latency tolerance	0.12
Latency relaxation expands feasible geography for placing inference workloads. Task Allocation	positive	high	geographic feasibility of relocating inference demand as a function of latency budget relaxation	0.12
Migration frictions, egress costs, state locality, legal constraints, and capacity limits can sharply reduce realized benefits from relocating inference workloads. Task Allocation	negative	high	realized energy/carbon/cost benefits from relocation after accounting for migration frictions and constraints	0.12

Relaxing latency constraints can shift AI inference to cheaper, lower‑carbon regions, but real‑world frictions — from egress charges and capacity limits to regulatory and state‑locality rules — sharply cut the potential electricity and emissions gains.