Rising AI rack power densities risk 'stranding' datacenter power — installed megawatts can vastly overstate usable capacity. Simulations using Azure data show multi-resource stranding cuts deployable capacity and inflates effective capital costs, so long-run planning must prioritize deployable capacity across generations, not simply installed power.

Designing Datacenter Power Delivery Hierarchies for the AI Era

Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, Ricardo Bianchini · May 15, 2026

arxiv descriptive medium evidence 7/10 relevance Source PDF

Using Azure production data and deployment projections, the paper shows that multi-resource stranding from rising rack power density materially reduces deployable capacity and raises effective capex, so datacenter planners should optimize for deployable capacity over installed megawatts.

Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. This poses a major challenge for datacenter power delivery designers. As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix. Moreover, each of these factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis. To address this challenge, we develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences. The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure. Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance, and quantify how rising density from rack- and pod-scale AI systems shapes these outcomes. For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.

Summary

Main Finding

Datacenter power-delivery designs must be evaluated by the capacity they can actually deploy over a multi-year lifecycle (deployable capacity), not by installed MW or $/W at commissioning. As rack and pod power densities rise for AI (projected toward ~1 MW per deployment), topology and redundancy choices (distributed vs block) interact with arrival sequences, oversubscription, and placement rules to create large amounts of stranded (undeployable) capacity. These dynamics materially change effective CapEx and delivered workload throughput (tokens/sec/W), producing economically important tradeoffs that are missed by static commissioning metrics.

Key Points

Key metric: deployable capacity over time (how much new demand a hall/fleet can admit across generations), rather than installed MW or first-order $/W installed.
Two structural stranding mechanisms:
- Distributed-redundant designs (xN/y): produce reserve fragmentation — usable headroom is spread across many parents so a single large deployment can fail admission even when aggregate slack exists.
- Block-redundant designs (p+q): produce line-up quantization — usable capacity is coarser, so divisibility thresholds create sharp jumps in stranded capacity when deployment quanta exceed block granularity.
Quantitative impacts:
- Throughput per watt (tokens/s/W) and effective cost per watt vary widely across design/workload/density combinations: the paper reports >20x variation in throughput/W and >20% variation in cost/W across scenarios.
- A concrete example: two halls with similar installed HA capacity (4N/3 distributed vs 3+1 block) appear comparable on commissioning metrics (∼3% cost difference), but over an 8-year fleet lifecycle the 3+1 option developed higher tail stranding, raising effective CapEx difference to 5.8% and forcing construction of ~23 additional halls to serve the same demand.
Placement policy matters: heuristics that minimize variance in parent utilization substantially reduce line-up-level stranding versus naive/random placement.
Multi-resource stranding: power stranding interacts with cooling, space, and networking constraints; considering power alone understates deployability problems for coarse AI deployments (pods/racks with joint cooling and networking needs).
Pod and model economics: larger tightly-coupled GPU pods can increase model throughput efficiency sufficiently to justify higher infrastructure cost in some regimes, but the break-even depends on the interaction between pod size, topology-induced stranding, and density growth trajectories.

Data & Methods

Framework: a lifecycle simulation that models hall-level power-delivery hierarchies as trees (substation → UPS → switchboards/line-ups → row → rack), redundancy topologies (distributed xN/y and block p+q), and hierarchical placement feasibility (a placement must satisfy capacity and redundancy constraints at every ancestor).
Workload classes: three IT classes simulated — GPUs, general compute (CPU), and storage; racks/pods are indivisible placement units, with GPU pods potentially spanning multiple racks and requiring HA and busbar capacity.
Inputs and calibration:
- Rack power and density projections informed by Azure production telemetry and public roadmaps (showing rapid growth: P99 accelerator racks >150 kW now, projections toward ~1 MW by ~2027).
- Cooling modeled via fixed conversions (e.g., 165 CFM/kW air, 2 LPM per rack D2C liquid); busbar and row limits explicitly modeled; harvesting/oversubscription behavior included.
- Rack lifetimes sampled (Normal) per hardware class; arrivals, decommissioning, harvesting modeled over an 8-year horizon.
Experiments:
- Single-hall Monte Carlo saturation tests (single-SKU and mixed-SKU) to reveal structural stranding patterns.
- Fleet-scale lifecycle simulations with arrival sequences, heterogenous SKUs and generations, and different redundancy topologies.
- Placement policies compared (min-variance, min-utilization, random, round-robin).
- Cost model includes component-level CapEx ($/MW ballpark given; example: ~$10M/MW for 4N/3), and throughput modeled for LLM inference (tokens/s/W) to compute effective delivered performance.
Metrics reported: deployable capacity over time, stranded (undeployable) provisioned power fraction, effective cost per deployable watt, tokens/sec/W delivered by fleet.
Simplifications/assumptions: instant commissioning of new halls when needed (isolates topology effects), outage probabilities not modeled (availability constraints enforced via redundancy rules), constant conversion factors for cooling, and abstraction to rack granularity (pods treated as multi-rack units).

Implications for AI Economics

Grid power is a scarce, high-value input for AI datacenters: effective economic value is determined by deployable MW over time, not installed MW. Mis-estimating deployability leads to overpaying for stranded capacity or under-building and losing workload revenue.
CapEx and ROI decisions should use lifecycle deployable-capacity simulations, not only $/installed MW. Small static CapEx differences can magnify over years due to stranding and force materially different infrastructure spending (additional halls).
Provider pricing and capacity strategies: cloud providers should internalize topology-driven deployment friction into capacity-planning and SLA/pricing. Providers with more deployable capacity (given their topology and placement policies) have an advantage in hosting high-density AI workloads and extracting rental value.
Network effects between hardware and infrastructure choices: accelerator pod size, networking/cooling integration, and redundancy topology interact. Choosing larger pod designs for model efficiency can be economically optimal only if the power-delivery design and placement policy permit deployability without excessive stranding.
Investment trade-offs:
- Investing in more flexible power distribution (e.g., additional busbars, cross-row feeds, finer-grained UPS/line-ups) or smarter placement tooling can reduce stranding and increase effective capacity yield per $ invested.
- Block redundancy can give higher instantaneous usable capacity per active block but can increase tail risk of stranded capacity when deployments are coarse; distributed redundancy smooths that risk but fragments usable headroom. The right choice depends on expected rack/pod sizes, growth trajectories, and workload mix.
Policy and market-level consequences: as AI accelerators push densities up, data-center-level inefficiencies may increase demand for alternatives — flexible grid connections, local generation/storage, or more modular/portable datacenter designs — changing where and how AI capacity is supplied.
Operational practices: placement policies that minimize variance in parent utilization materially reduce stranding; oversubscription and harvesting strategies must be coordinated with placement/topology decisions to maximize deployable capacity and revenue.

Overall, the paper argues that economic evaluations of datacenter investments for AI workloads must move from single-point commissioning metrics to lifecycle, topology-aware simulations that quantify deployable capacity, stranded power, and the downstream effects on effective CapEx and delivered AI throughput.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper combines operational production data from Microsoft Azure with projection models and sequence-based simulations to produce realistic counterfactuals; however results depend on model assumptions (future deployment mixes, oversubscription policies, hardware generations) and are not validated against independent out-of-sample deployments, limiting causal inference. Methods Rigormedium — Methodology appears systematic and grounded in production telemetry and realistic arrival/decommissioning sequences, but it relies on modeling choices and projections that introduce uncertainty and potential sensitivity to assumptions (e.g., workload mixes, topology rules, regional differences) without randomized or quasi-experimental identification. SampleSimulation framework combining projection models for GPU, compute, and storage deployments with operational factors extracted from Microsoft Azure production data; experiments run over realistic arrival, oversubscription, and decommissioning sequences and future density projections (up to ~2027). Themesadoption productivity GeneralizabilityBased mainly on Microsoft Azure operational data — other cloud providers, colocation facilities, and on-prem datacenters may have different topologies, placement policies, and cost structures., Relies on projections of GPU/AI hardware density and deployment mixes that are uncertain and may diverge by vendor or region., Does not fully account for policy, market, or grid-infrastructure changes that could alter power availability or cost., Specific oversubscription and placement rules modeled may not match all operators’ operational practices., Findings focused on rack/pod-scale AI deployments; may be less applicable to smaller-scale or edge deployments.

Claims (12)

Claim	Direction	Confidence	Outcome	Details
Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. Adoption Rate	positive	high	rack power density (MW per deployment)	approaching 1MW per deployment by 2027 0.18
This poses a major challenge for datacenter power delivery designers. Organizational Efficiency	negative	high	difficulty/challenge for datacenter power delivery design	0.03
As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Organizational Efficiency	negative	high	power stranding (unused provisioned power)	0.18
Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Organizational Efficiency	positive	high	design efficiency over time	0.03
Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Fiscal And Macroeconomic	negative	high	grid power scarcity/importance of power utilization	0.09
Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix. Organizational Efficiency	negative	high	difficulty/complexity of designing efficient power delivery hierarchies	0.09
These factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis. Organizational Efficiency	null_result	high	tractability of closed-form analysis for power delivery design	0.09
We develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences. Organizational Efficiency	positive	high	throughput, power utilization, cost metrics over deployment sequences	0.18
The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure. Adoption Rate	positive	high	realism/grounding of projection models (use of Azure production data)	0.18
Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance. Adoption Rate	negative	high	deployable capacity / effective capex / delivered performance (primary: deployable capacity)	0.18
Rising density from rack- and pod-scale AI systems shapes these outcomes (deployable capacity, capex, performance) — we quantify how density changes these outcomes. Adoption Rate	mixed	high	impact of rising rack/pod-scale density on deployable capacity, capex, performance	0.18
For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time. Adoption Rate	positive	high	planning objective (deployed capacity over time vs installed MW)	0.18