GPU Power & Cooling Planning for Enterprise Data Centers in 2026
How to plan power delivery and cooling for modern GPU clusters. B300 Ultra (1000W), MI355X (1400W), and H100 (700W) power requirements, cooling options, and facility upgrade costs.
The most common reason GPU deployment projects fail is not budget or technical complexity — it is underestimating the facility requirements. Modern high-performance GPUs draw between 700W and 1,400W each. Deploying 64 of them means managing 45–90kW of heat generation in a space no larger than a standard server room. This does not happen with air cooling.
This guide is for data center managers, facilities teams, and infrastructure architects planning GPU cluster deployments in 2026. We will cover power delivery, cooling options, costs, and practical recommendations based on real deployments.
Power Requirements by GPU
| GPU | TDP (W) | 8-GPU Node (W) | 8-GPU Node (kW incl. overhead) |
|---|---|---|---|
| NVIDIA H100 SXM5 | 700 | 5,600 | ~7.5 kW |
| NVIDIA H200 SXM | 700 | 5,600 | ~7.5 kW |
| AMD MI300X | 750 | 6,000 | ~8.0 kW |
| NVIDIA B200 | 1,000 | 8,000 | ~10.5 kW |
| NVIDIA B300 Ultra | 1,000 | 8,000 | ~10.5 kW |
| AMD MI355X | 1,400 | 11,200 | ~14.5 kW |
The "overhead" includes server chassis, CPU, memory, storage, and networking — typically 30–40% on top of GPU TDP. A single 8-GPU MI355X node requires the same power as 20 standard 1U servers. Plan accordingly.
Cooling Options and When to Use Each
Air Cooling (Traditional CRAC/CRAH)
Air cooling works for deployments up to approximately 15–20kW per rack in well-designed facilities. This means you can air-cool a rack of H100 or H200 nodes (two 8-GPU nodes per rack at ~7.5kW each = 15kW/rack) in a modern data center with hot aisle/cold aisle containment.
For B200, B300 Ultra, or MI355X, air cooling a full rack becomes problematic. You either limit to one 8-GPU node per rack (wasteful) or accept increased cooling costs and potential thermal throttling during warm ambient conditions.
Use air cooling when: Deploying H100/H200 nodes in an existing facility, rack density ≤ 20kW, or when liquid cooling infrastructure investment is not justified by deployment size.
Rear-Door Heat Exchangers (RDHx)
RDHx units mount to the back of a rack and use chilled water to capture heat before it enters the room. They can handle 20–40kW per rack without requiring floor modifications — just chilled water supply to each rack. Cost: $5,000–$15,000 per rack for the RDHx unit, plus chilled water piping.
RDHx is the lowest-friction upgrade path for existing air-cooled data centers that need to add GPU density. It does not require changes to the server hardware itself and works with any GPU.
Use RDHx when: Upgrading an existing air-cooled facility for higher-density GPU nodes, 20–40kW per rack targets, budget-constrained facilities that cannot do a full liquid cooling build-out.
Direct Liquid Cooling (DLC)
Direct liquid cooling routes cold plates directly to GPU components — either via manifolds inside the server chassis or via facility-level CDUs (Coolant Distribution Units). DLC handles 40–100kW+ per rack and is now standard for B200, B300 Ultra, and MI355X deployments.
NVIDIA and AMD now provide DLC-capable reference server designs for their flagship GPUs. NVIDIA's HGX B200/B300 modules support DLC via integrated coolant manifolds. AMD's MI355X OAM modules similarly support direct liquid cooling.
Facility cost for DLC: $30,000–$80,000 per rack for CDU, piping, and installation. At scale (20+ racks), this represents $600K–$1.6M in cooling infrastructure on top of GPU hardware costs. Budget for this explicitly — it is almost always underestimated.
Use DLC when: Deploying B200, B300 Ultra, or MI355X at any significant scale. Targeting rack densities above 40kW. Building a new GPU-optimized facility.
Immersion Cooling
Full immersion (servers submerged in dielectric fluid) handles effectively unlimited heat density and can achieve PUE below 1.03 — nearly all input power goes to compute. Infrastructure cost is $50,000–$150,000+ per tank, and server hardware must be compatible (standard servers can be immersed but require stripping plastics and fans).
Immersion is appropriate for hyperscale deployments (1,000+ GPUs) where the PUE savings and density benefits justify the infrastructure investment. For deployments under 500 GPUs, DLC with standard CDUs provides better economics.
Power Delivery Infrastructure
GPU clusters require three-phase power at high amperage. Planning notes:
- PDU sizing: A 32A three-phase PDU at 208V provides ~11.5kW. A single MI355X node needs 14.5kW — you need 16A three-phase minimum per node, 32A recommended for headroom.
- UPS sizing: Size your UPS for GPU TDP + 30% overhead + any network infrastructure. A 64-GPU H100 cluster requires ~500–600kW of UPS capacity.
- Generator backup: For production AI infrastructure, generator backup is not optional. Size for full cluster load. Diesel generator lead times in 2025–2026 are 16–20 weeks — order early.
- Power factor correction: Modern GPU servers have power factors of 0.95–0.99, but verify with your facility team. Low power factor inflates apparent power (kVA) requirements and can overload circuits.
PUE Targets and Energy Cost Modeling
Power Usage Effectiveness (PUE) measures how much total facility power goes to IT load vs cooling/other overhead. Target PUE by cooling technology:
- Air cooling (modern facility): 1.3–1.5 PUE
- Air cooling with hot aisle containment: 1.2–1.35 PUE
- RDHx: 1.15–1.25 PUE
- DLC: 1.05–1.15 PUE
- Immersion: 1.02–1.05 PUE
Energy cost example for 64× H100 SXM5 cluster over 3 years at $0.10/kWh:
- IT load: 64 × 700W × 8,760 hrs/yr × 3 yrs = 1,175 MWh = $117,500
- With 1.3 PUE: total facility energy = $152,700
- With 1.1 PUE (DLC): total facility energy = $129,250 — saves $23,450 over 3 years
At larger scales (256+ GPUs), DLC's energy savings pay back the infrastructure investment within 18–30 months.
Practical Checklist Before Ordering GPUs
- Confirm available power capacity per rack and total facility in kW
- Measure or calculate current PUE and headroom in cooling capacity
- Confirm three-phase power availability at required amperage
- Get quotes for cooling upgrades (RDHx or DLC) before finalizing GPU model choice
- Verify that your network switches support the bandwidth required (400G InfiniBand for B300/MI355X)
- Add 20% power headroom to all capacity calculations for thermal transients and future expansion
- Plan for liquid cooling drain and fill procedures in your datacenter runbooks
The GPU hardware is often the most straightforward part of an enterprise GPU deployment. The facility work — power, cooling, networking — determines whether the deployment succeeds on schedule and within budget. Start the facility assessment before you order the GPUs.
Try Our GPU Tools
Compare GPUs, calculate TCO, and get AI-powered recommendations.