H100 vs MI300X — which GPU should I choose?

H100 if you need the broadest software ecosystem (CUDA, TensorRT, vLLM). MI300X if you need maximum VRAM (192GB vs 80GB) for large model inference. MI300X offers better $/TFLOP but NVIDIA's software stack is more mature.

How much VRAM do I need for LLM inference?

A model needs ~2x its parameter count in GB for FP16 inference (70B model = ~140GB VRAM). With INT8 quantization ~70GB, INT4 ~35GB. A single H100 (80GB) runs 70B at INT8; MI300X (192GB) runs it at full FP16.

Should we buy GPUs or use cloud GPU instances?

If running GPUs 60%+ of the time, on-premise ownership wins on 3-year TCO. Below 40% utilization, cloud is more cost-effective. Many enterprises use hybrid: owned hardware for baseline, cloud for peak demand.

What is the cheapest way to rent an H100 GPU?

As of 2026, H100 cloud pricing ranges from $2.23/hr (Lambda, RunPod spot) to $4+/hr (AWS, Azure on-demand). Reserved instances and spot pricing offer 30-60% savings. CoreWeave and Lambda typically offer the lowest rates.

What GPU is best for LLM training in 2026?

NVIDIA H200 SXM (141GB HBM3e) for proven clusters, B200 for next-gen 4-5x speedup over H100, or AMD MI300X (192GB) for budget-conscious teams. For JAX workloads, Google TPU v5p pods offer unmatched scale.

Blog/Cost Analysis

Cost Analysis2026-03-0516 min read

How to Calculate GPU Total Cost of Ownership Without Fooling Yourself

Most GPU TCO estimates are wrong because they leave out half the costs. Here is a framework from someone who has reviewed dozens of GPU procurement proposals.

I review GPU procurement proposals for a living, and I can tell you that roughly 7 out of 10 TCO analyses I see are materially wrong. Not wrong in the sense of using the wrong GPU price — that number is easy to get right. Wrong in the sense of omitting entire cost categories that end up representing 30-40% of the actual three-year spend.

The consequences are predictable: the project goes over budget, the CFO loses trust in the infrastructure team's estimates, and the next GPU procurement request faces additional scrutiny and delay. It is a cycle I have watched play out at organizations of every size.

This guide is my attempt to break that cycle by documenting every cost component that should be in your TCO model, with real numbers from deployments I have reviewed over the past eighteen months.

Category 1: Hardware — What Most People Get (Mostly) Right

GPU Units

Current street pricing as of March 2026, based on actual procurement quotes (not MSRP):

GPU	MSRP	Actual Street Price	Lead Time
NVIDIA B300 Ultra	$40,000	$42,000-$45,000	16-20 weeks
NVIDIA B200	$30,000	$31,000-$33,000	8-12 weeks
NVIDIA H200 SXM	$25,000	$24,000-$26,000	4-6 weeks
NVIDIA H100 SXM5	$25,000	$20,000-$22,000	2-4 weeks
NVIDIA A100 80GB	$15,000	$8,000-$10,000 (refurb)	Immediate
AMD MI325X	$22,000	$20,000-$22,000	4-6 weeks
AMD MI300X	$15,000	$13,000-$15,000	2-4 weeks

Notice that street prices diverge from MSRP, especially for high-demand parts (B300 Ultra trades above MSRP due to constrained supply) and previous-gen parts (H100 now trades well below original MSRP as H200 takes over). Always use actual quotes from your procurement channel, not list prices from press releases.

Server Nodes

GPUs do not exist in isolation. They live in server chassis that include CPUs, system RAM, NVMe storage, power supplies, and management controllers. A typical 8-GPU DGX-style node adds $40,000-$80,000 on top of the GPU cost, depending on the CPU configuration and storage options.

Common mistake: quoting only the GPU cost to leadership. I have seen proposals that list "$2.56M for 64 B300 Ultra GPUs" but fail to mention the additional $480,000-$640,000 for the server nodes that house them. When the full purchase order comes through 25% over the initial estimate, credibility suffers.

Spare Parts and Warranty

Over a 3-year deployment, expect 3-5% of GPUs to fail (ECC errors, thermal degradation, manufacturing defects). For a 64-GPU cluster, that is 2-3 replacement GPUs. Budget $60,000-$120,000 for spares and warranty extensions. Some organizations keep cold spares on-site; others rely on next-business-day warranty replacements from the vendor. The choice depends on your uptime requirements — if every hour of downtime costs $10,000+ in lost productivity, on-site spares pay for themselves quickly.

Category 2: Networking — The Consistently Underestimated Line Item

I cannot overstate how often networking costs are underestimated or omitted entirely from GPU TCO proposals. Multi-GPU training requires high-bandwidth, low-latency interconnects between GPUs, and enterprise networking hardware is expensive.

InfiniBand Fabric

For clusters of 16+ GPUs doing distributed training, InfiniBand is the standard. Here is what a 64-GPU cluster typically requires:

Component	Quantity	Unit Cost	Total
InfiniBand NDR switch (400Gbps, 32-port)	4	$25,000	$100,000
InfiniBand NDR cables (2m)	72	$400	$28,800
Top-of-rack Ethernet switch (mgmt network)	4	$8,000	$32,000
Fiber patch panels and cabling	—	—	$15,000
Network configuration and testing	—	—	$10,000
Total Networking			$185,800

That is nearly $186,000 for networking alone — a figure that represents 7-10% of total hardware cost and is routinely omitted from first-draft TCO proposals. If you downgrade to InfiniBand HDR (200Gbps), switch costs drop by roughly 40%, but you sacrifice 15-20% of multi-node training throughput. Whether that tradeoff is worthwhile depends on your workload's sensitivity to communication bandwidth.

Category 3: Power and Cooling — It Adds Up Faster Than You Expect

Power cost calculations are deceptively simple in theory and frequently wrong in practice. Here is how to do them properly.

Step 1: Total IT Load

Start with GPU TDP, but do not stop there. The server node has CPUs, RAM, fans, storage, and power conversion overhead. A good rule of thumb: total node power = GPU TDP × number of GPUs × 1.3 (for system overhead).

For an 8x H100 node: 700W × 8 × 1.3 = 7,280W
For an 8x B300 Ultra node: 1000W × 8 × 1.3 = 10,400W

Add networking equipment: typically 500-1,000W per rack for switches.

Step 2: Apply PUE

Power Usage Effectiveness measures how much additional power the facility uses for cooling, lighting, and other overhead per watt of IT load. Modern purpose-built data centers achieve PUE 1.1-1.2 with liquid cooling. Older air-cooled facilities or repurposed office space can be PUE 1.5-1.8.

Facility power = IT load × PUE

For 8x B300 Ultra nodes (64 GPUs) at PUE 1.3:
IT load = (10,400W × 8 nodes) + (800W networking) = 84,000W
Facility power = 84,000 × 1.3 = 109,200W = 109.2kW

Step 3: Annual Cost

Annual energy = 109.2kW × 8,760 hours = 956,592 kWh
At $0.09/kWh (Virginia): $86,093/year
At $0.12/kWh (Texas): $114,791/year
At $0.18/kWh (California): $172,186/year

Over 3 years, power alone costs $258,000-$517,000 depending on location. This is a material portion of TCO that swings significantly based on your electricity rate and PUE. If you are comparing locations for a new deployment, electricity cost should be a primary factor.

Liquid Cooling Infrastructure

B300 Ultra and MI355X at their respective power levels effectively require liquid cooling. If your facility does not already have liquid cooling infrastructure, budget for installation:

Rear-door heat exchangers: $8,000-$12,000 per rack
In-row cooling units: $15,000-$25,000 each
Direct-to-chip liquid cooling (CDU + manifolds): $20,000-$35,000 per rack
Chilled water loop modifications: $30,000-$100,000 depending on facility

For a 64-GPU B300 Ultra deployment requiring 8 racks of liquid cooling, expect $160,000-$280,000 in cooling infrastructure. This is a one-time cost that amortizes over the 3-year deployment, but it must be in the budget.

Category 4: Data Center Space

High-density GPU racks consume more power per rack unit than traditional servers, which means you need racks rated for 30-50kW instead of the standard 5-10kW. Not all colocation facilities can accommodate this, and those that can charge accordingly.

Monthly colocation rates for GPU-density racks:

Market	Standard Rack (10kW)	High-Density Rack (40kW)
Northern Virginia (Ashburn)	$800-$1,200/mo	$2,000-$3,500/mo
Dallas / Phoenix	$600-$1,000/mo	$1,500-$2,500/mo
Bay Area	$1,200-$2,000/mo	$3,000-$5,000/mo
Chicago	$700-$1,100/mo	$1,800-$3,000/mo

A 64-GPU deployment typically requires 6-10 racks including networking, storage, and management infrastructure. Annual colocation cost ranges from $108,000 (Dallas, tight packing) to $420,000 (Bay Area, generous spacing). The location decision alone can swing TCO by $900,000 over three years.

Category 5: Personnel — The Elephant in the Room

This is the cost component that gets omitted from TCO models most frequently, and it is often the single largest ongoing expense after hardware depreciation.

Running a GPU cluster requires human beings who understand GPU hardware, Linux system administration, InfiniBand networking, job scheduling (Slurm/Kubernetes), and ML frameworks. Here is what a realistic staffing model looks like:

Role	FTE Allocation	Loaded Annual Cost
GPU Systems Administrator	0.5 - 1.0	$100,000 - $180,000
MLOps / DevOps Engineer	0.5 - 1.0	$120,000 - $200,000
Network Engineer	0.1 - 0.3	$15,000 - $50,000
Vendor Management / Procurement	0.1 - 0.2	$12,000 - $30,000
Total Annual Personnel	1.2 - 2.5 FTE	$247,000 - $460,000

Over 3 years, personnel costs add $741,000 - $1,380,000 to your TCO. For a 64-GPU cluster with a hardware cost of $2.5-$3.5M, personnel represents 20-40% of the total spend. Omitting this from your TCO model is like buying a car and forgetting to budget for fuel, insurance, and maintenance.

Cloud deployments reduce but do not eliminate personnel costs. You still need MLOps engineers to manage training pipelines, optimize instance usage, and handle failures. The savings is approximately 0.5-1.0 FTE — mainly the systems administration and network engineering roles that the cloud provider handles.

The Breakeven Framework

After modeling dozens of on-premise vs cloud comparisons, here is the framework I use:

Below 30% GPU utilization: Cloud wins decisively. You are paying to power and cool GPUs that sit idle most of the time. Use on-demand instances and only pay for what you use. This is the right model for research teams with bursty workloads.

30-50% utilization: A hybrid approach typically wins. Reserve cloud instances for your baseline load (whatever you know you will need every day), and burst to on-demand for peaks. This gives cost predictability without the capital commitment of ownership.

50-70% utilization: The gray zone. On-premise starts to compete with cloud on a 2-3 year horizon, but the upfront capital and operational complexity may not be worth the modest savings. Run the full TCO model with every cost category listed above. If the savings are less than 15-20%, the operational flexibility of cloud probably outweighs the cost difference.

Above 70% utilization: On-premise wins within 12-18 months even with all hidden costs included. At 70%+ utilization, you are effectively renting a GPU 24/7 at cloud pricing, which is 3-5x more expensive than ownership. The math is unambiguous at this level — but be honest about whether you can actually sustain 70% utilization. Measure it before you assume it.

The Most Common Mistake

The single most common TCO mistake I encounter: organizations assume 80-90% GPU utilization to justify an on-premise purchase, then run at 35-45% in practice. The gap between assumed utilization and actual utilization can flip the entire cloud-vs-own decision.

Before committing to a multi-million dollar on-premise deployment, spend 3-6 months tracking your actual GPU usage on cloud instances. Log the hours. Calculate the utilization percentage. Use that real number — not an aspirational one — in your TCO model.

If you do not have historical usage data, be conservative. Assume 40-50% utilization for your first year. If you end up higher, you can always add capacity. If you assumed 80% and land at 40%, you have spent millions on hardware that would have been cheaper to rent.

Run your own numbers with our GPU TCO Calculator — it accounts for all the cost categories described in this guide and lets you adjust utilization, power rate, and provider pricing to match your specific scenario.

TCOcost analysisinfrastructure planningcloud vs on-premiseGPU procurement

Try Our GPU Tools

Compare GPUs, calculate TCO, and get AI-powered recommendations.

Data Center GPUs More Articles

NVIDIA B300 Ultra vs AMD MI355X: A Deep-Dive into the 2026 Data Center GPU Battle

2026-03-15 · 18 min read

Choosing the Right GPU for LLM Training in 2026: A Practitioner's Guide

2026-03-12 · 20 min read

GPU Cloud Pricing in 2026: We Compared 7 Providers So You Don't Have To

2026-03-10 · 15 min read