H100 vs MI300X — which GPU should I choose?

H100 if you need the broadest software ecosystem (CUDA, TensorRT, vLLM). MI300X if you need maximum VRAM (192GB vs 80GB) for large model inference. MI300X offers better $/TFLOP but NVIDIA's software stack is more mature.

How much VRAM do I need for LLM inference?

A model needs ~2x its parameter count in GB for FP16 inference (70B model = ~140GB VRAM). With INT8 quantization ~70GB, INT4 ~35GB. A single H100 (80GB) runs 70B at INT8; MI300X (192GB) runs it at full FP16.

Should we buy GPUs or use cloud GPU instances?

If running GPUs 60%+ of the time, on-premise ownership wins on 3-year TCO. Below 40% utilization, cloud is more cost-effective. Many enterprises use hybrid: owned hardware for baseline, cloud for peak demand.

What is the cheapest way to rent an H100 GPU?

As of 2026, H100 cloud pricing ranges from $2.23/hr (Lambda, RunPod spot) to $4+/hr (AWS, Azure on-demand). Reserved instances and spot pricing offer 30-60% savings. CoreWeave and Lambda typically offer the lowest rates.

What GPU is best for LLM training in 2026?

NVIDIA H200 SXM (141GB HBM3e) for proven clusters, B200 for next-gen 4-5x speedup over H100, or AMD MI300X (192GB) for budget-conscious teams. For JAX workloads, Google TPU v5p pods offer unmatched scale.

GPUADVISOR

About

GPUADVISOR

Data Transparency

Methodology &
Data Sources

Enterprise GPU decisions are high-stakes. This page documents exactly where every number on GPU Advisor comes from, how often it is updated, and where our data has known limitations.

Weekly

Pricing Updates

100%

Public Sources

Vendor-Paid Rankings

Apr 2026

Last Full Audit

How We Calculate Each Metric

⚡

Performance Benchmarks

—All FP16/BF16 throughput figures use the decode phase of autoregressive LLM generation (output tokens per second), not prefill. This is the metric that dominates in production inference workloads.
—NVIDIA and AMD figures are sourced from MLPerf Inference or reproducible community vLLM benchmarks at batch size 1 (latency-bound) and batch size 16–32 (throughput-bound). We display throughput-bound numbers.
—Google TPU figures use JAX/JetStream benchmarks and are NOT directly comparable to NVIDIA/AMD CUDA/ROCm numbers due to different framework overhead, memory allocation strategies, and programming models. This is clearly noted wherever TPU data appears.
—FP8 TFLOPS are peak theoretical and assume full hardware utilization. Real-world FP8 inference depends on model support and kernel availability.

💰

Cloud Pricing Collection

—All prices are public list prices in USD for the us-east-1 / us-central1 / East US regions unless otherwise noted. Prices in other regions vary by 5–25%.
—On-demand pricing is verified manually each week against the official pricing page of each provider. We do not accept pricing data from providers directly.
—Spot pricing is inherently volatile and represents a recent average, not a guarantee. Actual spot prices can be 30–70% of the listed on-demand rate.
—CoreWeave and Lambda Labs prices are per-GPU. AWS, GCP, Azure prices are per-instance (8-GPU node) divided by 8 for per-GPU comparisons.
—We do not receive commission or placement fees that influence pricing data. Affiliate links (clearly disclosed) are only present in 'Deploy' buttons.

📊

TCO & ROI Calculations

—Training cost estimates use the Chinchilla scaling law (20× parameters = optimal token count) and 35% Model FLOPs Utilization (MFU), which is typical for production clusters without extreme optimization.
—Inference cost per token assumes 70% GPU utilization, single-GPU throughput at batch size 16, and 730 hours/month (full calendar month). Real costs vary by batch size, model quantization, and actual utilization.
—ROI comparisons assume equivalent GPU generations (e.g., H100 vs H100) across providers. Cross-generation comparisons (H100 vs B200) account for throughput differences using benchmark-derived correction factors.
—On-premise TCO excludes power and cooling costs by default. Enable the 'Include OpEx' toggle in the TCO calculator to add estimated $0.10/kWh and 1.4 PUE.

⚠️

What We Don't Cover

—Spot pricing SLA — spot instances can be preempted without notice. We show historical averages only.
—Private / negotiated enterprise pricing — hyperscaler contract pricing for large deployments can be 20–50% below list price. Our numbers are list price.
—Multi-region latency, compliance, and data residency requirements — these are workload-specific and outside our scope.
—Fine-tuned model performance — all benchmarks use standard pretrained model weights. Fine-tuned or quantized models will show different throughput.

Update Schedule

Data TypeFrequencyMethod

Cloud GPU on-demand pricing

Weekly (every Monday)

Manual spot-check of provider pricing pages + automated diff alerts

Spot / preemptible pricing

Weekly

Provider APIs where available; manual otherwise

Reserved / committed pricing

Monthly

Major provider pricing pages; rarely changes mid-month

GPU performance benchmarks

On new MLPerf release

MLCommons publishes ~2× per year; incorporated within 2 weeks

Hardware specifications

On product announcement

Updated within 24–48 hrs of official manufacturer announcement

TCO model assumptions

Quarterly

Power costs, rack/colocation rates reviewed each quarter

Primary Data Sources

GPU Performance Benchmarks

MLCommons MLPerf Inference v4.1 ↗

Primary throughput source for H100, A100, L40S, MI300X

MLCommons MLPerf Training v3.1 ↗

Training throughput for BERT, ResNet, GPT-3

NVIDIA Official Product Briefs ↗

FP8, FP16, INT8 TFLOPS specifications

AMD GPU Technical Specifications ↗

MI300X, MI325X, MI355X official specs

Google Cloud TPU Documentation ↗

TPU v5e, v5p, v6e BF16 and INT8 figures

Independent vLLM Benchmarks ↗

LLM inference tok/s from community reproducible runs

Cloud GPU Pricing

AWS EC2 Pricing API ↗

p5, p4d, g5 family — on-demand, spot, 1yr/3yr reserved

GCP Compute Engine Pricing ↗

a3-highgpu, a4-highgpu, TPU v4/v5 families

Azure VM Pricing ↗

ND, NCads GPU families

Lambda Labs API Pricing ↗

H100, A100, A10 on-demand rates

CoreWeave Pricing Page ↗

H100, H200, B200, MI300X per-GPU rates

RunPod GPU Pricing ↗

Secure Cloud and Community Cloud GPU rates

Hardware Specifications

NVIDIA Data Center Product Page ↗

VRAM, TDP, NVLink bandwidth, architecture

AMD Instinct Product Page ↗

HBM3/3e capacity, xGMI interconnect specs

Google TPU Architecture Overview ↗

Matrix multiply units, HBM capacity per chip

Hot Chips Presentations (IEEE) ↗

Die-level architecture details for Blackwell, CDNA4

Industry Research

SemiAnalysis GPU Reports ↗

Supply chain, pricing trends, TCO models

Epoch AI Compute Trends ↗

Training compute scaling laws, hardware trends

Artificial Analysis ↗

Real-world LLM API latency and throughput

⚠️

Commercial Disclosure

GPU Advisor earns referral commissions through affiliate links on "Deploy" buttons and provider sign-up links. These commercial relationships do NOT influence our benchmark rankings, pricing data, or editorial analysis. Affiliate links are clearly marked. If you believe any data has been influenced by commercial interests, please contact us.

About the Team →Report a Data Issue →

Methodology &Data Sources