H100 vs MI300X — which GPU should I choose?

H100 if you need the broadest software ecosystem (CUDA, TensorRT, vLLM). MI300X if you need maximum VRAM (192GB vs 80GB) for large model inference. MI300X offers better $/TFLOP but NVIDIA's software stack is more mature.

How much VRAM do I need for LLM inference?

A model needs ~2x its parameter count in GB for FP16 inference (70B model = ~140GB VRAM). With INT8 quantization ~70GB, INT4 ~35GB. A single H100 (80GB) runs 70B at INT8; MI300X (192GB) runs it at full FP16.

Should we buy GPUs or use cloud GPU instances?

If running GPUs 60%+ of the time, on-premise ownership wins on 3-year TCO. Below 40% utilization, cloud is more cost-effective. Many enterprises use hybrid: owned hardware for baseline, cloud for peak demand.

What is the cheapest way to rent an H100 GPU?

As of 2026, H100 cloud pricing ranges from $2.23/hr (Lambda, RunPod spot) to $4+/hr (AWS, Azure on-demand). Reserved instances and spot pricing offer 30-60% savings. CoreWeave and Lambda typically offer the lowest rates.

What GPU is best for LLM training in 2026?

NVIDIA H200 SXM (141GB HBM3e) for proven clusters, B200 for next-gen 4-5x speedup over H100, or AMD MI300X (192GB) for budget-conscious teams. For JAX workloads, Google TPU v5p pods offer unmatched scale.

Blog/Cloud Pricing

Cloud Pricing2026-03-1015 min read

GPU Cloud Pricing in 2026: We Compared 7 Providers So You Don't Have To

An honest breakdown of GPU cloud pricing across AWS, Azure, GCP, Lambda Labs, CoreWeave, Together AI, and Crusoe — including the hidden costs that vendor pricing pages won't mention.

Every quarter, I update a spreadsheet that tracks GPU cloud pricing across every major provider. It started as a personal reference tool, but over the past two years it has become something I share with every client who asks "where should we rent GPUs?" The answer is never simple, because the listed per-GPU-hour price is just the beginning of the cost conversation.

This article is the public version of that spreadsheet, with context and analysis that raw numbers cannot convey.

The Raw Numbers: H100 SXM5 Pricing

Provider	On-Demand ($/GPU/hr)	1-Year Reserved	Spot/Preemptible	Egress ($/GB)
AWS (p5.48xlarge)	$12.29	~$8.50	~$4.90	$0.09
Azure (ND H100 v5)	$11.58	~$7.90	~$4.60	$0.087
GCP (a3-highgpu)	$11.28	~$7.60	~$3.80	$0.12
Lambda Labs	$2.49	N/A	N/A	Free
CoreWeave	$4.76	~$3.80	N/A	$0.05
Together AI	$3.50	N/A	N/A	Free
Crusoe	$2.80	~$2.20	N/A	$0.02

The spread is dramatic. Lambda Labs charges $2.49/GPU/hr while AWS charges $12.29 — a 4.9x difference for access to the same silicon. But if the decision were simply "pick the cheapest number," this article would be one paragraph long. Let me explain what you are actually paying for at each price tier.

What Hyperscalers Sell That Specialists Don't

I have had many conversations with CTOs who see the Lambda Labs pricing and ask "why would anyone use AWS?" The answer becomes clear when you look beyond the compute cost.

Security and Compliance Certifications

AWS, Azure, and GCP maintain SOC 2 Type II, HIPAA, FedRAMP, PCI-DSS, and ISO 27001 certifications. If your organization handles health data, financial data, or government contracts, these certifications are not optional — they are legally mandated. GPU cloud specialists generally have SOC 2, but may lack the specialized certifications that regulated industries require.

I worked with a healthcare AI startup that wanted to use Lambda Labs for training. Their compliance team flagged that Lambda lacked HIPAA BAA agreements, which meant any model trained on patient data — even de-identified data — needed to stay on AWS. The $2.49 vs $12.29 price difference was irrelevant because only one provider was legally viable.

Global Availability and Redundancy

AWS has GPU instances in 8+ regions worldwide. If your training data lives in eu-west-1 and you need to comply with GDPR data residency requirements, you need a provider with European GPU capacity. Lambda Labs currently operates exclusively in US data centers. CoreWeave has limited European presence. For organizations with global operations, hyperscaler reach matters.

Integrated Ecosystem

Training a model is not an isolated activity. You need data pipelines (pulling training data from S3, GCS, or Azure Blob), experiment tracking (integrated with SageMaker, Vertex AI, or Azure ML), model registries, serving infrastructure, and monitoring. On AWS, all of these services are in the same network, share the same IAM, and communicate without egress charges. On a standalone GPU provider, you need to build or buy each component separately and pay egress to move data between systems.

For a mature ML organization with existing hyperscaler infrastructure, the "cost" of switching to a cheaper GPU provider includes re-architecturing data pipelines, setting up new security controls, managing a second set of credentials and access policies, and training the team on a new platform. These migration costs are real and can exceed the compute savings for the first 6-12 months.

What Specialists Sell That Hyperscalers Don't

Price-Performance, Obviously

A 70B model training run on 32 GPUs for 6 days costs approximately $11,500 on Lambda Labs versus $56,700 on AWS. That is $45,200 in savings on a single run. For a research team running 4-6 training runs per month during architecture search, the annual savings can exceed $1 million. That funds 5-6 additional ML engineers.

Simplicity

Lambda Labs gives you SSH access to a GPU server. There is no managed ML platform, no configuration wizard, no 47-page IAM policy to debug. You ssh in, run your training script, and pull your checkpoints when it is done. For experienced ML engineers, this simplicity is a feature, not a limitation. You bring your own tooling (Weights & Biases for tracking, your own checkpoint management) and avoid the lock-in that comes with hyperscaler ML platforms.

GPU Density and Specialization

CoreWeave's infrastructure is designed from the ground up for GPU workloads. Their GPU-to-storage networking is optimized for the large sequential reads that training data loading requires, and their Kubernetes-native orchestration is built for GPU scheduling. Hyperscalers have retrofitted GPU support onto general-purpose infrastructure, which sometimes results in suboptimal storage performance or networking topology for GPU-heavy workloads.

The Hidden Costs Nobody Talks About

Egress Fees

This is the sleeper cost that catches teams off-guard. AWS charges $0.09/GB for data leaving their network. If you are training a model on AWS but your data team works on GCP, or your inference stack runs on CoreWeave, every checkpoint and model artifact transfer incurs egress charges.

Concrete example: Training a 70B model with checkpoints saved every 1,000 steps for 50,000 steps = 50 checkpoints x 140GB each = 7TB of checkpoint data. If you download all of that from AWS: 7,000 GB × $0.09 = $630 just in egress. It is not catastrophic, but it adds up across dozens of training runs, especially when you factor in training data uploads, evaluation data transfers, and model deployment artifacts.

Lambda Labs and Together AI include egress in their pricing. Crusoe charges $0.02/GB, which is 4.5x cheaper than AWS. This difference is worth calculating for data-heavy workflows.

Storage Costs

GPU instances on hyperscalers charge separately for attached storage. A training run that needs 10TB of fast NVMe storage (for datasets and checkpoints) on AWS costs approximately $1,200/month for io2 Block Express volumes. This is on top of the compute cost. Lambda Labs and CoreWeave typically include generous local NVMe storage with GPU instances.

Idle Time

Cloud GPUs charge by the hour whether you are using them or not. If your training job runs for 22 hours and your team does not start the next job for 6 hours (because it is 2am and nobody is monitoring), you just paid for 6 hours of idle GPU time at full price. On 8 GPUs at AWS pricing, that is $590 wasted.

Sophisticated teams build automation to immediately terminate instances after training completes and restart them on demand. But this automation itself requires engineering time to build and maintain. The "true" cloud cost includes both the compute price and the operational overhead of managing it efficiently.

Our Provider Recommendations by Use Case

Research and experimentation: Lambda Labs or Together AI. Lowest cost, minimal setup overhead, no commitment required. Best for teams that need GPUs for hours or days at a time with gaps between runs.

Production training (unregulated): CoreWeave with reserved pricing. Good balance of cost, reliability, and Kubernetes-native tooling. Reserved instances at $3.80/GPU/hr are competitive while offering better uptime guarantees than purely on-demand providers.

Production training (regulated): AWS or Azure with reserved instances. The compliance certifications and enterprise support justify the premium when regulatory requirements constrain your options.

Inference serving: CoreWeave or Crusoe for cost-optimized serving, AWS/GCP for global low-latency serving. Inference workloads benefit from geographic distribution (serving users from nearby regions), which currently favors hyperscalers.

Hybrid approach (our most common recommendation): Use a hyperscaler for data storage, experiment tracking, and inference serving. Use a GPU specialist for training. Move data in, train, move the model out. The training cost savings typically dwarf the egress costs, and you keep your operational infrastructure on a platform your team already knows.

Explore regularly tracked GPU cloud pricing on our Cloud Pricing Dashboard.

cloud pricingAWSAzureGCPLambda LabsCoreWeaveGPU cloud

Try Our GPU Tools

Compare GPUs, calculate TCO, and get AI-powered recommendations.

Data Center GPUs More Articles

NVIDIA B300 Ultra vs AMD MI355X: A Deep-Dive into the 2026 Data Center GPU Battle

2026-03-15 · 18 min read

Choosing the Right GPU for LLM Training in 2026: A Practitioner's Guide

2026-03-12 · 20 min read

How to Calculate GPU Total Cost of Ownership Without Fooling Yourself

2026-03-05 · 16 min read