Methodology &
Data Sources
Enterprise GPU decisions are high-stakes. This page documents exactly where every number on GPU Advisor comes from, how often it is updated, and where our data has known limitations.
Weekly
Pricing Updates
100%
Public Sources
0
Vendor-Paid Rankings
Apr 2026
Last Full Audit
How We Calculate Each Metric
Performance Benchmarks
- —All FP16/BF16 throughput figures use the decode phase of autoregressive LLM generation (output tokens per second), not prefill. This is the metric that dominates in production inference workloads.
- —NVIDIA and AMD figures are sourced from MLPerf Inference or reproducible community vLLM benchmarks at batch size 1 (latency-bound) and batch size 16–32 (throughput-bound). We display throughput-bound numbers.
- —Google TPU figures use JAX/JetStream benchmarks and are NOT directly comparable to NVIDIA/AMD CUDA/ROCm numbers due to different framework overhead, memory allocation strategies, and programming models. This is clearly noted wherever TPU data appears.
- —FP8 TFLOPS are peak theoretical and assume full hardware utilization. Real-world FP8 inference depends on model support and kernel availability.
Cloud Pricing Collection
- —All prices are public list prices in USD for the us-east-1 / us-central1 / East US regions unless otherwise noted. Prices in other regions vary by 5–25%.
- —On-demand pricing is verified manually each week against the official pricing page of each provider. We do not accept pricing data from providers directly.
- —Spot pricing is inherently volatile and represents a recent average, not a guarantee. Actual spot prices can be 30–70% of the listed on-demand rate.
- —CoreWeave and Lambda Labs prices are per-GPU. AWS, GCP, Azure prices are per-instance (8-GPU node) divided by 8 for per-GPU comparisons.
- —We do not receive commission or placement fees that influence pricing data. Affiliate links (clearly disclosed) are only present in 'Deploy' buttons.
TCO & ROI Calculations
- —Training cost estimates use the Chinchilla scaling law (20× parameters = optimal token count) and 35% Model FLOPs Utilization (MFU), which is typical for production clusters without extreme optimization.
- —Inference cost per token assumes 70% GPU utilization, single-GPU throughput at batch size 16, and 730 hours/month (full calendar month). Real costs vary by batch size, model quantization, and actual utilization.
- —ROI comparisons assume equivalent GPU generations (e.g., H100 vs H100) across providers. Cross-generation comparisons (H100 vs B200) account for throughput differences using benchmark-derived correction factors.
- —On-premise TCO excludes power and cooling costs by default. Enable the 'Include OpEx' toggle in the TCO calculator to add estimated $0.10/kWh and 1.4 PUE.
What We Don't Cover
- —Spot pricing SLA — spot instances can be preempted without notice. We show historical averages only.
- —Private / negotiated enterprise pricing — hyperscaler contract pricing for large deployments can be 20–50% below list price. Our numbers are list price.
- —Multi-region latency, compliance, and data residency requirements — these are workload-specific and outside our scope.
- —Fine-tuned model performance — all benchmarks use standard pretrained model weights. Fine-tuned or quantized models will show different throughput.
Update Schedule
Cloud GPU on-demand pricing
Weekly (every Monday)
Manual spot-check of provider pricing pages + automated diff alerts
Spot / preemptible pricing
Weekly
Provider APIs where available; manual otherwise
Reserved / committed pricing
Monthly
Major provider pricing pages; rarely changes mid-month
GPU performance benchmarks
On new MLPerf release
MLCommons publishes ~2× per year; incorporated within 2 weeks
Hardware specifications
On product announcement
Updated within 24–48 hrs of official manufacturer announcement
TCO model assumptions
Quarterly
Power costs, rack/colocation rates reviewed each quarter
Primary Data Sources
GPU Performance Benchmarks
Primary throughput source for H100, A100, L40S, MI300X
Training throughput for BERT, ResNet, GPT-3
FP8, FP16, INT8 TFLOPS specifications
MI300X, MI325X, MI355X official specs
TPU v5e, v5p, v6e BF16 and INT8 figures
LLM inference tok/s from community reproducible runs
Cloud GPU Pricing
p5, p4d, g5 family — on-demand, spot, 1yr/3yr reserved
a3-highgpu, a4-highgpu, TPU v4/v5 families
ND, NCads GPU families
H100, A100, A10 on-demand rates
H100, H200, B200, MI300X per-GPU rates
Secure Cloud and Community Cloud GPU rates
Hardware Specifications
VRAM, TDP, NVLink bandwidth, architecture
HBM3/3e capacity, xGMI interconnect specs
Matrix multiply units, HBM capacity per chip
Die-level architecture details for Blackwell, CDNA4
Industry Research
Supply chain, pricing trends, TCO models
Training compute scaling laws, hardware trends
Real-world LLM API latency and throughput
Commercial Disclosure
GPU Advisor earns referral commissions through affiliate links on "Deploy" buttons and provider sign-up links. These commercial relationships do NOT influence our benchmark rankings, pricing data, or editorial analysis. Affiliate links are clearly marked. If you believe any data has been influenced by commercial interests, please contact us.