Skip to content
Data Transparency

Methodology &
Data Sources

Enterprise GPU decisions are high-stakes. This page documents exactly where every number on GPU Advisor comes from, how often it is updated, and where our data has known limitations.

Weekly

Pricing Updates

100%

Public Sources

0

Vendor-Paid Rankings

Apr 2026

Last Full Audit

How We Calculate Each Metric

Performance Benchmarks

  • All FP16/BF16 throughput figures use the decode phase of autoregressive LLM generation (output tokens per second), not prefill. This is the metric that dominates in production inference workloads.
  • NVIDIA and AMD figures are sourced from MLPerf Inference or reproducible community vLLM benchmarks at batch size 1 (latency-bound) and batch size 16–32 (throughput-bound). We display throughput-bound numbers.
  • Google TPU figures use JAX/JetStream benchmarks and are NOT directly comparable to NVIDIA/AMD CUDA/ROCm numbers due to different framework overhead, memory allocation strategies, and programming models. This is clearly noted wherever TPU data appears.
  • FP8 TFLOPS are peak theoretical and assume full hardware utilization. Real-world FP8 inference depends on model support and kernel availability.
💰

Cloud Pricing Collection

  • All prices are public list prices in USD for the us-east-1 / us-central1 / East US regions unless otherwise noted. Prices in other regions vary by 5–25%.
  • On-demand pricing is verified manually each week against the official pricing page of each provider. We do not accept pricing data from providers directly.
  • Spot pricing is inherently volatile and represents a recent average, not a guarantee. Actual spot prices can be 30–70% of the listed on-demand rate.
  • CoreWeave and Lambda Labs prices are per-GPU. AWS, GCP, Azure prices are per-instance (8-GPU node) divided by 8 for per-GPU comparisons.
  • We do not receive commission or placement fees that influence pricing data. Affiliate links (clearly disclosed) are only present in 'Deploy' buttons.
📊

TCO & ROI Calculations

  • Training cost estimates use the Chinchilla scaling law (20× parameters = optimal token count) and 35% Model FLOPs Utilization (MFU), which is typical for production clusters without extreme optimization.
  • Inference cost per token assumes 70% GPU utilization, single-GPU throughput at batch size 16, and 730 hours/month (full calendar month). Real costs vary by batch size, model quantization, and actual utilization.
  • ROI comparisons assume equivalent GPU generations (e.g., H100 vs H100) across providers. Cross-generation comparisons (H100 vs B200) account for throughput differences using benchmark-derived correction factors.
  • On-premise TCO excludes power and cooling costs by default. Enable the 'Include OpEx' toggle in the TCO calculator to add estimated $0.10/kWh and 1.4 PUE.
⚠️

What We Don't Cover

  • Spot pricing SLA — spot instances can be preempted without notice. We show historical averages only.
  • Private / negotiated enterprise pricing — hyperscaler contract pricing for large deployments can be 20–50% below list price. Our numbers are list price.
  • Multi-region latency, compliance, and data residency requirements — these are workload-specific and outside our scope.
  • Fine-tuned model performance — all benchmarks use standard pretrained model weights. Fine-tuned or quantized models will show different throughput.

Update Schedule

Data TypeFrequencyMethod

Cloud GPU on-demand pricing

Weekly (every Monday)

Manual spot-check of provider pricing pages + automated diff alerts

Spot / preemptible pricing

Weekly

Provider APIs where available; manual otherwise

Reserved / committed pricing

Monthly

Major provider pricing pages; rarely changes mid-month

GPU performance benchmarks

On new MLPerf release

MLCommons publishes ~2× per year; incorporated within 2 weeks

Hardware specifications

On product announcement

Updated within 24–48 hrs of official manufacturer announcement

TCO model assumptions

Quarterly

Power costs, rack/colocation rates reviewed each quarter

Primary Data Sources

GPU Performance Benchmarks

MLCommons MLPerf Inference v4.1

Primary throughput source for H100, A100, L40S, MI300X

MLCommons MLPerf Training v3.1

Training throughput for BERT, ResNet, GPT-3

NVIDIA Official Product Briefs

FP8, FP16, INT8 TFLOPS specifications

AMD GPU Technical Specifications

MI300X, MI325X, MI355X official specs

Google Cloud TPU Documentation

TPU v5e, v5p, v6e BF16 and INT8 figures

Independent vLLM Benchmarks

LLM inference tok/s from community reproducible runs

Cloud GPU Pricing

AWS EC2 Pricing API

p5, p4d, g5 family — on-demand, spot, 1yr/3yr reserved

GCP Compute Engine Pricing

a3-highgpu, a4-highgpu, TPU v4/v5 families

Azure VM Pricing

ND, NCads GPU families

Lambda Labs API Pricing

H100, A100, A10 on-demand rates

CoreWeave Pricing Page

H100, H200, B200, MI300X per-GPU rates

RunPod GPU Pricing

Secure Cloud and Community Cloud GPU rates

Hardware Specifications

NVIDIA Data Center Product Page

VRAM, TDP, NVLink bandwidth, architecture

AMD Instinct Product Page

HBM3/3e capacity, xGMI interconnect specs

Google TPU Architecture Overview

Matrix multiply units, HBM capacity per chip

Hot Chips Presentations (IEEE)

Die-level architecture details for Blackwell, CDNA4

Industry Research

SemiAnalysis GPU Reports

Supply chain, pricing trends, TCO models

Epoch AI Compute Trends

Training compute scaling laws, hardware trends

Artificial Analysis

Real-world LLM API latency and throughput

⚠️

Commercial Disclosure

GPU Advisor earns referral commissions through affiliate links on "Deploy" buttons and provider sign-up links. These commercial relationships do NOT influence our benchmark rankings, pricing data, or editorial analysis. Affiliate links are clearly marked. If you believe any data has been influenced by commercial interests, please contact us.