vLLM Deployment
Calculator
Select your model, GPU, and target throughput — get exact GPU count, monthly cloud cost, and cost per million tokens across providers. Based on vLLM benchmarks at batch size 16.
Deployment Plan
GPUs Needed
1×
VRAM-constrained
Model VRAM
70GB
FP8 quantization
Throughput
1,800
tokens/sec
Cost per Hour
$2.19
1× GPUs
Monthly Cost
$639
at 40% utilization
Cost per 1M Tokens
$0.34
output tokens
RECOMMENDED CONFIG
1× H100 SXM5 80GB on Lambda Labs running Llama 3.1 70B (FP8) via vLLM. Delivers 1,800 tok/s at $639/mo.
Provider Cost Comparison (1× H100 SXM5 80GB)
How the Calculator Works
Minimum GPUs for VRAM
The model must fit in GPU memory. GPU count = ⌈model_vram / gpu_vram⌉. For Llama 3.1 70B FP8 (70GB) on H100 (80GB): 1 GPU minimum.
GPUs for Target Throughput
Throughput scales linearly with GPU count (with some overhead). GPUs for throughput = ⌈target_tok_s / single_gpu_tok_s⌉. Final count = max(VRAM constraint, throughput constraint).
Monthly Cost
Monthly cost = GPU_count × hourly_rate × 730 hours × utilization_factor. Utilization accounts for idle time during off-peak hours.
Cost per Million Tokens
$/1M tokens = (hourly_cost / (tokens_per_sec × 3600)) × 1,000,000. This is the output-token cost at sustained throughput — actual cost depends on prompt/completion ratio.
Assumptions & Limitations
- —Throughput figures are vLLM community benchmarks at batch size 16, decode phase only (output tokens). Actual throughput depends on prompt length and batch configuration.
- —KV cache memory not included in the VRAM estimate — for small models at low concurrency, this is fine. For 100+ concurrent users or long context (32K+), add KV cache budget separately.
- —Multi-GPU tensor parallelism efficiency assumed at ~85% (15% overhead). Real overhead varies by model architecture and NVLink bandwidth.
- —Throughput figures are for FP8 and FP16 with standard vLLM defaults. TensorRT-LLM can deliver 20–50% more throughput on NVIDIA GPUs at the cost of longer compilation times.
- —Prices are public on-demand rates as of May 2026. Spot/reserved pricing is 30–60% lower.