LLM Inference Benchmarks

Independent benchmarks for production deployments. Throughput, latency, and capacity tested across hardware configurations.

Filters
NVFP4
MXFP4
FP8
BF16
NVIDIA
Apache 2.0
Modified MIT
MIT
Apache 2.0
Google
NVIDIA
Z.ai
MiniMax
Z.ai
Qwen
OpenAI
Mistral AI
1x MI300X
1x L40S
8x RTX Pro 6000 Blackwell
4x RTX Pro 6000 Blackwell
4x H200 SXM
2x RTX Pro 6000 Blackwell
1x RTX Pro 6000 Blackwell
1x H200 SXM
1x H100 SXM
1x RTX Pro 6000 Blackwell
1x H200 SXM
1x H100 SXM
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Model
Format
Parameters
Released
Organization
License
Configs
Qwen3.6-35B-A3B-MTP
FP8
35B
4/16/2026
Qwen
Apache 2.0
1
1x MI300X
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x MI300X192GB376tok/s1 - 31K - 128K2 usersView
Qwen3.6-35B-A3B
FP8
35B
4/16/2026
Qwen
Apache 2.0
1
1x MI300X
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x MI300X192GB256tok/s1 - 31K - 128K3 usersView
Qwen3.6-27B-MTP
FP8
27B
4/22/2026
Qwen
Apache 2.0
1
1x MI300X
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x MI300X192GB254tok/s1 - 31K - 128KView
Qwen3.6-27B
FP8
27B
4/22/2026
Qwen
Apache 2.0
1
1x MI300X
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x MI300X192GB143tok/s1 - 31K - 128KView
Gemma-4-26B-A4B
FP8
26B
4/2/2026
Google
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB674tok/s1 - 101K - 256K13 usersView
Gemma-4-31B
FP8
31B
4/2/2026
Google
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB129tok/s1 - 41K - 96K1 usersView
Gemma-4-31B
NVFP4
31B
4/2/2026
Google
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB126tok/s1 - 41K - 128K2 usersView
Mistral-Small-4-119B-2603
NVFP4
119B
3/17/2026
Mistral AI
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB262tok/s1 - 51K - 256K4 usersView
Nemotron-3-Super-120B-A12B
NVFP4
120B
3/11/2026
NVIDIA
NVIDIA
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB178tok/s1 - 51K - 512K6 usersView
Qwen3.5-397B-A17B
FP8
397B
2/17/2026
Qwen
Apache 2.0
1
8x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
8x RTX Pro 6000 Blackwell768GB244tok/s1 - 51K - 256K4 usersView
Qwen3.5-122B-A10B
FP8
122B
2/24/2026
Qwen
Apache 2.0
1
2x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
2x RTX Pro 6000 Blackwell192GB237tok/s1 - 51K - 256K7 usersView
Qwen3.5-27B
FP8
27B
2/24/2026
Qwen
Apache 2.0
2
1x RTX Pro 6000 Blackwell, 1x H100 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB102tok/s1 - 41K - 256K3 usersView
1x H100 SXM80GB312tok/s1 - 51K - 256K6 usersView
Qwen3.5-35B-A3B
FP8
35B
2/24/2026
Qwen
Apache 2.0
4
1x RTX Pro 6000 Blackwell, 1x H100 SXM, 2x RTX Pro 6000 Blackwell, 1x H200 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB598tok/s1 - 101K - 256K34 usersView
1x H100 SXM80GB908tok/s1 - 101K - 256K45 usersView
2x RTX Pro 6000 Blackwell192GB1,164tok/s1 - 151K - 256K27 usersView
1x H200 SXM141GB1,479tok/s1 - 151K - 256K62 usersView
Ministral-3-3B-Instruct-2512
FP8
3B
12/2/2025
Mistral AI
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB1,030tok/s1 - 61K - 256K23 usersView
Ministral-3-8B-Instruct-2512
FP8
8B
12/2/2025
Mistral AI
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB547tok/s1 - 51K - 256K14 usersView
Showing 
0
 of 
0
Methodology

How We Test

Each configuration runs 50+ test scenarios across different context lengths and concurrency levels up to what the model and hardware can support. This includes measuring standard metrics such as throughput and latency, as well as conducting capacity tests that increase concurrent requests until performance thresholds are exceeded.

Prompts use calibrated token counts. No prompt caching. No speculative decoding unless specifically noted.

Read Full Methodology →

Not Sure What You Need?

We work with teams to figure out the right model + hardware combination for your throughput, latency, and budget requirements.

Get a Recommendation