LLM Inference Benchmarks

Independent benchmarks for production deployments. Throughput, latency, and capacity tested across hardware configurations.

Filters
NVFP4
MXFP4
FP8
BF16
NVIDIA
Apache 2.0
Modified MIT
MIT
Apache 2.0
NVIDIA
Z.ai
MiniMax
Z.ai
Qwen
OpenAI
Mistral AI
8x RTX Pro 6000 Blackwell
4x RTX Pro 6000 Blackwell
4x H200 SXM
2x RTX Pro 6000 Blackwell
1x RTX Pro 6000 Blackwell
1x H200 SXM
1x H100 SXM
1x RTX Pro 6000 Blackwell
1x H200 SXM
1x H100 SXM
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Model
Format
Parameters
Released
Organization
License
Configs
Mistral-Small-4-119B-2603
NVFP4
119B
3/17/2026
Mistral AI
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB262tok/s1 - 51K - 256K4 usersView
Nemotron-3-Super-120B-A12B
NVFP4
120B
3/11/2026
NVIDIA
NVIDIA
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB178tok/s1 - 51K - 512K6 usersView
Qwen3.5-397B-A17B
FP8
397B
2/17/2026
Qwen
Apache 2.0
1
8x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
8x RTX Pro 6000 Blackwell768GB244tok/s1 - 51K - 256K4 usersView
Qwen3.5-122B-A10B
FP8
122B
2/24/2026
Qwen
Apache 2.0
1
2x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
2x RTX Pro 6000 Blackwell192GB237tok/s1 - 51K - 256K7 usersView
Qwen3.5-27B
FP8
27B
2/24/2026
Qwen
Apache 2.0
2
1x RTX Pro 6000 Blackwell, 1x H100 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB102tok/s1 - 41K - 256K3 usersView
1x H100 SXM80GB312tok/s1 - 51K - 256K6 usersView
Qwen3.5-35B-A3B
FP8
35B
2/24/2026
Qwen
Apache 2.0
4
1x RTX Pro 6000 Blackwell, 1x H100 SXM, 2x RTX Pro 6000 Blackwell, 1x H200 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB598tok/s1 - 101K - 256K34 usersView
1x H100 SXM80GB908tok/s1 - 101K - 256K45 usersView
2x RTX Pro 6000 Blackwell192GB1,164tok/s1 - 151K - 256K27 usersView
1x H200 SXM141GB1,479tok/s1 - 151K - 256K62 usersView
Ministral-3-3B-Instruct-2512
FP8
3B
12/2/2025
Mistral AI
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB1,030tok/s1 - 61K - 256K23 usersView
Ministral-3-8B-Instruct-2512
FP8
8B
12/2/2025
Mistral AI
Apache 2.0
1
1x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB547tok/s1 - 51K - 256K14 usersView
Ministral-3-14B-Instruct-2512
FP8
14B
12/2/2025
Mistral AI
Apache 2.0
2
1x RTX Pro 6000 Blackwell, 2x RTX Pro 6000 Blackwell
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB302tok/s1 - 41K - 256K7 usersView
2x RTX Pro 6000 Blackwell192GB648tok/s1 - 61K - 256K8 usersView
MiniMax-M2.5
FP8
230B
2/13/2026
MiniMax
Modified MIT
2
4x RTX Pro 6000 Blackwell, 4x H200 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
4x RTX Pro 6000 Blackwell384GB226tok/s1 - 41K - 192K6 usersView
4x H200 SXM564GB498tok/s1 - 81K - 192K15 usersView
GLM-4.7-Flash
BF16
30B
1/19/2026
Z.ai
MIT
1
1x H200 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x H200 SXM141GB458tok/s1 - 41K - 200K12 usersView
Qwen3-Coder-Next
FP8
80B
2/3/2026
Qwen
Apache 2.0
3
1x RTX Pro 6000 Blackwell, 2x RTX Pro 6000 Blackwell, 1x H200 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB306tok/s1 - 41K - 256K5 usersView
2x RTX Pro 6000 Blackwell192GB765tok/s1 - 101K - 256K6 usersView
1x H200 SXM141GB853tok/s1 - 101K - 256K8 usersView
Devstral-Small-2-24B-Instruct-2512
FP8
24B
12/9/2025
Mistral AI
Apache 2.0
3
1x RTX Pro 6000 Blackwell, 1x H100 SXM, 1x H200 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB102tok/s1 - 31K - 256K5 usersView
1x H100 SXM80GB274tok/s1 - 31K - 256K8 usersView
1x H200 SXM141GB564tok/s1 - 51K - 256K8 usersView
Qwen3-Coder-30B-A3B-Instruct
FP8
30B
7/31/2025
Qwen
Apache 2.0
3
1x RTX Pro 6000 Blackwell, 1x H100 SXM, 1x H200 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB334tok/s1 - 41K - 256K10 usersView
1x H100 SXM80GB584tok/s1 - 61K - 192K15 usersView
1x H200 SXM141GB600tok/s1 - 61K - 256K17 usersView
gpt-oss-120b
MXFP4
117B
8/5/2025
OpenAI
Apache 2.0
4
1x RTX Pro 6000 Blackwell, 1x H100 SXM, 2x RTX Pro 6000 Blackwell, 1x H200 SXM
HardwareVRAMPeak ThroughputConcurrency TestedContext Length TestedChatbot User Capacity (32K)Full Report
1x RTX Pro 6000 Blackwell96GB361tok/s1 - 41K - 128K5 usersView
1x H100 SXM80GB511tok/s1 - 41K - 96K7 usersView
2x RTX Pro 6000 Blackwell192GB664tok/s1 - 61K - 128K8 usersView
1x H200 SXM141GB849tok/s1 - 101K - 128K26 usersView
Showing 
0
 of 
0
Methodology

How We Test

Each configuration runs 50+ test scenarios across different context lengths and concurrency levels up to what the model and hardware can support. This includes measuring standard metrics such as throughput and latency, as well as conducting capacity tests that increase concurrent requests until performance thresholds are exceeded.

Prompts use calibrated token counts. No prompt caching. No speculative decoding unless specifically noted.

Read Full Methodology →

Not Sure What You Need?

We work with teams to figure out the right model + hardware combination for your throughput, latency, and budget requirements.

Get a Recommendation