MLPerf Inference v5.1 offers rigorous benchmarks for AI inference across LLMs, vision, and multimodal tasks. Here is the technical analysis on NVIDIA’s Blackwell Ultra results and AMD’s Instinct MI300X/MI325X/MI355X submissions. It includes detailed benchmark data, software optimizations, architectural strategies, and implications for hyperscalers and enterprises.
MLPerf Inference benchmarks serve as the scoreboard for AI accelerators, providing a standardized, apples-to-apples method for measuring performance across image classification, language models, and recommendation systems. For enterprises deploying GPUs at scale, these numbers often guide massive purchasing decisions.
MLPerf Inference is governed by MLCommons and defines Closed vs Open Division rules. Scenarios include:
The v5.1 workload suite included:
When charts of throughput (tokens/sec or samples/sec) are shown, they typically highlight NVIDIA’s continued dominance in absolute numbers, especially per-GPU. However, AMD’s progress in relative efficiency and scaling smoothness shows they are closing the gap round over round, particularly in server scenarios and multi-node deployments.
Blackwell Ultra systems achieved record throughput across all new workloads. Key enablers:
The bar chart below, contrasting Hopper vs Blackwell Ultra per-GPU, shows nearly 5× throughput uplift on DeepSeek-R1, validating that fine-grained data formats (NVFP4) and rack-wide bandwidth scale linearly for massive inference workloads. Another figure framed around interactive Llama-405B record-setting results demonstrates how NVIDIA uses NVLink fabric and disaggregated serving to keep latency SLAs while breaking past previous throughput limits. The takeaway: NVIDIA remains the industry pace-setter for sheer raw speed and latency-sensitive workloads.
AMD targeted efficiency and flexibility:
The above chart showing MI325X FP8 vs. MI355X FP4 illustrates breakthrough FP4 benefits, higher tokens/sec with minimal accuracy loss, proving FP4 isn’t experimental but deployment-ready.
This scaling curve (1 → 8 nodes) chart highlights near-linear scaling, something hyperscalers value since it translates to predictable expansion costs. Meanwhile, the schematic of structured pruning shows AMD’s focus isn’t only on hardware brute force but also on algorithmic efficiency, crucial for real-world inference clusters constrained by power and space.
| Metric | NVIDIA Blackwell Ultra | AMD MI355X | Notes | Winner |
| Tokens/sec per GPU | 5842 (DeepSeek-R1) | ~2200 (scaled FP4) | Higher NVIDIA raw perf | NVIDIA |
| Memory per GPU | 192GB HBM3e | 288GB HBM3e | AMD fits 520B model single GPU | AMD |
| Precision support | FP16, FP8, NVFP4 | FP16, FP8, FP4 | Both 4-bit lead | Tie |
| Scaling Fabric | 72 GPU NVLink | Linear node scaling to 8 | Different scale strategies | Tie |
Side-by-side visualizations emphasize the trade-offs.
NVIDIA continues to set the pace in raw throughput, with H200 GPUs leading inference in ResNet-50 and BERT across closed division benchmarks. That said, AMD’s MI300 doesn’t trail far behind, posting competitive numbers in server and offline scenarios while also showing strong gains in power efficiency. The real story here isn’t just that NVIDIA remains on top, but that AMD has significantly closed the gap compared to just one MLPerf round ago. For buyers, this means the days of looking only at green GPUs may be over—ROCm is maturing, and MI300 is viable in real-world inference deployments.
Dell Technologies has introduced a broad infrastructure refresh spanning storage, servers, cyber resilience, and private cloud automation, positioning the portfolio…
Kicking off Dell Technologies World, Dell announced an expansion of the Dell AI Factory with NVIDIA. This broad update targets…
At Dell Technologies World, Dell announced an expanded partnership with Samsung Electronics to support AI-driven semiconductor manufacturing, with Dell AI…
At VeeamON 2026 in New York City, Veeam Software announced Veeam Intelligent ResOps, a new resilience capability designed to unify…
Last week, we had the opportunity to attend NerdioCon, held May 4–6, 2026, at the La Quinta Resort & Club…
NetApp announced a set of data management updates for Red Hat OpenShift to improve backup predictability, disaster recovery, and operational…