Home EnterpriseBroadcom Tomahawk Ultra Switch Targets AI Scale-Up with Lossless Ethernet

Broadcom Tomahawk Ultra Switch Targets AI Scale-Up with Lossless Ethernet

by Harold Fritts

Broadcom ships Tomahawk Ultra, a 51.2Tbps switch with 250ns latency and lossless Ethernet, redefining networking for AI and HPC workloads.

Broadcom has officially announced the shipment of the Tomahawk Ultra Ethernet Switch, a product set to redefine high-performance computing (HPC) and artificial intelligence (AI) networking. Designed for ultra-low latency, high throughput, and lossless operation, Tomahawk Ultra establishes a new standard for Ethernet switching in demanding technical environments.

Broadcom Tomahawk Ultra

Ram Velaga, Senior Vice President and General Manager of Broadcom’s Core Switching Group, emphasized that Tomahawk Ultra is the result of a multi-year engineering effort involving hundreds of specialists. This launch underscores Broadcom’s ongoing commitment to advancing Ethernet technology for the next generation of high-performance and AI-driven workloads.

Shattering Myths, Redefining Ethernet

Historically, Ethernet has been viewed as a high-latency, lossy technology, unsuitable for the most demanding compute clusters. Tomahawk Ultra challenges this perception by delivering:

  • Ultra-low latency: Achieves 250ns switch latency at a full 51.2 Tbps throughput, enabling real-time data movement for tightly coupled compute environments.
  • High performance: Supports line-rate switching for even the smallest 64-byte packets, handling up to 77 billion packets per second.
  • Optimized Ethernet headers: Reduces header size from 46 bytes to just 10 bytes while maintaining full Ethernet compliance, increasing network efficiency, and allowing for application-specific enhancements.
  • Lossless fabric: Implements Link Layer Retry (LLR) and Credit-Based Flow Control (CBFC) to eliminate packet loss and ensure reliable data delivery.

Purpose-Built for HPC and AI Scale-Up

Tomahawk Ultra is optimized for the low-latency, high-bandwidth communication patterns found in HPC systems and AI clusters. Its architecture is designed to deliver predictable, high-efficiency performance for large-scale simulations, scientific computing, and synchronized AI model training and inference.

When deployed with Scale-Up Ethernet (SUE), Tomahawk Ultra achieves sub-400ns XPU-to-XPU communication latency, including switch transit time, setting a new standard for tightly synchronized AI compute at scale.

The reduction of Ethernet header overhead from 46 bytes to 10 bytes, while maintaining compliance, significantly enhances network efficiency. This streamlined, adaptable header provides both flexibility and performance improvements across a range of HPC and AI workloads.

Lossless Fabric for Data-Intensive Workloads

Tomahawk Ultra’s lossless fabric technology is engineered to prevent packet drops during high-volume data transfers. Using LLR, the switch detects link errors with Forward Error Correction (FEC) and automatically retransmits packets, avoiding physical-level drops. CBFC further prevents buffer overflows, a common cause of packet loss. Together, these mechanisms create a truly lossless Ethernet fabric, delivering the reliability required by today’s most data-intensive applications.

A significant bottleneck in AI and machine learning workloads is the overhead associated with collective operations, such as AllReduce, Broadcast, and AllGather. Tomahawk Ultra addresses this by performing these operations directly within the switch chip, reducing job completion times and maximizing the utilization of expensive compute resources. Notably, this feature operates independently of endpoints, allowing for rapid integration across diverse system architectures and vendor ecosystems.

Topology-Aware Routing and Ecosystem Openness

Tomahawk Ultra is designed with advanced, topology-aware routing to support HPC topologies such as Dragonfly, Mesh, and Torus. The switch complies with the UEC standard and leverages the openness and rich ecosystem of Ethernet networking, ensuring broad compatibility and future-proofing for evolving data center architectures.

As part of Broadcom’s Ethernet-forward strategy for AI scaling, the company has introduced SUE-Lite, an optimized version of the SUE specification. SUE-Lite is tailored for power- and area-sensitive accelerator applications, retaining the core low-latency and lossless features of full SUE while further reducing the silicon footprint and power consumption of Ethernet interfaces on AI XPUs and CPUs. This lightweight approach simplifies the integration of standards-compliant Ethernet fabrics into AI platforms, promoting broader adoption of Ethernet as the preferred interconnect for scale-up architectures.

Together with the 102.4 Tb/s Tomahawk 6, Tomahawk Ultra forms the backbone of a unified Ethernet architecture, enabling both scalable AI training clusters and expansive HPC and distributed workloads.

Availability

The switch is currently shipping for use in rack-scale AI training clusters and supercomputing environments.

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed