Home Enterprise Dell PowerEdge XE9640 Liquid-Cooled GPU Server Deep Dive

Dell PowerEdge XE9640 Liquid-Cooled GPU Server Deep Dive

by Harold Fritts

The Dell PowerEdge XE9640 is a 4x GPU-accelerated rack-mount server capable of delivering AI power in a power-efficient way, thanks to liquid cooling. The XE9640 was announced during SC22 along with the XE8640 and one of our favorites, the 8-way XE9680 GPU server. Today, the XE9640 is generally available, and we’re taking a deep dive into the underlying hardware.

The Dell PowerEdge XE9640 is a 4x GPU-accelerated rack-mount server capable of delivering AI power in a power-efficient way, thanks to liquid cooling. The XE9640 was announced during SC22 along with the XE8640 and one of our favorites, the 8-way XE9680 GPU server. Today, the XE9640 is generally available, and we’re taking a deep dive into the underlying hardware.

Dell PowerEdge XE9640

Dell PowerEdge XE9640

Dell GPU Accelerated Server Family

The PowerEdge XE family of servers is purpose-built for complex AI and HPC workloads that require performance and reliability. These servers are designed to be high-performance, insights-driven, and intelligent. Today, the XE server family comprises the XE9680 (Did we mention it was a Jordan favorite?), XE9640, XE8640, and XE8545. The common thread between all these servers is the design to support a wide variety of AI initiatives with a cooling option that makes sense to the customer.

Products

Purpose Benefits Use Cases
XE9680 Designed to boost insights with AI acceleration designed for optimal performance and the fastest time-to-value Leverage extreme performance for AI and HPC with 8x NVIDIA H100 or A100 Tensor Core SXM GPUs Large Language Models
Smart air-cooled operation (up to 35C) maximizes data conversion into outcomes Natural Language processing
Smart air-cooled operation (up to 35C) maximizes data conversion into outcomes Large Recommendation engine training
Modeling & Simulation
Digital Twins & Manufacturing
XE9640 Purpose-built to drive AI initiatives in a highly dense 2U, smart liquid-cooled server Drive greater outcomes for AI with 4x NVIDIA H100 Tensor Core GPUs or 4x Intel Data Center Max OAM GPUs Natural Language processing
Smart liquid-cooled CPUs and GPUs maximize performance Large Recommendation engine training
Lower TCO with optimized power utilization efficiency Modeling & Simulation
Artificial Intelligence, ML/DL Training for object recognition
XE8640 Drive AI, HPC, and analytics workloads with superior performance Automate analysis into insights with 4x NVIDIA H100 GPUs for a wide range of applications Medium data set language Models
Run air-cooled (up to 35C) to increase power efficiency Natural Language processing
Scale up operations with intelligent expansion options Modeling & Simulation
Artificial Intelligence, ML/DL Training and Inferencing, image recognition
XE8545 Mainstream AI and graphics application performance Boost Training and inferencing performance with 4x NVIDIA A100 GPUs Modeling and simulation, including Seismic analysis
Air-cooled operation (up to 35C) drives efficient operation Artificial Intelligence, ML/DL Training and Inferencing, image recognition, and Chatbot
Lower TCO with a balanced performance/watt solution

Dell PowerEdge XE9640 – GPU Diversity and Liquid Cooling

The PowerEdge XE9640 offers direct liquid cooling (DLC) to the GPUS and CPUs thanks to a deep partnership with partner CoolIT. The server does retail a few fans to ensure the DRAM, storage, and PCIe expansion cards receive sufficient airflow and cooling. That said, these fans do not need to operate at max-RPM, saving substantial power.

Dell PowerEdge XE9640 NVIDIA H100 Tray

Dell PowerEdge XE9640 – NVIDIA GPU Tray

The PowerEdge XE9640 brings GPU diversity to the table, offering a choice between a 4x NVIDIA NVLink interconnected H100 SXM5 GPU 700W module or 4x Intel XeLink interconnected Intel Data Center GPU Max 1550 600W Open Compute Platform (OCP) Accelerator Module (OAM).

Thanks to technologies like NVIDIA NVLink and Intel Xelink, these GPUs communicate seamlessly, effectively pooling their memory and cores. This is especially beneficial for handling memory-coherent workloads like Large Language Models (LLM). This versatility makes it suitable for a wide array of AI workloads. Of course, the obvious question is, “What about AMD Instinct?” Dell is constantly evaluating additional GPU support but does not offer an AMD solution in this server at launch.

Dell PowerEdge XE9640 Intel GPU tray

Dell PowerEdge XE9640 – Intel GPU Tray

This GPU diversity lets users address the escalating demands of generative AI, industrial simulation modeling, and cutting-edge scientific research. For instance, the PowerEdge XE9640’s Intel Data Center GPU Max accelerator capabilities have been put to use at the Texas Advanced Computing Center (TACC) for their Stampede3 supercomputer.

In terms of density and cooling, the PowerEdge XE9640 has been engineered and designed to make efficient use of rack space while boosting performance. With its compact 2RU profile, this server offers impressive GPU capacity per rack, maximizing valuable data center space. Employing DLC, the PowerEdge XE9640 outperforms traditional air-cooled systems in terms of efficiency and cost-effectiveness.

Dell PowerEdge XE9640 – The Rest of the Hardware

Beyond “just” the GPUs, the engineering behind the XE series is first-class. We just posted a video review of the XE9640 and XE8640. The video provides excellent detail about the design, from drive access and routing the DLC pipes on the XE9640 to closed loop GPU liquid cooling on the XE8640 and the path for future enhancements on all the XE servers. It’s embedded below for reference.

Further to the XE9640, removing the bezel provides easy access to NVMe drives. Two slots on the right side of the chassis support the NVMe boot-optimized storage subsystem (BOSS) drives and include HWRAID 1 via 2 x M.2 SSDs. Today, primary storage is provided via 4x U.2 Gen4 NVMe drives. This will double in a future release thanks to an option for an SSD tray to support 8x E3.S Gen5 NVMe drives. The platform doesn’t support HW RAID for the U.2 NVMe bays, although most won’t need it. Many of these GPU boxes leverage massive datasets externally. Local storage, in this case, won’t be the primary source for these large datasets.

Of course, there are good expansion options in the back of the server for AI professionals needing to tap into massive storage arrays. The XE9640 supports four PCIe Gen5 slots, two half-height and two full-height. Additionally, you have a PCIe Gen3 OCP NIC slot.

Dell PowerEdge XE9640 front

Dell PowerEdge XE9640 Front

To calm fears of a leak, liquid-cooled servers include leak detection reporting in iDRAC. Dell’s method to detect leaks is pretty incredible when you drill into different parts of the chassis. For example, in the shot of the CPU cooling plate below, you can make out fine copper traces in a lopped pattern around the entire water block. If any water drips onto these connections, the open wiring loop detects a small short, and the system knows a leak has occurred. Braided wire rope is used with a similar detection method in other parts of the chassis. This is visible in our photo of the main liquid distribution block with the numerous hoses at the front of the chassis. Additionally, the CoolIT CDUs and the rest of the loop also have leak detection reporting at many points along the way.

Dell PowerEdge XE9640

Dell PowerEdge XE9640 Leak Detection

Incidentally, the air-cooled XE8640 and XE9680 also include closed-loop GPU liquid cooling with the same leak detection through iDRAC.

Optimizing Power-Per-Rack

The PowerEdge XE9640 offers customers the opportunity to fine-tune their power-per-rack utilization. With nine servers per rack, a peak load of accelerated computing might demand around 41kW of power, utilizing a three-phase power distribution for balanced performance. For scaling up, data centers can deploy racks housing 12, 18, or even 21 PowerEdge XE9640 servers, achieving power levels of approximately 54kW, 81kW, and 95kW, respectively. This adaptability empowers data centers to optimize their rack power usage according to specific requirements.

Dell PowerEdge XE9640 liquid manifold

Dell PowerEdge XE9640 Interior Liquid Cooling Manifold

Dell has compiled a variety of resources to provide customers with a comprehensive insight into the PowerEdge XE9640. This includes unboxing videos and detailed product reviews. The unboxing video showcases the server’s design and features, offering customers a visual tour of its capabilities.

To further emphasize the advantages of the PowerEdge XE9640, Dell has crafted an infographic that compares it with its air-cooled counterpart, the PowerEdge XE8640. This infographic highlights the notable distinctions of the PowerEdge XE9640, particularly its efficiency in liquid cooling and impressive GPU capacity per rack.

Dell PowerEdge XE9640 rear

Dell PowerEdge XE9640 Rear

Final Thoughts

The PowerEdge XE9640 is integral to Dell’s expanding Generative AI Solutions, designed to revolutionize AI workloads and foster innovation. Dell’s Generative AI solution combines cutting-edge technology, innovation, and services offered by Dell Technologies to deliver more intelligent and expedited outcomes. By harnessing the capabilities of generative AI, organizations can achieve fresh insights, expedite their transformation efforts, and enhance workforce efficiency.

While the XE9680 may remain our favorite of the Dell GPU servers, the XE9640 has earned its way into our hearts purely from an efficiency and design perspective. The 2U chassis squeezes in a tremendous amount of engineering; data centers on liquid cooling will clearly gravitate to these power-efficient boxes. The four NVIDIA modules take 2800W for themselves, so anything Dell can do to help data centers be more rack- and power-efficient is a giant leap forward.

Infographic

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed