Enterprise

Dell PowerEdge XE9680 Review: My Favorite Server Ever Tested

We usually wait until the end of the article to paint the whole picture and complete the review. However, the Dell PowerEdge XE9680 presents such an exciting piece of hardware we couldn’t wait to share our excitement with this positive review. Dell’s design is centered around the needs of AI, providing an immense amount of computational power in a 6U form factor. Thanks to Dell’s partnership with Intel and NVIDIA on the XE9680, they have come up with an offering that is not only powerful but highly efficient.

We usually wait until the end of the article to paint the whole picture and complete the review. However, the Dell PowerEdge XE9680 presents such an exciting piece of hardware we couldn’t wait to share our excitement with this positive review. Dell’s design is centered around the needs of AI, providing an immense amount of computational power in a 6U form factor. Thanks to Dell’s partnership with Intel and NVIDIA on the XE9680, they have come up with an offering that is not only powerful but highly efficient.

Dell PowerEdge XE9680

Its specs are nothing short of impressive; two 4th Generation Intel Xeon Scalable processors, 56 cores each, and 2TB of DDR5 ram provide the CPU backbone for the AI accelerators. Then adding in the eight NVIDIA HGX H100 or A100 GPUs, connected on SXM and together through NVLink, the server is equipped to handle the largest of models and data workloads.

The PowerEdge XE9680’s capacity for large RAM volumes (up to 4TB) provides a significant competitive edge in handling AI workloads. Such large memory footprints allow for the training of more complex models, leading to higher performance and more accurate results.

Our configurations include 8x U.2 NVMe SSD bays in the front. But just as we saw with the R660, Dell intends to offer an E3.S backplane as well, with 16x E3.S SSDs. The server also supports the NVMe BOSS-N1 boot drive rig on the rear of the server.

It’s About More Than Just Power

The PowerEdge XE9680 is not just about power; it also prioritizes security and manageability. With features like cryptographically signed firmware, Data at Rest Encryption, and Secure Boot, the server ensures your data is always protected. The embedded iDRAC9 system provides an easy-to-use management interface, offering a variety of tools and integrations that make managing the server straightforward and hassle-free.

We put this system to the test when we wanted to switch up the OS install (more on this later) on the server, and it was simpler to use the Cryptographic Erase function of the iDRAC with only a few clicks to start with a clean system to work with.

nvidia-smi on the XE9680 with H100 cards

Previously we looked at Dell’s whitepaper on the XE9680 and the performance of the Dell PowerEdge XE9680 server when evaluated against the image generation latency benchmarks set by Lambda. The server delivered roughly double the throughput, which is a testament to the power and efficiency of the PowerEdge XE9680.

The MLPerf scores are widely known and accepted as a good rank positioning of performance of systems in this class, so to take advantage of our hands-on time with the XE9680 A100 and H100 servers, we decided to run a head-to-head comparison of fine-tuning Meta’s LLaMa with the two systems. To do this, we followed Stanford’s Alpaca training steps, which they accomplished using 4x A100 systems.

Alpaca Training on the XE9680

We want to thank the teams at NVIDIA and Dell for their assistance with this project. This is such a cutting-edge technology from a hardware and software perspective that without the guidance of industry experts from both companies, it would have been a much more drawn-out, intensive process.

On the A100 system, using the process outlined on the Stanford Alpaca Git Hub, we could reproduce the steps to create the Alpaca checkpoints completing the three epoch’s of training, averaging approximately 90 minutes.

Moving to the H100 system, we saw an improvement with runs completed in around 70 minutes per run. Due to the high demand and limited nature of these systems, we did not have an opportunity to tune the code itself to explore possible performance improvements, and it was clear that with refinement and time dedicated to development, an enterprise team could achieve a rapid turn time with fine-tuning.

Officially Unofficial

On the A100-flavored XE9680, we had an opportunity to do some outside-the-box and highly unorthodox testing. We installed Windows Server 2022 on the system! This involved some iDRAC drive wipes to remove the Linux installation, some tricks with iDRAC .iso files and virtual media to sideload Intel Network drivers, and from there, it was off to the races. Using the chipset drives from a comparable Dell PowerEdge system and then official NVIDIA A100 Drivers.

The system works and is stable with no issues. However, this is a highly unusual use case that Dell does not officially support. So armed with our fresh Windows install and eight of the best GPUs on the market, we did what we do best, benchmark Pi!

The eight NVIDIA A100 cards easily crushed GPU-Pi world records without any tuning, and the Xeon Platinum CPUs threw up some great numbers on both y-cruncher and Cinebench. We tried a few of our other standard CPU/GPU benchmarks with little hope of them working, and as expected, we ran into software/encoder issues that prevented successful runs. Procyon, for example, didn’t even understand that it had Tensor GPUs available to do the test.

Again we need to reiterate that this was simply a test of an unsupported configuration, and the fact that we got anything working at all is impressive and a testament to Dell’s ability to produce consistent hardware across platforms. Using Windows Server on this type of system in any production environment would be unwise.

Test Result
Cinebench Multi 90,710
Cinebench Single 174
CB MP Ration 77.24
Geekbench 6 GPU 197,669
Geekbench 6 Single 1,678
Geekbench6 Multi 16,425
Monster:
855.080461 samples per minute
Junkshop:
546.636998 samples per minute
Classroom:
394.441850 samples per minute
GPU Pi 3.2 1Billion 0.394 Seconds
GPU Pi 3.3 1Billion 0.317 Seconds

It’s worth noting that such impressive performance doesn’t just translate into speed but also impacts the practical application of AI. With faster retraining and finetuning ability, businesses can significantly improve their agility, allowing them to swiftly respond to market changes, customer needs, and internal requirements. For instance, design teams can evaluate and refine concepts in real-time, significantly reducing time to market, or compliance teams can continually train and finetune updated models with the latest policies and procedures for an assistant-type LLM.

Transforming Operations Using Generative AI

The Dell PowerEdge XE9680 can facilitate generative AI in transforming several industry operations. Imagine a retail scenario where AI can swiftly generate realistic images of various product configurations or color options based on customer preferences or a construction business creating visualizations of new buildings for planning and sales presentations. The possibilities are fascinating.

The Dell PowerEdge XE9680 offers a fantastic blend of power, efficiency, and versatility. It is a high-performance server that can handle demanding AI workloads with ease. And while it is specifically designed to cater to AI workloads, its capabilities give it the potential to be useful for other applications.

Coupled with Dell’s commitment to aiding organizations in executing their AI projects via Project Helix, the PowerEdge XE9680 is an exciting proposition and one of the best hardware innovations we’ve seen all year. With its ability to deliver raw power, refined finesse and enterprise-grade lifecycle management, it’s no wonder that the Dell PowerEdge XE9680 is fast becoming an aspirational favorite among enterprise servers.

Dell XE9680

NVIDIA A100

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed

Jordan Ranous

AI Specialist; navigating you through the world of Enterprise AI. Writer and Analyst for Storage Review, coming from a background of Financial Big Data Analytics, Datacenter Ops/DevOps, and CX Analytics. Pilot, Astrophotographer, LTO Tape Guru, and Battery/Solar Enthusiast.

Recent Posts

Maximize Storage Efficiency with Dell PowerStore 3200Q

The Dell PowerStore 3200Q brings QLC SSDs to the platform and with Dell's migration tools, stands as a great storage…

10 hours ago

HP FX700 SSD Review

The HP FX700 Gen4 NVMe SSD tries to strike a balance between cost and performance but ultimately misses the mark.…

4 days ago

CoolIT Direct Liquid Cooling – Seamless Efficiency for Liquid-Cooled Servers

CoolIT has coldplates, manifolds, and cooling distribution units designed to help enterprises adopt liquid cooling for power-hungry servers. (more…)

1 week ago

Corsair MP700 PRO SE Gen5 SSD Review (4TB)

The Corsair MP700 PRO SE is a refresh of the MP700 PRO, with some tweaks that gain it significant performance…

2 weeks ago

How it Works: Proxmox Import Wizard VMware Migration Tool

The Proxmox Import Wizard is a new way to migrate VMs from ESXi to Proxmox, providing users with a smooth…

2 weeks ago

Supermicro AS-1115SV-WTNRT Server Review (AMD EPYC 8004)

The Supermicro AS-1115SV-WTNRT features AMD EPYC 8804 CPUs that offer up to 64 cores with an efficient 200W TDP. (more…)

2 weeks ago