December 11th, 2018 by Michael Rink
IBM Announces Spectrum Scale On NVIDIA DGX-1 Server
IBM recently announced a converged appliance developed with NVIDIA, the IBM Spectrum AI with NVIDIA DGX. The Spectrum DGX combines NVIDIA’s DGX-1 server with IBM’s Spectrum Scale on all-flash.
IBM’s Spectrum Scale is a well-known cluster file system. Originally called IBM General Parallel File System (GPFS) it was renamed to Spectrum Scale three years ago. It provides concurrent access to a single file system or set of file systems from multiple nodes. It supports nodes that are SAN attached, network attached, a mixture of SAN attached and network attached, or in a nothing shared cluster configuration. On each node in the cluster, IBM Spectrum Scale consists of three basic components: administration commands, a kernel extension, and a multithreaded daemon. These three components combine to provide a global namespace, shared file system access among IBM Spectrum Scale clusters, simultaneous file access from multiple nodes, high recoverability and data availability through replication, and the ability to make changes while a file system is mounted.
NVIDIA targets the DGX-1 at AI research, and it is impressively well suited to that task. The system has eight Tesla V100 GPUs that provide a total of 256GB GPU memory. The CPU is typically a dual-core Intel Xeon. With 40,960 CUDA cores and 5,120 Tensor cores, this beast will be able to handle any AI model you throw at it. Four 1.92 TB SSDs in a RAID 0 setup provide storage. NVIDIA also preloads software onto the DGX designed to wring the most performance possible from their GPUS.
Combined they provide a complete solution of systems and software. IBM Spectrum AI with NVIDIA is designed for data science productivity and IT simplicity. Software-defined, IBM SpectrumAI with NVIDIA DGX can be configured to meet your current and growing business requirements. IBM Spectrum Scale can be deployed in configurations from a single IBM Elastic Storage Server (ESS), to support a few NVIDIA DGX-1 servers, to full rack of 9 servers with 72 Tesla V100 Tensor Core GPUs to multi-rack configurations. Using a full rack, IBM SpectrumAI with NVIDIA DGX has demonstrated 120GB/s of data throughput to support multiple user and multiple models simultaneously.