November 18th, 2019 by Adam Armstrong
AMD Makes Several Announcements At SC19
Today at SC19 in Denver, AMD made a slew of announcements in HPC. The company announced new EC2 instances with AWS. And, as we’ve previously covered, GIGABYTE announced five new servers that leverage 2nd generation AMD EPYC, bringing their total to 28 servers.
Since the release of AMD EPYC Rome CPUs, the industry has seen a pretty good adoption. The new CPUs came out and shattered world records and can deliver better performance in a single socket versus its competition’s dual socket setup. A big advantage of the AMD EPYC Rome is that it allows servers to leverage PCIe 4.0 devices. In a world that is leveraging GPUs more and more, this is a huge leg up over the competition. With these Advantages, AMD is pushing its way into the HPC/supercomputing market where the above will be leveraged quickly. PCIe 4.0 devices that can now be leveraged include:
- Broadcom Thor NIC for 200 GB ethernet.
- Mellanox ConnectX-6 NIC showing ~400 GB/s InfiniBand performance.
- Samsung Gen4 PM1733 NVME SSD – Showcasing 2x of IOPS over the Samsung Gen3 SSD.
- Xilinx Alveo U50, U280 FPGAs.
AMD is expanding its footprint in the cloud with new AWS EC2 Compute-Optimized Instances, C5a and C5ad. Powered by the EPYC Rome CPUs, the C5a and C5ad will come in 8 virtualized sizes with up to 96 vCPUs which will provide additional choices to help customers optimize both cost and performance for a variety of compute intensive workloads, including batch processing, distributed analytics, and web applications. C5a and C5ad will both be available in bare metal variants c5an.metal and c5adn.metal) that will offer twice as much memory and double the vCPU count of comparable instances. The new instances will be available soon across multiple AWS regions.
More cloud news, Microsoft Azure is previewing Azure HBv2 virtual machines for high-performance computing. These VMs leverage AMD EPYC 7742 processor. Azure and AMD state that these VMs can give customers supercomputer performance, supporting 200Gbps HDR InfiniBand, and up to 80,000 cores for a single job.
AMD released version 3.0 of its ROCm Open Software Platform. New features include:
- Introduction of ROCm 3.0 with new innovations to support HIP-clang – a compiler built upon LLVM, improved CUDA conversion capability with hipify-clang, library optimizations for both HPC and ML.
- ROCm upstream integration into leading TensorFlow and PyTorch machine learning frameworks for applications like reinforcement learning, autonomous driving, and image and video detection.
- Expanded acceleration support for HPC programing models and applications like OpenMP programing, LAMMPS, and NAMD.
- New support for system and workload deployment tools like Kubernetes, Singularity, SLURM, TAU and others.