March 8th, 2017 by Adam Armstrong
Facebook Refreshes Its Server Hardware Fleet
Today at the Open Compute Summit (OCP) 2017, Facebook is announcing an end-to-end refresh of its server hardware fleet. This announcement includes all new storage chassis, GPU servers, and compute servers, as well as the latest version of Yosemite, Facebook’s multi-node compute platform. All design specifications are available through the Open Compute Project site, and a comprehensive set of hardware design files for all systems will be released.
Facebook has been and still is evolving. The social media and social networking service still holds its status and photo sharing capabilities that shot it to success; since its beginning, Facebook has added several other features. A few years ago Facebook acquired the photo and video sharing service Instagram. Today Facebook has 100 million hours of video watched daily, 95 million photos and videos being pasted to Instagram, and over 400 million people using voice and video chat on Facebook Messenger. To keep up with this pace and keep customers happy, Facebook needs to refresh its server fleet to improve performance and scale.
With storage, Facebook is replacing the Open Vault chassis it has been using since 2013. The new chassis, Bryce Canyon, is designed for high-density storage for data such as photos and videos. The new Bryce Canyon is a 4OU (open rack units) that support up to 72, 3.5” HDDs (12 Gb SAS/6 Gb SATA), 20% higher density than the previous design. Bryce Canyon has a modular design making it flexible to become different configurations such as a JBOD or a powerful storage server. As far as being a storage server, the new platform supports more powerful processors and up to four times the memory footprint of the previous storage device.
All of those photos and videos need a great GPU system powering them. Facebook is replacing its Big Sur GPU server with the new Big Basin. Big Basin is a JBOG (just a bunch of GPUs), designed to disaggregate CPU compute form GPUs. The separate design means that Big Basin will need separate compute and networking blocks, but it also means the they can be scale independently. Big Basin can support eight GPUs (NVIDIA Tesla P100 GPU accelerators). It has seen a memory increase from 12GB to 16GB and nearly a 100% increase in throughput over Big Sur.
With the modular design of the above, there needs to be some sort of head unit for compute. Facebook has been using Leopard for a variety of compute services. Today they are introducing Tioga Pass, which has a dual-socket motherboard, which uses the same 6.5” by 20” form factor and supports both single-sided and double-sided designs. Tioga Pass can maximize memory configurations through its double-sided design, having DIMMs on both sides. Facebook has also upgraded the mSATA connecter with M.2 slots that support NVMe. The PCIe slots have been upgrade from x24 to x32, allowing for two x16 slots or one x16 slot and two x8 slots. This doubles the available PCIe bandwidth and makes the server a more flexible head unit for both Big Basin and the Lightning JBOF. Facebook has added a 100G NIC as well to enable higher-bandwidth access to flash.
Facebook has also updated its multi-node compute platform, Yosemite. The latest version uses a new 4 OU vCubby chassis design, however it is still compatible with Open Rack v2. Each cubby supports four 1S server cards or two servers plus two device cards and each of the four servers can connect to either a 50G or 100G multi-host NIC. Another new feature that is interesting is hot service support, no longer do servers need to be taken offline to be serviced. The new Yosemite can support both Mono Lake and Twin Lake 1S servers and offers support for the Glacier Point SSD carrier card and Crane Flat de- vice carrier card.
Specifications for each of the above will be available on the Open Compute Project site.