There has been a lot of discussion about the issues surrounding boot drives in the hyperscale space over the past several years. While hyperscalers don’t want to spend too much on them, they need a basic minimum performance threshold among other necessary specifications.
There has been a lot of discussion about the issues surrounding boot drives in the hyperscale space over the past several years. While hyperscalers don’t want to spend too much on them, they need a basic minimum performance threshold among other necessary specifications.
There’s also the question of who’s actually going to continue to make small-capacity M.2 NVMe boot drives, as the enterprise SSD manufacturers have mostly exited this space. Solidigm has no modern M.2 boot drive in its storage portfolio, and most modern options offered by Samsung, KIOXIA, and Micron have cost concerns due to their high capacity. Then there’s the performance need. While not great, a boot drive still needs to reliably produce a minimum result.
As you can see in the chart below, the capacity of both boot and data drives is continuously growing, which means more expenditures for organizations.
At the OCP Summit, these issues were discussed during a presentation from Google and Meta representatives and, most importantly, what they are doing to address them.
An example of a hyperscale boot SSD was shown representing activity for one day, which included I/O reads and writes as well as TRIM transactions. Most noticeable is the high throughput of TRIMs, which demonstrates short-lived data (created and then deleted soon after). If not properly designed, TRIMs will lead to latency stalls and interfere with read and write traffic. Most of the traffic is also random reads and writes.
Some hurdles that Hyperscale NVMe Boot faces include:
This is quite a diverse range of issues, so addressing these can be a complicated process if not done correctly.
As is the main drive and purpose behind OCP, the only way to solve these issues is through collaboration and open specifications. As such, Meta and Google have joined forces to combine requirements and create the Hyperscale NMEe Boot SSD Specification (Version 1.0), marking an important milestone for boot drive efficiency. It was submitted early this year and is available through the OCP website.
There are many benefits that come with these specifications. Ultimately, it allows the market to better understand features that hyperscalers need and use for their boot devices and ensures that they are getting industry alignment on SSD boot drive adoption. In addition, it gives organizations open-source tools to manage boot SSD, which leads to the development of 3rd-party test suites that can meet all requirements.
During the session, they also indicated there are two ways you approach creating a hyperscale boot SSD. Either downgrade an enterprise-class SSD or upgrade a consumer-grade SSD, as its requirements lay somewhere in the middle of these two spaces.
We’ve started to add a boot bench performance section in our SSD reviews, which is a workload profile adopted by OCP to gauge SSDs that are designed for server boot duty. This boot workload executes a relatively intense test plan that fills the drive entirely with writes before testing a read-heavy workload sequence.
For each test, it performs a 32K random read async operation alongside a 15MiB/s synchronous 128k random write as well as a 5MiB/s synchronous 128k random write/trim background workload. The script starts with the random-read activity at a 4-job level and scales up to 256-jobs at its peak. The final result is the read-operations performed during its peak run.
The OCP goal for this benchmark is a pass/fail at 60K read IOPS. Most drives we test will far exceed the minimum, but the results are instructive regardless. What was most interesting in our testing is that we were able to far exceed the IOPS threshold with performance-oriented NVMe SSD models, but not slower passing SSDs. Many slower SSD models seem to easily fall into the non-passing category, although we did get a 970 EVO Plus 2TB model to report a slower non-qualifying speed.
SSD | Read IOPS |
Sk hynix Platinum P41 | 220,884 IOPS |
WD SN850X | 219,883 IOPS |
Solidigm P44 Pro | 211,999 IOPS |
Fantom VENOM8 | 190,573 IOPS |
Samsung 990 Pro | 176,677 IOPS |
Sabrent Rocket 4 Plus | 162,230 IOPS |
Samsung 970 EVO Plus 2TB | 52,005 IOPS |
Corsair MP600 GS | DNF |
Solidigm P41 Plus | DNF |
During the session, they also compared two different drives: one that is more generic and one that conforms more to the OCP hyperscale specifications. Overall, they found that there was a dramatic improvement in latency across the board with the latter drive, which is something very important to the hyperscale space.
In the real world, this means that there is a noticeable improvement in time to market when trying to deploy a drive more attuned to the specifications.
While some companies were previously creating their own hyperscale boot drives to suit their own specific needs (and specific requests from their customers), these specifications were not shared among the industry. This resulted in vendors having to produce their own customized hardware/firmware in order to meet the needs of their customers.
We’ve come a long way since then, as OCP has now made version 1.0 of their Hyperscale NVMe Boot SSD Specification officially available. This allows system makers and SSD providers to align on a common set of requirements all the while encouraging further collaboration.
OCP is calling all OEMs (i.e., system manufacturers) and hyperscalers to join the cause and promises to keep evolving and improving the specifications as the storage landscape changes.
The boot drive need is not restricted to Hyperscale use cases. M.2 is the defacto boot drive standard now for most servers and storage arrays. While the drive doesn’t have to do a ton in most cases, it does need to be reliable, somewhat performant and no larger (more expensive) than is absolutely required. Hopefully we will see storage vendors respond to this initiative with a boot-specific SSD fo infrastructure providers can maintain some degree of standardization.
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed
NVIDIA, the graphical processing unit titan, has just unveiled its latest innovations with the release of the RTX A400 and…
AMD has launched its Ryzen PRO 8000 series processors, representing a breakthrough in business computing technology. These processors aim to…
Western Digital has announced a comprehensive expansion of its SanDisk and WD product lineup at this year's NAB show. This…
In advance of the NAB Show kicking off in Las Vegas, NV, Other World Computing (OWC) has announced the addition…
Meta has announced advancements in its AI infrastructure by unveiling the next-generation Meta Training and Inference Accelerator (MTIA). This development…
Supermicro has announced its X14 server portfolio with future support for the Intel Xeon 6 processor. The combination of Supermicro's…