Enterprise

OCP’s Hyperscale NVMe Boot SSD Specification Brings Performance, Affordability and Efficiency

There has been a lot of discussion about the issues surrounding boot drives in the hyperscale space over the past several years. While hyperscalers don’t want to spend too much on them, they need a basic minimum performance threshold among other necessary specifications.

There has been a lot of discussion about the issues surrounding boot drives in the hyperscale space over the past several years. While hyperscalers don’t want to spend too much on them, they need a basic minimum performance threshold among other necessary specifications.

There’s also the question of who’s actually going to continue to make small-capacity M.2 NVMe boot drives, as the enterprise SSD manufacturers have mostly exited this space. Solidigm has no modern M.2 boot drive in its storage portfolio, and most modern options offered by Samsung, KIOXIA, and Micron have cost concerns due to their high capacity. Then there’s the performance need. While not great, a boot drive still needs to reliably produce a minimum result.

As you can see in the chart below, the capacity of both boot and data drives is continuously growing, which means more expenditures for organizations.

Hyperscale NVMe Boot drive Requirements and Hurdles

At the OCP Summit, these issues were discussed during a presentation from Google and Meta representatives and, most importantly, what they are doing to address them.

An example of a hyperscale boot SSD was shown representing activity for one day, which included I/O reads and writes as well as TRIM transactions. Most noticeable is the high throughput of TRIMs, which demonstrates short-lived data (created and then deleted soon after). If not properly designed, TRIMs will lead to latency stalls and interfere with read and write traffic. Most of the traffic is also random reads and writes.

Some hurdles that Hyperscale NVMe Boot faces include:

  • Ultimately, hyperscale workloads are sensitive to latency, so sustained performance is very important in order to bring an efficient user experience.
  • It is also challenging to debug at scale, so having detailed monitoring metrics is paramount for both predicting and detecting failures.
  • Endurance is very important for boot SSDs. After you have finalized your system (which might take some time), having boot drives with high endurance will allow them to last as long as the entire product life cycle. This also will help eliminate the need to repair and prevent early wear out.
  • Most importantly, hyperscale customers put high importance on privacy and security and it is sometimes difficult to meet all of these standards.

This is quite a diverse range of issues, so addressing these can be a complicated process if not done correctly.

Addressing the Issues Facing Boot Drives

As is the main drive and purpose behind OCP, the only way to solve these issues is through collaboration and open specifications. As such, Meta and Google have joined forces to combine requirements and create the Hyperscale NMEe Boot SSD Specification (Version 1.0), marking an important milestone for boot drive efficiency. It was submitted early this year and is available through the OCP website.

There are many benefits that come with these specifications. Ultimately, it allows the market to better understand features that hyperscalers need and use for their boot devices and ensures that they are getting industry alignment on SSD boot drive adoption. In addition, it gives organizations open-source tools to manage boot SSD, which leads to the development of 3rd-party test suites that can meet all requirements.

During the session, they also indicated there are two ways you approach creating a hyperscale boot SSD. Either downgrade an enterprise-class SSD or upgrade a consumer-grade SSD, as its requirements lay somewhere in the middle of these two spaces.

Boot Bench

We’ve started to add a boot bench performance section in our SSD reviews, which is a workload profile adopted by OCP to gauge SSDs that are designed for server boot duty.  This boot workload executes a relatively intense test plan that fills the drive entirely with writes before testing a read-heavy workload sequence.

For each test, it performs a 32K random read async operation alongside a 15MiB/s synchronous 128k random write as well as a 5MiB/s synchronous 128k random write/trim background workload. The script starts with the random-read activity at a 4-job level and scales up to 256-jobs at its peak. The final result is the read-operations performed during its peak run.

The OCP goal for this benchmark is a pass/fail at 60K read IOPS. Most drives we test will far exceed the minimum, but the results are instructive regardless. What was most interesting in our testing is that we were able to far exceed the IOPS threshold with performance-oriented NVMe SSD models, but not slower passing SSDs. Many slower SSD models seem to easily fall into the non-passing category, although we did get a 970 EVO Plus 2TB model to report a slower non-qualifying speed.

SSD Read IOPS
Sk hynix Platinum P41 220,884 IOPS
WD SN850X 219,883 IOPS
Solidigm P44 Pro 211,999 IOPS
Fantom VENOM8 190,573 IOPS
Samsung 990 Pro 176,677 IOPS
Sabrent Rocket 4 Plus 162,230 IOPS
Samsung 970 EVO Plus 2TB 52,005 IOPS
Corsair MP600 GS DNF
Solidigm P41 Plus DNF

Hyperscale Workload Use Case Example

During the session, they also compared two different drives: one that is more generic and one that conforms more to the OCP hyperscale specifications. Overall, they found that there was a dramatic improvement in latency across the board with the latter drive, which is something very important to the hyperscale space.

In the real world, this means that there is a noticeable improvement in time to market when trying to deploy a drive more attuned to the specifications.

Moving Forward with the OCP Hyperscale NVMe Boot SSD Specifications

While some companies were previously creating their own hyperscale boot drives to suit their own specific needs (and specific requests from their customers), these specifications were not shared among the industry. This resulted in vendors having to produce their own customized hardware/firmware in order to meet the needs of their customers.

We’ve come a long way since then, as OCP has now made version 1.0 of their Hyperscale NVMe Boot SSD Specification officially available. This allows system makers and SSD providers to align on a common set of requirements all the while encouraging further collaboration.

OCP is calling all OEMs (i.e., system manufacturers) and hyperscalers to join the cause and promises to keep evolving and improving the specifications as the storage landscape changes.

Impact on the Enterprise

The boot drive need is not restricted to Hyperscale use cases. M.2 is the defacto boot drive standard now for most servers and storage arrays. While the drive doesn’t have to do a ton in most cases, it does need to be reliable, somewhat performant and no larger (more expensive) than is absolutely required. Hopefully we will see storage vendors respond to this initiative with a boot-specific SSD fo infrastructure providers can maintain some degree of standardization.

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed

Lyle Smith

Lyle is a staff writer for StorageReview, covering a broad set of end user and enterprise IT topics.

Recent Posts

VergeIO Adds Site Manager and VergeOS Edge Edition with Atria Release

VergeIO has announced VergeOS Atria, its latest data center operating system release. This new release allows organizations to add edge…

1 day ago

Lenovo and WEKA Collaborate to Accelerate Next-Gen AI and Analytics

Lenovo and WEKA have announced a global agreement focused on solutions for next-generation data management architectures and performance-intensive workloads. The…

2 days ago

New Portfolio of Programs and Guarantees With NetApp Advance

NetApp AFF customers are getting the help needed to cost-effectively future-proof on-premises environments with NetApp Advance, a new portfolio of…

2 days ago

NetApp High-Capacity, Low-Cost Flash Storage Announced

NetApp announced the AFF C-Series family of all-flash QLC-based storage options that offer lower-cost, all-flash storage and the AFF A150…

2 days ago

Pure Storage Purity//FB 4.1 Update Released

Pure Storage has released Purity//FB 4.1 for FlashBlade, Pure's unified fast file and object unstructured data platform. The new version…

2 days ago

We Set a y-cruncher Record Benchmark with AMD EPYC Genoa

We recently reviewed AMD's 4th generation server processor, commonly referred to as AMD EPYC Genoa. While we saw impressive performance…

2 days ago