April 20th, 2015 by Adam Armstrong
Methods of SMR Data Management
SMR uses a mapping system for LBAs that want to get written randomly to only writing them sequentially. Similar to SSD’s Flash Translation Layer (FTL), SMR HDDs use what is sometimes called a SMR (or Shingle) Translation Layer (STL), which is a similar concept. With SMR, however, much more is to be gained by making the host aware of the underlying SMR technology. The industry is in the final stages of the standardization process for SMR with ZBC (Zoned Block Commands) being the standard for SAS and ZAC (Zoned ATA Commands) being the standard for SATA. These standards define a Zoned Block Device in which the LBA space is divided into independent Zones. Within each Zone, writes should be sequential. In order to overwrite data, it is necessary to first reset the zone, similar to an erase block in an SSD. What happens when non-sequential writes are sent to a Zone varies depending on the type of SMR implementation.
There are three categories SMR drives fall into, or more accurately, three types of management drive vendors can employ. Each has its own set of advantages and disadvantages.
The first type is known as drive managed, also known as transparent. Simply put, the SMR drive manages all requests from the host, like a traditional HDD today. Drive managed has the advantage of not needing a host that is SMR aware, the drive managed SMRs are compatible with almost everything making them the most simple to deploy. The zoned nature of the underlying SMR HDD is completely hidden from the host. This is the type of SMR management we expect to see generally available in the initial consumer market release as there are no commercially available OSs or file systems that support SMR drives as of this writing. However as more testing is done, and SMR technology becomes more pervasive, we will see widely available OSs and software stacks that support SMR.
The downside to drive managed is that performance is unpredictable as the drive handles its background processes when it needs to, regardless of the IO requests. Additionally, since inbound random writes are not coalesced into sequential writes on the host side, the drive is under more duress, and thus lower performing in sustained workloads, than it would be if the host was SMR aware. Drive managed SMR drives cope with these shortcomings by leveraging a “landing zone” of sorts, where random writes can be managed before being written to disk. The ways of incorporating this space on SMR drives can vary widely though, leading to significantly different performance profiles depending on the target market of each drive and manufacturer.
The next type of management is known as host managed. With this type of management the host uses commands and Zone information to optimize the behavior of the SMR drive by managing IOs to ensure writes are always sequential within a Zone. If a host sends a non-sequential write within a Zone the drive will reject it and return an error. This gives the drive more predictable performance and would be more likely to be seen initially in enterprise and hyperscale applications.
The down side to host managed is that the SMR drives are not compatible with host systems (HBAs, device drivers, filesystems, databases, etc) that are not SMR aware. That means file systems need to be adapted to support SMR drives. This is occurring, first in the hyperscale space where the largest players in the world have the ability to modify their storage stacks to account for SMR, and now also in the mainstream open source space. The xfs maintainer, Dave Chinner, published a document outlining the SMR optimizations for xfs during the Linux Vault conference in Boston in early March. In the same event, Hannes Rienecke of Suse presented a Zone caching mechanisms that can allow current filesystems to work with host managed SMR drives. It’s likely that these investments together with an appetite for capacity will encourage others to adopt the new open source solutions and pursue modifications to their systems to support SMR drives.
The final type of management is known as host aware. In a nutshell, host aware is a combination of the two types of management above. The SMR drive is self managed but it also implements the new ZBC/ZAC standards and allows the host to use the new command set to optimize the drive behavior. In this case if the drive receives a non-sequential write from the host it will accept the request but then the performance from the request can be unpredictable. Host aware has the advantage of being backward compatible and gives the host some control. Host aware is likely to be the model of choice for most client and traditional enterprise systems, taking over all drive managed deployments, while host managed is starting to appear as the choice for modern distributed storage solutions.