December 5th, 2018 by Lyle Smith
Memblaze PBlaze5 910 AIC NVMe SSD Review
The Memblaze PBlaze5 910 is a high-performing NVMe SSD designed to significantly enhance application performance. Equipped with 64-layer 3D NAND, the PBlaze5 features massive single-disk capacity up to 15.36TB, which is 40% more than previous models. Memblaze indicates that this will allow their new SSD to increase rack density in storage server applications, saving space and lowering energy costs. Adding ever more energy efficiency to data centers, the 910 series is also reported to increase performance by 38% per watt and supports 16 power mode settings, ranging from 10W to 25W.
As far as performance goes, Memblaze quotes the 910 series with up to 6GB/s and 3.8GB/s for reads and writes, respectively, and 1 million IOPS read and 135,000 IOPS write in sustained random throughput.
The PBlaze5 910 series supports AES 256 Data Encryption, Full Data Path Protection, and Enhanced Power Failure protection to help ensure the integrity of enterprise application data. It also features a dual-port function, essentially eliminating the single-path failure problem, as both ports can be accessed simultaneously.
Memblaze PBlaze5 910 NVMe SSD Specifications
|Form factor||HHHL AIC|
|Interface||PCIe 3.0 x 8|
|Sequential Read (128KB)||5.5GB/s||6.0GB/s|
|Sequential Write (128KB)||3.1GB/s||3.8GB/s|
|Sustained Random Read (4KB)||850K IOPS||1 Million IOPS|
|Sustained Random Write (4KB)||99K IOPS||135K IOPS|
|MTBF||2 million hours|
Our Enterprise SSD reviews leverage a Lenovo ThinkSystem SR850 for application tests and a Dell PowerEdge R740xd for synthetic benchmarks. The ThinkSystem SR850 is a well-equipped quad-CPU platform, offering CPU power well in excess of what's needed to stress high-performance local storage. Synthetic tests that don't require a lot of CPU resources use the more traditional dual-processor server. In both cases, the intent is to showcase local storage in the best light possible that aligns with storage vendor maximum drive specs.
Lenovo ThinkSystem SR850
- 4 x Intel Platinum 8160 CPU (2.1GHz x 24 Cores)
- 16 x 32GB DDR4-2666Mhz ECC DRAM
- 2 x RAID 930-8i 12Gb/s RAID Cards
- 8 NVMe Bays
- VMware ESXI 6.5
Dell PowerEdge R740xd
- 2 x Intel Gold 6130 CPU (2.1GHz x 16 Cores)
- 16 x 16GB DDR4-2666MHz ECC DRAM
- 1x PERC 730 2GB 12Gb/s RAID Card
- Add-in NVMe Adapter
Testing Background and Comparables
The StorageReview Enterprise Test Lab provides a flexible architecture for conducting benchmarks of enterprise storage devices in an environment comparable to what administrators encounter in real deployments. The Enterprise Test Lab incorporates a variety of servers, networking, power conditioning, and other network infrastructure that allows our staff to establish real-world conditions to accurately gauge performance during our reviews.
We incorporate these details about the lab environment and protocols into reviews so that IT professionals and those responsible for storage acquisition can understand the conditions under which we have achieved the following results. None of our reviews are paid for or overseen by the manufacturer of equipment we are testing. Additional details about the StorageReview Enterprise Test Lab and an overview of its networking capabilities are available on those respective pages.
Comparables for this review:
- Memblaze PBlaze5 3.2TB
- Memblaze PBlaze4 3.2TB
- Intel P3700 2TB
- Intel P4500 2TB
- HGST SN100 3.2TB
- Toshiba PX04 1.6TB
Application Workload Analysis
In order to understand the performance characteristics of enterprise storage devices, it is essential to model the infrastructure and the application workloads found in live-production environments. Our benchmarks for the Memblaze PBlaze5 910 are therefore the MySQL OLTP performance via SysBench and Microsoft SQL Server OLTP performance with a simulated TCP-C workload. For our application workloads, each drive will be running 2-4 identically configured VMs.
SQL Server Performance
Each SQL Server VM is configured with two vDisks: 100GB volume for boot and a 500GB volume for the database and log files. From a system-resource perspective, we configured each VM with 16 vCPUs, 64GB of DRAM and leveraged the LSI Logic SAS SCSI controller. While our Sysbench workloads tested previously saturated the platform in both storage I/O and capacity, the SQL test is looking for latency performance.
This test uses SQL Server 2014 running on Windows Server 2012 R2 guest VMs, and is stressed by Quest's Benchmark Factory for Databases. StorageReview’s Microsoft SQL Server OLTP testing protocol employs the current draft of the Transaction Processing Performance Council’s Benchmark C (TPC-C), an online transaction-processing benchmark that simulates the activities found in complex application environments. The TPC-C benchmark comes closer than synthetic performance benchmarks to gauging the performance strengths and bottlenecks of storage infrastructure in database environments. Each instance of our SQL Server VM for this review uses a 333GB (1,500 scale) SQL Server database and measures the transactional performance and latency under a load of 15,000 virtual users.
SQL Server Testing Configuration (per VM)
- Windows Server 2012 R2
- Storage Footprint: 600GB allocated, 500GB used
- SQL Server 2014
- Database Size: 1,500 scale
- Virtual Client Load: 15,000
- RAM Buffer: 48GB
- Test Length: 3 hours
- 2.5 hours preconditioning
- 30 minutes sample period
For our SQL Server transactional benchmark, the Memblaze PBlaze5 910 AIC boasted the top performance with 12,645.1 TPS.
The 910 AIC model also had the lowest latency at just 1.5ms, which was half the latency of the second place Huawei ED3000.
The next application benchmark consists of a Percona MySQL OLTP database measured via SysBench. This test measures average TPS (Transactions Per Second), average latency, and average 99th percentile latency as well.
Each Sysbench VM is configured with three vDisks: one for boot (~92GB), one with the pre-built database (~447GB), and the third for the database under test (270GB). From a system-resource perspective, we configured each VM with 16 vCPUs, 60GB of DRAM and leveraged the LSI Logic SAS SCSI controller.
Sysbench Testing Configuration (per VM)
- CentOS 6.3 64-bit
- Percona XtraDB 5.5.30-rel30.1
- Database Tables: 100
- Database Size: 10,000,000
- Database Threads: 32
- RAM Buffer: 24GB
- Test Length: 3 hours
- 2 hours preconditioning 32 threads
- 1 hour 32 threads
With the Sysbench transactional benchmark, the AIC 910 continued its impressive performance with 9,190.7 TPS, which placed it at the top of the leaderboard.
With Sysbench average latency, the AIC 910 again was the top performer with 13.9ms.
In our worst-case scenario latency benchmark, the AIC 910 found itself at the top of the pack with just 25.9ms.
Houdini by SideFX
The Houdini test is specifically designed to evaluate storage performance as it relates to CGI rendering. The test bed for this application is a variant of the core Dell PowerEdge R740xd server type we use in the lab with dual Intel 6130 CPUs and 64GB DRAM. In this case, we installed Ubuntu Desktop (ubuntu-16.04.3-desktop-amd64) running bare metal. Output of the benchmark is measured in seconds to complete, with fewer being better.
The Maelstrom demo represents a section of the rendering pipeline that highlights the performance capabilities of storage by demonstrating its ability to effectively use the swap file as a form of extended memory. The test does not write out the result data or process the points in order to isolate the wall-time effect of the latency impact to the underlying storage component. The test itself is composed of five phases, three of which we run as part of the benchmark, which are as follows:
- Loads packed points from disk. This is the time to read from disk. This is single threaded, which may limit overall throughput.
- Unpacks the points into a single flat array in order to allow them to be processed. If the points do not have dependency on other points, the working set could be adjusted to stay in-core. This step is multi-threaded.
- (Not Run) Processes the points.
- Repacks them into bucketed blocks suitable for storing back to disk. This step is multi-threaded.
- (Not Run) Writes the bucketed blocks back out to disk.
With the Houdini test, the 910 AIC placed mid to lower end with 3,077.7 seconds, which was right above the U.2 model and among the Memblaze products.
VDBench Workload Analysis
When it comes to benchmarking storage devices, application testing is best, and synthetic testing comes in second place. While not a perfect representation of actual workloads, synthetic tests do help to baseline storage devices with a repeatability factor that makes it easy to do apples-to-apples comparison between competing solutions. These workloads offer a range of different testing profiles ranging from "four corners" tests, common database transfer size tests, to trace captures from different VDI environments. All of these tests leverage the common vdBench workload generator, with a scripting engine to automate and capture results over a large compute testing cluster. This allows us to repeat the same workloads across a wide range of storage devices, including flash arrays and individual storage devices. Our testing process for these benchmarks fills the entire drive surface with data, then partitions a drive section equal to 25% of the drive capacity to simulate how the drive might respond to application workloads. This is different than full entropy tests which use 100% of the drive and takes them into steady state. As a result, these figures will reflect higher-sustained write speeds.
- 4K Random Read: 100% Read, 128 threads, 0-120% iorate
- 4K Random Write: 100% Write, 64 threads, 0-120% iorate
- 64K Sequential Read: 100% Read, 16 threads, 0-120% iorate
- 64K Sequential Write: 100% Write, 8 threads, 0-120% iorate
- Synthetic Database: SQL and Oracle
- VDI Full Clone and Linked Clone Traces
In our first VDBench Workload Analysis, Random 4K Read, the Memblaze PBlaze5 910 AIC was able to stay under 1ms throughout the test with a peak of 814,640 IOPS and a latency of 155.9μs, placing well into first place.
Again, 4K random write performance again showed sub-millisecond latency throughout. The 910 AIC placed second among the test drives (the PBlaze 900 taking first) with a peak performance of 550,864 IOPS and a latency of 229.3μs.
Switching over to sequential workloads, the 910 AIC placed at the top of the pack in 64K sequential reads with a peak score of 50,372 IOPS or 3.14GB/s with a latency of 317μs.
Changing places with the PBlaze 900, the 910 AIC peaked at 42,563 IOPS or 2.66GB/s with a latency of 370μs in the 64K sequential write.
Moving on to SQL workloads, the 910 AIC peaked at 270,432 IOPS with only 117.8μs latency, which placed it at the top of the leaderboard once again.
SQL 90-10 saw the 910 retained top spot by a large margin with a peak score of 273,321 IOPS and a latency of 116.6μs.
The 910 AIC peaked at 277,815 IOPS with a latency of 114.μs in the SQL 80-20 benchmark keeping it at the top of the leaderboard.
In our Oracle workload, the 910 AIC continued its dominance with a peak score of 282,326 IOPS and a latency of 126.2μs.
For Oracle 90-10, the 910 AIC showed a peak of 202,695 IOPS and a latency of 108μs for top spot.
In 80-20, the 910 ended its Oracle testing with an impressive peak performance of 210,228 IOPS and a latency of 104μs.
Next, we moved on to our VDI clone benchmark, Full and Linked, where the 910 AIC showed top performance in most of the tests. For VDI Full Clone Boot, the 910 AIC had a peak performance of 219,337 IOPS and a latency of 158.2μs.
VDI FC Initial Login saw the 910 AIC with a peak performance of 150,121 IOPS and a latency of 197μs for top spot.
With VDI FC Monday Login, the 910 AIC ended in 2nd place just behind the Huawei with 101,128 IOPS and a latency of 156.3μs.
Switching over to Linked Clone (LC), we first looked at the boot test. In this scenario, the 910 AIC came in first place with 98,284 IOPS and a latency of 161.3μs.
VDI LC Initial Login showed virtually identical performance between the 910 AIC and the Huawei with 55,061 IOPS and a latency of 143.4μs.
In the VDI LC Monday Login test, the Huawei and 910 AIC again showed neck-and-neck performance with a peak of 77,721 IOPS and a latency of 203.8μs, again placing fourth.
Consisting of both U.2 and AIC form-factor models, Memblaze's PBlaze5 910 is their newest NVMe data center drive that uses 64-Layer 3D NAND. We looked at the AIC model for this review, which comes in capacities of 3.84TB and 7.68TB and performance quoted to reach 6.0GB/s read and 3.8GB/s write, while claiming throughput performance up to 1 million IOPS read. The 910 AIC has the power necessary to excel in virtually any application it is used for, coupled with a range of protection features including AES 256 data encryption and TRIM function support up to 8TB/s.
Diving into the details of the performance showed the 910 AIC at the top of the leaderboard in many of our tested workloads. In our SQL Server benchmarks, it came in first with 12,645.1 TPS and an average latency of just 1.5ms. In Sysbench, the drive performed at the top of the leaderboard again with 9,190.7 TPS, average latency of 13.3ms, and a worst-case scenario latency of 25.9ms. In our Houdini by SideFX benchmark, the 910 AIC recorded 3,077.7 seconds, which was slightly better than the U.2 model and among the other Memblaze products. In our VDBench benchmark, the 910 AIC maintained a sub-millisecond latency throughout all our tests, where it was the top performer in most of the categories. In random 4K read and write, the Memblaze PBlaze5 910 AIC hit 814,640 IOPS and 550,864 IOPS, respectively, while sequential workloads posted 3.14GB/s and 2.66GB/s, respectively. The SQL results held around a quarter million IOPS with the Oracle tests running between 277,000 IOPS and 270,000 IOPS.
Moving from a x4 to x8 interface, the Memblaze PBlaze5 910 offers a huge boost in performance compared to the U.2 form-factor version in the sme family. Across the board the AIC 910 showed its strengths over interface-limited U.2 models, and shows customers what is possible if the given application can leverage the larger form-factor.