by Kevin OBrien

LSI Nytro WarpDrive BLP4-400 Application Accelerator Review

The LSI Nytro WarpDrive BLP4-400 is a half-height half-length PCIe application accelerator that offers 400GB of eMLC NAND. Like the WLP4-200 200GB SLC model we have reviewed previously, the BLP4-400 combines four NAND pools leveraging SandForce controllers into a single storage volume. The drive is designed to be easy to deploy; the universal form factor slots easily into most servers, and thanks to LSI's background in HBAs and RAID cards, the WarpDrive usually doesn't require additional software or drivers to be installed. The WarpDrive family is as close as it gets to plug and play in an enterprise flash storage environment. 

As we reviewed the 200GB SLC last year and much of that review applies here, we won't dive down into as many specifics in this review. It is helpful though to understand that within the Nytro WarpDrive family LSI offers a number of iterations designed for varying use cases. The high-endurance SLC models come in 200GB and 400GB capacities, while the more mainstream eMLC drives come in 400GB, 800GB, and 1.6TB. While the most obvious use case for the Nytro WarpDrive line is in-server storage, the LSI product is being widely deployed by NetApp and others as a caching card in front of attached storage. LSI also offers their own caching software combined with the 400GB and 800GB eMLC cards - in those cases the drives are dubbed Nytro XD. For enterprise buyers who need help figuring out caching solutions and their degree of hot spot data, LSI is one of the few companies that offers a tool to address this need. That tool is their Nytro Predictor.

LSI Nytro WarpDrive Specifications

  • Single Level Cell (SLC)
    • 200GB Nytro WarpDrive WLP4-200
      • Sequential IOPS (4K) - 238,000 Read, 133,000 Write
      • Sequential Read and Write IOPS (8K) - 189,000 Read, 137,000 Write
      • Bandwidth (256K) - 2.0GB/s Read, 1.7GB/s Write
    • 400GB Nytro WarpDrive WLP4-400
      • Sequential IOPS (4K) - 238,000 Read, 133,000 Write
      • Sequential Read and Write IOPS (8K) - 189,000 Read, 137,000 Write
      • Bandwidth (256K) - 2.0GB/s Read, 1.7GB/s Write
  • Enterprise Multi Level Cell (eMLC)
    • 400GB Nytro WarpDrive BLP4-400
      • Sequential IOPS (4K) - 218,000 Read, 75,000 Write
      • Sequential Read and Write IOPS (8K) - 183,000 Read, 118,000 Write
      • Bandwidth (256K) - 2.0GB/s Read, 1.0GB/s Write
    • 800GB Nytro WarpDrive BLP4-800
      • Sequential IOPS (4K) - 218,000 Read, 75,000 Write
      • Sequential Read and Write IOPS (8K) - 183,000 Read, 118,000 Write
      • Bandwidth (256K) - 2.0GB/s Read, 1.0GB/s Write
    • 1600GB Nytro WarpDrive BLP4-1600
      • Sequential IOPS (4K) - 218,000 Read, 75,000 Write
      • Sequential Read and Write IOPS (8K) - 183,000 Read, 118,000 Write
      • Bandwidth (256K) - 2.0GB/s Read, 1.0GB/s Write
  • Average Latency < 50 microseconds
  • Interface - x8 PCI Express 2.0
  • Power Consumption - <25 watts
  • Form Factor - Low Profile (half-length, MD2)
  • Environmentals Operational at 0 to 45C
  • OS Compatiblity
    • Microsoft: Windows XP, Vista, 2003, 7; Windows Server 2003 SP2, 2008 SP2, 2008 R2 SP1
    • Linux: CentOS 6; RHEL 5.4, 5.5, 5.6, 5.7, 6.0, 6.1; SLES: 10SP1, 10SP2, 10SP4, 11SP1; OEL 5.6, 6.0
    • UNIX: FreeBSD 7.2, 7.4, 8.1, 8.2; Solaris 10U10, 11 (x86 & SPARC)
    • Hypervisors: VMware 4.0 U2, 4.1 U1, 5.0
  • End of Life Data Retention >6 months SLC, >3 months eMLC
  • Product Health Monitoring Self-Monitoring, Analysis and Reporting Technology (SMART) commands, plus additional SSD monitoring

Build and Design

The LSI Nytro WarpDrive is a Half-Height Half-Length x8 PCI-Express card comprised of four custom form-factor SSDs connected in RAID0 to a main interface board. Being a half-height card, the Nytro WarpDrive is compatibile with more servers by simply swapping the backplane adapter. LSI uses four SATA 6.0Gb/s SF-2500 SandForce processors at the heart of the Nytro WarpDrive. The Nytro houses two of these SSDs in two sandwiched heatsink "banks" which are connected to the main board with a small ribbon cable. To interface these controllers with the host computer, LSI uses their own SAS2008 PCIe to SAS bridge, which has wide driver support across multiple operating systems.

Unlike the first-generation WarpDrive, these passive heatsinks allow the NAND and SandForce controllers to shed heat into a heatsink first, which then gets passively cooled by airflow in the server chassis. This reduces hot-spots and ensures more stable hardware performance over the life of the product. A view from above the card shows the tightly sandwiched aluminum plates below, between, and on top of the custom SSDs that power the Nytro WarpDrive. The Nytro also supports legacy HDD indicator lights, for those who want that particular level of monitoring to be externally visible.

Each of the four SSDs powering the 400GB MLC LSI Nytro WarpDrive has one SandForce SF-2500 controller and eight 16GB Toshiba MLC Toggle NAND pieces. This gives each SSD a total capacity of 128GB, which is then over-provisioned 22% to have a usable capacity of 100GB. The LSI Nytro WarpDrive is fully PCIe 2.0 x8 power compliant, and it only consumes <25 watts of power during its operation. 

Testing Background and Comparables 

All PCIe Application Accelerators compared in this review are tested on our second-generation enterprise testing platform consisting of an Intel Romley-based Lenovo ThinkServer RD630. This new platform is configured with both Windows Server 2008 R2 SP1 as well as Linux CentOS 6.3 to allow us to effectively test the performance of different AAs in the various environments their drivers support. Each operating system is optimized for highest performance, including having the Windows power profile set to high-performance as well as cpuspeed disabled in CentOS 6.3 to lock the processor at its highest clock speed. For synthetic benchmarks, we utilize FIO version 2.0.10 for Linux and version 2.0.12.2 for Windows, with the same test parameters used in each OS where permitted.

StorageReview Lenovo ThinkServer RD630 Configuration:

  • 2 x Intel Xeon E5-2620 (2.0GHz, 15MB Cache, 6-cores)
  • Intel C602 Chipset
  • Memory - 16GB (2 x 8GB) 1333Mhz DDR3 Registered RDIMMs
  • Windows Server 2008 R2 SP1 64-bit, Windows Server 2012 Standard, CentOS 6.3 64-Bit
  • 100GB Micron RealSSD P400e Boot SSD
  • LSI 9211-4i SAS/SATA 6.0Gb/s HBA (For boot SSDs)
  • LSI 9207-8i SAS/SATA 6.0Gb/s HBA (For benchmarking SSDs or HDDs)

When it came to choosing comparables for this review, we chose the newest top-performing SLC Application Accelerators. These accelerators were selected based on individual performance characteristics as well as price-range. Where applicable, we include both stock and high-performance benchmark results if the manufacturer includes that level of configuration through software to target different product use cases. In the case of the FlashMAX II, we include both full capacity and high-performance benchmarks.

200GB LSI Nytro WarpDrive WLP4-200

  • Released: 1H2012
  • NAND Type: SLC
  • Controller: 4 x LSI SandForce SF-2500 through LSI SAS2008 PCIe to SAS Bridge
  • Device Visibility: Fixed Hardware RAID0
  • LSI Windows: 2.10.51.0
  • LSI Linux: Native CentOS 6.3 driver
  • Preconditioning Time: 6 hours

400GB LSI Nytro WarpDrive BLP4-400

  • Released: 1H2012
  • NAND Type: MLC
  • Controller: 4 x LSI SandForce SF-2500 through LSI SAS2008 PCIe to SAS Bridge
  • Device Visibility: Fixed Hardware RAID0
  • LSI Windows: v07.00.00.00
  • LSI Linux: Native CentOS 6.3 driver
  • Preconditioning Time: 6 hours

800GB Intel SSD 910

  • Released: 1H2012
  • NAND Type: eMLC
  • Controller: 4 x Intel EW29AA31AA1 through LSI SAS2008 PCIe to SAS Bridge
  • Device Visibility: JBOD, software RAID depending on OS
  • Intel Windows: 13.0
  • Intel Linux: Native CentOS 6.3 driver

Enterprise Synthetic Workload Analysis

The way we look at PCIe storage solutions dives deeper than just looking at traditional burst or steady-state performance. When looking at averaged performance over a long period of time, you lose sight of the details behind how the device performs over that entire period. Since flash performance varies greatly as time goes on, our benchmarking process analyzes the performance in areas including total throughput, average latency, peak latency, and standard deviation over the entire preconditioning phase of each device. With high-end enterprise products, latency is often more important than throughput. For this reason we go to great lengths to show the full performance characteristics of each device we put through our Enterprise Test Lab.

We also include performance comparisons to show how each device performs under a different driver set across both Windows and Linux operating systems. For Windows, we use the latest drivers at the time of original review, which each device is then tested under a 64-bit Windows Server 2008 R2 environment. For Linux, we use 64-bit CentOS 6.3 environment, which each Enterprise PCIe Application Accelerator supports. Our main goal with this testing is to show how OS performance differs, since having an operating system listed as compatible on a product sheet doesn't always mean the performance across them is equal.

Flash performance varies throughout the preconditioning phase of each storage device. With different designs and varying capacities, our preconditioning process lasts for either 6 hours or 12 hours depending on the length of time needed to reach steady-state behavior. Our main goal is to ensure each drive is fully into steady-state mode by the time we begin our primary tests. In total, each of the comparable devices are secure erased using the vendor's tools, preconditioned into steady-state with the same workload the device will be tested with under a heavy load of 16 threads with an outstanding queue of 16 per thread, and then tested in set intervals in multiple thread/queue depth profiles to show performance under both light and heavy usage.

Attributes Monitored In Preconditioning and Primary Steady-State Tests:

  • Throughput (Read+Write IOPS Aggregate)
  • Average Latency (Read+Write Latency Averaged Together)
  • Max Latency (Peak Read or Write Latency)
  • Latency Standard Deviation (Read+Write Standard Deviation Averaged Together)

Our Enterprise Synthetic Workload Analysis includes four profiles based on real-world tasks. These profiles have been developed to make it easier to compare to our past benchmarks as well as widely-published values such as max 4K read and write speed and 8K 70/30, which is commonly used for enterprise drives. We also included two legacy mixed workloads, the traditional File Server and Webserver, each offering a wide mix of transfer sizes.

  • 4K
    • 100% Read or 100% Write
    • 100% 4K
  • 8K 70/30
    • 70% Read, 30% Write
    • 100% 8K
  • File Server
    • 80% Read, 20% Write
    • 10% 512b, 5% 1k, 5% 2k, 60% 4k, 2% 8k, 4% 16k, 4% 32k, 10% 64k
  • Webserver
    • 100% Read
    • 22% 512b, 15% 1k, 8% 2k, 23% 4k, 15% 8k, 2% 16k, 6% 32k, 7% 64k, 1% 128k, 1% 512k

In our first workload we look at a fully random 4K write preconditioning profile with an outstanding workload of 16T/16Q. In this test, the 400GB LSI Nytro WarpDrive offered a burst speed of 81,000 IOPS in Windows and 58,000 IOPS in Linux. After nearing steady-state, the eMLC Nytro WarpDrive leveled off to around 14,000 IOPS in both Windows and Linux.

In our preconditioning 4K random write 16T/16Q workload, the 400GB eMLC LSI Nytro WarpDrive ranged from 3.1-4.4ms in burst to 17.4-18ms in steady-state.

Looking at max latency in our 4K preconditioning workload, the 400GB WarpDrive had peak response times starting at 50-60ms in burst and increased to 100-150ms as it neared steady-state.

Comparing latency standard deviation, the eMLC Nytro WarpDrive scaled much higher than the Intel SSD 910 as well as the SLC-based Nytro WarpDrive.

After our 6-hour preconditioning period ended on the 400GB LSI Nytro WarpDrive, it had steady-state random write 4k performance measuring a peak of 14,295 IOPS in Windows with a read speed of 124,261 IOPS. This is compared to the Intel SSD 910 which offered 219,795 IOPS read and 121,850 IOPS write steady-state.

Comparing average latency with a heavy 16T/16Q workload with 100% 4K random read activity, the 400GB LSI Nytro WarpDrive measured 2.058ms in Windows and 3.277ms in Linux. Average steady-state write latency measured 17.9ms in Windows and 18.244ms in Linux.

When comparing max latency in our 4k steady-state test, the 400GB LSI Nytro WarpDrive had a peak write latency of 104ms in Windows and 172ms in Linux. Read latency measured 31.74ms in Windows and 63.78ms in Linux.

Comparing latency standard deviation between the MLC Nytro WarpDrive to the MLC-based Intel SSD 910, the Nytro had less consistency in write-activity, and ranked average in read latency consistency.

Our next test switches to an 8K 70/30 mixed workload where the 400GB Nytro WarpDrive had burst speeds measuring 84-120,000 IOPS in Linux and Windows respectively before leveling off to 36-43,000 IOPS in steady-state.

Comparing average latency in our 8k 70/30 preconditioning 16T/16Q workload, the 400GB LSI Nytro WarpDrive offered burst latency between 2.1-3ms which increased to 6.0-6.9ms near steady-state.

With a 8k 70/30 workload, peak latency from the 400GB LSI Nytro WarpDrive ranged from 30-40ms during burst to 50-80ms as the drive neared steady-state.

Comparing latency consistency in our 8k 70/30 preconditioning workload, the MLC-based LSI Nytro WarpDrive had standard deviation that scaled higher than the Intel SSD 910 in steady-state, as well as higher than the SLC-based WarpDrive. 

Compared to the fixed 16 thread, 16 queue max workload we performed in the 100% 4K write test, our mixed workload profiles scale the performance across a wide range of thread/queue combinations. In these tests we span our workload intensity from 2 threads and 2 queue up to 16 threads and 16 queue. In our expanded 8K 70/30 test, the 400GB LSI Nytro WarpDrive scaled from 11-11.2k IOPS at 2T/2Q in Windows and Linux and increased to 36.8k-42.7k IOPS at 16T/16Q in Linux and Windows respectively. This scaled lower than both the Intel SSD 910 and the SLC-based Nytro WarpDrive.

In the scaled average latency segment of our 8k 70/30 test, we found that the 400GB LSI Nytro WarpDrive scaled from 0.35ms at 2T/2Q and increased to 5.9-6.9ms at 16T/16Q in Linux and Windows.

Max latency in our 8k 70/30 main test measured higher on the MLC-based LSI Nytro WarpDrive, ranging from 32-142ms in peak response times.

Comparing latency consistency of the mainstream Intel SSD 910 and MLC-based LSI Nytro WarpDrive, the WarpDrive in Linux scaled higher than the SSD 910, but in Windows offered an edge under higher workloads.

The File Server workload represents a larger transfer-size spectrum hitting each particular device, so instead of settling in for a static 4k or 8k workload, the drive must cope with requests ranging from 512b to 64K. In this workload the MLC-based LSI Nytro WarpDrive offered a higher burst speed than the Intel 910, measuring 69.7-83k IOPS, but as it neared steady-state performance lowered to the bottom of the group, measuring 23.9-27.7k IOPS.

At a low workload in our File Server preconditioning test, average latency measured 3-3.6ms at 2T/2Q and increased to 9.2-10.6ms at 16T/16Q.

During the preconditioning stage of our File Server test, peak response times from the MLC-based LSI Nytro WarpDrive ranged from 40-50ms in burst mode and increased to 60-140ms as the drive neared steady-state.

Comparing latency consistency between the 400GB LSI Nytro WarpDrive and the Intel SSD 910, in burst mode the Nytro had lower latency standard deviation, although as it neared steady-state its performance in Linux fell behind the SSD 910.

After the File Server preconditioning process had completed with a constant 16T/16Q load, we dropped into our main tests which measure performance at set levels between 2T/2Q and 16T/16Q. In our main File Server workload the 400GB MLC-base LSI Nytro WarpDrive scaled from ~7,500 IOPS at 2T/2Q in Windows and Linux up to 23.7-27.2k IOPS in Linux and Windows respectively at 16T/16Q.

Average latency from the 400GB LSI Nytro WarpDrive ranged from 0.52-0.53 in Linux and Windows at 2T/2Q which increased to 9.39-10.76ms at 16T/16Q

Comparing max latency between the MLC-based LSI Nytro WarpDrive and the Intel SSD 910, the Nytro ranged higher in our File Server main test, with peak response times located in a band between 75-150ms.

Moving from peak latency to latency standard deviation, the MLC-based Nytro WarpDrive lagged behind the group for most of the test, and it had a slight advantage over the Intel SSD 910 in some areas in Windows.

In our last synthetic workload covering a Web Server profile, which is traditionally a 100% read test, we apply 100% write activity to fully precondition each drive before our main tests. Under this stressful preconditioning test the 400GB MLc-based LSI Nytro WarpDrive had burst speeds similar to the Intel SSD 910, measuring between 29.6-35.6k IOPS, although as it neared steady-state performance dipped to the bottom of the group measuring 5.6-5.7k IOPS.

Average latency in our stressful Web Server preconditioning test started at 7.1-8.6ms in burst and increased to 44-45ms as the Nytro neared steady-state.

As the MLC-based LSI Nytro WarpDrive neared steady-state, its peak response times ranged between 240-360ms, compared to the Intel SSD 910 which measured between 80-250ms.

Latency consistency of the MLC-based LSI Nytro WarpDrive lagged behing the Intel SSD 910 as well as the SLC-based Nytro, scaling much higher as the drive neared steady-state conditions.

Switching to the main segment of our Web Server test with a 100% read profile, the 400GB LSI Nytro WarpDrive had performance scaling from 11.7-12k IOPS at 2T/2Q which increased to a peak of 47.5-57.6k IOPS at 16T/16Q. This compared to the Intel SSD 910 which ranged from 15-15.4k IOPS at 2T/2Q and increased to a peak of 57.4-64.6k IOPS at 16T/16Q.

In our read-heavy Web Server main test, the MLC-based Nytro offered an average latency scaling from 0.33ms at 2T/2Q up to 4.4-5.3ms at 16T/16Q.

The MLC-based LSI Nytro WarpDrive scaled slightly higher in peak response times compared to the Intel SSD 910. Max latency measured between 25-70ms over the course of the workload.

While peak response times were higher from the Nytro WarpDrive compared to the SSD 910, switching to latency consistency the WarpDrive offered much better latency standard deviation in both low and high workloads.

Conclusion

The LSI Nytro WarpDrive BLP4-400 is a mainstream application accelerator designed to hit a wider range of uses than the SLC version we previously reviewed. The 400GB capacity works well for in-compute needs for small databases or as a caching card to accelerate a slower tier of hard drives. The half-height, half-length card design also makes it a universal fit for most servers, and LSI has built the WarpDrive on a longstanding HBA platform known for driveless server compatibility.

That said, the multi-controller design LSI has used makes it a little soft when it comes to comparing against more modern application accelerators that have been released since LSI launched this Nytro WarpDrive line. When benchmarked in our 8k 70/30 or File Server tests, the 400GB Nytro trails the Intel SSD 910 by 30-40%. We also noted performance drops in Linux, where the WarpDrive favored Windows for higher performance. This wasn't as noticeable with the Intel SSD 910. Accounting for this difference, the Intel relies on software-RAID, whereas the WarpDrive uses fixed hardware-RAID0.

The LSI product though has gained traction with many enterprise users and solutions resellers though because of its ease of use, reliability and compatibility. While harder to quanitfy than performance metrics, it could be said that these factors are just as important in many use cases where knowing the card will work with minimal fuss is more important than hands-on tuning for maximum IOPS.

Pros

  • High degree of compatibility
  • Universal HHHL form factor
  • Can operate as a boot drive

Cons

  • Trails competitors in performance

Bottom Line

The LSI Nytro WarpDrive BLP4-400 400GB eMLC flash card is one of the easier application accelerators to deploy by presenting itself as a bootable, single volume featuring a universal HHHL form-factor. It is also one of the most compatible solutions, with built-in support from operating systems including Windows and Linux.

BLP4-400 Product Page

Discuss This Review