by StorageReview Enterprise Lab

Violin WFA-64 Windows Flash Array Review

Violin’s Windows Flash Array (WFA) is an all-flash SMB and NFS storage solution that combines Violin Memory’s Flash Fabric Architecture with Windows Storage Server 2012 R2 to offer a straightforward appliance-style application server storage solution with 10Gb Ethernet and 56Gb FDR InfiniBand connectivity. Violin and Microsoft collaborated on the development of the WFA, such as Windows Server kernal optimzations that allow the WFA to fully leverage the SMB 3.0 protocol with support for SMB Direct over RDMA‐enabled network interfaces.

The WFA is built on the 3U Violin All Flash Array 6000 platform, with dual blades that run Windows Storage Server as a 2-node cluster that can scale to a raw capacity of 280TB. The system scales by adding new WFA appliances to the Windows cluster in increments of 35 or 70TB of raw capacity for up to 4 arrays or 8 nodes. Violin uses a server leasing and ‘pay-as-you-grow’ licensing model that is designed to take advantage of the platform’s non-disruptive scaling capabilities that allows users to license a smaller capacity rather than a full array and increase utilization over time. This review is based on the performance of the WFA-64, the largest array in the Windows Flash Array lineup at 64x1TiB Violin Inline Memory Modules (VIMMs).

Windows Flash Array Model WFA-64 WFA-48 WFA-32 WFA-24 WFA-16
Form Factor/Flash Type 3U/MLC 3U/MLC 3U/MLC 3U/MLC 3U/MLC
Raw Capacity (TB) 70 52 35 26 17.5
Usable Capacity (TB)
at 84% format level
44 33 22 16 11
I/O Connectivity 40GbE, 56Gb IB 40GbE, 56Gb IB 40GbE, 56Gb IB 40GbE, 56Gb IB 40GbE, 56Gb IB
Max. 4KB IOPS 1.1M IOPS 1.1M IOPS 800K IOPS 800K IOPS 800K IOPS
Max. Bandwidth 4GB/s 4GB/s 4GB/s 4GB/s 4GB/s
Nominal Latency <500 μsec  <500 μsec  <500 μsec  <500 μsec  <500 μsec 

One of the main selling points for the Windows Flash Array is its comprehensive support for the SMB 3.0 protocol via Windows Server 2012 R2. For example, SMB 3.0 includes multi-channel support to aggregate multiple network ports for failover and increased performance. Unlike block-based port bonding and aggregation which must keep individual packets intact as they are split between interfaces, SMB Multi-channel is capable of splitting individual packets for transmission over multiple links. Depending on the environment and workload, this form of aggregation has the potential to improve latency as well as throughput.

Storage and File System File And Block Access Networking
Data Deduplication
Compression
NTFS Availability
Offloaded Data Transfer (ODX)
Thin Provisioning
Encryption
SMB 3.0
NFS 3.0 and NFS 4.1
Support for VMware VMs over NFS
Scale-out File Server (SOFS)
VSS for Remote SMB File Shares (snaps)

SMB Direct (RDMA)
SMB Multichannel
Encryption
Transparent Failover

Clustering Virtualisation Management
Cluster Shared Volumes v2
DFS Replicatoin
Live Storage Migration
New VHDX standard
Microsoft System Center
PowerShell

Full support for SMB 3.0 also means that the Windows Flash Array can leverage the new addition of Remote Direct Memory Access (RDMA) to SMB, a feature dubbed SMB Direct. SMB Direct allows network interfaces to directly access system RAM rather than passing through the operating system in order to reduce network latency and CPU utilization. According to Microsoft, SMB Direct stands to reduce the application server's CPU consumption by 30%, with I/O intensive workloads benefitting the most. Violin is also quick to point out that this increased CPU efficiency has a financial upshot for applications which assess license fees on a per-core basis.

Our review model is the Violine WFA-64, with an MSRP of around $585,000.

Violin WFA-64 Specifications

  • Flash Type: MLC
  • Raw Capacity: 64TiB / 70TB
  • Maximum Usable Capacity: 40 TiB / 44TB
  • Maximum 4K IOPS: 1,100,000
  • Minimum Lantecy: 220 μsec
  • VIMM Count (Data + Hot Spares): 60+4
  • Reliability/Resiliency: Highly Available Hardware Configuration; System Level hardware Based vRAID Dual or Quad vRAID Controller Modules; 2 Array Controller Modules and Memory Gateways; 99.999% Availability
  • IO/Connectivity: 8 x 56Gb FDR Infiniband or 8x 40Gb Ethernet 
  • Height: 3RU
  • Width: 17.5"
  • Depth: 27"
  • Cable Management: 6"
  • Weight: 92lbs
  • Power: 1500W
  • Cooling: 4961 BTU/hr
  • Flash Endurance: Covered under 3 year warranty or maintenance contract, whichever is greater

Build and Design

The Windows Flash Array incorprates two server blades running Windows Server 2012 R2, located along the left side of the chassis. By deploying the WFA with RDMA-enabled network interfaces, located just behind the server blades, the array can make use of SMB Direct to improve performance and lower latency. The front of the chassis is primarily a huge intake grill for the large cooling fans, as well as a strudy handle and status LEDs.

The WFA's Violin Intelligent Memory Modules (VIMMs) are located behind the fans in the center of the chassis. VIMMs are Violin's alternative to SSD storage and manage garbage collection, wear leveling, and error/fault management for their underlying storage media. VIMMs are composed of a logic-based flash controller, management processor, DRAM for metadata, and NAND Flash for storage. Each is hot-swappable for ease of maintence, and in a card form factor rather than a traditional 2.5" SSD.

From the rear of the chassis we see the primary power and network connectivity.

Management and Operating System

The core of the Windows Flash Array management experience is the platform’s tight integration with the Windows Server 2012 R2 instances that run from the array’s dual server blades. WFA deployments are designed to be managed via Microsoft System Center and PowerShell, which allows organizations that already have Microsoft administrative capacity to streamline their processes by avoiding the overhead of another management environment.

This approach allows Violin to get the jump on competing arrays which have yet to provide support for Microsoft’s SMB Direct to increase array and application server performance. According to Violin, the WFA with SMB Direct can reduce SQL Server CPU utilization by up to 30%, thereby reaching pegging a sustained throughput of 1.1 million 4K IOPS and 4GB/s of bandwidth in manufacturer benchmarks.

The Windows Flash Array offers granular control over data service deployment, allowing deduplication and other features to be selectively activated for nodes and shares.

The WFA operates as a Windows Failover Cluster Failover in an active‐active configuration and can make use of SMB Multipathing to detect connectivity failures and reroute traffic. It also offers Hyper-V Replica for asynchronous replication of virtual machines along with live VM migration. Much of this functionality is focused on the SMB protocol; Live Migration is only available via SMB.

Performance Testing

The purpose of getting the Violin WFA in the lab was multi-faceted. First, we had a goal to integrate with many of our great partners. We leveraged experience from Dell to get the most out of the PowerEdge R920 testing platform. Mellanox contributed Infiniband configuration support and Microsoft was available to ensure SMB 3.0 best practices were used. Second, we wanted to deploy a more intensive benchmark in our lab, designed to stress high-end all flash configurations like the WFA and the rest of the Violin line. We thus partnered with Stream Financial to replicate their DataFusion performance test in our lab. Lastly, we wanted to be able to beat the results that Violin had produced with this test previously, setting a new high-water mark for what flash storage is capable of.

DataFusion in its simplest form is designed to demonstrate the processing and aggregating of more than one trillion rows of risk data, containing 13 trillion points of data, one risk point per row. The test looks at a very real big data use case, where decision-making can be hampered by the time it takes to process the data. The test mimics a trading environment with risk data containing risk buckets for delta, gamma, vega and theta for trading books over a 12-year period. To simulate a typical business view, the data was aggregated using SQL ‘where’, ‘like’ and ‘group by’ queries to show bucketed risk exposure by risk type, currency and counterparty. The overall highly-compressed database footprint is a little over 8TB, expanded exceeds 100TB. For the purposes of this test, the database is run without indexing, forcing the server and storage to process all data in real-time.

The initial testing done by "The Test People" out of the UK, was a bit modest in comparison to the R920 configuration in our lab. Their findings used a Violin WFA-32 interfacing with a single Intel Xeon CPU E5-2690 v2 @3.00GHZ. The test process took 4 hours and 19 minutes. They further commented that, "process time could be reduced further by scaling the servers and arrays." 

With the gauntlet thrown and Violin providing us with a WFA-64 to use for a few weeks, we sought to see how hard we could push the Violin flash, Windows and the Infiniband fabric. We leveraged a Dell PowerEdge R920 to see how far we could drop the processing times using just one incredibly-powerful server. The configuration of our R920 offered 138GHz of aggregate CPU processing power, versus 30GHz which the original press release leveraged.

Dell PowerEdge R920

  • Four Intel E7-4870 v2 CPUs (2.3GHz, 15-cores, 30MB Cache) 
  • 512GB RAM (8GB x 64 DDR3, 128GB per CPU)
  • 2 x 300GB 10K SAS RAID1 Boot
  • 4 x Mellanox ConnectX-3 Dual-Port InfiniBand Adapters

With our new test platform chosen and configured with Windows Server 2012 R2, we were able to completely saturate the R920 during the benchmark. CPU utilization was 90-100% over the course of the test, with 2-3GB/s of traffic over the wire. When all was said and done we completed with an incredibly low time of 56 minutes and 16 seconds. That shaved off about 80% over the original processing time, showing the benefits of quad-CPU server such as the Dell PowerEdge R920 in compute heavy tasks mixed with a fast interconnect such as our Mellanox Infiniband fabric. While the benchmark time did improve dramatically, the WFA-64 still had headroom left on both controllers and available bandwidth to be tapped into.

Conclusion

All-Flash storage arrays are fundamentally an exercise in squeezing the maximum performance possible from a single platform. Violin’s Windows Flash Array takes a very specific approach to maximizing the performance of Violin’s All-Flash array platform by focusing on tweaks and integrations for organizations that need storage for Windows-based application server workloads and the SMB protocol. Violin’s argument will sound convincing to many administrators: by fully committing to the Windows Server featureset and management paradigm, the Windows Flash Array will be simpler and less expensive to deploy and manage. For Windows shops this is probably true, and for those who use other platforms, the Violin 7000 Flash Storage Platform is a more traditional array that would fit better there. 

Our testing in this review is somewhat limited do to the time it took to set up the new testing environment and generally the access to the array. While not intended to be comprehensive, the data points are encouraging when considering the results we found. Our test, albeit with substantially improved hardware, lowered the time the benchmark took to complete by just under 80%. That's pretty impressive given the overall density of the array and the R920 combined. With some headroom leftover on the WFA-64, faster or newer compute hardware could achieve even greater results. Given new quad-CPU platforms like the R930, we'd expect even more performance could be squeezed from the Violin WFA, which isn't even running the latest Intel Haswell CPUs inside.

The WFA is not without compromises, the CPUs haven't been updated to the latest Intel offering and aside from the hardware design advantages Violin offers, there's not much "special sauce" on the software side that Microsoft isn't providing. That's not necessarily a problem, and in Windows environments it's probably a benefit. The question is simply going to come down to how badly an enterprise needs this tier of performance vs. more traditional SAN offerings. From what we've seen in this limited interaction though, is that the WFA really screams if you have enough compute to throw at it and an application that is highly sensitive to latency. We've seen nothing out of other Windows boxes or even all-flash DIY solutions that leads us to believe there's a better option in this category.

Violin Windows Flash Array Product Page

Discuss this review

Sign up for the StorageReview newsletter

Related News and Reviews