VMware VMmark Virtualization Benchmark

The VMmark Virtualization Benchmark is a comprehensive multi-host datacenter virtualization benchmark designed to mimic the behavior of complex consolidation environments. Legacy benchmarking methodologies designed for single-workload performance and scalability are insufficient for server consolidation, which gathers a collection of workloads onto a virtualization platform consisting of a set of physical servers with access to shared storage and network infrastructure.

The ability to virtualize irregular workloads while effortlessly load-balancing and automating workload provisioning combined with a wider range of administrative tasks has revolutionized server usage. As such, VMmark benchmarking focuses on user-centric application performance and accounts for the effects of this infrastructure activity (that can impact CPU, network, storage or other performance) on overall platform performance.

VMmark 2.x's benchmarking approach utilizes a series of sub-tests derived from commonly used load-generation tools and commonly initiated virtualization administration tasks. The benchmark implements a tile-based scheme for measuring application performance. The unit of work known as a tile is best defined as a collection of VMs running a diverse set of workloads encapsulated in diverse set on VMs.

VMmark 2.x also executes ubiquitous platform infrastructure workloads such as cloning and deploying of VMs, automatic VM load balancing across a datacenter, VM live migration (vMotion) and dynamic datastore relocation (storage vMotion). These operations complement the conventional application-level workloads. A data center's consolidation capacity, which measures scalability and individual application performance, is thus measured as the number of tiles that the data center platform can handle while at the same time supporting the required administrative operations. The performance of each workload within every tile that a multi-host platform can accommodate combined with the performance of the infrastructure operations determines the overall benchmark score.

Fully compliant VMmark benchmark tests are designed to run for a minimum of 3 hours with workload metrics reported every minute. After a benchmark run, each tile's metrics are computed and aggregated into a score for that tile. For aggregation, first the test normalizes the metrics via a reference system (in order to match up ratings such as MB/s and database commits/second). Then, a geometric mean is computed as the final score for the tile, with all of the per-tile score added to create the application workload portion of the final metric. Infrastructure workloads utilize a similar process for their portion of the metric. Dissimilar however, is how the infrastructure workloads are scaled by the size of the underlying server cluster and not explicitly by the user. As a result, the infrastructure workloads are compiled as a single group and no multi-tile sums are required. From this point, a final benchmark score is then computed as a weighted average where application-workloads account for 80% and infrastructure-workloads comprise 20%. These weights reflect the relative contribution of infrastructure and application workloads to overall resource demands.

In order to run the VMmark Virtualization Benchmark, there are some serious hardware requirements to start, which only increases as the number of tiles you are testing goes up.

VMmark Virtualization Benchmark Minimum Specifications

  • ESXi Virtual Server Hosts (vMotion compatible)
    • 2-host cluster (homogeneous systems not required) with the following:
    • 4 logical CPUs per server
    • 27GB of memory
    • 320GB of shared storage
  • vCenter Server installed on a separate, dedicated server
  • Client System Per Tile
    • Recommended minimum: two CPU cores
    • 4-GB of RAM
    • 15GB of available local disk space
    • vMotion networking (10 Gb/s network recommended)

VMware's VMmark 2.5 utilizes a wide range of software and operating systems to fully reflect a real-world virtualized environment. Below is an overview of the VM's included in each VMmark tile and the applications and operating systems they use.

VMmark 2.5 Tile Configuration

  • Client
    • Microsoft Windows Server 2008 Enterprise Edition R2, 64-bit
    • VMmark 2.x harness
    • STAF framework and STAX execution engine
    • LoadGen
    • Microsoft Outlook 2007 (standalone or included in Microsoft Office 2007)
    • Microsoft Exchange 2007 management tools
    • Cygwin
    • A Java JDK
    • Rain Workload Toolkit
  • Mailserver
    • Microsoft Windows Server 2008 Enterprise Edition R2, 64-bit
    • Microsoft Exchange 2007
    • 1000 heavy profile users
  • Standby
    • Microsoft Windows Server 2003 SP2 Enterprise Edition, 32-bit
  • OlioDB
    • SUSE Linux Enterprise Server 11, 64-bit
    • Olio database w/ MySQL database
  • OlioWeb
    • SUSE Linux Enterprise Server 11, 64-bit
    • Olio workload
  • DS2DB
    • SUSE Linux Enterprise Server 11, 64-bit
    • MySQL database
  • DS2WebA
    • SUSE Linux Enterprise Server 11, 64-bit
    • Apache 2.2 Web server
    • DS2 Web tier A
  • DS2WebB
    • SUSE Linux Enterprise Server 11, 64-bit
    • Apache 2.2 Web server
    • DS2 Web tier B
  • DS2WebC
    • SUSE Linux Enterprise Server 11, 64-bit
    • Apache 2.2 Web server
    • DS2 Web tier C

VMware VMmark Testing Environment

Storage solutions are tested with the VMware VMmark benchmark in the StorageReview Enterprise Test Lab utilizing multiple servers connected over a high-speed network. We utilize Intel-based Sandy Bridge and Westmere-based Lenovo ThinkServers for different segments of the VMware VMmark Environment, including four ThinkServer RD630s for the VMmark 2.5.1 hosts, two ThinkServer RD630s for hosting multiple virtual clients, one ThinkServer operating as a physical prime client, one ThinkServer RD240 running a VMware vCenter Appliance, and one ThinkServer RD240 as a temporary staging ground for each tile leveraged in our VMmark test. The Lenovo ThinkServer brand was a top choice when designing this new platform, leveraging Intel's powerful processor and chipset lineup to offer the best performance and still driving great value. The ThinkServer line also offers excellent hardware compatibility, which is an absolute must as we incorporate different forms of storage and networking technology into our testing platform. As with our other testing platforms, our goal is to show realistic performance customers can expect from mid-range server platforms, versus the top-spec servers generally leveraged in most competitive benchmarks. Another advantage of this unique 4-host VMmark platform is we can leverage more host-side resources in aggeragate than a top-spec 2-host setup, putting the stress on the storage product under test without getting CPU-bound.

For local storage in this VMmark environment, we picked OCZ Talos 2 R SSDs which offer dual LSI SandForce SF-2500 controllers and a dual-port SAS 6Gb/s interface. These SSDs are utilized on both the VM server and virtual client side of our VMmark testing layout, supporting the local storage demands of each ThinkServer RD630. With up to six Windows Server 2008 R2-based virtual clients running on each of our ESXi vSphere hosts stressing the VM servers under test, we wanted to rule out all possibility of the hosts becoming I/O bound during the benchmark. These SSDs also include in-flight data protection in the event of a power interruption, advanced error correction, as well as strong endurance with the R-series 22% over-provisioning combined with the low write-amplification of the SandForce controllers.

Mellanox 56Gb InfiniBand interconnects were used to provide the highest performance and greatest network efficiency on each ESXi vSphere host to ensure that the VMs connected are not network-limited. We use one single-port Mellanox ConnectX-3 NIC operating in IPoIB mode, with multiple VM networks running on a single vSwitch. This alleviates any network constraints and reduces the complexity of the environment in our multi-use testing infrastructure.

For our storage fabric we leverage multiple interconnect standards, including 1GbE or 10/40GbE iSCSI, 8/16Gb Fibre Channel, or 56Gb/s InfiniBand. In our testing environment, we utilize Netgear switching for 1GbE management as well as  1GbE and 10Gbase-T iSCSI storage. These switches include the Netgear 52-port ProSafe GS752TS (management and iSCSI VLANS) as well as the 24-port ProSafe M7100 (10Gbase-t). For our Twinax 10GbE and 40GbE infrastructure we leverage the Mellanox SX1036 switch which supports native 40GbE, as well as 10GbE with fan-out QSFP+ to four SFP+ cables. When it comes to attached Fibre Channel equipment, our lab supports both traditional 8Gb FC, as well as the latest 16Gb equipment. For 8Gb equipment we use a QLogic SB5800V while faster 16Gb end-to-end solutions are tested through a Brocade 6510 switch. For the top-tier storage solutions that drive the fastest speeds and lowest latency over InfiniBand, we use a 56Gb/s Mellanox SX6036 switch, which also acts as the backbone for our VMmark VM network.

VMware VMmark Virtualization Benchmark Equipment

  • Lenovo ThinkServer RD630 VMware ESXi vSphere 4-node Cluster
    • Eight Intel E5-2650 CPUs for 127GHz in cluster (Two per node, 2.0GHz, 8-cores, 20MB Cache) 
    • 512GB RAM (128GB per node, 8GB x 16 DDR3, 64GB per CPU)
    • 400GB OCZ Talos 2 SAS SSD x 4 (via LSI 9207-8i)
    • 4 x Mellanox ConnectX-3 InfiniBand Adapter (vSwitch for vMotion and VM network)
    • 4 x QLogic QLE2672 16Gb FC Adapter
    • VMware ESXi vSphere 5.1 / Enterprise Plus 8-CPU
  • Lenovo ThinkServer RD630 VMware ESXi vSphere Virtual Client Hosts (2)
    • Four Intel E5-2650 CPUs (Two per node, 2.0GHz, 8-cores, 20MB Cache)
    • 256GB RAM (128GB per node, 8GB x 16 DDR3, 64GB per CPU)
    • 400GB OCZ Talos 2 SAS SSD x 2(via LSI 9207-8i)
    • 2 x Mellanox ConnectX-3 InfiniBand Adapter
    • VMware ESXi vSphere 5.1 / Enterprise Plus 4-CPU
  • Lenovo ThinkServer RD240 (Prime Client)
    • Two Intel Xeon X5650 CPUs (2.66GHz, 6-cores, 12MB Cache)
    • 16GB RAM (8GB x 4 DDR3, 8GB per CPU)
    • 600GB 10K SAS HDD in RAID1 (via LSI 9260-8i)
    • Mellanox ConnectX-3 InfiniBand Adapter
    • Windows Server 2008 R2 64-bit
  • Lenovo ThinkServer RD240 (vCenter Appliance Host)
    • Two Intel Xeon X5650 CPUs (2.66GHz, 6-cores, 12MB Cache)
    • 24GB RAM (8GB x 2 DDR3, 4GB x 2, 12GB per CPU)
    • Hosted on shared SSD iSCSI LUN (125GB Thin Provisioned)
    • VMware ESXi 5.1 vSphere / Enterprise Plus 2-CPU
  • Lenovo ThinkServer RD240 (VMmark Tile Storage)
    • Two Intel Xeon X5650 CPUs (2.66GHz, 6-cores, 12MB Cache)
    • 32GB RAM (8GB x 4 DDR3, 16GB per CPU)
    • 8TB x 8 WD RE4 SAS RAID6 + Hot Spare (20TB usable via LSI 9260-8i)
    • VMware ESXi 5.1 vSphere / Enterprise Plus 2-CPU
  • Mellanox SX6036 InfiniBand Switch
    • 36 FDR (56Gb/s) ports
    • 4Tb/s aggregate switching capacity
  • Mellanox SX1036 10/40GbE Switch
    • 36 40GbE ports
  • Netgear ProSafe M7100 10Gbase-t Switch
    • 24 10Gbase-t RJ45 ports
    • 480Gb/s aggregate switching capacity
  • Netgear ProSafe GS752TS 1GbE Switch
    • 48 1GbE RJ45 ports
    • 104Gb/s aggregate switching capacity
  • Brocade 6510 16Gb FC Switch
    • 48 16Gb FC ports
    • 768Gb/s aggregate switching capacity

VMmark Performance Results

We test a wide range of storage solutions with the VMware VMmark benchmark that meet the minimum requirements of the testing environment. To qualify for testing, the storage device must have a usable capacity exceeding 300GB per tile and be geared towards operating under stressful enterprise conditions. For a high performance storage array this translates into roughly 3TB of usable capacity for a 10 tile configuration. For testing multiple PCIe Application Accelerators or groups of SATA or SAS enterprise SSDs, we attach the storage through our InfiniBand-based SCST host to present it as shared storage to our servers. Listed below are the normalized VMmark 2.5.1 results captured from all devices tested to date in this test. Note these values can't be compared to official VMmark 2.5.1 scores, as we test devices on our unique 4-host platform running at less than full-utilization to simulate how users typically operate in a production environment. In product reviews we dive into greater detail and put competing products head to head while our main list shows the stratification of different storage solutions.

 

Device Highest Normalized VMmark 2.5.1 Application Score Highest Normalized VMmark 2.5.1 Overall Score 1-Tile Normalized VMmark 2.5.1 Application Score 1-Tile Normalized VMmark 2.5.1 Overall Score
Fusion-io ION Accelerator
3.2TB ioScale x 4 in RAID10, 16Gb FC x 4
15.40 - 10 Tiles 12.66 - 10 Tiles 1.7 - 1 Tile 1.58 - 1 Tile
Infortrend ESDS S16F-R2651
(2) 400GB Smart Optimus x 8 RAID5 Pools, 16Gb FC x 4
15.39 - 10 Tiles 12.66 - 10 Tiles 1.7 - 1 Tile 1.58 - 1 Tile
Infortrend ESDS S16F-R2651
(2) 400GB Smart Optimus Eco x 8 RAID5 Pools, 16Gb FC x 4
15.29 - 10 Tiles 12.57 - 10 Tiles 1.68 - 1 Tile 1.58 - 1 Tile
Synology RackStation RS10613xs+
400GB Hitachi SSD800MM x 10 RAID5 Pool, 10GbE x 2
10.15 - 7 Tiles 8.47 - 7 Tiles 1.59 - 1 Tile 1.49 - 1 Tile
Dell EqualLogic PS6110XS
400GB SAS SSD x 7, 600GB 10K SAS HDD x 14 RAID6 Accelerated Pool, 10GbE x 1
5.05 - 4 Tiles 4.39 - 4 Tiles 1.62 - 1 Tile 1.50 - 1 Tile
Synology RackStation RS10613xs+
147GB Toshiba MK01GRRB x 10 RAID0 Pool, 10GbE x 2
1.00 - 1 Tile 1.00 - 1 Tile 1.00 - 1 Tile 1.00 - 1 Tile

All VMmark result folders are available for download upon request. The Synology RackStation RS10613xs+'s 1-Tile raw score of 1.10 Application and 1.11 VMmark2 with 10 15K SAS HDDs in RAID0 is used as our baseline of 1 to normalize results against.

VMware VMmark Virtualization Benchmark