MarkLogic MarkMail NoSQL Database Storage Benchmark
MarkLogic 7 is an Enterprise NoSQL (“Not Only SQL”) database that has the flexibility and scalability to handle today’s data challenges that SQL-based databases were not designed to handle. It also has enterprise-grade capabilities like search, ACID transactions, failover, replication, and security to run mission-critical applications. MarkLogic combines database functionality, search, and application services in a single system. It provides the functionality enterprises need to deliver value. MarkLogic leverages existing tools, knowledge, and experience while providing a reliable, scalable, and secure platform for mission-critical data.
Companies and organizations across industries including the public sector, media, and financial services have benefited from MarkLogic’s unique architecture. Any environment that faces a combination of data volume, velocity, variety, and complexity—a data challenge known as Big Data—can be enhanced with MarkLogic. Example solutions built on MarkLogic include intelligence analysis, real-time decision support, risk management, digital asset management, digital supply chain, and content delivery.
MarkLogic MarkMail Benchmark
MarkMail is a free service for searching mailing list archives. It is powered by MarkLogic Server. Each email and attachment is stored internally as an XML document. The public webpage provides search, faceted navigation etc across millions of emails from public archives. It runs on a small cluster hosted by MarkLogic Corporation.
The MarkMail benchmark creates a MarkMail database and ingests millions of emails and attachments into this database using the MarkLogic Content Pump (mlcp). The data set we are using in this benchmark is a large email corpus created by MarkLogic. The MarkMail load is scaled by increasing the number of MarkLogic nodes and / or available I/O bandwidth. As the MarkLogic resources are scaled the number of mlcp threads is increased to achieve higher ingest rates.
The ingestion is I/O intensive. I/O is broken down into three categories:
- Initially documents are ingested into in-memory stands and the only disk writes are Journal saves.
- In-memory stands quickly overflow and are continually written as on-disk stands. This is saving activity.
- As the number of on-disk stands increase, MarkLogic must merge them to reduce query overhead. Merging involves reading multiple on-disk stands, writing back a merged single version and deleting the originals.
To ensure the highest level of accuracy and to force each device into steady-state, we repeat the ingestion and query phases 24 times for flash-based devices. For PCIe Application Accelerators, each interval takes between 60-120 minutes to complete, thus putting the total test time into a range of 24-48 hours. For devices with lower I/O throughput, the total test time can span days. Our focus in this test is to look at overall latency from each storage solution across four areas of interest: journal writes (J-lat), save writes (S-lat), as well as merge read (MR-lat) and merge write latency (MW-lat).In the diagram above we see the I/O paths and latencies in MarkLogic:
- Journal writes record the deltas to the database. When an update request runs, all the changes it made to the state of the database are recorded in the journal. Those changes can be applied again from the journal, without running the request again. Updates can be additions, replacements or deletions of documents. The journal protects from outages, it is guaranteed to survive a system crash thereafter. The latency of Journal writes is captures in the J-lat metric
- After enough documents are loaded, the in-memory stand will fill up and be flushed to disk, written out as an on-disk stand. This flush to disk is called a Save. The latency of Save writes is captured in S-lat
- As the total number of on-disk stands grows, an efficiency issue threatens to emerge. To read a single term list, MarkLogic must read the term list data from each individual stand and unify the results. To keep the number of stands to a manageable level, MarkLogic runs merges in the background. A merge reads (Merge Read) some of the stands on disk and creates a new singular stand out of them Merge Write), coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments. The latency of Merge reads is captured in MR-lat and the latency of Merge writes in MW-lat.
During ingestion, MarkLogic also indexes all of the documents, creates term lists, etc. This activity requires CPU cycles which make the benchmark a good balance between high I/O and high CPU utilization.
MarkLogic Testing Environment
Storage solutions are tested with the MarkLogic NoSQL benchmark in the StorageReview Enterprise Test Lab utilizing multiple servers connected over a high-speed network. We utilize servers from Lenovo for different segments of the MarkLogic NoSQL Testing Environment, and for the fabric that connects the equipment, we use Mellanox InfiniBand Switching and NICs.
The storage solution is broken up into three sections: the storage host, the MarkLogic NoSQL Database Cluster, and the MarkLogic Database Client. For the storage host, we use a 2U Lenovo ThinkServer RD630 to present PCIe Application Accelerators, groups of eight or sixteen SATA/SAS SSDs, and a host for NAS/SAN equipment to present them on the InfiniBand fabric. For the MarkLogic Database Cluster, we use a Lenovo ThinkServer RD630 octal-node server cluster equipped with sixteen Intel Xeon E5-2650 CPUs to provide the compute resources needed to effectively stress the fastest storage devices. These Thinkserver RD630 servers have proven to have remarkably strong performance for a variety of demanding workloads like MarkMail, and we continue to trust them as a flexible, reliable platform to run our benchmarks. On the client side, we use 1U Lenovo ThinkServer RD530 servers that provide the working data that are loaded into system memory and pushed to the NoSQL Database cluster over our high-speed network. Linking all of these servers together is a Mellanox 56Gb/s InfiniBand fabric including both switch and NICs that give us the highest transfer speeds and lowest latency to not limit the performance of high-performance storage devices.
The Lenovo ThinkServer brand was a top choice when designing this new platform, leveraging Intel's powerful processor and chipset lineup to offer the best performance and still driving great value. The ThinkServer line also offers excellent hardware compatibility, which is an absolute must as we incorporate different forms of storage and networking technology into our testing platform. As with our other testing platforms, our goal is to show realistic performance customers can expect from mid-range server platforms, versus the top-spec servers generally leveraged in most competitive benchmarks.
For local storage in this MarkLogic MarkMail environment, we picked OCZ Talos 2 R SSDs which offer dual LSI SandForce SF-2500 controllers and a dual-port SAS 6Gb/s interface. These SSDs are utilized on both the cluster side of our testing layout, supporting the local storage demands of each ThinkServer RD630. With logging and other tasks written to the local storage, we wanted to rule out all possibility of the hosts becoming I/O bound during the benchmark. These SSDs also include in-flight data protection in the event of a power interruption, advanced error correction, as well as strong endurance with the R-series 22% over-provisioning combined with the low write-amplification of the SandForce controllers.
Mellanox InfiniBand interconnects were used to provide the highest performance and greatest network efficiency to ensure that the devices connected are not network-limited. Looking at just PCIe storage solutions, a single PCIe Application Accelerator can easily drive more 1-3GB/s onto the network. Ramp up to an all-flash storage appliance with peak transfer speeds in excess of 10-20GB/s and you can quickly see how network link capacity can be easily saturated, limiting the overall performance of the entire platform. InfiniBand’s high-bandwidth links allow for the greatest amount of data to be moved over the least number of links, allowing for the full system capabilities to be realized.
In addition to higher network throughput, InfiniBand also enables higher overall cluster efficiency. InfiniBand uses iSER (iSCSI-RDMA) and SRP (SCSI RDMA Protocol) to replace the inefficient iSCSI TCP stack with Remote Direct Memory Access (RDMA) functionality, allowing near-native access times for external storage. iSER and SRP enable greater efficiency throughout the clustered environment by allowing network traffic to bypass the systems’ CPUs and allowing data to be copied from the sending systems’ memory directly to the receiving systems’ memory. In comparison, traditional iSCSI operation routes network traffic through a complex multiple-copy and transfer process, eating up valuable CPU cycles and memory space, and drastically increasing data transfer latencies. In our MarkLogic NoSQL environment, we utilize the SCSI RDMA Protocol to connect each node to a SCSI target subsystem for Linux (SCST) running on our storage host.
MarkLogic Benchmark Equipment
- Lenovo ThinkServer RD630 Octal-Node Database Cluster
- Sixteen Intel E5-2650 CPUs (Two per node, 2.0GHz, 8-cores, 20MB Cache)
- 1024GB RAM (128GB per node, 64GB per CPU)
- 200GB OCZ Talos 2 SAS SSD x 8 (via LSI 9207-8i)
- 8 x Mellanox ConnectX-3 InfiniBand Adapter
- CentOS 6.3
- Lenovo ThinkServer RD530 Database Client
- Dual Intel E5-2640 CPUs (2.5GHz, 6-cores, 15MB Cache)
- 64GB RAM (8GB x 8 Micron DDR3, 32GB per CPU)
- 900GB x 6 Hitachi 10k SAS RAID6 (via LSI 9260-8i)
- 1 x Mellanox ConnectX-3 InfiniBand Adapter
- CentOS 6.3
- Lenovo ThinkServer RD630 Storage Host
- Dual Intel E5-2680 CPUs (2.7GHz, 8-cores, 20MB Cache)
- 32GB RAM (8GB x 4 DDR3, 16GB per CPU)
- 100GB Micron RealSSD P400e SSD (via LSI 9207-8i)
- 1 x Mellanox ConnectX-3 InfiniBand Adapter
- CentOS 6.3
- Mellanox SX6036 InfiniBand Switch
- 36 FDR (56Gb/s) ports
- 4Tb/s aggregate switching capacity
The main goal with this platform is highlighting how enterprise storage performs in an actual enterprise environment and workload, instead of relying on synthetic or pseudo-synthetic workloads. Synthetic workload generators are great at showing how well storage devices perform with a continuous synthetic I/O pattern, but they don't take into consideration any of the other outside variables that show how devices actually work in production environments. Synthetic workload generators have the benefit of showing a clean I/O pattern time and time again, but will never replicate a true production environment. Introducing application performance on top of storage products begins to show how well the storage interacts with its drivers, the local operating system, the application being tested, the network stack, the networking switching, and external servers. These are variables that a synthetic workload generator simply can't take into account, and are also an order of magnitude more resource and infrastructure intensive in terms of the equipment required to execute this particular benchmark.
MarkLogic Performance Results
We test a wide range of storage solutions with the MarkLogic NoSQL benchmark that meet the minimum requirements of the testing environment. To qualify for testing, the storage device must have a usable capacity exceeding 1.8TB and be geared towards operating under stressful enterprise conditions. This includes multiple PCIe Application Accelerators, groups of eight or sixteen SAS or SATA enterprise SSDs, as well as large SAN arrays that are network attached. Listed below are the overall latency figures captured from all devices tested to date in this test. In product reviews we dive into greater detail and put competing products head to head while our main list shows the stratification of different storage solutions.