August 30th, 2019 by Brian Beeler
Microsoft Azure Stack HCI for the Edge (2-Node HCI)In a previous article we looked at the implementation of Microsoft Azure Stack HCI on DataON hardware. DataON is an Intel premier partner and Microsoft certified hardware vendor that has been integrating Microsoft solutions for the past 20 years to provide enterprises with turnkey Microsoft solutions. In this article we look at a specific implementation of the Microsoft Azure Stack HCI; a two-node cluster (2NC).
Benefits of 2-Node HCI
DataON was one of the first vendors to recognize the value of, and embrace the integration of, 2NC’s. In September of 2017, DataON announced the first two commercially available Kepler-47 hyper-converged infrastructure (HCI) for Windows Server 2016 Storage Spaces Direct systems (now Azure Stack HCI).
One of DataON’s systems, the Kepler-47S, was designed for deployments that were cost conscious and storage needs could be met with a hybrid mixture of SSD and HHD storage, while the Kepler-47P uses all SSD drives and was designed for environments that need more performance. As with all DataON products they are built using Intel servers and storage, and the first line of support for software and hardware is handled directly by DataON.
The Kepler-47 has an interesting heritage as it was designed as a proof-of-concept by Microsoft during the development of Storage Spaces Direct. The developers wanted to see if they could create a high-performing, resource-efficient 2NC that would be available at a low price point, and they did - DataON said that their Kepler-47 systems (for both nodes) can be purchased for under $10K USD and can be deployed in under 15 minutes. The price point on this cluster is even more amazing when you consider that DataON builds them using high quality Intel servers and storage, not generic or low-cost hardware. It also uses mini-tower server cases allowing both nodes to be housed in the same space as it would take to house a single 2U server.
In order to drive the price down and keep the reliability up to enterprise standards a Kepler-47 cluster has some unique hardware features such as using Intel’s Thunderbolt 3 USB type-C for the interconnect between the nodes, rather than using Ethernet. This eliminates the need for costly high-speed network adapters and switches, while also simplifying deployment and management. Thunderbolt 3 provides up to 40Gbps throughput, plenty for replicating storage and live migration of virtual machines from one node to the other.
For increased visibility, monitoring, and management DataON includes their own MUST (Management Utility Software Tool) to provide SAN-like storage monitoring features.
DataON MUST Main View (view larger image)
DataON MUST Alert Setup (view larger image)
DataON MUST Alerts (view larger image)
Two-Node HCI Cluster Management
Having a 2NC cluster in a small footprint at a low-cost is meaningless though if it doesn’t provide adequate resiliency and robustness. DataON’s Kepler-47 systems do this by using Intel hardware for the servers and leveraging Microsoft software features on their 2NC Microsoft Azure Stack HCI solutions.
2NC supports having both a drive failure and server failure at the same time. It does this by using RAID5 + 1 to do parity resiliency and mirror that across to the other server. Microsoft calls this ability “nested resiliency” and added this capability to Storage Spaces Direct in Windows Server 2019.
Nested resiliency does not require any special RAID hardware and can be implemented in two different manners; nested two-way mirror, which offers the best performance or nested mirror-accelerate parity which allows for greater data efficiency. Nested two-way mirror makes a RAID1 copy of the data on the host and on the other node. Nested mirror-accelerate parity makes a copy of the data on each server but uses erasure coding, rather than RAID1, for data resiliency except for recent writes which use two-way mirroring. Nested two-way mirroring has data efficiency of 25% as four copies of the data are written to disk vs. Nested mirror-accelerate parity which has 33% to 40% data efficiency.
Azure Site Recovery can be used for business continuity and site recovery and is integrated into Windows Admin Center. Azure Site Recovery is an Azure service that replicates workloads running on VMs to Azure storage. If a total failure happens, or a site goes offline you can failover business critical VM’s and run them on Azure cloud. Azure Site Recovery also supports a sandbox environment for failover testing and Recovery Plan to automate the failover.
One of the benefits of DataON’s 2NC solution is that additional nodes can be added to it as needed if additional capacity is needed. To simplify the addition of nodes DataON released a switchless 3NC in August of this year.
We haven’t spent a lot of time investigating two node HCI clusters in the past, as the majority of HCI products require three or four nodes in order to provide the resiliency and robustness required for enterprises. Even in these cases, there’s often times a requirement for a cloud-based witness to keep the nodes in sync, or other deployment sacrifices that add complexity that HCI is trying to eliminate. There are of course other software-defined ways to achieve two-node clusters, but this also means leaving the comfort of a hardened and well-understood hypervisor. DataON’s two node solutions provide not only the ability to survive a node failure, but it allows drive failure in the surviving node, without risk to data.
Although we spent a majority of the time in this article discussing the low-cost Kepler-47 system, DataON showed us that a Kepler-47 system with four NVMe per node was able to deliver 600K IOPS and cost less than $40K. Alternatively, Azure Stack HCI can be done with traditional rack-mount servers as well, for those environments that have the room or a reasonable expectation for the need to expand either the storage or compute needs in the future. It can go the other way too, the 4-node cluster we are reviewing could be easily converted to a 2-node cluster, switch or switchless should there be a need to do so.
Two-node HCI clusters are clearly not the best solution for all use cases. However, for those that require a small footprint, low-complexity, low-cost solution that doesn’t give up operational flexibility and reliability, DataON’s Azure Stack HCI two-node offerings are worthy of consideration.