November 17th, 2014 by Adam Armstrong
Seagate ClusterStor Hadoop Workflow Accelerator And ClusterStor Engineered Solution For Lustre Announced
Today Seagate Technologies announced the availability of a new solution that provides the tools, services, and support for High Performance Computing (HPC) customers, the ClusterStor Hadoop Workflow Accelerator. The Hadoop Workflow Accelerator will leverage and enhance the performance of ClusterStor, a scale-out storage system designed for Big Data Analytics. Seagate also announced its new ClusterStor Engineered Solution for Lustre. This next generation solution offers up to 700% metadata performance improvements.
The Seagate ClusterStor Hadoop Workflow Accelerator will support Hadoop distributions based on Open Source Apache Hadoop. The Hadoop Workflow Accelerator significantly reduces data transfer times in computationally intensive High Performance Data Analytics environments and will serve both the technical computing and commercial side of Big Data Computing. Hadoop data processing will be immediately enabled at the start of each job reducing time to results and eliminating bulk copying large amounts of data from a separate data repository. The solution also includes the Hadoop on Lustre Connector. The Hadoop on Lustre Connector allows both Hadoop and HPC Lustre clusters to use exactly the same data without having to move data between file systems or storage devices.
Features of Seagate ClusterStor Hadoop Workflow Accelerator include:
- Tests show marked improvements for Apache Hadoop 1.0 distributions over standard storage configurations, Hadoop Workflow Accelerator outperformed Hadoop on HDFS by up to 38%.
- Includes the Seagate developed Hadoop on Lustre Connector and an array of ClusterStor performance optimization best practices, system tuning methods, installation and configuration management tools, and professional services.
- Expanding compatibility, the Seagate engineered family of Hadoop on Lustre Connectors extend support to several Hadoop eco-system packages such as Mahout, Hive and Pig to take advantage of the parallel read/write performance of the Lustre file system operating with high-speed networks such as 40 Gig-E and Infiniband.
- Compatible with both Hadoop 1.0 and Hadoop 2.0 or Yarn distributions
- Requires no code changes or re-compiling of either Hadoop or Lustre systems.
- Compatible with existing HDFS-based Hadoop installations. There’s no need to migrate data to Seagate ClusterStor prior to using the Accelerator as users can read from or write to ClusterStor and HDFS interchangeably, while running Hadoop jobs.
The new ClusterStor Engineered Solution for Lustre is the only fully engineered solution integrating all aspects of hardware, software and full Lustre support for the latest version of Lustre parallel file system.
The benefits of this new solution are:
- Improved metadata performance and scalability through implementation of Distributed Namespace (DNE) features in the Lustre 2.5 parallel file system.
- ClusterStor customers now have the option to add up to 16 Lustre DNE metadata servers per single file system, resulting in client metadata performance improvement up to 700%
- Expanded scalability up to 16 billion files per file system
- Scalability advancements supporting larger clusters
- Improved security capabilities including government security compliance
- Ease of deployment for upcoming Lustre releases
Seagate also announced the availability of ClusterStor Secure Data Appliance (SDA). ClusterStor SDA includes Kerberos network authentication protocol as a foundation providing the file system framework to enable encryption key management solutions based upon Kerberos to encrypt network data traffic between the compute client and storage system. Seagate ClusterStor SDA is the industry’s first compliant and secure scale-out parallel file system solution, providing the means to implement all the necessary Mandatory Access Control (MAC), explicit audit logging / tracking, encryption and support capabilities to enforce “least privilege” access control.
Attributes of SDA with Kerberos enablement include:
- ICD 503 Multi Level Security for Geospatial Imagery Data Capture and analytics
- Performance capability to capture and analyze large image file sizes providing faster time to results with increased decision support accuracy.
- Framework to enable customized compute client to disk encryption and key management solutions
Seagate ClusterStor Hadoop Workflow Accelerator is expected to be available in January 2015 as a set of distinct product bundles.
Seagate new ClusterStor Engineered Solution for Lustre is available now.
Seagate ClusterStor SDA with Kerberos is scheduled to be available next month.