AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT By Joshua Kwedar Sr. Systems Engineer By Steve Horan Cloud Architect ATS Innovation Center, Malvern, PA Dates: Oct December 2017
INTRODUCTION As an IBM premier business partner, we are always looking for creative ways to meet our customer s needs. The demand for low cost, quickly deployed solutions is ever increasing in today s cloud first climate. Many of our customers have invested a significant amount of money into HPC solutions with node counts that can reach into the thousands and have found themselves feeling locked in to a specific vendor with no quick way to elastically grow or shrink based on the demand of their workloads. In terms of hybrid growth, we ve researched IBM Spectrum Scale s Transparent Cloud Tiering feature which allows for an S3 object store to serve as a storage tier within the same Scale namespace. While customers see the value in growing their storage footprint on demand, a subset of them view this feature as a stop gap to their ultimate goal of moving their storage as well as their compute into the cloud. We have built Scale clusters across multiple cloud providers (Google Cloud Platform, Amazon Web Services), but have focused in on AWS after IBM s Spectrum Scale on AWS Quick Start trial evaluation was released in September 2017. THE GOAL OF THE POC WAS THE FOLLOWING: Review the AWS / Spectrum Scale architecture layout. Create a Spectrum Scale cluster using IBM s Quick Start guide. Review cloud formation templates and scripts to better understand steps to automate cluster build in AWS. Execute failure scenarios and observe behavior. Determine any restrictions in the trial release. Improve cloud formation templates, scripts, restrictions based on observations. Create customized AMIs with updated Spectrum Scale versions. 01
A Spectrum Scale cluster deployed in AWS consists of the following components: 1. A VPC (Virtual Private Cloud) is defined within AWS. All instances exist in the VPC. Deployment templates allow a cluster to be deployed within an existing VPC, or for a new VPC to be created. The VPC defaults to span two availability zones. 2. A single Bastion host is created within a single availability zone in a public subnet. Think of a Bastion host as a jump server or an admin server. It serves no function within the Spectrum Scale cluster other than a means to SSH into cluster nodes from outside of the AWS VPC. a. Note that one of the Bastion hosts is greyed out in the image above. The Bastion stack has autoscaling configured by default to ensure there is always at least one Bastion host up and running. Without an accessible Bastion host, you would be unable to SSH to any cluster nodes. 3. NAT gateways are defined in the public subnet (Bastion stack) to allow outbound internet access for nodes in the private subnet (Server and Compute instances). 4. AWS Identity and Access Management (IAM) role and Security groups are automatically created to allow ports for SSH and the Spectrum Scale daemon. 02
5. The IBM Quickstart allows users to specify EC2 types and quantity for the Server (NSD Server) and Compute nodes to be configured as a part of the cluster. a. NSD Servers i. Minimum = 2 ii. Maxiumum = 64 b. Compute nodes i. Minimum = 1 ii. Maximum = 64 6. One 100GB disk is allocated per EC2 instance to be used as the root volume. Users can specify the size, quantity, and type of EBS volumes to be used as NSDs (Network Shared Disks). The currently supported disk sizes the template provides are 10GB-16384GB. EBS Volume Types for NSD use can only be allocated as either gp2 (general purpose), io1 (high performance SSD) or standard (HDD). If a user specifies 2 NSD servers and 1 compute node, a 5GB EBS volume is allocated to the compute node to account for quorum. 7. A filesystem name, block size (all supported Spectrum Scale block sizes can be specified) and number of replicas (max 2) must be provided within the template. The filesystem default number of replicas and NSD failure group definitions are automatically configured based on user inputs. 03
Notes Regarding Architecture The template creates a synchronous, highly available Spectrum Scale cluster across two availability zones, but does not account for third site quorum. A single site will always have a majority quorum definition when an odd number of quorum nodes are specified using this architecture. Only a single EBS volume type can be specified for use. Spectrum Scale allows users to split metadata from data. Metadata is often placed on faster volumes (io1) for high response during metadata lookups. The cloud template places metadata and data on the same volumes and sets the maximum/default replicas to 1 or 2 based on user input. The maximum number of replicas for Spectrum Scale filesystems is 3. The template only allows for 2, as only two availability zones are able to be specified during cluster creation. Autoscaling groups are created for the Bastion, Server and Compute stacks but need to be configured to take any action beyond satisfying the minimum number of nodes within each stack. For example, a CPU % used threshold needs to be user defined. There is no input for GPFS cluster name. 04
The Following Inputs Were Provided to The Template in Order to Create a Test Cluster: A filesystem block size of 16M is selected. Allowable values are: 256k, 512k, 1M, 2M, 4M, 8M, 16M. The minimum number of NSD Servers and two Compute nodes are selected for testing purposes. 05
VPC, Private and Public CIDR block entries in the Network section of the form are prepopulated but can be changed if desired. A user must select at least 2 availability zones and defined an External CIDR block. For the purposes of testing, I have specified 0.0.0.0/0 to allow all public traffic. In an actual implementation, you would specify a corporate network CIDR Block range. A key pair name, S3 bucket and operator email must be supplied. Other values can be modified but are prepopulated. While the Bastion instance type can be changed, it is simply a jump/admin server and does not need to be configured with any substantial amount of resources. 06
The following default options were taken: After review of the user supplied inputs, the overall stack can be created. The progress of each individual stack can be monitored within the AWS console. See progression below with timestamps: 07
Once the status for each stack listed above is CREATE_COMPLETE the EC2 instances containing a fully functioning Spectrum Scale filesystem are accessible. Creation time of each stack depends on the user supplied inputs. In our example, the entire process took ~10 minutes. Instance/cluster creation time increases as nodes/ number of disks increase. EC2 instance information by accessing the AWS console -> EC2 Instances. Our cluster consists of the following hosts: A column titled Public IP shows that only the LinuxBastion host has been assigned a public address. The Bastion host can be accessed using the ec2-user account and passing your AWS key to the Bastion IP. From there, you can SSH to any Server/Compute node. From a cluster node, we can view the output of mmlscluster and mmlsnsd to review the Cluster Name, Repository Type, Node names, Node designation, NSD names, NSD to Filesystem allocations and NSD servers per NSD. 08
Based on this output, we can determine that one NSD server and one Compute node are placed in the 10.0.1.X (Availability Zone us-east-2a) and the other NSD server and compute node are placed in the 10.0.3.X private network (Availability Zone us-east-2b). Note that two NSD servers and a single compute node are designated as quorum nodes. A single 5GB desconly disk is allocated to compute node ip-10-0-1-132.us-east-2.compute.internal to serve as a quorum disk, however, if availability zone us-east-2a were to go offline, the filesystem would be inaccessible due to loss of quorum. The script defines NSDs with a naming convention that includes availability zone in order to make it much easier for user to determine which zone an NSD resides. In this example, nsd_2a_1_0 is served out by NSD server ip-10-0-1-208.us-east-2.compute.internal from availability zone us-east-2a. 09
Total filesystem size is 20G (2 x 10G disk one per NSD Server) with a replication factor of 2 for data and metadata. The filesystem is created with the maximum number of replicas (3) for data and metadata allowed by Spectrum Scale. Autoscaling groups can be viewed by navigating to Auto Scaling -> Auto Scaling Groups with the EC2 instance view. In our example, three autoscaling groups are created for the Bastion, Server and Compute stacks. Each stack has a minimum, desired and maximum definition derived from the user inputs supplied in the Cloud Formation template. Other auto scaling rules tied to CPU/Memory utilization, for example, can be manually defined. A basic test of terminating a compute node demonstrates the functionality of the autoscaling groups that were created by the template. A compute node is termined: 10
A new compute instance spins up to satisfy the rules of the Compute stack autoscaling group: The new instance is ready for use: We can now SSH to the server, verify that its just been built and that the AMI containing Spectrum Scale packages has been used to build the instance. However, the GPFS daemon is not running. 11
Attempts to run mmlscluster on the newly created compute node show that it does not belong to a Spectrum Scale cluster. The same command run on an existing NSD Server node verifies that the new node has not been added to the cluster. The terminated compute node (ip-10-0-3-222.us-east-2. compute.internal) is still a member of the cluster configuration. At this time, there is no functionality built into the template/stacks to automatically add and remove newly generated instances from the cluster configuration. Nodes would need to be manually added and removed using mmaddnode and mmdelnode commands. In the event of quorum node loss, new quorum nodes would need to be designated. In the event of NSD server loss, steps would need to be taken to reestablish optimal striping of data across newly created NSDs (mmrestripefs). Users may want to automate certain functions listed above (automatic add of new nodes to the cluster) while other administrative tasks (mmrestripefs) may be better off being run manually during a maintenance window, for example. With Regard to The Current Version of The AWS Spectrum Scale Trial Cloud Formation Template, The Following Restrictions Exist: Protocol support, including the use of Cluster Export Services (CES) nodes and protocol access such as Network File System (NFS), Object, and Server Message Block (SMB). Active File Management (AFM). Transparent Cloud Tiering (TCT). Compression. Encryption. Data Management API (DMAPI) support, including Hierarchical Storage Management (HSM) to tape. Hadoop Distributed File System (HDFS) connector support. Multi-cluster support (exporting an IBM Spectrum Scale file system from one Spectrum Scale cluster to another IBM Spectrum Scale cluster). 12
GUI. User name space management and quota management. Snapshots and clones. Replication is restricted to only 1X (IBM Spectrum Scale makes a single copy of all data and metadata) and 2X (IBM Spectrum Scale makes two copies of all data and metadata). Additional Limitations Using EBS volume encryption for IBM Spectrum Scale file systems is not supported. The archiving and restoring of IBM Spectrum Scale data through the use of AWS services is not supported. Many of the limitations above are a result of the edition packaged within the AMI (Standard). ATS has created custom cloud formation templates to reflect many of the design considerations called out in this document (third site quorum, maximum data/metadata replicas, splitting of metadata/data volumes, among others) in addition to creating our own AMI images using the Advanced/Data Management edition of Spectrum Scale. Features such as encryption, compression, and AFM are included in these editions. As a next step, we are looking to implement Protocol nodes within AWS, using Active Directory replication as a means for authentication. Implementing an S3 archive tier using Transparent Cloud Tiering would serve as an attractive option for those looking to leverage cheaper storage for archive purposes, all within AWS. We look forward to partnering with our customers to come up with the next generation of Spectrum Scale cluster implementations in the cloud. 13
THEATSGROUP.COM/COMPANY/CONTACT About the ATS Group Since our founding in 2001, the ATS Group has consulted on thousands of system implementations, upgrades, backups and recoveries. We also support customers by providing managed services, performance analysis and capacity planning. With over 60 industry-certified professionals, we support SMBs, Fortune 500 companies, and government agencies. As experts in IBM, VMware, Oracle and other top vendors, we are experienced in virtualization, storage area networks (SANs), high availability, performance tuning, SDS, enterprise backup and other evolving technologies that operate mission-critical systems on premise, in the cloud, or in a hybrid environment.