Deploying and Managing Dell Big Data Clusters with StackIQ Cluster Manager

Similar documents
How to Execute a Successful Proof-of-Concept in the Dell Solution Centers

Dell Reference Configuration for Hortonworks Data Platform 2.4

Dell PowerVault NX Windows NAS Series Configuration Guide

Storage Consolidation with the Dell PowerVault MD3000i iscsi Storage

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Microsoft SharePoint Server 2010 on Dell Systems

A Dell technical white paper By Fabian Salamanca, Javier Jiménez, and Leopoldo Orona

OpenStack and Hadoop. Achieving near bare-metal performance for big data workloads running in a private cloud ABSTRACT

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

A Dell Technical White Paper Dell Virtualization Solutions Engineering

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

vstart 50 VMware vsphere Solution Specification

Reference Architecture for Dell VIS Self-Service Creator and VMware vsphere 4

Four-Socket Server Consolidation Using SQL Server 2008

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Lifecycle Controller 2 Release 1.0 Version Readme

Dell Storage NX Windows NAS Series Configuration Guide

Active System Manager Release 8.2 Compatibility Matrix

Dell EMC PowerEdge and BlueData

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Consolidating OLTP Workloads on Dell PowerEdge R th generation Servers

Dell PowerVault NX1950 configuration guide for VMware ESX Server software

Dell PowerEdge R910 SQL OLTP Virtualization Study Measuring Performance and Power Improvements of New Intel Xeon E7 Processors and Low-Voltage Memory

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY

Using Dell Repository Manager to Update Your Local Repository

DELL Reference Configuration Microsoft SQL Server 2008 Fast Track Data Warehouse

Reduce Costs & Increase Oracle Database OLTP Workload Service Levels:

vstart 50 VMware vsphere Solution Overview

Introduction to Stacki. Greg Bruno, PhD VP Engineering, StackIQ

Competitive Power Savings with VMware Consolidation on the Dell PowerEdge 2950

DELL STORAGE NX WINDOWS NAS SERIES CONFIGURATION GUIDE

Save hands-on IT administrator time using the Dell EMC OpenManage Essentials profile mobility feature

DELL EMC NX WINDOWS NAS SERIES CONFIGURATION GUIDE

Dell PowerVault MD Family. Modular Storage. The Dell PowerVault MD Storage Family

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Deployment of VMware ESX 2.5 Server Software on Dell PowerEdge Blade Servers

VMware VAAI Integration. VMware vsphere 5.0 VAAI primitive integration and performance validation with Dell Compellent Storage Center 6.

TALK THUNDER SOFTWARE FOR BARE METAL HIGH-PERFORMANCE SOFTWARE FOR THE MODERN DATA CENTER WITH A10 DATASHEET YOUR CHOICE OF HARDWARE

Microsoft Lync Server 2010 on Dell Systems. Solutions for 500 to 25,000 Users

Deployment of VMware ESX 2.5.x Server Software on Dell PowerEdge Blade Servers

INFOBrief. Dell-IBRIX Cluster File System Solution. Key Points

Performance Comparisons of Dell PowerEdge Servers with SQL Server 2000 Service Pack 4 Enterprise Product Group (EPG)

Enhancing Oracle VM Business Continuity Using Dell Compellent Live Volume

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

Dell OpenManage Cluster Configurator on Dell PowerEdge VRTX

A Reference Architecture for SAP Solutions on Dell and SUSE

Dell DR4000 Replication Overview

Database Solutions Engineering. Best Practices for running Microsoft SQL Server and Microsoft Hyper-V on Dell PowerEdge Servers and Storage

Deployment of VMware Infrastructure 3 on Dell PowerEdge Blade Servers

A Performance Characterization of Microsoft SQL Server 2005 Virtual Machines on Dell PowerEdge Servers Running VMware ESX Server 3.

Shared LOM support on Modular

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan

Dell Compellent Storage Center

Teradici APEX 2800 for VMware Horizon View

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

iscsi Boot from SAN with Dell PS Series

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

Dell EqualLogic Red Hat Enterprise Linux 6.2 Boot from SAN

Dell Server Migration Utility (SMU)

MidoNet Scalability Report

Proactive maintenance and adaptive power management using Dell OpenManage Systems Management for VMware DRS Clusters

Microsoft SQL Server 2012 Fast Track Reference Architecture Using PowerEdge R720 and Compellent SC8000

Active System Manager Version 8.0 Compatibility Matrix Guide

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Shared Infrastructure: Scale-Out Advantages and Effects on TCO

2 to 4 Intel Xeon Processor E v3 Family CPUs. Up to 12 SFF Disk Drives for Appliance Model. Up to 6 TB of Main Memory (with GB LRDIMMs)

Implementing SharePoint Server 2010 on Dell vstart Solution

Dell Solution for JD Edwards EnterpriseOne with Windows and SQL 2000 for 50 Users Utilizing Dell PowerEdge Servers And Dell Services

Setting Up the DR Series System as an NFS Target on Amanda Enterprise 3.3.5

Lenovo SAN Manager - Provisioning and Mapping Volumes

Reference Architecture Microsoft Exchange 2013 on Dell PowerEdge R730xd 2500 Mailboxes

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

Accelerating storage performance in the PowerEdge FX2 converged architecture modular chassis

Leveraging Microsoft System Center Configuration Manager 2007 for Dell Factory Customization

A Principled Technologies deployment guide commissioned by QLogic Corporation

12/04/ Dell Inc. All Rights Reserved. 1

InfoBrief. Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux

Cloudera Enterprise 5 Reference Architecture

NFV Infrastructure for Media Data Center Applications

Installing Dell OpenManage 4.5 Software in a VMware ESX Server Software 2.5.x Environment

DriveScale-DellEMC Reference Architecture

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Improving Blade Economics with Virtualization

DELL POWERVAULT NX3500. A Dell Technical Guide Version 1.0

Microsoft SharePoint Server 2010

Lifecycle Controller with Dell Repository Manager

VMware Infrastructure Update 1 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

2014 VMware Inc. All rights reserved.

Setting Up the Dell DR Series System as an NFS Target on Amanda Enterprise 3.3.5

Red Hat Cloud Platforms with Dell EMC. Quentin Geldenhuys Emerging Technology Lead

Dell TM PowerVault TM Configuration Guide for VMware ESX/ESXi 3.5

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

Dell EMC SAP HANA Appliance Backup and Restore Performance with Dell EMC Data Domain

Dell Compellent Storage Center. Microsoft Server 2008 R2 Hyper-V Best Practices for Microsoft SCVMM 2012

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Configuring Direct-Connect between a DR Series System and Backup Media Server

INFOBrief. Dell PowerEdge Key Points

Teradata Aster Database Platform/OS Support Matrix, version AD

Transcription:

Deploying and Managing Dell Big Data Clusters with StackIQ Cluster Manager A Dell StackIQ Technical White Paper Dave Jaffe, PhD Solution Architect Dell Solution Centers Greg Bruno, PhD VP Engineering StackIQ, Inc. Tim McIntire Co-Founder StackIQ, Inc.

Executive Summary The Dell Intel Cloud Acceleration Program, a team within the Dell Solution Centers, provides customer engagements on the topics of Cloud and Big Data including proof-ofconcept (POC) testing in the Dell Solution Center lab. In a recent POC for a large provider of financial data, Dell partnered with StackIQ to configure a 60-datanode cluster to compare two different Big Data technologies, Cassandra and HBase. Using StackIQ s Cluster Manager software the customer was able to rapidly provision and re-provision the servers to run the two different applications with various configurations. The use of Cluster Manager enabled the customer to complete more and higher quality tests than originally expected. THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. 2013 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell. Dell, the DELL logo, and the DELL badge are trademarks of Dell Inc. Intel and Xeon are registered trademarks of Intel Corp. Red Hat is a registered trademark of Red Hat Inc. Linux is a registered trademark of Linus Torvalds. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own. StackIQ Cluster Manager includes software developed by the Rocks Cluster Group at the San Diego Supercomputer Center at the University of California, San Diego and its contributors. June 2013 2 Deploying Dell Big Data Clusters with StackIQ

Table of contents 1 Introduction... 4 2 Dell s Big Data Cluster... 5 3 StackIQ Cluster Manager... 8 4 Installing and Managing the Mixed-Use Cluster... 9 Step Zero... 9 Meeting the Customer Specifications... 9 Automated Scaling... 11 Dynamic Repurposing... 11 Advanced Diagnostics and Repair...12 Success...12 To Learn More...12 Tables Table 1. PowerEdge R720 Infrastructure Node Configuration... 7 Table 2. PowerEdge R720XD Datanode Configuration... 7 Table 3. Cassandra/HBase Tests... 11 Figures Figure 1. Dell 60-Datanode Big Data Cluster... 5 Figure 2. Cluster networking diagram... 6 Figure 3. StackIQ Cluster Manager Monitor Tab... 10 Figure 4. StackIQ Cluster Manager Attributes Tab... 10 3 Deploying Dell Big Data Clusters with StackIQ

1 Introduction The Dell Intel Cloud Acceleration Program (DICAP), a team within the Dell Solution Centers, has the mission of providing customer engagements on the topics of Cloud and Big Data. These engagements can be short briefings, longer architectural design sessions, or one to two-week proof of concept tests in which the customer may bring in their own data or workload to test the Dell solution. For Big Data proof of concepts, the DICAP team maintains a laboratory on its Round Rock, Texas, campus with several hundred Dell PowerEdge R-series and C-series servers that function as management and data nodes for Big Data clusters. A typical Big Data proof of concept involves using Dell tools to stand up a Hadoop cluster with 20, 40 or even more data nodes. Since the Dell customer set has many varied needs in the Big Data space it is often necessary to stand up a cluster with a specialized application, or even to build a cluster that may be repurposed quickly from one application to another. This was the case recently when a large financial data provider was interested in the performance scaling of two competing NoSQL databases: Apache Cassandra and Apache Hadoop s HBase. The provider wanted to run tests on clusters of 10, 20 and 40 data nodes with one application and then quickly switch and run the second application on the same hardware. For cases like this Dell has a strong partnership with StackIQ, whose Cluster Manager product provided the bare metal installation of the cluster nodes, followed by the installation and management of one application, and then the rapid changeover to the other application. Together the team of Dell, StackIQ, DataStax (provider of the Apache Cassandra distribution) and Cloudera (Apache Hadoop provider) was able to provide a large, flexible, test cluster to meet the customer s needs. The Dell Big Data cluster hardware used in the test is described in the next section, followed by details of StackIQ s Cluster Manager. Finally, the use of Cluster Manager to successfully implement the proof of concept is described in Section 4. 4 Deploying Dell Big Data Clusters with StackIQ

2 Dell s Big Data Cluster The cluster used for the Cassandra vs. HBase testing is shown below. Three 52-rack unit tall racks of Dell PowerEdge servers and Dell Force 10 network switches were utilized in the tests. The cluster was managed by three PowerEdge R720 servers with one running the StackIQ Cluster Manager software and two serving as name nodes for the HBase tests (see Table 1 for configuration details). Figure 1. Dell 60-Datanode Big Data Cluster 5 Deploying Dell Big Data Clusters with StackIQ

Sixty PowerEdge R720XD servers with 24 500 GB disks each (see Table 2 for configuration) were employed as data nodes for HBase and Cassandra as well as load drivers for the customer s test application. All servers were connected via 1 Gb Ethernet links as well as 10 Gb Ethernet links using a set of Dell Force 10 S60 1 GbE switches and S4810 10 GbE switches. As shown below, the 1 GbE network provides server management as well as idrac (Dell Remote Access Controller) connectivity while the 10 GbE network serves as the data network. The two Force10 S60 switches within each rack are connected via stacking cables and uplinked to the 10 GbE infrastructure running on six stacked Force10 S4810 switches, two per rack. The S4810 switches are connected using two 40GbE ports per switch. Figure 2. Cluster networking diagram 6 Deploying Dell Big Data Clusters with StackIQ

Table 1. PowerEdge R720 Infrastructure Node Configuration Height 2 Rack Units (3.5 ) Processor 2x Intel Xeon E5-2650 2 GHz 8-core procs Memory 128 GB 1600 MHz Disk 6x 600 GB 15K SAS Drives Network 4x 1GbE LOMs, 2x PCIe 10GbE NICs RAID Controller PowerEdge RAID Controller H710 (PERC) Management Card Integrated Dell Remote Access Controller Enterprise Edition (idrac) Table 2. PowerEdge R720XD Datanode Configuration Height 2 Rack Units (3.5 ) Processor 2x Intel Xeon E5-2667 2.9 GHz 6-core procs Memory 64 GB 1600 MHz Disk 24x 500GB 7200 RPM Nearline SAS drives Network 4x 1GbE LOMs, 2x 10GbE PCIe NICs RAID Controller PowerEdge RAID Controller H710 (PERC) Management Card Integrated Dell Remote Access Controller Enterprise Edition (idrac) 7 Deploying Dell Big Data Clusters with StackIQ

3 StackIQ Cluster Manager StackIQ takes a software defined infrastructure approach to provisioning and managing clusters. The product used herein, StackIQ Cluster Manager, was built around the Rocks project, which began in 2000 at the San Diego Supercomputer Center (a research division of the University of California) with the single purposed goal of "making clusters easy". While Rocks initially focused on High Performance Computing, the underlying structure of the software allows it to easily define, deploy, and manage all types of Linux-based servers. The unique experience this project has had with clustered systems makes the Rocks Cluster Distribution even more interesting today for groups looking to deploy big data and cloud infrastructure. This paper discusses some of the enhancements StackIQ has made to the software for enterprise use, including a step-by-step description of a real-world scenario for installing, scaling, and managing Apache HBase and Apache Cassandra with StackIQ Cluster Manager. StackIQ Cluster Manager has been used in similar deployments between 2 nodes and 4,000+ nodes. StackIQ Cluster Manager manages all of the software that sits between bare metal and a cluster application, such as Hadoop. A dynamic database contains all of the configuration parameters for an entire cluster. A cluster-aware management framework leverages this database to define server configuration, deploy software, manage cluster services, and monitor the environment. Specific features of the Cluster Manager include: Provisioning and managing the operating system from bare metal Configuring host-based network settings throughout the cluster Leveraging hardware resource information (such as CPU, memory, and disk layout) to set cluster application parameters Setting up disk controllers and using this information to programmatically partition disks for specific cluster services Installing and configuring a cluster monitoring system Providing a unified interface (CLI and GUI) for controlling and monitoring all of the above In addition to managing the underlying cluster infrastructure, StackIQ Cluster Manager s capabilities also handle the day-to-day operation of cluster services. Other competing products fail to integrate the management of the cluster stack with the application stack. By contrast, StackIQ leverages its knowledge and control of the underlying cluster infrastructure to manage cluster services such as HDFS, MapReduce, HBase, and Cassandra. Some benefits of this integration include: Faster time to value through automation Consistent, dependable application deployment and management Simplified operation that can be quickly learned without cluster experience Reduced downtime due to configuration errors Reduced total cost of ownership Increased flexibility from the ability to quickly repurpose resources 8 Deploying Dell Big Data Clusters with StackIQ

4 Installing and Managing the Mixed-Use Cluster Step Zero Most installation instructions for cluster applications start with the assumption that a running cluster is already in place, thereby skipping over the complex, time-consuming process of building and managing cluster infrastructure. Since these instructions generally start with Step 1, StackIQ refers to the actual first step as Step Zero. For the Apache Cassandra and Apache HBase tests the StackIQ Cluster Manager node was installed from bare-metal (i.e. there was no pre-requisite software and no Operating System was previously installed) by burning the StackIQ Cluster Core Roll ISO to DVD and booting from it (the StackIQ Cluster Core Roll can be obtained from the Rolls section after registering at http://www.stackiq.com/download). The Core Roll led the user through a few simple forms (e.g., set the IP address of the Cluster Manager, set the gateway, DNS server, etc.) and then asked for a base OS DVD (for this installation, Oracle Linux 6.3 was supplied; similar distributions such as Red Hat Enterprise Linux & CentOS are supported as well). The installer copied all the bits from both DVDs and automatically created a new Oracle Linux distribution by blending the packages from all Rolls together. The user can also optionally select additional Rolls for the cluster; in this case, the Cloudera Roll and the Cassandra Roll were used, which were purpose built for the end-user. After the Cluster Manager was installed and booted (in about 30 minutes), it was put into "discovery" mode using the GUI and all backend nodes were set to PXE-first and booted. The Cluster Manager discovered and installed each backend node in parallel (in about 15 minutes) -- no manual steps were required. As shown in Figure 3, the default view from StackIQ Cluster Manager provides an interface to cluster-wide monitoring and management. The left-hand pane is used to change the view from Global, to Appliance, Rack, or Host, each of which provide context sensitive tabs for monitoring and management. Meeting the Customer Specifications Of the 60 data nodes used in the tests 20 were dedicated to running the customer s workload generator program. The remaining 40 were configured with either Cassandra or the Cloudera Distribution of Hadoop (CDH) with HBase, using the StackIQ Cluster Manager (StackIQ CM) to meet the customer s specifications. This means this configuration could be changed or automatically replicated on any cluster managed by a StackIQ CM. For example, if the current StackIQ CM node were to fail, one could build a new StackIQ CM from bare metal and it would contain all customer-specific settings on first boot. An initial 10-node Cassandra instance and an initial 10-node HBase instance were installed concurrently on separate racks by the StackIQ CM. As shown in Figure 4, the "Attributes" tab provides the ability to add, remove, or change values in the StackIQ database that are used for application and service configuration. This tab is present for Global, Appliance, and Host views so the user can choose the appropriate granularity of each attribute. 9 Deploying Dell Big Data Clusters with StackIQ

Figure 3. StackIQ Cluster Manager Monitor Tab Figure 4. StackIQ Cluster Manager Attributes Tab 10 Deploying Dell Big Data Clusters with StackIQ

Automated Scaling Both instances went through a "step-up" procedure where 10 nodes were incrementally added via the StackIQ CM until the node count reached 20. Both Cassandra and HBase were "stepped-up. When the 20-node tests completed, the 20 HBase nodes were converted into Cassandra nodes via reprovisioning using the StackIQ CM. The Dell PowerEdge RAID Controllers (PERC) were automatically reconfigured for Cassandra (a single RAID 0 boot disk and 12 drives configured as a RAID 10). The new 20-nodes were "perfectly shuffled" into the existing 20-node Cassandra instance to bring the instance up to 40 nodes with the StackIQ CM. A perfect shuffle is the fastest way in which to expand an existing Cassandra instance. Dynamic Repurposing When the customer completed their 40-node Cassandra testing, the 40 Cassandra nodes were flipped into HBase nodes. Using the StackIQ CM, a bare-metal reinstallation of all 40-nodes was kicked off in order to automatically and correctly optimize the underlying software stack and PERCs for HBase. It took 1 hour and 38 minutes to completely reconfigure the system and bring a 40-node instance of HBase online. This required configuring and starting a 40-node HDFS instance (4 commands), a 40-node MapReduce instance (3 commands), a 3-node ZooKeeper instance (3 commands) and a 40-node HBase instance (3 commands). During the switchover, StackIQ Cluster Manager coordinated the parallel reconstruction of almost a quarter petabyte of disk from RAID10 to RAID Bunch of Disks (RBOD) formatted with XFS. After the customer completed their 40-node HBase testing the 40 HBase nodes were flipped back into Cassandra nodes with a third disk configuration, again using the StackIQ CM to perform a bare-metal reinstallation of all 40-nodes in order to automatically and correctly configure the PERCs to the customer's new specifications. During the switchover, StackIQ Cluster Manager again coordinated the parallel reconstruction of almost a quarter petabyte of disk in under two-hours (note: had the cluster contained 400-nodes, rather than 40-nodes the time would have stayed approximately the same due to the parallel nature of StackIQ s installation and management system). See Table 3 for a complete listing of all tests performed. Table 3. Cassandra/HBase Tests Test Workload Driver Nodes Cassandra Nodes HBase Nodes 1 20 10 10 2 20 20 20 3 20 40 0 4 20 0 40 5 20 40 0 11 Deploying Dell Big Data Clusters with StackIQ

Advanced Diagnostics and Repair After the 10-node HBase instance was running, the customer s engineers decided to try a different caching policy on the virtual disks storing the test data. The StackIQ CM was used to change each of the virtual disks from write-through mode to write-back mode, in parallel. Additionally, the new settings were stored in the StackIQ CM local database so when a node is installed, its PERC will be automatically configured with these new settings (for existing nodes and for any new nodes assigned to run HBase). During the 40-node test, one node suffered a hardware failure, as expected in any largescale environment, which caused the Cassandra service on that node to fail. That node was removed from the Cassandra configuration and a spare node was installed as a Cassandra node and then added to the Cassandra instance. Again, the procedure was fully automated with the StackIQ Cluster Manager. Success At the end of the proof-of-concept tests, the customer commented that the rapid redeployment capabilities of the StackIQ Cluster Manager enabled them to: 1) experiment with more configurations than they thought possible in the available time frame; and 2) completely focus on testing Cassandra and HBase rather than spend time thinking about how to deploy and configure the services. In other words, the joint Dell StackIQ solution successfully enables customers to drive more value from their investment and focus on results; a win for all parties. To Learn More For more information on installing and using the StackIQ Cluster Manager, please visit http://www.stackiq.com/support or http://www.youtube.com/stackiq. The StackIQ Cluster Manager demonstration video is a good place to start: http://www.youtube.com/watch?v=gvpzca-yhqy To learn more about Dell and Hadoop: www.dell.com/hadoop To learn more about the Dell Solution Centers: www.dell.com/solutioncenters 12 Deploying Dell Big Data Clusters with StackIQ