Abstract /10/$26.00 c 2010 IEEE

Similar documents
IBM Storage Software Strategy

EMC Business Continuity for Microsoft Applications

FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series Non-Stop Storage Reference Architecture Configuration Guide

Synology High Availability (SHA)

What's in this guide... 4 Documents related to NetBackup in highly available environments... 5

Microsoft E xchange 2010 on VMware

Storage Area Networks SAN. Shane Healy

Benefits of Multi-Node Scale-out Clusters running NetApp Clustered Data ONTAP. Silverton Consulting, Inc. StorInt Briefing

IBM IBM Storage Networking Solutions Version 1.

3.1. Storage. Direct Attached Storage (DAS)

IBM InfoSphere Streams v4.0 Performance Best Practices

As storage networking technology

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Nimble Storage Adaptive Flash

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

EMC VMAX 400K SPC-2 Proven Performance. Silverton Consulting, Inc. StorInt Briefing

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

White Paper. EonStor GS Family Best Practices Guide. Version: 1.1 Updated: Apr., 2018

Data center requirements

Virtualization And High Availability. Howard Chow Microsoft MVP

Virtualization of the MS Exchange Server Environment

2014 VMware Inc. All rights reserved.

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

High Availability and Disaster Recovery features in Microsoft Exchange Server 2007 SP1

Surveillance Dell EMC Storage with Digifort Enterprise

EMC VPLEX Geo with Quantum StorNext

EMC VPLEX with Quantum Stornext

Dell Exchange 2013 Reference Architecture for 500 to 20,000 Microsoft Users. 1 Overview. Reliable and affordable storage for your business

Vendor: EMC. Exam Code: E Exam Name: Cloud Infrastructure and Services Exam. Version: Demo

Surveillance Dell EMC Storage with FLIR Latitude

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

SurFS Product Description

InfoSphere Warehouse with Power Systems and EMC CLARiiON Storage: Reference Architecture Summary

Copyright 2012 EMC Corporation. All rights reserved.

FOUR WAYS TO LOWER THE COST OF REPLICATION

White Paper. A System for Archiving, Recovery, and Storage Optimization. Mimosa NearPoint for Microsoft

Managing Data Center Interconnect Performance for Disaster Recovery

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange 2007

Application Integration IBM Corporation

The Microsoft Large Mailbox Vision

Configuring and Managing Virtual Storage

E EMC. EMC Storage and Information Infrastructure Expert for Technology Architects

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange Enabled by MirrorView/S

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Dell s High Availability Cluster Product Strategy

Realizing the Promise of SANs

Using EonStor DS Series iscsi-host storage systems with VMware vsphere 5.x

Using EMC FAST with SAP on EMC Unified Storage

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

Dynamically unify your data center Dell Compellent: Self-optimized, intelligently tiered storage

Infrastructure Provisioning with System Center Virtual Machine Manager

Veritas Storage Foundation for Windows by Symantec

DELL EMC UNITY: BEST PRACTICES GUIDE

Veritas Dynamic Multi-Pathing for VMware 6.0 Chad Bersche, Principal Technical Product Manager Storage and Availability Management Group

NE Infrastructure Provisioning with System Center Virtual Machine Manager

HP solutions for mission critical SQL Server Data Management environments

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

A TPC Standby Solution User's Guide

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Exploiting the full power of modern industry standard Linux-Systems with TSM Stephan Peinkofer

VERITAS Storage Foundation 4.0 TM for Databases

Dell EMC SAN Storage with Video Management Systems

Offloaded Data Transfers (ODX) Virtual Fibre Channel for Hyper-V. Application storage support through SMB 3.0. Storage Spaces

Optimizing Quality of Service with SAP HANA on Power Rapid Cold Start

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Take Back Lost Revenue by Activating Virtuozzo Storage Today

All-Flash Business Processing SAN and ONTAP 9 Verification Tests Using Microsoft SQL Server Workloads

IBM IBM Storage Sales Version 8. Practice Test. Version QQ:

VMware vsphere 5.0 STORAGE-CENTRIC FEATURES AND INTEGRATION WITH EMC VNX PLATFORMS

FlexArray Virtualization

Dell/EMC CX3 Series Oracle RAC 10g Reference Architecture Guide

EMC Celerra CNS with CLARiiON Storage

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

CONTENTS. 1. Introduction. 2. How To Store Data. 3. How To Access Data. 4. Manage Data Storage. 5. Benefits Of SAN. 6. Conclusion

IBM Storage Networking Solutions Version 2. Download Full Version :

Nové možnosti storage virtualizace s řešením IBM Storwize V7000

Real-time Protection for Microsoft Hyper-V

Microsoft Office SharePoint Server 2007

VERITAS Dynamic Multipathing. Increasing the Availability and Performance of the Data Path

Power Systems with POWER7 and AIX Technical Sales

QLE10000 Series Adapter Provides Application Benefits Through I/O Caching

Synology High Availability (SHA)

vsan Mixed Workloads First Published On: Last Updated On:

Vblock Architecture. Andrew Smallridge DC Technology Solutions Architect

GUIDE. Optimal Network Designs with Cohesity

System Description. System Architecture. System Architecture, page 1 Deployment Environment, page 4

EMC Unified Storage for Oracle Database 11g/10g Virtualized Solution. Enabled by EMC Celerra and Linux using FCP and NFS. Reference Architecture

Availability & Resource

Automated Storage Tiering on Infortrend s ESVA Storage Systems

The Oracle Database Appliance I/O and Performance Architecture

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

How much Oracle DBA is to know of SAN Part 1 Srinivas Maddali

DELL EMC CX4 EXCHANGE PERFORMANCE THE ADVANTAGES OF DEPLOYING DELL/EMC CX4 STORAGE IN MICROSOFT EXCHANGE ENVIRONMENTS. Dell Inc.

Storage Virtualization II Effective Use of Virtualization - focusing on block virtualization -

Four-Socket Server Consolidation Using SQL Server 2008

Lenovo Database Configuration

Mission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software.

Power Systems High Availability & Disaster Recovery

1 Quantum Corporation 1

Transcription:

Abstract Clustering solutions are frequently used in large enterprise and mission critical applications with high performance and availability requirements. This is achieved by deploying multiple servers for the purpose of load balancing or by having redundant servers for the purpose of high availability. Most cluster management tools focus on the availability of servers and IP connectivity, and rely on shared network storage like Network Attached Storage (NAS), Storage Area Network (SAN), etc. for storage resiliency. Modern enterprise storage systems are complex, with millions of possible configurations. For end-to-end performance and availability, the configuration of storage resources and interconnection fabric need to be carefully planned. Adhoc configuration may lead to poor availability, deteriorated performance or higher cost. In this paper, we discuss the challenges associated with provisioning storage resources in cluster environment and present a list of best practices and novel optimization strategies to make storage more resilient for clusters. We have build a tool called Casper that automates the process of planning, optimization and deployment of shared network storage for clusters. Casper distributes storage load across different storage nodes and across different network paths to ensure load balancing and prevent single point of failure. 978-1-4244-5367-2/10/$26.00 c 2010 IEEE 647

Introduction Many enterprise and scientific applications rely on clustering solutions for their high performance and high availability needs. These applications may range from commercial databases to application servers to large parallel applications like scientific simulations and weather forecasting. Clusters provide a low cost alternative to single computer of similar capabilities. Based on their type and configurations, clusters can be broadly grouped in 3 categories: High Availability (HA) Clusters: The clustering software aims to reduce the service downtime by switching to one of the stand-by nodes, when a hardware or software component fails. Load-balancing clusters: Multiple instances of the servers are run to provide the same set of service and the load is distributed across all of them. Compute clusters: These are typically used by large parallel applications as opposed to transactional applications. Application or data processing is typically sub-divided into multiple smaller components or datasets and the tasks are distributed across multiple compute nodes for processing. The processed result may be later combined for further processing. 648 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track

Different applications may have different requirements, and these clusters need to be appropriately provisioned and configured to deliver the required level of performance and availability. Multiple challenges arise in provisioning resources for clusters: First, the overall behavior of the cluster is dependent on the end-toend configurations of multiple components including that of applications, servers, network and storage. Second, even though the clustering software have the ability to detect failures in software and hardware components and take appropriate actions (like moving workload to stand-by nodes and rebalancing load), there are, however, no management tools that do complete end-to-end configuration of the cluster resources. These are typically done by high-skilled administrators using rule-of-thumb or back of the envelope calculations. Manual methods tend to be error-prone, time consuming and tedious. These may also overprovision, which increases the cost. Third, many cluster applications require access to shared storage, which give them a consistent view of their data from all nodes. The most common storage technologies that are being used for cluster environment are Network Attached Storage (NAS), which provide file-based access, and Storage Area Network (SAN), which provide block-based access to storage. Modern NAS and SAN are complex distributed systems and have large number of possible configurations, which determine the overall application performance, availability and other characteristics. There are some commercially available tools like IBM s TotalStorage Productivity Center (TPC)[4] and EMC SAN Architect[5] that automates SAN provisioning. But most of the current approaches don t take the entire end-toend aspect (application, cluster and SAN) into account because of which the overall storage system design may not be very resilient. For example, in a HA cluster, if both the active and stand-by nodes are using the same network path to access shared storage, the failure of any link or switch in that path would make both the nodes unavailable, thus bringing the entire service down. Furthermore, if the same network path is used by multiple cluster nodes to access shared storage, it can quickly become a performance bottleneck. In this paper, we present an integrated tool called Casper that automates the provisioning of shared network storage for clusters. We focus on provisioning fiber channel SAN for high availability (HA) clusters, primarily because SAN is the most used storage technology in high-end enterprise environment and also one of the most complex. Casper allocates storage and automatically configure the network path to make SAN storage accessible to the cluster nodes in a way that balances the overall load and avoid any single point of failure. 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track 649

The above figure shows a simple example of a highly available (HA) cluster configuration with SAN backend. Popular HA clustering solutions include IBM HACMP [1], Microsoft Cluster Server (MSCS) [2], Linux HA, Veritas Cluster Server [3], etc. The clustering software in each server nodes exchange heatbeat messages with each other across a TCP/IP network or a serial connection or a shared disk to monitor their health status. If a software or hardware failure is detected, the HA software quickly restarts the failed application in a standby node (also called passive node). This prevents application downtime or performance degradation. Cluster Resource Group is a collection of related cluster resources that defines actions to be taken during a switchover operation of the access point of resilient resources. The group effectively describes a recovery domain. In order to facilitate failover and/or load balancing, cluster applications access data from shared network storage like Storage Area Network (SAN) [6] as shown in the figure above. A typical SAN consists of one or more storage subsystems connected to host computers either directly or via a set of fiber channel switches. The storage subsystem may be composed of disk arrays or tape libraries or other storage medias. Even though SAN can be used in most enterprise environment, they are very well-suited for cluster environment because they can provide a consistent view of shared storage to all cluster nodes with high degree of reliability and performance. Provisioning SAN resources involve creation of storage units called LUNs (Logical Unit Numbers) and mapping them to the host computers. The LUN mapping process creates and configures one or more fiber channel network path via which the host can access the storage LUNs. Both of these processes (i.e. LUNs creation and mapping) need to be planned carefully. LUNs are created based on the workload requirements and the storage subsystems characteristics. LUN mapping is done based on physical connectivity and on the performance and failover requirements of the workloads. This is extremely non-trivial in a clustered environment, where a set of LUNs need to be mapped to multiple l nodes in the clusters. Some of these nodes are active and some are stand-by. Each of them imposes different load on different fiber channel network path. 650 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track

Our proposed solution, Casper, is an intelligent analytic component that helps the administrator plan and provision storage resources for a cluster. We have extended the SAN Planner component available in the IBM Total Storage Productivity Center (TPC) [4][7] suite to demonstrate the above functionality. Existing SAN Planner [8] automates SAN storage provisioning by determining LUNs placements in storage subsystems based on different capacity and workload requirements. It can also recommend fiber channel fabric zoning and multipath configurations. TPC is an integrated operational management solution that automates various aspect of storage resource management (SRM). It discovers and monitors various configuration and operational information from its managed environment consisting of applications, servers, network elements like fiber channel (FC) switched and fabrics, and storage controllers, tape libraries etc. These pieces of information are stored in an operational database, which is later queried for various management tasks like device configuration, performance reporting and analyses, problem determination, chargeback, etc. Casper leverages this information in a policy based framework to allocate SAN storage and configure network paths based on the performance and resiliency requirements of the cluster workloads. 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track 651

The figure above shows an high level view of Casper. Administrator can specify multiple workload requirements and select policies that need to be applied in the planning process. Casper plans for the following scenarios: Given the managed environment, what are the hosts that can potentially be included into a given cluster based on cluster type, host operating system, application type, etc. Given a cluster, what are the hosts that can potentially be included into a cluster resource group based on application type, failure probability, server performance, etc. Given a set of cluster nodes and workload requirements, what is the best allocation of storage and network resources that achieves required performance and resiliency. 652 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track

Cluster aware storage resource provisioning mainly has to address two broad requirements : i) Resiliency and ii) Performance Resiliency ensures that the service is available even when a few components go down. From the SAN perspective, resiliency has two main components: (a) storage resiliency and (b) data path resiliency. Storage resiliency is achieved using RAID arrays and other redundant device components. Data path resiliency ensures that the path from storage LUN (Logical Unit Number) to the cluster node is resilient. This is achieved by making sure that there is no-single-point-of-failure in the active and stand-by data path. During failure, the clustering software would failover application(s) from the active server to available standby server(s). Complete end-to-end resiliency must ensure that none of the entities in the data-path becomes a single point of failure. These entities include fiber channel fabric, switch connecting to server, fiber channel switch connecting to storage controller, storage controller fiber channel port, host fiber channel port, LUN assignment, etc. Similarly, the performance factor is dependent on performance of underlying disk subsystems and the data path performance. The performance characteristics of disk subsystems is dependent on the type of disks (SAS, SATA, SSD, etc.), RAID configuration, etc. In addition, the configuration of the data paths also impact performance. Casper ensures balanced load along the fiber channel data-path in aclustered environment. Workload has to be optimally balanced along the data-paths to avoid performance degradation or the occurrence of hotspots during failover. Based on the workload and resource availability in a data centers, different policies are employed to satisfy the administrator requirements. Cluster nodes are selected on the basis of the operating system and hardware configurations of the servers and also on cluster type, application type and workload requirements. Similarly, new hosts are added to cluster-resource-group based on application requirements and failure characteristics. 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track 653

Once the clusters and cluster-resource-groups are set up, Casper configures the data path from the storage LUN to the cluster nodes. This is done by applying a series of policies that ensures that there is no single point of failure in the data path. Diagrams above show fiber channel fabric configuration policies under different scenarios. Policy I: Active and standby hosts should connect to disk subsystems via different fabrics to protect from fabric failure. Policy II.a: In the absence of multiple fabric, active and standby hosts should be connected to different hostend edge switches of the same fabric to protect from FC switch failure. Policy II.b: In the absence of multiple fabric, active and standby hosts should access storage through different subsystem-end edge switches of the same fabric to protect from FC switch failure Policy III: Active and standby hosts should be assigned (mask/map) to different ports of storage subsystem Policy IV: The assignment of active and standby host ports should be uniformly distributed across all available storage subsystem ports Policy V: Fiber channel ports of active and stand-by cluster nodes are configured according to their failover characteristics The goal of the above policies is create disjoint data path for active and stand-by nodes. When an active node goes down because of the failed component in its data path, the stand-by node would be able to communicate with the LUN from a different data path. This ensures high degree of resiliency. 654 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track

Diagrams above show different policies for LUN assignment to achieve required performance goals. LUN assignment is a mapping defined by a tuple <Initiator Port, Target Port, LUN>. Initiator port specifies the FC (fiber channel) port of the cluster node. Target port specifies the FC port of the storage subsystem. Along with performance and resiliency characteristics of the storage subsystems, following policy is also applied while allocating storage for cluster nodes: If a storage subsystem is already serving storage to a particular cluster, it has higher priority while allocating additional storage for that cluster. This simplifies many aspects of storage management like disaster recovery, copy service relationship, etc. 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track 655

When an active cluster node fails, the clustering software moves the load to stand-by node, which in turn may shift the load in the backend SAN. This shift in I/O traffic may overload some portions of SAN fabric or fiber channel ports of the storage subsystems. In order to prevent unnecessary bottlenecks, Casper accounts for these failure and shifts in load during the planning process. This is done as follows: First, it computes the expected load on each fiber channel ports of the storage subsystems taking into account the failure characteristics of the cluster nodes. The first component of the E(l p ) is the load due to I/O traffic from active cluster nodes. The other component is due to traffic from stand-by nodes when their corresponding active nodes fail. Note that multiple LUNs may be mapped to hosts via same port. And a single LUN may be mapped to via more than one port (indicated by above). Second, it performs LUN mapping assignment in such a way that the expected load on each fiber channel port is balanced. 656 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track

In this section, we use a simple case study to explain the working of the cluster aware SAN planning in a data center. As shown in the figure above, a cluster (Cluster CL-1) is created with two resource groups (CRG-DB, CRG-AS). CRG-DB is a database cluster resource group that contains servers S2 and S3. S2 is configured as the active server that runs the database server and S3 as the stand-by server. CRG-AS is an application server cluster-resource-group that contains servers S1 and S3. S1 is configured as the active server that runs application server and S3 as the stand-by application server. In this environment, two cluster resource groups in a cluster share their stand by server assuming both database server and application server are not going to fail at the same time. Server S3 is installed with both database and application server related software bundles. While creating the cluster or cluster resource groups, administrator can use Casper to analyze existing servers and recommend appropriate hosts with sufficient spare capacity. This is done based on the type of cluster (or cluster resource group) and applications. In the above case study, administrator has defined a cluster with two active servers (S1 and S2), each of which belongs to a separate cluster-resource-group. Once active servers are defined, administrator uses Casper to perform the following tasks: i) Provision a server that could be added to the cluster and serves as a standby for server S1 and S2 in their respective cluster-resource-groups (i.e. CRG-DB and CRG-AS). ii)provision storage for the database server with the following specifications: Capacity: 240 GB RAID type: RAID-5 Workload type: OLTP workload with 1000 transactions/sec The SAN consists of 2 fiber channel fabric and 2 storage subsystems, consisting of a high-end enterprise class disk subsystem (IBM DS8000) and amid-range disk subsystem (IBM DS4000.) 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track 657

Since a OLTP workload need to be provisioned, both performance and resiliency requirements are high. Casper selects DS8000 for two main reasons: i) it can handle high load and has much higher IOPS and lower response time compared to DS4000. ii) DS8000 is connected to two fabric and DS4000 is connected to just one fabric. During the planning process, Casper was able to configure paths with disjoint fabric from the storage LUNs in DS8000 to server S2 and S3. Server S2 access LUNs in DS8000 through fabric F1 and server S3 access the same LUNs through fabric F2. This gives better failover resiliency. The dotted lines show the configured data path. If DS4000 were selected for storage allocation, both server S2 and S3 have to access their storage through fiber channel fabric F2. If F2 fails, both S2 and S3 would also fail. In addition to the fabric selection, Casper also creates LUN mappings through multiple subsystem FC ports using different FC switches. This balances the I/O load as well as protect against a switch failure within a fabric. The above example illustrates the importance of LUN data path planning to improve the failure resiliency of the clusters. 658 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track

Summary and future work We studied different challenges faced by the administrators in provisioning shared network storage for cluster environment. We discussed and presented an integrated tool that uses best practices and administrators specified policies to find best possible SAN storage allocations for the cluster nodes. Even though, we focused on SAN and HA, similar ideas can be applied to other cluster and storage environment. Most high availability cluster solutions cannot distinguish between different types of SAN failures. If a LUN fails (due to disk array failure, etc.), switching to a stand-by cluster node is not going to help. On the other hand, if a fiber channel switch or port or fabric fails, it may be possible to keep the application/service available by starting it in the stand-by node. As a future work, we would investigate enhancing current HA solutions with SAN failure type awareness. Another aspect that need further research is the effect of San fabric zoning on cluster resiliency. Zoning is the security feature of fiber channel fabric to restrict communication between different parts of the fabric. Zoning mis-configuration can potentially disrupt storage connectivity and bring down an entire cluster. In this paper, we primarily focused on provisioning for new workloads. As system evolves, workload may change and system configuration may change. In such cases, dynamic analysis of the cluster and SAN configuration become necessary to make sure that the system is continuing to meet the performance and resiliency requirements of the deployed applications/workloads. 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track 659

660 2010 IEEE/IFIP Network Operations and Management Symposium - NOMS 2010: Application Track