A Continuous Availability Solution for Virtual Infrastructure

Technical Report A Continuous Availability Solution for Virtual Infrastructure NetApp and VMware Inc. September 2009 TR-3788 EXECUTIVE SUMMARY As enterprise customers look to virtualization as a cost-effective means to simplify their server environment, they realize the importance of a highly available storage environment. It is critical to address single points of failure scenarios. Planning a robust high-availability (HA) infrastructure solution for virtual data center environments is of utmost importance. This document showcases the simplicity of architecting powerful and robust infrastructure solutions in a VMware virtual environment with NetApp storage. The designs described provide the ability to maintain HA for both the computing and storage resources between the sites and the ability for complete disaster recovery in the event of the loss of a whole site, as well as the ability to move the complete operation of the environment between the two sites to minimize disruption during scheduled downtime events. This document describes various failure scenarios showcasing the value of deploying VMware HA and NetApp MetroCluster to recover from these failures. Each failure scenario simulates real-world operational failures and real disasters. The expected outcome and actual results are described in terms of MetroCluster product operation and resulting behavior of VMware ESX Servers and virtual machines.

TABLE OF CONTENTS 1 INTRODUCTION...3 1.1 INTENDED AUDIENCE... 3 1.2 SCOPE... 3 1.3 ASSUMPTIONS AND PREREQUISITES... 3 2 BACKGROUNDD...3 2.1 BUSINESS CHALLENGE... 3 2.2 HIGH-AVAILABILITY SOLUTIONS FOR VIRTUAL INFRASTRUCTURE... 4 2.3 DIFFERENT TIERS OF PROTECTION IN A HIGHLY AVAILABLE VIRTUAL INFRASTRUCTURE6 3 ARCHITECTURE OVERVIEW OF THE HIGH-AVAILABILITY SOLUTION USING NETAPP METROCLUSTER AND VMWARE HA...7 3.1 HIGH-LEVEL TOPOLOGY DIAGRAM... 7 3.2 DEPLOYMENT DETAILS... 9 4 SETUP AND CONFIGURATION OF THE HIGH-AVAILABILITY SOLUTION...12 4.1 LAB ENVIRONMENT... 12 4.2 NETAPP STORAGE CONFIGURATION... 12 4.1 VMWARE HA CLUSTER CONFIGURATION... 14 4.2 SETUP AND CONFIGURATION OPTIONS FOR THE VCENTER SERVER... 14 5 TESTING THE HIGH-AVAILABILITY SOLUTION IN DIFFERENT FAILURE SCENARIOS...15 5.1 FAILURES WITHIN A DATA CENTER... 15 5.2 FAILURES THAT AFFECT AN ENTIRE DATA CENTER... 21 6 SUMMARY...30 APPENDIX A: BROCADE SWITCH CONNECTION DETAILS FOR FABRIC METROCLUSTER (SOFTWARE-BASED DISK OWNERSHIP)...31 APPENDIX B: MATERIALS USED IN THE LAB SETUP...32 APPENDIX C: REFERENCES...33 AUTHORS...33 ACKNOWLEDGEMENT...33 DISCLAIMER...33 2

1 INTRODUCTION This technical report provides an overview and describes the end-to-end implementation details of a production-class high-availability solution for virtual infrastructure consisting of VMware ESX Servers and NetApp FAS storage. 1.1 INTENDED AUDIENCE This document is for: Customers and prospects looking to implement a high-availability solution for their virtual infrastructure consisting of VMware ESX Servers and NetApp FAS storage End-users and management seeking information on a high availability solution in a production or test and development environment 1.2 SCOPE What this document describes: End-to-end architecture overview of the HA solution for virtual infrastructure Detailed design and implementation guide; configuration best practices Reproducible test results that simulate common failure scenarios resulting from operational problems and real disasters The scope of this document is limited to the following: This report does not replace any official manuals and documents from NetApp and VMware on the products used in the solution or any other switch vendors referenced in the report. This report does not discuss any performance impact and analysis from an end-user perspective during a disaster. This report does not replace NetApp and VMware professional services documents or services. This report does not discuss a regional (long-distance) disaster recovery solution. If you are looking for a regional disaster recovery solution in addition to the high-availability option discussed in this paper, contact your NetApp representative for further assistance. 1.3 ASSUMPTIONS AND PREREQUISITES This document assumes familiarity of the following: Basic knowledge of VMware s virtualization technologies and products: VMware vcenter Server 2.5 and VMware Infrastructure 3 Basic knowledge of NetApp FAS systems and Data ONTAP 2 BACKGROUNDD 2.1 BUSINESS CHALLENGE Economic challenges compel businesses to provide high levels of availability and business continuity while simultaneously achieving greater levels of cost savings and reduced complexity. As a result, data center infrastructure is increasingly virtualized because virtualization provides compelling economic, strategic, operational, and technical benefits. Planning for a robust high-availability infrastructure solution for virtual data center environments hosting mission-critical applications is of utmost importance. Some of the key aspects of an effective high-availability virtualized infrastructure should include: Operational efficiency and management simplicity Cost effectiveness Architectural simplicity High performance 3

Resiliency and flexibility VMware provides high availability for virtual machines with VMware HA. VMware HA provides uniform, cost-effective failover protection against hardware failures within a virtualized IT environment. NetApp MetroCluster is a cost-effective, synchronous replication solution for combining high-availability and disaster recovery in a campus or metropolitan area to protect against both site disasters and hardware outages. MetroCluster provides automatic recovery for any single storage component failure and single-command recovery in case of major site disasters, making sure of zero data loss and making recovery possible within minutes rather than hours. Combining VMware HA and NetApp MetroCluster technologies offers a great value proposition. The combination provides a simple and robust high-availability solution for planned and unplanned downtime in virtual data center environments hosting mission-critical applications. Each of these solutions is discussed briefly in section 2.2. 2.2 HIGH-AVAILABILITY SOLUTIONS FOR VIRTUAL INFRASTRUCTURE VMWARE SOLUTION: VMWARE HA VMware cluster technology groups VMware ESX Server host a pool of shared resources for virtual machines and provides the VMware HA feature. When the HA feature is enabled on the cluster, each ESX Server maintains communication with other hosts so that if any ESX host becomes unresponsive or isolated, the HA cluster can negotiate the recovery of the virtual machines that were running on that ESX host among surviving hosts in the cluster. Figure 1) VMware HA architecture. Note: 1. Setting up an HA cluster requires a vcenter Server, but once the cluster is set up, it can maintain HA without further interaction from the vcenter Server. 2. VMware distributed resource scheduler (DRS) is out of the scope of this document. However, it is to be noted that VMware DRS can be enabled simultaneously with VMware HA on the same cluster. 4

NETAPP SOLUTIONS: NETAPP ACTIVE-ACTIVE SYNCMIRROR AND METROCLUSTER NetApp clusters, referred to as active-active HA pairs, consist of two independent storage controllers that provide fault tolerance and high-availability storage for virtual environments. The cluster mechanism provides nondisruptive failover between controllers in the event of a controller failure. Redundant power supplies in each controller maintain constant power. Storage HBAs and Ethernet NICs are all configured redundantly within each controller. The failure of up to two disks in a single RAID group is accounted for by RAID-DP. Figure 2) Active-active cluster. The NetApp HA cluster model can be enhanced by synchronously mirroring data at the RAID level using NetApp SyncMirror 1. When SyncMirror is used with HA clustering, the cluster has the ability to survive the loss of complete RAID groups or shelves of disks on either side of the mirror. Figure 3) NetApp SyncMirror. NetApp MetroCluster further builds on the NetApp cluster model by providing the capability to place the nodes of the clusters at geographically dispersed locations. Figure 4) NetApp active-active SyncMirror and MetroCluster. MetroCluster supports distances of up to 100 kilometers. For distances less than 500 meters, the cluster interconnects and the controllers and the disk shelves are directly connected. This configuration is referred to a stretch MetroCluster configuration. 1 See [1] and [4] in C 5

For distances of over 500 meters, MetroCluster uses redundant Fibre Channel switches to provide siteto-site connectivity with inter-switch links (ISL) running between the sites. This configuration is referred to a fabric MetroCluster configuration. In this case, the connections between the controllers and the storage are maintained through the ISL. Note: In the logical diagrams above, redundant connections to each component also exist, but are not shown here to simplify the diagram. 2.3 DIFFERENT TIERS OF PROTECTION IN A HIGHLY AVAILABLE VIRTUAL INFRASTRUCTURE Figure 5) Different tiers of virtual infrastructure protection with NetApp storage. Table 1 summarizes various scenarios for this high-availability solution. Table 1) High-availability solution scenarios. # Tier of Protection VMware Component NetApp Component Scope of Protection 1 Data-center-level protection VMware HA cluster NetApp active-active Cluster (with or without SyncMirror) Complete protection against common server and storage failures, including but not limited to failure of: Physical ESX Server Power supplies Disk drives Disk shelves Cables Storage controllers and so on 6

2 Cross-campus-level protection VMware HA cluster NetApp stretch MetroCluster VMware HA cluster nodes and the NetApp FAS controllers located at different buildings within the same site (up to 500m). Can handle building-level disasters in addition to protections provided in tier 1. 3 Metro (site-level) distance protection VMware HA cluster NetApp fabric MetroCluster VMware HA cluster nodes and the NetApp FAS controllers located at different regional sites (up to 100km). Can handle site-level disasters in addition to protections provided in tier 1. 4 Regional protection Outside the scope of this paper For more information on cross-campus HA solution using VMware HA cluster and Stretch MetroCluster, see appendix A. 3 ARCHITECTURE OVERVIEW OF THE HIGH-AVAILABILITY SOLUTION USING NETAPP METROCLUSTER AND VMWARE HA 3.1 HIGH-LEVEL TOPOLOGY DIAGRAM Figure 6 illustrates the architecture of the campus-distance-level (up to 500 meters) and metro-distancelevel (up to 100 KM) HA solution for virtual infrastructure made up of VMware ESX Servers, vcenter Server and NetApp storage. As described in section 2.3, high-availability of the infrastructure is provided through seamless integration of two technologies working in two different layers: VMware HA technology for high-availability at the server-level. NetApp MetroCluster technology for high-availability at the storage-level: o Stretch MetroCluster setup, for campus-distance-level protection o Fabric MetroCluster setup, for metro-distance-level protection Each site in this high-availability solution is active with VMware ESX Servers in the sites running Windows and Linux VMs. In the lab setup, there are three ESX Servers in each site, each running two VMs (one Windows and the other Linux). VMware ESX Servers in the each site can access the NetApp storage in the back-end MetroCluster setup using FC, NFS, or iscsi protocols. The flexibility to use any protocol is made possible by the unified storage architecture of NetApp FAS systems. A separate front-end SAN is required to use FC data stores in VMware ESX Servers as shown in Figure 6 in addition to the back-end fabric for the fabric MetroCluster setup. 7

Figure 6) Topology diagram of VMware HA and NetApp stretch MetroCluster solution. Figure 7) Topology diagram of VMware HA and NetApp fabric MetroCluster solution. 8

3.2 DEPLOYMENT DETAILS MATERIAL LIST FOR HIGH-AVAILABILITY SETUP See appendix B for details on the material used in the lab setup. Table 2) Material list. Infrastructure Component Vendor Details Server Any As per VMware Compatibility Guide Storage Switch (Fabric MetroCluster only, not required for Stretch MetroCluster) Switch (Front end SAN) Network Adapter HBA Software NetApp Brocade Brocade Broadcom QLogic NetApp NetApp NetApp FAS System Fabric MetroCluster Config, RAID-DP For currently supported matrix, see NetApp support site Brocade Switch Model 200E X 4 numbers 16P, Full Fabric, 4GB SWL SFPs For currently supported matrix, see the NetApp support site Brocade Switch Model 3800 X 4 numbers Broadcom NetXtreme II BCM 5708 1000Base-T X 2 per server Qlogic QLA 2432 X 2 per Server. A minimum of 2 HBAs are required in a production environment Data ONTAP 7.2.4 or higher See section 4.2 and the VMware KB article: VMware VI3 Support with NetApp MetroCluster cluster_remote NetApp syncmirror_local VMware VMware ESX Server 3.5 U3 VMware VMware vcenter Server 2.5U4 9

LOW-LEVEL CONNECTION DIAGRAM OF NETAPP STRETCH METROCLUSTER SETUP Figure 8) Stretch MetroCluster connection diagram. LOW-LEVEL DIAGRAM OF NETAPP FABRIC METROCLUSTER SETUP Figure 9) Fabric MetroCluster connection diagram. 10

BROCADE SWITCH CONNECTION DETAILS AND CONFIGURATION TABLES Figure 10) Brocade switch connection details in fabric MetroCluster setup. For connection details on the Brocade switch in MetroCluster setup, see Appendix B. 11

4 SETUP AND CONFIGURATION OF THE HIGH-AVAILABILITY SOLUTION 4.1 LAB ENVIRONMENT The following infrastructure was used in the lab to verify the functionality of the solution. The resources used in the test environment meet only the minimum requirements. It is strongly recommended to use multiple FC HBAs and NICs. In addition to the NetApp storage controller listed below, refer to the NetApp support site for MetroCluster support matrix. Each site (site 1 and site 2) has: One FAS3170 storage system running Data ONTAP 7.3.1 (supported for MetroCluster) along with its disk shelves. Two DS14MK4 disk shelves (144GB 10K RPM disk type) Three ESX 3.5 U3 server host with minimum of two NICs and one HBA2 (FC Adapter - QLogic 2432) Two Brocade FC switches Model 200E, 16-port, Fabric, 4GB SWL SFP for MetroCluster fabric setup (not required for stretch MetroCluster setup). Two Brocade FC switches Model 3800 for front-end fabric required for accessing FC datastore by ESX Servers 4.2 NETAPP STORAGE CONFIGURATION 1 BEST PRACTICE: Set the Data ONTAP configuration option cf.takeover.change_fsid to OFF. This option is supported on Data ONTAP version 7.2.4 and higher. In the event of complete storage controller and/or all disk shelves failure (storage controller and associated local disk shelves), a manual failover of the MetroCluster should be performed. If the change_fsid option is set to OFF on a NetApp FAS storage controller running Data ONTAP version 7.2.4 or higher, after performing a manual MetroCluster failover, the UUIDs of the mirrored LUNs are retained and additional steps in the ESX Server side are not required to detect the VMFS volumes. Once the VMFS volumes are detected, the VMs can be powered on manually. For NetApp FAS storage controllers running Data ONTAP older than 7.2.4, after performing a manual MetroCluster failover, the mirrored LUNs do not maintain the same LUN UUID as the original LUNs as this option is not available. When these LUNs house the VMFS-3 file system, the volumes are detected by ESX Server 3.x as being on Snapshot LUNs. Similarly, if a RAW LUN that is mapped as an RDM (Raw Device Mapping) is replicated or mirrored through MetroCluster, the metadata entry for the RDM must be recreated to map to the replicated or mirrored LUN. To make sure the ESX hosts have access to the VMFS volumes on the mirrored LUNs, see VMware KB 1001783. Figure 11) Set the cf.takeover.change_fsid configuration to OFF. 2 The FAS controllers should be licensed with the following features: cluster, cluster_remote, syncmirror_local iscsi, fcp, nfs 2 It is recommended to use at least two redundant HBAs and multiple NICs in production environments. See [5] in Appendix C. 12

3 MetroCluster setup with software-based disk ownership for the NetApp FAS controllers with the Brocade switches is performed in accordance with the guidelines provided by: Data ONTAP 7.3 Active/Active Configuration Guide (Part number: 210-04192_A0) http://now.netapp.com/now/knowledge/docs/ontap/rel7311/pdfs/ontap/aaconfig.pdf Brocade 200E Switch Configuration Guide MetroCluster Design and Implementation Guide: http://media.netapp.com/documents/tr- 3548.pdf 4 In FAS controllers in both the sites, flexible volumes are created inside the same aggregate corresponding to two types of ESX data stores: VMFS (FC and iscsi) and NFS. Figure 9 depicts the physical and logical storage configuration of NetApp MetroCluster setup as viewed from any of the sites. BEST PRACTICE: See NetApp and VMware Virtual Infrastructure 3 Best Practices for best practices related to creating Aggregates and NFS and VMFS datastores on NetApp FAS systems for VMware Infrastructure 3. Figure 12) Physical and logical storage configuration of NetApp FAS controllers in MetroCluster setup. 13

4.1 VMWARE HA CLUSTER CONFIGURATION BEST PRACTICE: When adding ESX hosts to a VMware HA cluster, the first five hosts added are considered primary hosts. The remaining hosts added are considered secondary hosts. Primary HA nodes hold node state information, which is synchronized between primary nodes and from the secondary nodes. To make sure that each site contains more than one primary HA node, the first five nodes added to the HA cluster should be added one at a time, alternating between sites. The sixth node and all remaining nodes can then be added in one operation. VMware ESX hosts and NetApp FAS controller network ports are connected to the same subnet that is shared between site 1 and site 2. The VMware ESX host s FC HBA should be connected to the same fabric that is shared between site 1 and site 2. 4.2 SETUP AND CONFIGURATION OPTIONS FOR THE VCENTER SERVER In this setup, the VMware vcenter Server is run inside a virtual machine in the HA cluster. Another way of designing the vcenter Server is to place it on a physical MSCS cluster with a MSCS cluster node in each site. If the storage housing the vcenter MSCS instance is at the failed site, it is necessary to perform the NetApp CFOD recovery. First recover the MSCS and start vcenter; then continue with the recovery process. For details on the deployment of vcenter Server with MSCS cluster, see www.vmware.com/pdf/vc_mscs.pdf. 14

5 TESTING THE HIGH-AVAILABILITY SOLUTION IN DIFFERENT FAILURE SCENARIOS Note: The tests below are executed for the high-availability solution with both stretch and fabric MetroCluster setup unless otherwise specifically stated. The terms Site 1, Local Site, and Failed Site are used interchangeably. The terms Site 2 and Remote Site are used interchangeably Refer to figures 6 and 7 for the component s name, location and connectivity In the diagrams of the operational scenarios below, only the setup with fabric MetroCluster is shown. However, the same tests were also performed with the stretch MetroCluster setup (unless otherwise stated) 5.1 FAILURES WITHIN A DATA CENTER OPERATIONAL SCENARIO 1: COMPLETE LOSS OF POWER TO DISK SHELF Figure 13) Operation scenario 1. Table 3) Operational scenario 1: complete loss of power to disk shelf. Tests Performed Expected Results 1. Power off one disk shelf. 2. Observe the result. 3. Power it back on. No disruption to data availability: Relevant disks go offline, plex is broken. No change detected in the ESX Server level, and VMs run without any interruption. When power is returned to the shelf, the disks are detected, and a resync 15

of the plexes occurs without any manual action. Actual Results MetroCluster Behavior VMware HA behavior Impact to Data Availability Actual results were in line with the expected behavior, and the tests passed as expected. No MetroCluster event No HA event None OPERATIONAL SCENARIO 2: LOSS OF ONE LINK IN ONE DISK LOOP Figure 14) Operation scenario 2. Table 4) Operational scenario 2: loss of one link in one disk loop. Tests Performed Expected Results Actual Results MetroCluster Behavior 1. Disconnect the fiber cable on one of the disk shelves. 2. Observe the results. 3. Reconnect the fiber. No disruption to data availability: The controller displays the message that some disks are connected to only one switch. No change detected in the ESX Server level, and VMs run without any interruption. When the fiber is reconnected, the controller displays the messages that disks are now connected to two switches. Actual results were in line with the expected behavior, and the tests passed as expected. No MetroCluster event 16

VMware HA behavior Impact to Data Availability No HA event None OPERATIONAL SCENARIO 3: LOSS OF ONE BROCADE FABRIC INTERCONNECT SWITCH (APPLICABLE ONLY FOR HA SOLUTIONS WITH FABRIC METROCLUSTER SETUP) Figure 15) Operation scenario 3. Table 5) Operational scenario 3: loss of one Brocade fabric interconnect switch. Tests Performed Expected Results Actual Results MetroCluster Behavior VMware HA behavior 1. Power off the LOCAL-SW2 Fibre Channel switch. 2. Observe the results. 3. Power it back on. No disruption to data availability. The controller displays the message that some disks are connected to only one switch and that one of the cluster interconnects is down. No change detected in the ESX Server level, and VMs run without any interruption. When power is restored and the switch completes its boot process, the controller displays messages to indicate that the disks are now connected to two switches and that the second cluster interconnect is again active. Actual results were in line with the expected behavior, and the tests passed as expected. No MetroCluster event No HA event 17

Impact to Data Availability None OPERATIONAL SCENARIO 4: LOSS OF ONE ISL BETWEEN THE BROCADE FABRIC INTERCONNECT SWITCHES Figure 16) Operation scenario 4. Table 6) Operational scenario 4: loss of one ISL between the Brocade fabric interconnect switches. Tests Performed Expected Results Actual Results MetroCluster Behavior VMware HA behavior Impact to Data Availability 1. Remove the ISL fiber cable between the Brocade fabric interconnect switches. No disruption to data availability: The controller displays the message that some disks are connected to only one switch and that one of the cluster interconnects is down. No change detected in the ESX Server level, and VMs run without any interruption. When ISL is reconnected, the controller displays messages to indicate that the disks are now connected to two switches and that the second cluster interconnect is again active. Actual results were in line with the expected behavior, and the tests passed as expected. No MetroCluster event No HA event None 18

OPERATIONAL SCENARIO 5: FAILURE AND FAILBACK OF STORAGE CONTROLLER Figure 17) Operation scenario 5. Table 7) Operational scenario 5: failure and failback of storage controller. Tests Performed Expected Results Actual Results 1. Power off one of the controllers by turning off both power supplies. No disruption to data availability: A slight delay from host perspective occurs while the data store (ISCSI/NFS/FC) connection is rebuilt because of the change of processing from one controller to the other. No interruptions occur in the VMs running on the ESX Servers. The partner controller reported the outage and began automatic takeover. There was a momentary pause in disk activity. When takeover was complete, activity returned to normal. For details on how ESX hosts responds to controller failure, see the What happens to an ESX host in the event of a single storage component failure section of VMware KB 1001783 MetroCluster Behavior VMware HA behavior Impact to Data Availability No MetroCluster event No HA event None 19

Tests Performed Expected Results Actual Results MetroCluster Behavior VMware HA behavior 1. Power on the storage controller. 2. Execute the cf giveback command on the other storage controller to cause the failback to occur. No disruption to data availability: There was a momentary drop in disk activity as indicated in the graph above. A slight delay from host perspective occurs while the data store (ISCSI/NFS/FC) connection is rebuilt because of the change of processing from one controller to the other. No interruptions occur in the VMs running on the ESX Servers. Actual results were in line with the expected behavior and the tests passed as expected. For details on how ESX hosts responds to controller failback, please refer to What happens to an ESX host in the event of a single storage component failure section of VMware KB 1001783 Controller in failed site reclaims its original role prior to failure; there is no disruption of storage access to either site. No HA event 20

5.2 FAILURES THAT AFFECT AN ENTIRE DATA CENTER OPERATIONAL SCENARIO 6: LOSS OF AN ENTIRE SITE In case of a complete site-level disaster like a terrorist strike, bomb explosion, war, flood, earthquake, or loss of power grid, all physical components of the VMware HA and NetApp MetroCluster solution such as VMware ESX Servers, NetApp storage controllers and associated disk shelves, and fabric switches can become unavailable simultaneously. In such circumstances, a manual failover of the NetApp MetroCluster needs to be performed. One way to simulate a real-world site disaster in the lab is to interrupt following components of the lab setup in the order given, in rapid succession (so that the partner site component is unable to automatically detect any failure): To simulate a site 1 disaster: 1. Disconnect both ISLs from the side of site 1. 2. Remove power from all the disk shelves in site 1. 3. Remove power from all the ESX Servers in site 1. 4. Remove power from the NetApp storage controller in site 1. Figure 18) Operation scenario 6: simulate complete site disaster. 21

Figure 19 illustrates the recovery process involved in a complete site loss scenario. Figure 19) Steps to be performed during the loss of a site. Step 1: Verify the site-level disaster 1. If the VMware vcenter Server is still available after a site-level disaster, it will display unavailability of resources from the failed site. Depending on the location of the VMware vcenter Server in the lab setup, which is running as a virtual machine inside the VMware HA cluster, the following observations can be made regarding its availability after a site-level disaster: Table 8) Availability of vcenter Server after a site-level disaster. Location of vcenter Server Datastore Observation Recovery steps Any ESX Server in site 2 From the NetApp FAS controller in site 2 vcenter Server will be up and running Any ESX Server in site2 From NetApp FAS controller in site 1 vcenter Server will not be available vcenter can be powered on after the manual force takeover in step 3 below. vcenter can be powered on in Site 2 ESX Server after the manual force takeover described in Step 2. Any ESX Server in site 1 From NetApp FAS controller in site 1 vcenter Server will not be available Any ESX Server in site 1 From NetApp FAS controller in site 2 vcenter Server will be available after it is powered on by VMware HA service on any ESX server cluster node in site 2. Optional: If the vcenter Server is available after the disaster (that is, in case the vcenter Server and its underlying storage is not on the failed site), the vcenter Server will show the unavailability of the VMs and resources in failed site. Figure 20) vcenter Server displaying unavailable resources. 22

2. The storage controller in surviving site, site 2, will show that its partner node is down. As mentioned previously, during an entire site failure, an automated cluster takeover will not be initiated by the surviving storage controller node. Figure 21) Site 2 display. Step 2: Declare disaster and perform force takeover Declare a site disaster and perform a manual takeover at the surviving site (site 2) by issuing the following command in NetApp storage controller of site 2: cf forcetakeover -f Figure 22) Output of the forcetakeover command. Step3: Verify availability of storage after force takeover After executing the cf forcetakeover command, all the LUNs and NFS exports of the failed node will be made available automatically. Figure 23) Output of the forcetakeover command. Step 4: Automatic power on of Virtual machines in remote site Note: In this specific test scenario, the disaster is planned with immediate declaration; therefore, the active storage on failed site was made available almost immediately to the ESX Server hosts in the surviving site. In real site disaster scenarios, the declaration may not be immediate. Therefore, the virtual machines may need to be manually powered on in the surviving site after storage is made available through the force takeover. 1. Takeover was successful and the virtual machines get automatically powered on in the surviving ESX Server on remote site. 2. If the vcenter Server was down, it will now boot in a surviving ESX Server and can be accessed. 23

3. From this point, all the VMs on the ESX Servers in the failed site get powered on and start running normally. This completes the takeover process for site-level disasters. Step 5: Perform giveback once the failed site is back online 1. Power on all the disk shelves connected to the storage controller of site 1. 2. Reconnect the ISL between sites so that storage controller in site 2 can see the disk shelves from site1. After connection, the disk shelves from either side automatically begin to resync. 3. All ESX Servers in site 1 are now powered on. When the ESX Servers are properly online, power on the storage controller in site 1. 4. Use the cf status command to verify that a giveback is possible and verify that all mirror resynchronization is complete before proceeding with the cf giveback command. Figure 24) cf giveback command. 5. If necessary, perform manual migration of virtual machines from the surviving site to the restored failed site with their respective ESX Servers. This completes the giveback process. 24

OPERATIONAL SCENARIO 7: LOSS OF ALL ESX SERVERS IN ONE SITE Figure 25) Operation scenario 7. Table 9) Operation scenario 7: Loss of all ESX Servers in one site. Tests Performed Expected Results Actual Results MetroCluster Behavior VMware HA behavior Impact to Data Availability Shut down all ESX Servers in one of the sites disconnecting the power cable. VMs residing on the ESX Servers are automatically powered on in the ESX Servers in the remote site. Actual results were in line with the expected behavior, and the tests passed as expected. In this scenario, all VMs that were migrated still use storage from their original site. No MetroCluster event Auto power on of virtual machines from failed hosts on the surviving nodes Applications or data on the VMs running on the ESX Servers powered off will not be available till they come up in the surviving nodes of the VMware HA cluster. 25

OPERATIONAL SCENARIO 8: LOSS OF ESX SERVERS, CONTROLLER, AND DISK SHELF IN ONE SITE Figure 26) Operation scenario 8. Table 10) Operation scenario 8: Loss ESX Servers, controller, and disk shelf in one site. Tests Performed Power off all the ESX Servers, storage controller, and disk shelf in a one site. Expected Results Actual Results MetroCluster Behavior VMware HA behavior Impact to Data Availability VMs residing on the ESX Servers shut down automatically migrate to the ESX Servers in the surviving side and boot up. Controller at the other site automatically takes over the powered off controller and the disk shelves. Actual results were in line with the expected behavior, and the tests passed as expected. Controller in remote site performs automatic takeover; there is no disruption of data access to either site. Auto power on of virtual machines from failed hosts on the surviving nodes. Applications or data on the VMs running on the ESX Servers powered off will not be available till they come up in the surviving nodes of the VMware HA cluster. 26

OPERATIONAL SCENARIO 9-1: COMBINATION TEST (SIMULTANEOUS FAILURES IN BOTH SITES) Figure 27) Operation scenario 9-1. Table 11) Operation scenario 9-1: Combination tests (simultaneous failures in both sites). Tests Performed Expected Results Actual Results MetroCluster Behavior VMware HA behavior Impact to Data Availability 1. Power off ESX Servers in one site. 2. Power off the storage controller in the other site. VMs residing on the ESX Servers shut down automatically migrate to the surviving ESX Servers and boot up. The surviving storage controller automatically takes over the powered off controller. Actual results were in line with the expected behavior, and the tests passed as expected. Surviving storage controller performs automatic takeover; there is no disruption of data access to either site. Auto power on of virtual machines from failed hosts on the surviving nodes Applications or data on the VMs running on the ESX Servers powered off will not be available till they come up in the surviving nodes of the VMware HA cluster. 27

OPERATIONAL SCENARIO 9-2: COMBINATION TEST (SIMULTANEOUS FAILURES IN BOTH SITES) Figure 28) Operation scenario 9-2. Table 12) Operation scenario 9-2: Combination tests (simultaneous failures in both sites). Tests Performed Expected Results Actual Results MetroCluster Behavior VMware HA behavior Impact to Data Availability 1. Power off disk pool 0 in site 1. 2. Power off disk pool 0 in site 2. VMs should not see any change and continue to operate normally. Actual results were in line with the expected behavior, and the tests passed as expected. No MetroCluster event and hence mirroring process continues. No HA event None 28

OPERATIONAL SCENARIO 9-3: COMBINATION TEST (SIMULTANEOUS FAILURES IN BOTH SITES) Figure 29) Operation scenario 9-3. Table 13) Operation scenario 9-3: Combination tests (simultaneous failures in both sites). Tests Performed Expected Results Actual Results MetroCluster Behavior VMware HA behavior Impact to Data Availability 1. Power off storage controller in site 1. 2. Power off disk pool 0 in site 2. VMs should not see any change and continue operation normally. Actual results were in line with the expected behavior, and the tests passed as expected. Surviving storage controller performs automatic takeover; there is no disruption of data access to either site No HA event None 29

6 SUMMARY Table 14) Summary of failure scenarios and its impact to data availability. # Failure Scenario Data Availability Impact 1 Complete loss of power to disk shelf None 2 Loss of one link in one disk loop None 3 Loss of brocade switch None 4 Loss of one inter-switch link (ISL) None 5 Failure and failback of storage controller None 6 Loss of an entire site Applications or data on the VMs running in the failed site will be available after executing the force takeover command from the surviving site and manually powering on the disk shelf. 7 8 9-I Loss of all the ESX Servers in one site Loss of ESX Servers, controller, and disk shelf in one site Loss of ESX Servers in one site and loss of storage controller in other Applications or data on the VMs running on the ESX Servers powered off will be available after they automatically come up in the surviving nodes of the VMware HA cluster. Applications or data on the VMs running on the ESX Servers powered off will be available after they automatically come up in the surviving nodes of the VMware HA cluster. Applications or data on the VMs running on the ESX Servers powered off will be available after they automatically come up in the surviving nodes of the VMware HA cluster. 9-II Loss of disk pool 0 in both sites None 9-III Loss of storage controller in one site and loss of disk pool 0 in other None VMware HA and NetApp MetroCluster configuration enables an end-to-end high-availability solution for Virtual Infrastructure. The combination of VMware HA Cluster and NetApp MetroCluster delivers high levels of availability by minimizing both planned and unplanned downtime. Planned site failovers, at both the VMware HA level and the NetApp MetroCluster level, can be triggered without disrupting the environment, this allowing for scheduled maintenance without downtime. The combined VMware HA and MetroCluster solution delivers complete protection against server and storage failures, including failure of the physical ESX Server, NetApp storage controller, power supplies, disk drives, disk shelves, cables, and so on. Each failure scenario described above showcases the value of deploying VMware HA and NetApp MetroCluster to recover from these failures. Combining VMware HA and NetApp MetroCluster technologies provides a simple and robust highavailability solution for planned and unplanned downtime in virtual data center environments. This paper is not intended to be a definitive implementation or solutions guide for high-availability solutions in VMware virtual infrastructure with NetApp storage. Many factors related to specific customer environments are not addressed in this document. Contact NetApp support to speak with one of our virtualization solutions experts for any deployment requirement. Please forward any errors, omissions, differences, new discoveries, and comments about this paper to any of the authors. 30

APPENDIX A: BROCADE SWITCH CONNECTION DETAILS FOR FABRIC METROCLUSTER (SOFTWARE-BASED DISK OWNERSHIP) Table 15) Brocade switch connection details for fabric MetroCluster. Switch Name SITE1-SW1 Port Bank/Pool Connected To Purpose 0 1/0 SITE1 FCVI Cluster interconnect 1 1/0 Onboard HBA -0a 2 1/0 Onboard HBA 0c 3 1/0 4 1/1 ISL Inter-switch link 5 1/1 6 1/1 7 1/1 8 2/0 9 2/0 10 2/0 11 2/0 Site 1 Shelf 1 Disk HBA for bank 2 shelves Pool 0 12 2/1 13 2/1 Site 2 Shelf 1 Disk HBA for bank 2 shelves Mirror Pool 1 14 2/1 15 2/1 Switch Name SITE1-SW2 Port Bank/Pool Connected To Purpose 0 1/0 SITE1 FCVI Cluster interconnect 1 1/0 Onboard HBA -0b 2 1/0 Onboard HBA -0d 3 1/0 4 1/1 ISL Inter-switch link 5 1/1 6 1/1 7 1/1 8 2/0 9 2/0 10 2/0 11 2/0 Site 1 Shelf 1 Pool 0 Disk HBA for bank 2 shelves 12 2/1 13 2/1 Site 2 Shelf 1 Mirror Pool 1 Disk HBA for bank 2 shelves 14 2/1 15 2/1 Switch Name SITE2-SW3 Port Bank/Pool Connected To Purpose 0 1/0 SITE2 FCVI Cluster interconnect 1 1/0 Onboard HBA 0a 2 1/0 Onboard HBA 0b 3 1/0 4 1/1 ISL Inter-switch link link 5 1/1 6 1/1 7 1/1 8 2/0 9 2/0 10 2/0 11 2/0 Site 2 Shelf 1 Pool 0 Disk HBA for bank 2 shelves 31

12 2/1 13 2/1 Site 1 Shelf 1 Mirror Pool 1 14 2/1 15 2/1 Switch Name SITE2-SW4 Disk HBA for bank 2 shelves Port Bank/Pool Connected To Purpose 0 1/0 SITE2 FCVI Cluster Interconnect 1 1/0 Onboard 0b 2 1/0 Onboard 0d 3 1/0 4 1/1 ISL Inter-switch link link 5 1/1 6 1/1 7 1/1 8 2/0 9 2/0 10 2/0 11 2/0 Site 2 Shelf 1 Pool 0 Disk HBA for bank 2 shelves 12 2/1 13 2/1 14 2/1 15 2/1 Site 1 Shelf 1 Mirror Pool 1 Disk HBA for bank 2 shelves APPENDIX B: MATERIALS USED IN THE LAB SETUP Table 16) Lab setup. Infrastructure Component Vendor Quantity Details Server IBM 1 IBM x3550 server 5 IBM x3650 server Intel Xeon Processor (Intel-VT), CPU: 74 GHz total in the cluster Memory: 52GB total in the cluster Storage NetApp FAS 3170: Fabric MetroCluster config, RAID-DP Switch (MetroCluster) Switch (front-end SAN) Brocade 4 Brocade Switch Model 200E 16P, Full Fabric, 4GB SWL SFPs For currently supported matrix, see the NetApp support site Brocade 4 Brocade Switch Model 3800 Network adapter Broadcom 2 per server Broadcom NetXtreme II BCM 5708 1000Base-T HBA QLogic 1 per server 3 Qlogic QLA 2432 NetApp Data ONTAP 7.3.1 Software NetApp NetApp VMware cluster_remote syncmirror_local VMware ESX Server v3.5 U3 VMware VMware vcenter Server 2.5U4 3 Note that a production environment requires a minimum of two HBAs. 32

Virtual machines Two VMs per ESX host configured for functional tests (one VM running Windows 2003 SP1 EE and another running RHEL 5 U2) APPENDIX C: REFERENCES 1. Data ONTAP 7.3 Active/Active Configuration Guide, http://now.netapp.com/now/knowledge/docs/ontap/rel7311/pdfs/ontap/aaconfig.pdf, part number: 210-04192_A0 2. Brocade 200E Switch Configuration Guide 3. MetroCluster Design and Implementation Guide: http://media.netapp.com/documents/tr-3548.pdf 4. Active-active configuration best practices, http://media.netapp.com/documents/tr-3450.pdf 5. NetApp and VMware Virtual Infrastructure 3 best practices, http://media.netapp.com/documents/tr- 3428.pdf 6. VMware HA White Paper: http://www.vmware.com/pdf/vmware_ha_wp.pdf 7. Clustering vcenter Server: http://www.vmware.com/pdf/vc_mscs.pdf AUTHORS Preetom Goswami Technical Marketing Engineer, NetApp Wen Yu Sr. Technical Alliance Manager, VMware Sridhara Gangoor Technical Marketing Engineer, NetApp Sitakanta Chaudhury Technical Marketing Engineer, NetApp ACKNOWLEDGEMENT Various teams from NetApp and VMware have helped to a great extent to develop this paper. Their invaluable guidance and participation in all phases made sure of a technical report that relied on their real-world experience and expertise. This paper would like to acknowledge the help received particularly from MetroCluster experts and VMware QA and Alliance teams. DISCLAIMER NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication or with respect to any results that might be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer s responsibility and depends on the customer s ability to evaluate and integrate them into the customer s operational environment. This document and the information contained herein must be used solely in connection with the NetApp products discussed in this document. 2009 NetApp. All rights reserved. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, Data ONTAP, RAID-DP, Snapshot, MetroCluster, and SyncMirror are trademarks or registered trademarks of NetApp, 33 Inc. in A the Continuous United States Availability and/or Solution other countries. for Virtual VMware Infrastructure and VMotion are registered trademarks of VMware, Inc. Linux is a registered trademark of Linus Torvalds. Intel is a registered trademark of Intel Corporation. Windows is a registered trademark of Microsoft Corporation. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.tr-3788