Disaster Recovery with Dell EqualLogic PS Series SANs and VMware vsphere Site Recovery Manager 5

Similar documents
Benefits of Automatic Data Tiering in OLTP Database Environments with Dell EqualLogic Hybrid Arrays

NexentaStor Storage Replication Adapter User Guide

Dell Storage PS Series Administration and Product Overview

Microsoft SQL Server Database Protection Using EqualLogic Auto-Snapshot Manager / Microsoft Edition

VMware vcenter Site Recovery Manager 4.1 Evaluator s Guide EVALUATOR'S GUIDE

Server Fault Protection with NetApp Data ONTAP Edge-T

Site Recovery Manager Administration. Site Recovery Manager 6.0

Site Recovery Manager Administration. Site Recovery Manager 6.1

PS Series Deployment: Configuring and Deploying the Dell EqualLogic Host Integration Toolkit for Linux

SRM Evaluation Guide First Published On: Last Updated On:

VMware Site Recovery Manager with EMC CLARiiON CX3 and MirrorView/S

Dell PowerVault MD Storage Array VMware Storage Replication Adapter (SRA) Installation and Configuration Manual

Data Migration from Dell PS Series or PowerVault MD3 to Dell EMC SC Series Storage using Thin Import

Disaster Recovery-to-the- Cloud Best Practices

NexentaStor VVOL

Configuring ApplicationHA in VMware SRM 5.1 environment

Automated Disaster Recovery. Simon Caruso Senior Systems Engineer, VMware

vrealize Suite 7.0 Disaster Recovery by Using Site Recovery Manager 6.1 vrealize Suite 7.0

VMware vsphere with ESX 4.1 and vcenter 4.1

VMware vcenter Site Recovery Manager disaster recovery best practices

vsphere Replication for Disaster Recovery to Cloud vsphere Replication 8.1

Table of Contents HOL-PRT-1467

vstart 50 VMware vsphere Solution Overview

Implementing disaster recovery solution using IBM SAN Volume Controller stretched cluster and VMware Site Recovery Manager

VMware vcenter Site Recovery Manager 5 Technical

Dell Storage Compellent Integration Tools for VMware

VMware vsphere 5.5 Professional Bootcamp

Setting Up the Dell DR Series System on Veeam

Data Protection Guide

Evaluator Guide. Site Recovery Manager 1.0

Dell Storage Compellent Integration Tools for VMware

PS SERIES BEST PRACTICES DEPLOYING WINDOWS SERVER 2008 WITH PS SERIES SANS

Dell PS Series to Dell EMC SC Series Terminology Dictionary

Dell Storage Integration Tools for VMware

Dell EqualLogic PS Series Arrays: Advanced Storage Features in VMware vsphere

Virtual Storage Console, VASA Provider, and Storage Replication Adapter for VMware vsphere

Setting Up the DR Series System on Veeam

Tintri Deployment and Best Practices Guide

Dell Storage vsphere Web Client Plugin. Version 4.0 Administrator s Guide

TECHNICAL WHITE PAPER - FEBRUARY VMware Site Recovery for VMware Cloud on AWS Evaluation Guide TECHNICAL WHITE PAPER

Microsoft E xchange 2010 on VMware

vrealize Suite 6.0 Disaster Recovery by Using Site Recovery Manager 5.8

Setting Up Cisco Prime LMS for High Availability, Live Migration, and Storage VMotion Using VMware

A Dell Technical White Paper Dell Virtualization Solutions Engineering

Dell EMC Unity Family

SRM 8.1 Technical Overview First Published On: Last Updated On:

GET MORE VIRTUAL WITH DELL AND VMWARE vsphere 4

Best Practice Guide for Implementing VMware vcenter Site Recovery Manager 4.x with Oracle ZFS Storage Appliance

DELL EMC XTREMIO X2 NATIVE REPLICATION ADAPTER FOR VMWARE SITE RECOVERY MANAGER

Using EonStor DS Series iscsi-host storage systems with VMware vsphere 5.x

Site Recovery Manager Installation and Configuration. Site Recovery Manager 5.5

Dell EqualLogic Software All-inclusive software for enterprise power with everyday simplicity

Storage Replication Adapter for VMware vcenter SRM. April 2017 SL10334 Version 1.5.0

Dell PowerVault MD Series VMware Storage Replication Adapter (SRA) Installation and Configuration Manual (Web Client)

SRM 6.5 Technical Overview February 26, 2018

vsphere Replication for Disaster Recovery to Cloud vsphere Replication 6.5

Virtualization with VMware ESX and VirtualCenter SMB to Enterprise

By the end of the class, attendees will have learned the skills, and best practices of virtualization. Attendees

StarWind Virtual SAN. HyperConverged 2-Node Scenario with Hyper-V Cluster on Windows Server 2012R2. One Stop Virtualization Shop MARCH 2018

vsphere Replication for Disaster Recovery to Cloud

Basic Configuration Installation Guide

Dell EqualLogic Software

Data Protection Guide

Release Notes and User Guide: DataCore SANsymphony Storage Replication Adapter 1.1

VMware vsphere 6.5 Boot Camp

Basic Configuration Installation Guide

Virtual Server Agent v9 with VMware. June 2011

HP StorageWorks. EVA Virtualization Adapter administrator guide

HP StoreVirtual Storage Multi-Site Configuration Guide

Data Protection Guide

HC3 Move Powered by Carbonite

Virtual Storage Console, VASA Provider, and Storage Replication Adapter for VMware vsphere

"Charting the Course... VMware vsphere 6.7 Boot Camp. Course Summary

DELL TM PowerVault TM DL Backup-to-Disk Appliance

Storage Manager 2018 R1. Installation Guide

VMware vsphere with ESX 4 and vcenter

vsphere Installation and Setup Update 2 Modified on 10 JULY 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

Virtual Storage Console, VASA Provider, and Storage Replication Adapter for VMware vsphere

Zerto Virtual Replication Release Notes Version 3.0U5

RecoverPoint for Virtual Machines

vsphere Host Profiles 17 APR 2018 VMware vsphere 6.7 VMware ESXi 6.7 vcenter Server 6.7

StarWind Virtual SAN 2-Node Stretched Hyper-V Cluster on Windows Server 2016

Using VMware vsphere Replication. vsphere Replication 6.5

Site Recovery Manager Administration. Site Recovery Manager 8.1

MIGRATING TO DELL EMC UNITY WITH SAN COPY

vsphere Replication for Disaster Recovery to Cloud

StarWind Virtual SAN Configuring HA Shared Storage for Scale-Out File Servers in Windows Server 2016

SnapCenter Software 4.0 Concepts Guide

Business Continuity and Disaster Recovery Disaster-Proof Your Business

Architecting DR Solutions with VMware Site Recovery Manager

Dell EMC Ready Architectures for VDI

VMware vsphere with ESX 6 and vcenter 6

DELL EMC UNITY: REPLICATION TECHNOLOGIES

VMware Site Recovery Technical Overview First Published On: Last Updated On:

VMware vsphere 5.5 Advanced Administration

Introduction to Virtualization

Virtualization with VMware ESX and VirtualCenter SMB to Enterprise

VMware Site Recovery Manager 5.0. Best Practices

How to Deploy vcenter on the HX Data Platform

StarWind Virtual SAN Installing and Configuring SQL Server 2017 Failover Cluster Instance on Windows Server 2016

Transcription:

Technical Report Disaster Recovery with Dell EqualLogic PS Series SANs and VMware vsphere Site Recovery Manager 5 Abstract The virtual datacenter introduces new challenges and techniques for disaster recovery. This technical report details the installation and configuration of Dell EqualLogic PS Series storage with VMware vcenter Site Recovery Manager to help make disaster recovery an automated and manageable part of your virtual environment. TR1073 V1.0

Copyright 2011 Dell Inc. All Rights Reserved. EqualLogic is a registered trademark of Dell Inc. Dell is a trademark of Dell Inc. All trademarks and registered trademarks mentioned herein are the property of their respective owners. Information in this document is subject to change without notice. Reproduction in any manner whatsoever without the written permission of Dell is strictly forbidden. [Sept 2011] WWW.DELL.COM/PSseries

Current Customers Please Note: You may not be running the latest versions of the tools and software. If you are under valid warranty or support agreements for your PS Series array, you are entitled to obtain the latest updates and new releases as they become available. To learn more about any of these products, contact your local sales representative or visit the Dell EqualLogic site at http://www.equallogic.com. To set up a Dell EqualLogic support account to download the latest available PS Series firmware and software kits visit: https://www.equallogic.com/secure/login.aspx

Preface Thank you for your interest in Dell EqualLogic PS Series storage products. We hope you will find the PS Series products intuitive and simple to configure and manage. PS Series arrays optimize resources by automating volume and network load balancing. Additionally, PS Series arrays offer all-inclusive array management software, host software, and firmware updates. The following value-add features and products integrate with PS Series arrays and are available at no additional cost: PS Series Array Software o o o Firmware Installed on each array, this software allows you to manage your storage environment and provides capabilities such as volume snapshots, clones, and replicas to ensure data hosted on the arrays can be protected in the event of an error or disaster. Group Manager GUI: Provides a graphical user interface for managing your array Group Manager CLI: Provides a command line interface for managing your array. Manual Transfer Utility (MTU): Runs on Windows and Linux host systems and enables secure transfer of large amounts of data to a replication partner site when configuring disaster tolerance. You use portable media to eliminate network congestion, minimize downtime, and quick-start replication. SAN HeadQuarters (SANHQ): Provides centralized monitoring, historical performance trending, and event reporting for multiple PS Series groups. Host Software for Linux o Host Integration Tools Kit Host Software for Windows o Remote Setup Wizard Command Line Interface (RSWCLI): Discovers and initializes PS Series arrays, configures and manages access to PS Series arrays, and configures and manages multipathing. Multipath device configuration components: Provides the EqualLogic Connection Manager (ehcmd) daemon to manage multipath devices, a Device Mapper kernel module (dm-switch) to optimize routing of multipathing devices, and a command line interface (ehcmcli) that allows users to review the diagnostic state of EqualLogic multipathing. EqualLogic Host Performance and Tuning Tool Suite (eqltune): A utility used to validate configurable parameters against Dell s recommended practices. Host Integration Tools Kit Remote Setup Wizard (RSW): Initializes new PS Series arrays, configures host connections to PS Series SANs, and configures and manages multipathing. Multipath I/O Device Specific Module (MPIO DSM): Includes a connection awareness-module that understands PS Series network load balancing and facilitates host connections to PS Series volumes. VSS and VDS Provider Services: Allows 3 rd party backup software vendors to perform off-host backups. Auto-Snapshot Manager/Microsoft Edition (ASM/ME): Provides point-in-time SAN protection of critical application data using PS Series snapshots, clones, and replicas of supported applications such as SQL Server, Exchange Server, Hyper-V, and NTFS file shares. PowerShell Tools: Provides a comprehensive set of PowerShell cmdlets for managing one or many PS Series groups.

Host Software for VMware o o o Host Integration Tools Kit o o o Auto-Snapshot Manager/VMware Edition (ASM/VE): Integrates with VMware vcenter Server and PS Series snapshots to allow administrators to enable Smart Copy protection of vcenter folders, datastores, and virtual machines. EqualLogic Datastore Manager: Allows administrators to create and manage datastores on EqualLogic PS Series arrays from within vcenter. Virtual Desktop Deployment Utility: Automates the deployment of virtual desktops in a VMware View environment. Storage Replication Adapter for Site Recovery Manager (SRA): Allows SRM to understand and recognize PS Series replication for full SRM integration. Multipathing Extension Module for VMware vsphere: Provides connection awareness enhancements to the existing VMware multipathing functionality that understands PS Series network load balancing and facilitates host connections to PS Series volumes.

Table of Contents Revision Information... iii Executive Summary... 1 Introduction... 1 VMware vcenter Site Recovery Manager Terminology... 2 Overview and Prerequisites... 2 Configuring Replication for vcenter Site Recovery Manager... 3 Step 1: Configure Replication Partnership between Protected Site Array and Recovery Site Array... 3 Step 2: Configure replication on the Datastore volumes... 5 Step 3: Configure Replication Schedule... 8 Installation and Configuration of VMware vcenter Site Recovery Manager... 9 Step 1: Install Dell EqualLogic Storage Replication Adapter... 10 Step 2: Connect Using VMware vsphere Client...13 Step 3: Configure Site Recovery Manager Connection...13 Step 4: Configure Array Managers... 15 Step 5: Configure Inventory Mappings...20 SRM Protection Groups... 21 Datastore Cluster Considerations in Protection Groups... 25 Recovery Plans... 26 Testing... 29 Recovery...33 Failback... 36 Recovery Scenario 1: Reprotect and Failback...37 Recovery Scenario 2: Re-establish SRM... 38 Integration with iscsi Connected Volumes... 41 Troubleshooting... 43 Summary... 43 Technical Support and Customer Service... 44

Revision Information The following table describes the release history of this Technical Report. Report Date Document Revision 1.0 September 2011 Initial Release The following table shows the software and firmware used for the preparation of this Technical Report. Vendor Model Software Revision VMware vsphere ESXi 5.0 VMware vcenter Server 5.0 VMware vcenter Site Recovery Manager 5.0 Dell Dell EqualLogic PS Series Storage 5.1 Dell Dell EqualLogic Storage Replication Adaptor 2.0 The following table lists the documents referred to in this Technical Report. All PS Series Technical Reports are available on the Customer Support site at: support.dell.com Vendor VMware VMware VMware VMware Dell Dell Document Title VMware vcenter Site Recovery Manager Administration VMware vcenter Site Recovery Product Documentation VMware vcenter Site Recovery Manager Performance and Best Practices VMware vsphere System Administrator Documentation Dell EqualLogic PS Series Installation and Setup Dell EqualLogic PS Series Group Administration Guide iii

Executive Summary Data Protection and Disaster Recovery (DP/DR) is foremost in the minds of datacenter administrators. Virtualization adds increased flexibility and techniques when looking at protection schemes for the environment. Because VMware encapsulates entire servers into a series of files which make up the virtual machine, administrators can now take advantage of block storage based techniques such as clones, snapshots and replicas to protect these servers. Dell EqualLogic PS Series storage arrays offer built-in replication to transfer data, and thus the virtual machines, from one location to another and integration software that combines with VMware vsphere. VMware vcenter Site Recovery Manager (SRM) is a suite of tools that help to automate and test Disaster Recovery plans and depends on the PS Series replication to work. By combining these two platforms, administrators now have a manageable way to not only configure, automate, and test disaster recovery plans, but the means to easily run them in the case of a disaster. This Technical Report details the installation and configuration of VMware vcenter Site Recovery Manager software. As part of this setup, Dell has a storage adapter plug-in, the Storage Replication Adapter (SRA), that is needed to enable communication from VMware vcenter to the PS Series storage. In addition to SRM configuration and the storage adapter, this document will detail how to configure replication between sites and how to setup and test a recovery plan. The last section of this report shows administrators how to perform a full site failover and how to failback when the problems are resolved. Introduction Historically, disaster recovery (DR) solutions have been difficult and costly to implement. DR has been a pain point for many IT administrators, having to maintain consistency across duplicate hardware in various locations, documenting and maintaining detailed run books, reacting to changes in the environment, and scheduling testing that is classically disruptive to the production environment. VMware s vcenter Site Recovery Manager, or Site Recovery Manager (SRM), works with the Dell EqualLogic PS Series built-in replication to make disaster recovery rapid, manageable, reliable and affordable. SRM is a plug-in for vcenter on both the primary and the recovery site. SRM and vcenter at Site A (the Production Site) coordinate with SRM and vcenter residing at Site B (the DR Site) to simplify and automate disaster recovery. SRM allows for automated testing of DR plans, in advance of a disaster, as well as managing the failover and recovery in the event of a real disaster. SRM facilitates the process of bringing an entire virtual environment from one location to another. SRM centralizes the process of configuring a DR plan run book and allows for the testing of the plan without causing any impact to the production environment. This technical report will discuss the installation and configuration of SRM with the Dell EqualLogic PS Series storage arrays. It will cover setup, testing, failover, failback and troubleshooting. This document is designed to be used in conjunction with the SRM Administration Guide and assumes a prior knowledge of VMware vsphere and vcenter environments. 1

VMware vcenter Site Recovery Manager Terminology VMware s vcenter Site Recovery Manager introduces new terms when discussing DR planning and configuration. Because both SRM and the PS Series storage support bidirectional replication and configuration, it is not sufficient to use the terms production site and DR site. For example, consider a company with a virtual environment in New York running Virtual Machines (VMs) in production that is replicating to Chicago as a DR site. Chicago also has its own virtual environment in production and its DR site is in New York. With SRM and bi-directional replication provided by the storage, each site is protected from a disaster. If there is a problem in New York, all of the virtual machines previously hosted there can be recovered in Chicago and brought online. The reverse holds true if there is a problem in Chicago. Another example would be an environment that leverages the many-to-one replication of the PS Series array, where multiple satellite offices are each configured to replicate back to a centralized corporate datacenter. To avoid confusion, VMware uses the terms protected site and recovery site to differentiate between the two sites for a VM. A protected site is the site in which production VMs are up and running. These VMs must be protected and are done so by configuring array-based replication for their Datastores to another site. A recovery site is the site that the protected VMs are replicated to. In the case of a disaster, SRM can bring these VMs online following a clear plan, minimizing the downtime and recovery time. Overview and Prerequisites Site Recovery Manager requires VMware vcenter server to be installed at both the protected site and the recovery site. Each site will have its own VMware vsphere environment and datacenters. The SRM configuration is mirrored to the recovery site s SRM server so that if the primary site goes down, everything that the recovery site needs to resume the primary site s operations is local. VMware vcenter Site Recovery Manager does not automate or configure SAN storage replication between sites. This needs to be configured by the storage administrator as detailed in this document using the storage array management tools. Both PS Series replication and SRM require adequate network connectivity and bandwidth between the protected site and the recovery site. SRM will protect the VMFS datastores that the VMs reside on. All of the VMs must reside on shared storage and be configured for replication to qualify as protected. PS Series Replication is always configured between two EqualLogic groups, even if all the SAN members are in the same physical location. 2

Configuring Replication for vcenter Site Recovery Manager VMware vcenter Site Recovery Manager is deployed with two separate PS Series groups that have replication enabled so that the Virtual Machines that reside on the datastores at the protected site are replicated to the recovery site. In order for a datastore to be protected with SRM the following conditions must be met: The PS Series groups must be configured for replication The volumes must have replication configured The volumes must be part of a replication schedule Once these conditions are met the volume will show up as a protected datastore group inside SRM. Replication capabilities are a standard feature in the PS Series SAN and are straight forward to configure. The steps for configuring replication and configuring the volume are detailed below. These operations are done using the PS Series array Group Manager GUI. More detailed information can be found in the Group Manager under Tools -> Online help -> Welcome To Group Manager GUI Help -> Managing Data Replication. Step 1: Configure Replication Partnership between Protected Site Array and Recovery Site Array 1. From the protected site Group Manager GUI click the Replication management button. 2. Under Replication Partners in the Activities tab click on Configure Partner. 3. Enter in the Group Name of the recovery site (case sensitive), the Group IP Address and a Description. Click Next. 3

4. Enter in contact information in the next screen and click Next. 5. On the next screen there are two password fields. The first field is the Password for partner. This is the password that the protected group will give to the recovery group when establishing a connection for replication. The second field is the Password obtained from partner. This is the password that the recovery group expects to receive from the partner. These passwords need not be the same, and are set by each group s administrator. Enter in the passwords and click Next. 6. The next step in the replication configuration wizard is to specify Delegated Space. This is space that is reserved from local free space on the protected group to store inbound replicas from the partner group. The Dell EqualLogic PS Series SAN supports bi-directional replication as well as many to one replication. If there are going to be no replicas sent to this site from that partner then the value can be left at 0. Choose the amount of delegated space, the storage pool which the space will come from and click Next. 7. Verify all of the information is correct and click Finish. 4

8. This creates the partnership between the protected site and the recovery site. Now follow the same steps to configure the partnership between the recovery site and the protected site. When configuring the dedicated reserve space for replication, take into account the number of volumes being replicated and the total available space. For example, four 200GB volumes with data on them being replicated with 200% reserve space plus a little bigger will mean 1TB of delegated space on the recovery site. Choosing how much space to allocate will depend on various factors such as the number of replicas you wish to keep as well as how much data change is happening between each replica. Step 2: Configure replication on the Datastore volumes Once a partnership is established on both sides for replication, each volume needs to also be configured for replication. 1. From the protected site Group Manager GUI click the Volumes management button. 2. Click on a Datastore volume that you wish to protect with replication. In the Activities tab click on Configure Replication. 3. The next screen is divided into three sections. In the first section, choose the Replication partner that this volume will be replicated to. 5

In the second section, select the Total replica reserve which is a percentage of this volume that is set aside from the delegated space on the partner. This percentage not only accounts for 100% of the initial replica volume but should be configured based on the amount of change and the number of replicas you wish to keep. The third section will set aside a percentage of this volume in the local replica reserve to keep track of changes made during the replica process. This is also where Fast Failback Snapshots are stored in the event you are failing back from a replica set. This can either come from local replica reserve or free space by selecting Allow temporary use of free pool space. Choose your options in each section and click Next. These can always be modified later. 4. In the Advanced Settings screen, you have the option to Keep failback snapshot. Fast Failback keeps a copy of the most recent replica on the protected array. With SRM 5 administrators have the capability of using SRM to reprotect which automates fail-back to the original protected site after the cause of the failover has passed. By leveraging fast failback it will result in shorter recovery time as only the changes at the recovery site need to be replicated back to the protected site. In order to use Fast Failback and 6

Reprotect within SRM this check box must be selected. Make your selection and click Next. 5. View the summary and click Finish to complete the replication configuration. When this is done you will be prompted to optionally start the volume replica process immediately. The dialogue box for "Perform manual replication" refers to the Manual Transfer Utility and is not required for normal over the wire replication. This will begin replicating the base volume and creating a replica set on the partner array, and can either be performed at this stage or by the replication schedule that is configured for the volume. Optionally you can utilize the Manual Transfer Utility (MTU) to perform the initial replica, which allows you to move data from one datacenter to another without using the built in replication. This is often done if the bandwidth between sites is sufficient to accommodate the rate of data change of the volume but not sufficient to create the large initial replica in a timely manner. In 7

order to use the MTU click the checkbox in Perform manual replication See the Manual Transfer Utility User Guide for detailed information. 6. Repeat this process for each volume that will be protected by SRM. SRM will not recognize a Datastore as protected until it is configured for replication and has an active schedule configured for it which is detailed in the next step. Step 3: Configure Replication Schedule Each volume can be set up with different replication properties and schedules. This granular control allows for volumes to have different protection schemes based upon the data that they host. For example, some volumes may be replicated hourly with keeping the last 10 replicas and others may only be replicated once or twice a day. By taking advantage of the per-volume schedules and schemes, administrators can develop a replication strategy that makes sense for the data contained in each volume. 1. From the protected site Group Manager GUI click the Volumes management button. 2. Click on the volume that you need to configure a schedule for. In the Activities tab under Schedules click Create Schedule. 3. Give the schedule a name and choose Replication Schedule. Choose the schedule option: Run once, hourly, daily, or reusing an existing schedule and enable the schedule and then click Next. 8

4. Configure the replication schedule that meets the bandwidth and recovery needs for the VMs on that Datastore volume and click Next. For more information on Replication considerations, see the PS Series Administration Guide and Dell EqualLogic Auto-Replication: Best Practices and Sizing Guide. 5. Verify the summary of the schedule and click Finish. Follow the same procedure on all the Datastore volumes that host VMs that are to be protected in SRM. Once the replication partnership is configured between the protected site and the recovery site, and every Datastore volume that needs to be protected has been configured for replication, including having an active replication schedule, you can proceed with the configuration of SRM. These steps must be repeated any time a new volume is added to the virtual environment that will be hosting VMs that are to be protected by SRM.. Installation and Configuration of VMware vcenter Site Recovery Manager Before VMware vcenter Site Recovery Manager can be installed, both the protected site and recovery site must have their own instance of VMware vcenter Server installed and configured. The server upon which SRM is installed must be able to communicate with the remote SRM server as well as have connectivity to the local SAN through 9

either the management network or iscsi network. For more information consult the VMware vsphere System Administrator Documentation. NOTE: In some instances when using multiple networks for LAN/SAN connectivity, the iscsi NICs may be bound first. This will cause SRM to look on the SAN network instead of the LAN network. Double check NIC bindings so that public LAN bindings are first before SAN. SRM 5 has a new feature during the installation to choose the local IP address to use. This will help in making sure the correct communication IP Address is used. Once the vsphere virtual environment is configured on both the protected site and the recovery site, install VMware vcenter Site Recovery Manager on both sites. SRM requires a new database to be installed on both sites. The database may reside on the same database server with the vcenter database but it cannot use the same database. SRM can be installed on the same server as the vcenter server but installations may differ. For more information consult the SRM Administrator Guide from VMware. Step 1: Install Dell EqualLogic Storage Replication Adapter Once SRM is installed on both sites the Dell EqualLogic Storage Replication Adapter for VMware Site Recovery Manager (sometimes referred to as array scripts ) must be installed on each site. This adapter is necessary to allow SRM to communicate with the EqualLogic PS Series SANs and to coordinate the entire process of testing, cloning and failing over storage resources as part of disaster protection and testing. SRM does not automate the SAN replication configuration and management process, this is done using PS Series management tools as described in the previous steps. 1. You can find the PS Series Array Manager storage replication adapter for SRM download page on the Dell EqualLogic support site, http://www.equallogic.com/support or on the VMware SRM SRA site. Download the installer and install on both servers that are running SRM. If the SRA is downloaded from the VMware SRA site, unzip the package and run the installer. 2. On both sites the installation procedure is the same. Run the executable on each site. 3. Click Next. 10

4. Review and agree to the terms of the license agreements. Select Next. 5. Verify the target installation directory. Click Install. 11

6. When the install is complete select Finish. The storage adapter needs to be installed on both servers running SRM. 7. The SRA still needs to be discovered by the vcenter Site Recovery Manager plug-in. This is covered later in the section 4 of this document. 12

Step 2: Connect Using VMware vsphere Client 1. Now that SRM and the Array storage adapters are installed, connect to the environment using VMware vsphere Client. Install and configure the SRM Plugin for each client being used to manage SRM. 2. From the file menu click on Plug-ins and then Manage Plug-ins. Choose the vcenter Site Recovery Manager Plug-in to install. This will need to be done on any vsphere Client that is used to connect to the environment in which management of SRM is also needed. 3. For more information consult the SRM Administrator Guide from VMware. Step 3: Configure Site Recovery Manager Connection 1. Just as the PS Series SAN needs to have a partnership established for replication, vcenter Site Recovery Manager needs to have a partnership established between the protected site and the recovery site. 2. To get to the SRM configuration screen select Home. Then under Solutions and Applications click Site Recovery. 3. From the Sites section of the SRM Plug-in screen in the GUI click Configure. Enter the hostname or IP address of the SRM server at the recovery site. Type in the credentials to access both sides and the partnership will be established. 4. This is covered in more detail in the SRM Administrator Guide, but once configured it will look similar to the following example. This partnership is only established once and SRM will take care of configuring both partners to see each other. 13

5. From the example screenshot above, note that the protected vcenter server and protected SRM server are both on the same system, 10.124.6.211, and have been given the site name Prod Site. On the recovery site the vcenter server and SRM server reside on 10.124.6.212, and the site name is DR Site. 14

Step 4: Configure Array Managers Once the SRM partnership between the two sites is established the Array Manager needs to be configured. This is done for both sites but the configuration can be done through the protected site. NOTE: Even if there is only replication in one direction, the storage replication adapter must be installed on both the protected site and recovery site. When configuring the array manager on the protected site it needs to communicate to vcenter at the recovery site and see the recovery site array. 1. Click on the Array Managers button from the SRM section of the vsphere client. 2. Since the Dell EqualLogic SRA was already installed previously click the protection site (Prod Site 10.124.6.211 in this example) and click the SRAs tab. Click Rescan SRAs. This will discover the Dell EqualLogic SRA. If this is the first time you scan it will show that it is unable to find the SRA at the paired site. Click on the Recovery site (DR Site 10.124.6.212 in this example) and Rescan SRAs there and it will update the status to OK for both sites. 15

3. Once the SRA has been installed at both locations, each side needs to have its Array Manager configured. This is the PS Series group that is located locally to each SRM instance. In order for the SRM server to see the Array Manager correctly it must have access to either the group management IP address or iscsi group IP address. 4. Select the protection site (Prod Site 10.124.6.211 in this example) and click Add Array Manager above the SRM instances or in the Getting Started tab. 5. Enter in the Display Name for the group on the Protected site the Dell EqualLogic PS Series SRA from the dropdown. Click Next. 16

6. Fill in the local Group IP Address or Hostname and the Group Manager credentials as well as the replicated partner Group IP Address or Hostname and Group Manager credentials. Click Next. 17

7. When finished you will see a success message and the Array Manager show up under the protected site. Click Finish. 18

8. This will establish the protection site array manager. Do the same thing for the recovery site. This can be done from the SRM GUI of the protected site. Click the recovery site (DR Site 10.124.6.211 in this example) and then Configure Array Manager. When this is done the array groups will show up under each SRM instance. 9. Once the Array Managers have been configured on both sides the final step is to pair them with each other in SRM and verify that the replicated volumes show up. Click on the array manager for either site and click the Array Pairs tab. Click Enable to pair the two groups. This will verify all available replicated datastore volume pairs. 10. Click on the Devices tab and verify that he replicated datastores are visible. These are the datastores that will be available for creating protection groups. Anytime new volumes are added and configured for replication, simply click Refresh, to rescan and show the new volumes. As protection groups are established the information will also be displayed here corresponding to the datastore volumes that are part of the protection group. In this example you can see four datastores have been fully configured and are available for protection. 19

Step 5: Configure Inventory Mappings Inventory mappings are global settings that match the protected site resources with the recovery site resources. This can be used to match data centers, resource pools and networks on the protected site with data centers, resource pools and networks on the recovery site. This enables administrators to guarantee that broad configuration choices that are made in the protected site match up on the recovery site when a failover occurs. These global settings can be overwritten at the individual VM level during the creation of the protection group. Consult the SRM Administration Guide on how to configure the inventory mappings for SRM. This can then be used to match networks, hosts, resource pools, and folders from the protected site to the recovery site for testing and failover. To configure inventory mappings click on the Sites button in the SRM GUI and select any of the tabs for mapping: Resource Mappings, Folder Mappings, Network Mappings. Below is an example of Inventory Mappings from the Resource Mappings tab. 20

Once the inventory mappings are created click the Placeholder Datastores tab. Click Configure Placeholder Datastore and choose a location to store the recovery placeholder VM configuration files. This is a datastore on the recovery site. There does not need to be much space allocated here as it is just a temporary space to hold the small.vmx and other configuration files for each of the protected VMs. It is recommended to use a volume datastore that is shared among all of the recovery ESXi servers. Click OK. This also needs to be done on the protected site in the case of a reprotect this will be used to unregister and re-register the VMs on failback. SRM Protection Groups A protection group is a datastore volume or group of datastore volumes that contain virtual machines that need to be protected. A protection group can be configured from either site through the VMware vcenter SRM GUI. More detailed information can be found in the SRM Administration Guide, but the following is an example of the steps taken: 1. From the vcenter Client logged into the protected site click on Home-> Solutions and Applications->Site Recovery. Select the Protection Groups button and click Create a Protection Group from the menu bar or in the Getting Started tab. 21

2. Select the SRM instance in which to create the protection group. This will show the array pair. Select the correct array pair in which the protected datastore resides on and click Next. 3. On the next screen select all of the datastores to be included in the protection group. If a VM has entities that reside across multiple datastores they will all be included in the same group. By selecting each group you will see which VMs are associated with the protection group. Protection groups are used by SRM when creating recovery plans. This granularity allows administrators to create multiple small protection groups and then recovery plans based on them or even a single large protection group that encompasses all replicated datastores. Make your selection and click Next. 22

4. Give the Protection Group a name and a description and click Next. 5. Verify the setting on the final screen and click Finish. 23

6. Follow this same procedure for all of the protection groups that need to be configured. In order to be protected with SRM, each replicated datastore must belong to a protection group. 7. As the protection groups are created, you will see placeholder VMs being registered on the recovery site. SRM will also notify you if there are any errors in protecting the VMs. The following is an example of a finished protection group. 24

8. A more detailed explanation of modifying the protection group can be found in the SRM Administration Guide. To modify individual protection schemes of a VM select the protection group and then click on the Virtual Machines tab. From here right click on the VM you want to modify and select Configure Protection. Datastore Cluster Considerations in Protection Groups VMware vsphere 5 includes a new functionality for datastores called a Datastore Cluster. A datastore cluster is a grouping of datastores that allows for ease of VM placement as well as load balancing and capacity balancing across datastores in the cluster. It leverages storage DRS to move VMs to the appropriate datastore either via a manual recommendation or automated fashion. Because SAN replication is configured at the datastore volume level, this introduces some complexity to SRM. If a VM is storage vmotioned to another datastore the protection of the VM will break inside SRM. Not only will the protection be broken but the VM could possibly be moved to a non replicated volume and interfere with the recovery plan for the 25

organization. This should not dissuade you from using datastore clusters just be aware of the impact to the protection groups. When a VM is moved to another datastore the protection for the VM needs to be established. 1. You will see the VM protection status as invalid in the old Protection Group 2. Click Remove Protection to remove the placeholder VM on the recovery site. 3. If the VM was moved to another Protection Group select the VM and click Configure Protection. This will re-establish the correct placeholder VM for failover. 4. Verify a new PS Series replica has been created since the VM has moved or test/recovery will fail. Run a test and verify the recovery plan recovers the VM properly. Recovery Plans A recovery plan is a run book plan to facilitate and automate the process of testing and failing over virtual machines. The recovery plan is created on the recovery site and can encompass any or all of the protection groups that were created on the protected site. This allows administrators to configure various test scenarios and to run these tests. It also allows for more comprehensive full site failover situations to be run. More detailed information can be found in the SRM Administration Guide, but the following is an example of the steps taken. Recovery plans can be configured from either the protected site or the recovery site GUI and will prompt for the recovery plan location. 1. From the vcenter Client logged into the protected site click on Home-> Solutions and Applications->Site Recovery. Select the Recovery Plans button and click Create Recovery Plan from the menu bar or in the Getting Started tab. 2. Choose the Recovery Site that the plan will exist on and click Next. 3. Select all of the Protection Groups that will make up this Recovery Plan. This process gives administrators multiple granular recovery options for both testing and full site failover. Make your selection and click Next. 26

4. The next screen gives you the option of Test Networks. Test Networks are isolated network environments that are created on the fly during the test to make sure that the VMs being brought online cannot interfere with other running VMs including those in production. You have the option of changing the Test Network if you have additional switching requirements configured at the DR site such as an isolated test LAN with physical systems etc. The default of Auto will allow SRM to automatically create a new vswitch during the test failover. These isolated switches are even isolated from other failover tests. Make your selection and click Next. 5. Give the Recovery Plan a name and a description. Click Next. 6. Verify the information is correct and click Finish. 27

7. Once the Recovery Plan is configured you can adjust individual VM recovery options by clicking on the recovery plan and clicking the Virtual Machines tab. There are many options including recovery priority, customization of individual machines, IP address settings, pre- and post-power on scripts and many other options. The power of SRM lies in the ability to modify each VMs needs for recovery and testing, and then testing these changes without impacting the production environment. To modify the individual protection schemes of a VM right click the VM and modify the options. Every VM can be modified according to the recovery plan requirements. 28

8. Another option that can be used is the suspension of Non-critical VMs at the recovery site. This allows administrators to use the DR location to host Test/Dev machines or VMs that are not necessary in the event of a failover. This option is found under the Recovery Steps tab by clicking Add Non-Critical VM. 9. There can be multiple recovery plans configured with multiple scenarios. The benefit of SRM is the ability to not only configure multiple recovery plans but to also non-disruptively test them to assure they meet the organization s needs. Once there is at least one recovery plan, testing can begin. Testing One of Site Recovery Manager s greatest attributes is the ability to non-disruptively test the recovery plan before there is a failure. This allows administrators the ability to tune their recovery process and make sure that the plan is sound in the case of an actual failover. It also allows for comprehensive auditing of the recovery plans without impacting actual production virtual machines. A test failover scenario is designed to completely eliminate any impact on the production VMs and datastore volumes. When a test is run on the recovery site, the recovery site PS Series group is told to create a clone of the replica volumes and bring the clones online. This allows for no interruption to production replication and still allows exact copies of the VMs to be made available. SRM 5 has the option to even replicate the most recent changes so that the test is using the latest versions of the VMs. SRM will send the information about the recovery ESXi hosts to the SAN group to add to the access control list. Since it creates a clone of the replica, at no time is the 29

production replication impacted. However, there needs to be sufficient free space in the group in order to create a clone and bring it online. If there is not enough free space the clone operation will fail and the test will fail with an error of not being able to see the datastore volumes. Once the clone is created and brought online, SRM will rescan the storage at the ESXi host level. It will resignature the volume as needed to be able to bring that clone online as a new datastore. SRM will then re-register the recovery VMs to point to the new clone volumes and start powering them up in the order specified during the creation of the protection group and recovery plan. During the testing phase SRM will isolate the VMs based on the testing network setting that was configured earlier. By default it will create a srmvsrecovery vswitch with no external NIC connections. Again, this is done along with everything else to protect the production environment. 1. To start the test, log into the recovery site with the vcenter Client. Click on Home-> Solutions and Applications->Site Recovery and click the Recovery Plans button. Select a Recovery plan and click the Recovery Steps tab. This shows the recovery run book for both testing and recovery. 2. There are a few buttons at the top of the page: Test, Cleanup, Recovery, Reprotect and Cancel. To begin the test click Test. During a test you have the option of replicating the recent changes to the recovery site. This is independent of any replication schedules you may have, and so could take several minutes. Select your option and click Next. 30

3. Review the recovery plan steps and click Start to begin. 31

4. The progress of the test will fill up the Recovery Steps screen. It will go through and prepare the storage by creating the clones, bringing the clones online and rescanning the storage subsystem, re-registering the VMs, and bringing them online according to the plan. 5. Once the entire Recovery plan has been run it will pause with a message. At this point you can see if the test was successful by logging into the console of various VMs. Because of the isolated network there may be some limitations to the things that can be done inside the VM, such as RDP sessions, mapped drives or connections to iscsi volumes. These are all processes to be tested and documented in the case of a true failover. If there was an error you can troubleshoot the issue and correct the error and re-run the test. A log of the Recovery Steps can be exported for auditing purposes as well. 6. Once you have verified the test failover of the individual virtual machines, click Cleanup to finish the test and clean up the environment. SRM will unregister the VMs, re-register them to point towards the temporary datastore, and remove the isolated test network. It will then unregister the storage from ESXi and delete the clone volumes, freeing up that space and refreshing the ESXi storage subsystem. Within a few minutes the recovery site environment will be back to the way it was before the test was run. 32

This process allows for not only the fine tuning of the DR recovery plan, but also testing any time without having to bring the production environment down. The other benefit of the testing process is the ability to find errors or mistakes and fix them before a true disaster happens. Recovery In the case of a full site failure, a site migration, or simply wanting to fail over individual protection groups, running the recovery plan follows a slightly different process flow. The first thing SRM will do is see if it can communicate with the protected site vcenter Server. If it can, SRM will shut down any VMs on the protected site to make sure they are not online on both sites. It will also offline the original protected datastores. When running the recovery plan, instead of making a clone of the volumes, the replicas are promoted with the ability to be demoted. SRM will send the ESXi information to the SAN so that the access control list can be populated with the recovery site ESXi server information. Once all the replicas are promoted, SRM will prompt vcenter to rescan all of the storage adapters and bring the promoted volumes online as storage. SRM will then re-register the recovery VMs to point to the newly promoted volumes. Unlike a test though, the network configuration will be changed to match the recovery settings so the srmvs-recovery virtual switch will not be used. Each of the VMs will power on based on their priority level and the rest of the recovery plan will run. Once the entire recovery plan is complete, the protected site s virtual environment will now be up in production on the recovery site. The replicas that were promoted are not fully promoted volumes. They retain the ability to be demoted to utilize the fast failback procedure on the group. Because the promotion is not permanent there are a few actions that cannot be done on the volume, such as renaming it or resizing it. At any point you can make the promotion of the volume permanent however this will impact SRM's ability to reprotect and failback that volume. The failover process will also pause the inbound replication to the group but once the plan is finished inbound replication will be enabled again. 1. To begin a recovery plan, from the SRM GUI click the Recovery Plan button, select a Recovery Plan and Click Recovery. A site migration or a full site recovery is an executive decision and should not be invoked without planning and thought. SRM will verify that this is the option you wish to take by requiring the administrator to check the box acknowledging that the process will change the environment. You can select between two types of recovery processes: Planned Migration - Useful when attempting to migrate between datacenters. The plan will stop on any failure and not complete the site migration. It will also try to get one last replication in to obtain all of the latest changes. Disaster Recovery - Used when the protected site is unavailable or in the event of a true disaster. The plan will attempt to synchronize and power 33

down VMs on the protected site but if there are any errors it will continue moving forward with the plan. Acknowledge the warning, select the Recovery Type and click Next to begin the failover process. 2. There will be one last screen to review before the recovery process begins. Verify the settings and click Start. 34

3. Inside the recovery site PS Series Group Manager GUI you will see all of the promoted volumes show up as recovery volumes. 4. During the recovery process you can follow the steps by selecting the Recovery Steps tab. 35

5. When the recovery process is complete, SRM will show that the recovery is complete in the recovery steps tab. If there are any errors during the recovery they will show up as well. Each of the VMs will show up on the recovery site registered to the correct recovery datastore and be running according to the plan. At this point the recovery failover has been complete and you can utilize the recovery site as full production according to the disaster recovery plans for your organization. NOTE: For a planned failover/migration, in the event of an expected outage at the protected site, it is recommended that a planned downtime occur and the VMs are powered off before the final replica to ensure that all of the data is in a clean and consistent state on the DR site. Failback Failback is the process to bring all of the VMs that are running on the recovery site back to the original protected site after a full recovery plan has been run. There can be multiple reasons for enacting the full recovery plan and moving production VMs from the protected site to the recovery site; anything from power outage, equipment outage, planned migration, to a true disaster. In each of these cases, careful consideration must be given to bringing the existing environment back onto the original protected site. Regardless of the reason that the recovery site is now servicing production VMs, there are two basic scenarios for utilizing failback: the original SAN on the protected site is still in service and has some subset of data from the production environment before the failover; or the SAN is completely new, either because it is new hardware or has been re-initialized. There may even be an instance where both of these techniques are used, depending on the reason for failover. Site Recovery Manager 5 now has the ability to failback to the original protected site using a process called Reprotect. 36

Reprotect is available when the original protected site and the associated data is still available. With careful planning, bringing the recovery site virtual environment back into production on the protected site can happen with very little downtime, of which that downtime is planned. During the discussion, the role of protected site and recovery site may change; so to keep things straight, this section will discuss site A as the original protected site which had data in production and site B as the original recovery site that was failed over to. Recovery Scenario 1: Reprotect and Failback The first scenario discussed is one where the original site A protected site still has a functioning SAN with some subset of data. The failover could have been invoked due to a planned hardware outage or unplanned power failure but nothing involving the underlying server and storage environment. While disaster recovery failovers are seldom planned, the failback process can be planned and controlled to ensure that there is no loss of data and minimum disruption. A controlled failback is when the administrator has the time and ability to schedule downtime and prepare for failing back from site B to site A. Administrators can take their time devising a strategy in which to migrate back to site A with all of the current data that was written since the failover occurred. Because failback is done at the volume and datastore layer, administrators need to ensure that all of the VMs that reside on the volume are shut down to insure data consistency. Also, if there are VMs that span multiple volumes, all of these volumes need to be failed back at the same time to guarantee the VMs operation back at site A. Reprotect SRM 5 has a new functionality called reprotect. Reprotect will automate the process of re-establishing the replica going from the Site B recovery site back to the Site A protected site. It doesn't failback, just configures everything so that you can test going back to the original protected site A. During the reprotect, fast failback can shorten the time period of the replication sync back. The process is as follows: 1. Demote protected Site A volume to an inbound replica set 2. Makes recovery Site B volume promotion permanent 3. Reverses replication setup and configures replication from Site B to Site A for the re-protected volumes. 4. Reverses protection groups and recovery plans so they can be run going from Site B to Site A Once the reprotect is finished the Protection Groups and Recovery Plans will have been switched. This will now make site B the protected site and site A the recovery site for these particular plans. NOTE: As of the writing of this Technical Report, during the reprotect option, replication schedules from the recovery site B back to protected site A are not recreated. Re-establish replication schedules from site B to site A to meet the 37

organization SLAs put in place. Refer to Dell EqualLogic Storage Replication Adapter release notes. Test Now that the reprotect has occurred, run a test recovery to go from the now protected site B to the recovery site A. Verify the test plan is successful before committing to failing back the environment. Failback Once the testing has been verified it is time to failback to the original site A. There is no failback button since the roles of protected site and recovery site were reversed there is just the Recovery option. Run the recovery to bring the virtual machines from protected site B to recovery site A as normal. Reprotect Once the VMs are back on site A the promoted volumes will still need to be made permanent and replication back to site B re-established. To do this run the reprotect function one more time to make site A the protected site again and site B the recovery site. Once this final reprotect is finished your complete site failback is finished. Continue to run tests and update protection groups and recovery plans as needed to protect the environment. NOTE: Due to how vcenter rescans the failover/failback volume, the datastore may show up with the nomenclature snap-****datastorename. During the reprotect stage this datastore name is used for reprotection. If the datastore is renamed, refresh the devices under Array Managers. Recovery Scenario 2: Re-establish SRM If the SAN on the original protected site A is considered a new production environment with no prior data residing on it, the failback process will be much like 38

configuring SRM from the beginning as described in this document with the recovery site B now taking on the role of the protected site and the old protected site A temporarily becoming a recovery site. In this scenario there are a few steps that must be performed on the SAN before starting the process to failback. Make Promotions Permanent 1. During the full recovery the replicas on the recovery site B were promoted with the ability to fail back. Since there is nothing to fail back to, these volumes need to be promoted permanently to reverse the replication process. From the Group Manager GUI on site B select each volume that was promoted during the recovery process. In the Activities tab click Make Promote Permanent. 2. Enter a new name for the volume, select the Storage pool assignment and click Next. 3. In Step 2 you can add an additional iscsi access. The volume was already given the access control list from SRM when the volume was promoted. NOTE: If you select No Access it does not remove the existing access list. Choose No Access or add another access control and click Next. 39

4. Verify the settings are correct and click Finish to make the promotion permanent. Once this is done the volume cannot utilize the Fast Failback option and any failback must include the reconfiguration of replication to the other partner group. 5. Do this step for all volumes that were promoted during the SRM failover. 6. Now that the volumes have been promoted, treat this as a new configuration for SRM with the failover site B becoming the new protected site and the original protected site A becoming a temporary recovery site. Configure Replication Partnership Since this example assumes that there is no partnership established between the current protected site B and the new recovery site A, recreate the replication partnership as detailed earlier. If the new group will have different group information, delete the current partnership that exists on the currently running site B and recreate it with the new information. Configure Volume Replication Configure replication for each of the volumes that were promoted that contain data for the virtual environment. Once the volumes are configured, create a replication schedule for each volume as well. 40