BC/DR Strategy with VMware VMware vforum, 2014 Andrea Teobaldi Systems Engineer @teob77 2014 VMware Inc. All rights reserved.
What s on the agenda? Defining the problem Definitions VMware technologies that provide BC and DR vsphere HA and FT vsphere App HA vsphere Data Protection / Advanced vsphere Replication vcenter Site Recovery Manager (SRM) vcloud Hybrid Service - Disaster Recovery
IT Business Continuity
Is It a Real Problem?
What s the Difference? Disaster Avoidance Disaster Recovery Planned vs. Unplanned
Disaster Recovery vs. Business Continuity Example: Tuesday, August 23, 2011 at 1:51 PM EDT - Magnitude 5.8 earthquake near Mineral, Virginia Disaster recovery required? No Interruption to business? YES!
Fault Tolerance vs. High Availability Fault tolerance Ability to recover from component loss Example: Hard drive failure X High availability Uptime percentage in one year Downtime in one year 99 3.65 days 99.9 8.76 hours 99.99 52 minutes 99.999 five nines 5 minutes
Quickly define Recovery Point Objective (RPO) Amount of data loss that can be incurred Recovery Time Objective (RTO) How long it should take to recover Work Recovery Time (WRT) Determines the maximum tolerable amount of time that is needed to verify the system and/or data integrity Maximum Tolerable Downtime (MTD) Downtime that can occur before significant loss is incurred Examples: Financial, reputation
vsphere HA Keep In Mind RTO measured in minutes (not seconds) Requires shared storage Best practices Use admission control percentage policy Test post-failure performance with host maintenance mode Isolation response leave powered on Network and storage redundancy
vsphere Fault Tolerance (FT) Zero recovery time, data loss Host hardware failure only Does not protect against OS and application failure Works fine with HA, App HA Why not FT? Resource requirements does workload really need it? VM has multiple CPUs No VM snapshots backups require agent
Making an Application Service Highly Available Hyperic Agents Running in VMs vfabric Hyperic Virtual Appliance vsphere App HA Virtual Appliance vcenter Server vsphere vsphere vsphere vsphere vsphere HA Cluster
vsphere App HA New Protect off-the-shelf apps VMware vfabric tc Server Policy-based
Data Protection (Backup and Restore) Agents? No Agents? Both! No agents for majority of workloads keep it simple Agents for certain apps vsphere Data Protection (VDP) Advanced Backup and recovery for VMware, from VMware Based on proven, mature EMC Avamar Agent-less VM backup and restore Agents for granular tier-1 application protection
VDP Advanced Keep In Mind Uses VADP VM snapshots, CBT Utilizes Windows VSS in VMware Tools Works fine with HA, not with FT RDM virtual yes, physical no Is it DR? Maybe depends on RTO, RPO Needs replication offsite, right?
VDP Advanced Best Practice Prepopulate DNS, always use FQDN Manage VM snapshots Avoid deploying to slow storage Do not power-off, always shut down gracefully Do not schedule backups during maintenance window
vsphere Replication DR Native tool built into the platform Per-VM hypervisor replication, managed in VC Selectable RPO from 15 min up to 24 hours Selectable destination datastore (Disktype agnostic)
New Feature Retain Historical Replicas vsphere VR Agent Retention of multiple points in time allows reversion to earlier known good states After recovery, use the snapshot manager to revert to earlier points
vsphere Replication Interoperability HA, vmotion, DRS Storage vmotion and Storage DRS Now supported! VDP Mostly no problem! If using VSS ensure you are using 5.5!! Fault tolerance Doesn t work with VR FT conflicts at the vscsi disk filter level.
vsphere Replication Best Practices RPO Only what is necessary! Just because you can RTO Don t set one! No testing, no automation, manual process. VSS Only if necessary! What about bandwidth? Very hard to determine, do a local loopback first http://blogs.vmware.com/vsphere/2014/04/vsphere-replication-capacityplanning-appliance.html RDMs? Don t use them. If you must, use virtual compatible. Don t mix ABR and VR!
SRM What is it? A Disaster Recovery engine A tool that uses externally replicated data (VR or array based) to speed the RTO of a BCP A product that allows for DR to be tested, automated, planned, repeatable and customizable What is it not? A replication engine A tool for systems that need near-instant RPO A disaster avoidance stretched cluster
Key Components of SRM vcenter Server One vcenter Server (Windows or VCVA) per site, same versions Replication SRM Server One SRM Server per site, same versions vsphere hosts, recommend same versions per site (pre vsphere 5.x only if using array replication) vsphere Essentials Plus and higher editions supported
SRM Replication Options Storage-based Replication Multi-tier App LUN 1 Web App Hub DB LUN 2 SRM can utilize BOTH array based AND vsphere Replication SRM will see existing standalone vsphere Replication protected VMs vsphere Replication Multi-tier App Web SRM can install vsphere Replication from scratch if needed App DB
Recovery Workflows Failover Automation User defined recovery plan Minimize errors Non-disruptive Failover Testing Isolated test environment Increase confidence in DR process Planned Migration Zero data loss Operational migration Failback Automation Re-protect VM s, migrate back
SRM Interoperability Works with VR and- ABR Backups, VADP or other are fine HA is no problem at all vmotion and DRS are fine Storage vmotion and Storage DRS Sort of Replication Dependent FT is yellow Array replicated only and the FT status is not recovered Web vs vsphere Client
vcloud Hybrid Service - Disaster Recovery Simple and secure asynchronous replication and failover for vsphere Warm standby capacity on vchs Self-service protection, failover and failback workflows per VM 15 min 1 24 hr. recovery point objective (RPO) 4 hours or less recovery time objective (RTO) Initial data seeding by shipping a disk Includes: 2x 7-day DR tests per year 30 days of recovered VM run time Site A (Primary) VMware vcenter Server VMware vsphere Servers vsphere Replication vchs, Site B (Recovery) US East Region UK London 1 Dependent on available bandwidth 25
Disaster Recovery New Core Class of Service Term Lengths: 1m, 12m, 24m, 36m subscriptions vcloud Hybrid Service Standard Servicer Tiers Minimum size: 10GHz vcpu 20GB vram Starts at 1 TB Dedicated Cloud Instance Virtual Private Cloud Instance New Instance Type as DR Service Tier 10 Mbps allocated 2 Public IPs 2 Tests* DR-VDC Instance
Disaster Recovery Add-On Options VMware vcloud Hybrid Service Disaster Recovery Standard Storage, Support, Bandwidth Compute (subscrip8on) Compute (one 8me) IP Address Offline Data Transfer Direct Connect Addi8onal Failover test
vsphere Provides The Best Foundation For Disaster Recovery in the Cloud Hybrid Aware: Seamless Integration with vchs Reduced costs by leveraging the cloud for DR Scale your protection capacity to meet variable demand Hardware-Independence: Flexible Infrastructure Eliminate the need for SAN or array-based replication Enable consistent recovery throughout data center lifecycle changes Encapsulation: Simple Application Protection Entire system including application, OS, and data is stored as virtual machine files Entire system can be protected with data protection tools 28
Fully integrated with vsphere Web Client Consistent management and operational best practices Single interface and common management Designed to integrate with vcloud Hybrid Service Doesn t require console hopping 29
Disaster Recovery System Requirements Primary Data Center VMware vsphere 5.0 or above vsphere Essentials Plus vsphere Standard vsphere Enterprise vsphere Enterprise Plus VMware vcenter 5.1 or above Includes vsphere Web Client vsphere Replication Appliance 5.6 1:1 mapping with vcenter* Public internet connectivity vcloud Hybrid Service DR subscription (DR Virtual Data Center instance) CONFIDENTIAL 30
Questions? CONFIDENTIAL 31
GRAZIE! R te a d r ico ee f i i v o s Vi a iam t t e p ck dba! O EM D i a NT I O P!