Business Continuity and Disaster Recovery Disaster-Proof Your Business Jon Bock Senior Product Marketing Manager October 16, 2008
Agenda Disaster Recovery Requirements and Challenges Disaster Recovery with VMware Automating Disaster Recovery with Site Recovery Manager Conclusions
Business Continuity: The Big Picture Business Continuity = Minimizing Downtime Availability expectations continue to increase 2.5 Cost of Downtime Per Hour RTO s decreasing from >24 hours to <12 hours Cost of downtime continues to rise Increasing dependence on x86 infrastructure $Millions per Hour 2.0 1.5 1.0 0.5 0.0 Energy Telecom Manufacturing Average Financial Svcs IT Insurance Retail Pharma Source: META Group Almost 60% of surveyed companies incurred significant financial damage as a result of systems failure in the past year -- Economist Intelligence Unit
Defining Disaster Recovery What is a disaster? Extended outage that requires an organization to recover IT services using alternate or rebuilt infrastructure What is not a disaster? Failure of an individual server A short service interruption Network failure Power failure Disk failure Server failure Storage failure Fire Flood Hurricane Earthquake
Requirements for Disaster Recovery Solutions Minimize Downtime Minimize Risk Control Cost 93% of companies that lost their data center for ten days or more due to a disaster filed for bankruptcy within one year of the disaster. --National Archives and Records Administration 92% of users surveyed acknowledged that their companies would face serious consequences if they had to implement their disaster recovery plans. --Dynamic Markets Ltd. 73% of executives expressed concern with the costs associated with maintaining a secondary data centre --Beacon Technology Partners Effective disaster recovery is a business imperative, but is very difficult to achieve
Challenges of Traditional Disaster Recovery Complex recovery processes and infrastructure???????? Dependent on perfect training, documentation, and execution Failure to meet recovery requirements > Recovery takes days to weeks > Recovery tests often fail > Significant IT time and resources consumed
VMware Infrastructure for Disaster Recovery Customers 55% of customers using virtualization for BC/DR Press Press Best Disaster Recovery Product of 2006 (TechTarget) (#1 reason for virtualization behind consolidation/resource utilization) Using VMware Infrastructure in our disaster recovery plans, we ve been able to reduce the time it takes to recover our critical systems by 50 percent. -- Ted Duncan, Education Datacenter, Florida Department of Education
Technology Components of Disaster Recovery Management Simplify and automate implementation, testing, and execution of recovery process Data Protect configuration, OS, and application data Infrastructure Provide infrastructure necessary to ensure successful recovery at lowest cost and complexity
Reduce Cost and Complexity of Recovery Infrastructure Eliminate hardware dependencies Reduce risk of failures during recovery Reduce ongoing management burden Production Failover Recovery Test/Dev Reduce infrastructure requirements Consolidate production and recovery Reuse servers from production for recovery VMware VMware Turn recovery site into productive resource Leverage recovery site for other workloads Resource guarantees ensure predictable resource allocation
Virtualization Simplifies Data Protection Virtual machines encapsulate an entire system in a few files on disk System Apps Data Physical Server = files on disk Virtual Machine Everything about a system stored in small number of files Simplifies copying and cloning of systems Simplifies provisioning systems for recovery and recovery testing Copyright 2006 VMware, Inc. All rights reserved.
Simplifying the Disaster Recovery Process Physical Virtual Configure hardware Restore VM Power on VM Install OS Configure OS < 4 hrs Install backup agent Start Single-step automatic recovery 40+ hrs Eliminate recovery steps No operating system re-install or bare-metal recovery No time spent reconfiguring hardware Standardize recovery process Consistent process independent of operating system and hardware
Hancock Bank: Before Virtualization DB HR App App DB HR App App Backup data Production Datacenter DR Hosting Provider Site > Unknown hardware makes and models > Recovery time unable to meet requirements > Failing back to primary site takes months
Hancock Bank: With Virtualization DB HR App App DB HR App App > Recovery to any hardware Production Datacenter Image and data backups DR Hosting Provider > Recovery time within 24 hour objective > Failback takes days, not months Without VMware Infrastructure, it would have taken us weeks to recover our critical systems when Hurricane Katrina hit our datacenter. VMware Infrastructure enabled us to get our critical systems up and running within 24 hours. ~ Scott Fontenette, Hancock Bank
VMware Site Recovery Manager Site Recovery Manager leverages VMware Infrastructure to deliver advanced disaster recovery management and automation Production VMware Infrastructure Recovery VMware Infrastructure Simplifies and automates disaster recovery workflows: Setup, testing, failover Turns manual recovery runbooks into automated recovery plans Provides central management of recovery plans from VirtualCenter Works with VMware Infrastructure to provide process automation for disaster recovery
Site Recovery Manager Target Scenario VirtualCenter Virtual Machines VMware Infrastructure Servers Storage Site Recovery Manager Site Recovery Manager > Manages and monitors recovery plans > Tightly integrated with VirtualCenter VMware Infrastructure Site Recovery Manager VirtualCenter > Requires ESX 3.0.2 or ESX Virtual 3.5 U1Machines > Requires VirtualCenter 2.5 U1 Storage VMware Infrastructure > iscsi or FibreChannel storage Servers Storage Partner Replication > Integrated via replication adapters created, certified and supported by replication Storage vendor Array Replication
Site Recovery Manager: User Interface Managed through VirtualCenter plug-in Key configuration steps
Disaster Recovery Setup Integrate with replication Identify which virtual machines are protected by replication configuration Map recovery resources Server resources, network resources, management objects Create recovery plans For virtual machines, applications, business units Convert manual runbook to preprogrammed response Customizable with scripting and callouts
Site Recovery Manager Replication Interface Replication adapters support: Array discovery Replicated LUN discovery Test & failover initiation Storage vendors are responsible for adapters Storage partners create, certify, and support replication adapters New adapters do not require Site Recovery Manager updates
Recovery Infrastructure Management with Site Recovery Manager Map resources from production to recovery Use resource pools to map server resources Map virtual machines to correct port groups and IP addresses
Site Recovery Manager: Resource Mappings Network port group mappings Resource pool mappings VirtualCenter folder mappings
Site Recovery Manager: Creating and Editing Recovery Plans Recovery plan editor Recovery plans for failure scenarios
Site Recovery Manager: Recovery Plan Example VM shutdown Attach virtual disks High priority VM recovery Normal priority VM recovery Cleanup after tests
Testing Replication Management Snapshot replicated LUNs before test Delete snapshots of replicated LUNs after test Network Management Change all virtual machines to a test port group before powering them on Customization and Extensibility Same breakpoints and callouts as failover sequence Extra breakpoints and callouts around the test bubble
Testing a Recovery Plan SRM enables you to Test a recovery plan by simulating a failover with zero downtime to the protected VMs in the protected site
Testing a Recovery Plan SRM enables you to Test a recovery plan by simulating a failover with zero downtime to the protected VMs in the protected site %.% %*$..,. / 0!" # $ % $ & #$ %"! )* ""+,-""+,( )*# $ % )* ""+,-""+,( )*"# %* $. ' (
Testing and Executing Recovery Plans Steps in recovery plan Status and time stamps When to execute User confirmation message
Failover Automation Detect site failures Raise alert when heartbeat lost User confirms outage and initiates failover process Granular failover initiation Manage replication failover Break replication Make replica visible to recovery hosts Execute recovery process Use pre-programmed plan Provide visibility into progress
Executing Failover
Executing Failover %.%. % *1.,!""! )* ""+,-""+,( "#.. *.* $ % )* ""+,-""+,( "# * %* '! "#$ %
Recovery Plan Reports SRM maintains history of recovery plan tests and execution Can export recovery plans and test results from SRM
Failback Failback without Site Recovery Manager Unregister protected virtual machines in the protected site VirtualCenter Working with your storage team, reverse data replication Re-inventory VM s at recovery site, restart and re-ip (manual or scripted) No recovery plan, limited testing capabilities, no audit trail With Site Recovery Manager SRM does not automatically configure failback failback is configured after failover Working with your storage team, reverse data replication Create a recovery plan in SRM to fail VM s from recovery site back to protected site Provides a documented recovery plan, ability to test before recovery, built-in audit trail
Datacenter Disaster Recovery VirtualCenter VMware Infrastructure SRM Replication VirtualCenter VMware Infrastructure SRM Customer Benefits: Automated disaster recovery across metro area Run active workloads at both sites Documented, managed recovery plan Servers: 6 servers, 50 VM s Storage: 3.0 TB SAN storage + replication VMware Software: VMware Infrastructure, VirtualCenter, Site Recovery Manager
VMware Impact on Disaster Recovery Expand disaster recovery protection Any virtualized workload can be protected with minimal incremental effort and cost Reduce time to recovery Single button kicks off recovery process Increase reliability of recovery Automation ensures repeatable recovery Hardware independence eliminates failures due to different hardware Easier testing based off of actual failover sequence allows more frequent and more realistic tests
VMware: The Safest Place To Run Applications Prevent planned outages Minimize downtime from unplanned outages Prevent unplanned outages Component NIC Teaming, Multipathing Server Storage DRS Maint. Mode, VMotion Storage VMotion HA VCB + ISVs, Data Recovery Fault Tolerance Data VCB + ISVs, Data Recovery Site Site Recovery Manager All available across physical hardware, operating system, and application
Special Promotions valid till 15 Dec 2008 Midsize Acceleration Kit VI-Ent for 6 processors + VC Foundation + 30 PSO Credits with 1-year Platinum Support & Subscription USD17,369 Enterprise Acceleration Kit VI-Ent for 8 processors + VCMS with 1-year Platinum SNS USD29,044 SRM Acceleration Kit VI-Ent and SRM for 6 processors + VCMS with 1-year Platinum SNS USD34,792 Visit VMware booth for details and other promotions
Questions?