1 Disaster Recovery and Mitigation: Is your business prepared when disaster hits?
2 Our speaker today: Catherine Roy, Director of PMO at Hosting 15 years Project Management experience At HOSTING since 2012 Project Manager on over 100 Disaster Recovery Solutions PMP, ITIL, CSPO
3 Outline of Class Overview The Difference between Business Continuity and Disaster Recovery The Five Best Practices for Disaster Recovery Plans Common Mistakes in Disaster Recovery Plans Common Risks to Application Availability The first six steps you need when Declaring a Disaster Actual DR Project Plan and scenario DR run book
Overview 4
5 Why Invest in Disaster Recovery? Gartner estimates that only 35% of SMBs have a comprehensive disaster recovery plan in place According to research companies lose an average of $50,000 - $90,000 for every hour of downtime 80% of companies that suffer a major disaster and don t have any form of contingency planning go into liquidation within 18 months
History of DR 2001 Traditional Backups 2017 End-to-End Solutions - Anywhere File level backups Block Level Backups VM Replication Data Archiving Off-Site Storage Application Aware Array Replication 3 rd Party Cloud Multi-Cloud Recovery 2001 2007 2011 2015 2016 2017 Enabling Technologies Recovery
7 Pain Points Associated with DR Data loss Multiple culprits and threats Lack of a DR plan Complex and manual coordination between teams Multiple DR solutions/ platforms Managing multiple storage arrays High costs High workload overhead
8 DR Environments AWS/AZURE one way trip to DR Rollbacks to Prod. on Azure in 2017? SRM/Zerto/Veeam (replication tools)
Business Continuity/DR 9
10 Disaster Recovery and Business Continuity Not one in the same two different definitions Business Continuity ensures that an organization s business functions will continue to operate and withstand critical incidents and disasters Disaster Recovery is part of Business Continuity Business continuity often falls into Compliance, Security, Quality and/or Risk Management
Key Considerations for Business Continuity/Disaster Recovery What level of resilience are you building your production environment to? What needs to be recovered RPO (Recovery Point Objective) RTO (Recovery Time Objective) Application dependencies Organizational changes/it changes Upgrades, acquisitions, new employees 11
12 Key Considerations for Business Continuity How replication load will affect the production environment? Rate of change
DR Plans 13
14 Common Mistakes in Disaster Recovery Plans Overdependence on the Cloud Failure to set Disaster Recovery Standards Failure to Test
15 Five Best Practices for Disaster Recovery Plans Create Thorough Disaster Recovery Plans Assign Responsibilities Arrange for Alternate Sites and Equipment Maintain, Execute and Evaluate Testing Procedures IT Specific Disaster Procedures
16 Common Risks to Application Availability Human Error Network Failure Cloud provider downtime External Threats Application scalability limitations
17 Where do I begin? Current State Assessment Requirements Gathering Initial Disaster Recovery Planning Fit Assessment Deployment Testing and Mitigation Don t overdesign
Preparing for Disasters 18
19 Preparing for Disasters key questions to ask What constitutes a Disaster for your company? Earthquake, Fire, Tornado, Cyber threats, Human error Rank each disaster and discuss with your team what each response plan should be Factor in recovery time Recovery from a site outage may be a few days and recovery from a fire or earthquake could mean months What business factors are going to cause the need to re-evaluate DR?
20 Threat Assessments Type of Threat Probability Severity Score Notes HVAC Outage 4 3 12 We only have one unit Hurricane 3 3 9 Fire 1 4 4 Power Outage 4 1 4 Earthquake 1 2 2 We have a generator that can be refilled Tornado 1 2 2 Unlikely for our area Wild fire 1 1 1 Unlikely for our area
21 Scenario Threat Assessment PCI company located in San Francisco. Your data center is located off site in Silicon Valley and your DR site is located in AWS in Western Region. You have a 4 TB SQL DB with a high rate of change. SQL backups are to a separate datacenter on the east coast. Office hours are M-F, but you do have an interactive website with weekly maintenance on Saturdays at 1am pst to 6am pst. You have 100 critical servers, 200 non critical servers, and 3 applications that are hosted elsewhere that are considered critical. What threat assessments should you consider?
22 Threat Assessments Type of Threat Probability Severity Score Notes
23 Checklist for DR Planning Testing your DR Solution What is the order that your applications need to come online? (Databases 1 st?) HIPPA,PCI DSS/ requirements you will be required to emulate in DR? What resources need to be involved for testing?
24 Checklist for DR testing How long can you run Production on your DR solution if required? Was networking tested on your DR solution? How will your servers be networked? How will your users / customers connect to the DR location? (VPNs)
25 Run Book! Every DR solution must have a run book This is a playbook that walks through what happens in the event of a disaster with a step by step scenario Possible compliance/audit or insurance requirements for your business to have runbook
26 Frequency of Disaster Recovery Testing You have a DR cloud solution in place How often have you tested your solution? Pending on regulations and requirements, testing may take place once a year or up to 4 times a year. Disaster recovery exercises may take a few attempts depending on the complexity of your solutions The point of the exercise is to find what is missed or what doesn t work!
27 Failover Test vs. Disaster Recovery Site Test Failover Test: is the process of completely failing over a recovery plan with the intention of testing. There is nothing wrong at the production location. Disaster Recovery Site Test: It is very similar to a failover test with the only difference being that the production site remains the system of record. Meaning that the production stays up and running for production users. Only the Customer staff testing/verifying the failover test will access the failover site. Any changes made to the virtual machines at the failover site will be lost at the end of the test.
28 Final thoughts Differences between Disaster Recovery and testing Disaster Recovery Local Internet connectivity Communication In a real disaster your company email could be down If a large disaster is declared, phones may be down, have you worked out an alternative way to contact appropriate parties?
Scenario Planning 29
30 Sample Company Project Plan Task Name Duration Predecessors Start Finish Production Environment 46 days Wed 1/6/16 Wed 3/9/16 Storage 10 days Wed 1/6/16 Tue 1/19/16 NAS (4000GB) 10 days Wed 1/6/16 Tue 1/19/16 SAN (500GB) for file server second drive 6 days Tue 1/12/16 Tue 1/19/16 Oracle Installation and Configuration 28 days Mon 2/1/16 Wed 3/9/16 Oracle installed/configured for Production cluster 28 days Mon 2/1/16 Wed 3/9/16 Oracle installed/configured for Dev server 2 days 6 Mon 2/15/16 Tue 2/16/16 Dev restore from backup 10 days Tue 2/16/16 Mon 2/29/16 DR Environment 119 days Fri 10/16/15 Wed 3/30/16 Cloud Enterprise to Cloud Enterprise Recovery 119 days Fri 10/16/15 Wed 3/30/16 Single Firewall 15 days Mon 12/21/15 Fri 1/8/16 Live DB (physical) 84 days Fri 10/16/15 Wed 2/10/16 Server in stock 1 day Mon 12/21/15 Mon 12/21/15 Server racked and cabled 30 days 18 Mon 12/21/15 Fri 1/29/16 Server linked to network 1 day 16,19 Mon 2/1/16 Mon 2/1/16 OS deployment 1 day 20 Mon 2/1/16 Mon 2/1/16 Network troubleshooting 1 day 21 Tue 2/2/16 Tue 2/2/16 Environment QA 5 days 22 Wed 2/3/16 Tue 2/9/16 Provisioning complete 1 day 23 Tue 2/9/16 Wed 2/10/16 Restore backup to DR DB Server for testing 3 days Fri 10/16/15 Tue 10/20/15 Recovery Environment 61 days Tue 1/5/16 Wed 3/30/16 Production buildout complete 0 days Fri 1/29/16 Fri 1/29/16 Complete VPN or ICC to DR site 0 days 27 Tue 1/5/16 Tue 1/5/16 Deploy Cloud Recovery Infrastructure 22 days 28 Wed 1/6/16 Thu 2/4/16 Configure VMs for protection 0 days Tue 2/9/16 Tue 2/9/16 DR Data Sync 15 days 30 Wed 2/10/16 Tue 3/1/16 Recovery Plans Created 0 days 31 Wed 2/10/16 Wed 2/10/16 Configure portal functionality 0 days 32 Wed 2/10/16 Wed 2/10/16 Exercise Qualification Meeting 1 day Mon 3/21/16 Mon 3/21/16 Runbook complete 1 day 34 Tue 3/22/16 Tue 3/22/16 Pre-exercise verification meeting 1 day 35 Wed 3/23/16 Wed 3/23/16 Cloud Recovery Exercise(s) 5 days 36,25 Thu 3/24/16 Wed 3/30/16
31 Scenario begin your planning You work for a company that hosts multiple dating websites. Your CTO wants you to come up with a budget/plan to implement DR. Where do you begin? Addl notes. Environment currently hosted in private cloud Some PCI compliance
Declaring a Disaster 32
33 Steps for Declaring a Disaster The first 24 hours 1. Determine degree of disaster 2. Notify senior management 3. Notify VENDOR where you re DR environment is located 4. Notify your disaster recovery team 5. Notify users of the disruption of service 6. Implement Disaster Recovery Plan 7. Contact external vendor/contacts (software)
RUNBOOK 34
Q&A 35