High Availability- Disaster Recovery 101

Similar documents
High Availability- Disaster Recovery 101

Are AGs A Good Fit For Your Database? Doug Purnell

New England Data Camp v2.0 It is all about the data! Caregroup Healthcare System. Ayad Shammout Lead Technical DBA

AlwaysOn Availability Groups: Backups, Restores, and CHECKDB

Microsoft SQL Server

Avoiding the Cost of Confusion: SQL Server Failover Cluster Instances versus Basic Availability Group on Standard Edition

Ryan Adams Blog - Twitter Thanks to our Gold Sponsors

SQL Server Myths and Misconceptions

HIGH-AVAILABILITY & D/R OPTIONS FOR MICROSOFT SQL SERVER

Index. Peter A. Carter 2016 P.A. Carter, SQL Server AlwaysOn Revealed, DOI /

Microsoft SQL Server Database Administration

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Designing Database Solutions for Microsoft SQL Server (465)

Performance Monitoring Always On Availability Groups. Anthony E. Nocentino

Microsoft SQL AlwaysOn and High Availability

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

SQL Server 2012 virtually out, Microsoft talks features, licensing

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

ECE Engineering Robust Server Software. Spring 2018

SQL Server DBA Course Content

SQL Server 2014 Training. Prepared By: Qasim Nadeem

Maintaining a Microsoft SQL Server 2005 Database Course 2780: Three days; Instructor-Led

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

DESIGNING DATABASE SOLUTIONS FOR MICROSOFT SQL SERVER CERTIFICATION QUESTIONS AND STUDY GUIDE

Microsoft SQL Server HA and DR with DVX

Eliminate Idle Redundancy with Oracle Active Data Guard

SQL Server Virtualization 201

SQL Server DBA Online Training

SQL Server HA and DR: A Simple Strategy for Realizing Dramatic Cost Savings

Ed Watson, MVP Ambassador of Mayhem. LinkedIn.com/in/WatsonEd

SQL Saturday Jacksonville Aug 12, 2017

MS SQL Server 2012 DBA Course Contents

EXAM Administering Microsoft SQL Server 2012 Databases. Buy Full Product.

6 Months Training Module in MS SQL SERVER 2012

COMPLETE ONLINE MICROSOFT SQL SERVER DATA PROTECTION

Microsoft SQL AlwaysOn and High Availability

Ryan Adams Blog - Twitter MIRRORING: START TO FINISH

Administering Microsoft SQL Server Databases

Windows Clustering 101

Patient C SQL Critical Care

ZYNSTRA TECHNICAL BRIEFING NOTE

SQL 2016 Always On High Availability

Microsoft. Designing Database Solutions for Microsoft SQL Server 2012

Microsoft Azure Windows Server Microsoft System Center

ScaleArc for SQL Server

SQL Server Availability Groups

Patient A SQL Critical Care Part 1: Health Triage Findings

Administering Microsoft SQL Server 2012/2014 Databases

2788 : Designing High Availability Database Solutions Using Microsoft SQL Server 2005

Case study: Building bi-directional DR. Joep Piscaer, VMware vexpert, VCDX #101

Swiss IT Pro SQL Server 2005 High Availability Options Agenda: - Availability Options/Comparison - High Availability Demo 08 August :45-20:00

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Course Outline: Designing, Optimizing, and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Database Architectures

Transactional Replication New Features for AlwaysOn AG in SQL Henry Weng Premier Field Engineer SQL Server & AI

Designing, Optimizing, and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

ADMINISTERING MICROSOFT SQL SERVER CERTIFICATION QUESTIONS AND STUDY GUIDE

SAP HANA Disaster Recovery with Asynchronous Storage Replication

Refresh a 1TB+ database in under 10 seconds

Veritas Storage Foundation for Windows by Symantec

Microsoft SQL Server Fix Pack 15. Reference IBM

5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992

Synergetics-Standard-SQL Server 2012-DBA-7 day Contents

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database

Chapter 11. SnapProtect Technology

How to be a Great Production DBA

Disaster Recovery and Business Continuity

Number: Passing Score: 800 Time Limit: 120 min File Version:

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

Maintaining a Microsoft SQL Server 2008 Database (Course 6231A)

Decrease IT Cost. Using SQL Server 2012 to. eguide. By Michael K. Campbell

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database

Sql Server 2000 Manually Run Maintenance Plan

White Paper. How to select a cloud disaster recovery method that meets your requirements.

Veritas Storage Foundation for Windows by Symantec

How to license Oracle Database programs in DR environments

Database Architectures

Azure Webinar. Resilient Solutions March Sander van den Hoven Principal Technical Evangelist Microsoft

Microsoft SQL Server" 2008 ADMINISTRATION. for ORACLE9 DBAs

WHITE PAPER. Header Title. Side Bar Copy. Header Title 5 Reasons to Consider Disaster Recovery as a Service for IBM i WHITEPAPER

Disaster Recovery Solutions for Oracle Database Standard Edition RAC. A Dbvisit White Paper By Anton Els

microsoft.

Microsoft SQL Server on Stratus ftserver Systems

TSM Paper Replicating TSM

From Single File Recovery to Full Restore: Choosing the Right Backup and Recovery Solution for Your Cloud Data

Administering Microsoft SQL Server 2012 Databases

OL Connect Backup licenses

Using Computer Associates BrightStor ARCserve Backup with Microsoft Data Protection Manager

SQL Server DBA Course Details

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

HP Designing and Implementing HP Enterprise Backup Solutions. Download Full Version :

SAP HANA. HA and DR Guide. Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

Protecting Microsoft SQL Server databases using IBM Spectrum Protect Plus. Version 1.0

Arcserve Unified Data Protection Virtualization Solution Brief

Offloaded Data Transfers (ODX) Virtual Fibre Channel for Hyper-V. Application storage support through SMB 3.0. Storage Spaces

Module 4 STORAGE NETWORK BACKUP & RECOVERY

Administering Microsoft SQL Server 2012 Databases

Virtual protection gets real

20462: Administering Microsoft SQL Server 2014 Databases

Transcription:

High Availability- Disaster Recovery 101 DBA-100 Glenn Berry, Principal Consultant, SQLskills.com

Glenn Berry Consultant/Trainer/Speaker/Author Principal Consultant, SQLskills.com Email: Glenn@SQLskills.com Blog: http://www.sqlskills.com/blogs/glenn Twitter: @GlennAlanBerry Regular presenter at worldwide conferences on hardware, scalability, and DMV queries Author of SQL Server Hardware Chapter author of Pro SQL Server Practices Chapter author of Professional SQL Server 2012 Internals and Troubleshooting Chapter author of MVP Deep Dives Volumes 1 and 2 Instructor-led training: Immersion Events Online training: http://pluralsight.com/ Consulting: health checks, hardware, performance, upgrades Become a SQLskills Insider: http://www.sqlskills.com/insider

3 Agenda Causes of downtime and data loss Planning a high availability strategy SQL Server 2017 high availability technologies Planning a disaster recovery strategy SQL Server 2017 disaster recovery methods

Definition of High Availability Availability means that something is able to be used as expected Example: The backend database behind a web site is able to service transactions High availability means that the something is protected by various technologies to prevent it from becoming unavailable Example: The backend database is protected with database mirroring so that it continues to be available if disaster strikes Users/apps are always able to do what they need to be able to do But what is the something mentioned above? 4

What is the Something? The something will vary by situation, and so will the protecting technologies Example: a table Could be protected by replication, or a solution that protects the whole database Example: a group of databases Could be protected by an Availability Group in SQL Server 2012 or newer Example: a server Could be protected by failover clustering Example: a data center Could be protected using SAN replication 5

Causes of Downtime and Data Loss Planned downtime Unplanned downtime 6

Reasons for Planned Downtime Performing database maintenance 7 Creating or rebuilding a nonclustered index Creating, dropping, or rebuilding a clustered index Enterprise Edition has online index operations, that help reduce this issue Performing batch operations Performing batch operations can cause downtime through blocking locks Performing an upgrade Installing a SQL Server Service Pack or Cumulative Update Announcing Updates to the SQL Server Incremental Servicing Model (ISM) http://bit.ly/1rzyitz Installing Windows or Microsoft Updates, updating drivers or firmware Use rolling upgrades to minimize your planned downtime

Reasons for Unplanned Downtime Data center failure Natural disasters, fire, power loss, failed network connectivity Server failure Failed power supply, failed CPU, failed memory, operating system crashes I/O subsystem failure Drive failure, a RAID controller failure, I/O subsystem software bug causing corruption Human error Dropping a table, deleting or updating data in a table without specifying a predicate, setting a database offline, or shutting down a SQL Server instance 8

Planning a High Availability Strategy Requirements Recovery Point Objective (RPO) The maximum allowable data-loss when a failure occurs http://bit.ly/2ay1gow Recovery Time Objective (RTO) The maximum allowable downtime when a failure occurs http://bit.ly/2apkldn Context for SLA requirements When specifying that a database must be available 99.99% of the time, is that 99.99% of 24x7 or is there an allowable maintenance window? 9

Allowable Downtime Availability % Downtime per Year Downtime per Month Downtime per Week Downtime per Day 90% 36.5 days 72 hours 16.8 hours 2.4 hours 99% 3.65 days 7.2 hours 1.68 hours 14.4 minutes 99.9% 8.76 hours 43.8 minutes 10.1 minutes 1.44 minutes 99.99% 52.56 minutes 4.38 minutes 1.01 minutes 8.66 seconds 99.999% 5.26 minutes 25.9 seconds 6.05 seconds 864.3 milliseconds 10

SQL Server 2017 HA-Related Technologies Backup and Restore Methods Component Redundancy Windows Failover Clustering AlwaysOn Availability Groups Basic Availability Groups Database Mirroring Transactional Replication Peer-to-Peer Replication Log Shipping 11

Backup and Restore Methods Recovery models Full, Bulk-Logged, and Simple Backup strategy Full backups, differential backups, and log backups Differential backups are very useful, but often neglected Backup compression, backup checksums, mirrored backups Recovery strategy Actually test restoring your backups and have a plan for how you will do it This is often ignored! Instant file initialization and backup compression can reduce restore times Keeping VLF counts under control reduces recovery time portion of a restore 12

Using a Secondary Restore Server It is very common to not regularly restore database backups People take regular backups, but very rarely (or never) actually restore them Then, they find out in an emergency that their database backups are no good It is also quite common for people not to run DBCC CHECKDB They are concerned about the resource usage on their production server(s) Consider using a Restore Server to restore your database backups You can restore each database and then run DBCC CHECKDB on it This can easily be automated. You can use an older server or new desktop machine 13

Component Redundancy It is important to have redundant components for a database server This helps avoid ever having to use your HA/DR technology You want to eliminate single points of failure where possible Multiple power supplies plugged into separate circuits Multiple network ports, plugged into separate network switches Appropriate RAID protection for all of your logical drives Hot-swappable components can help avoid down time Having some cold spares available is also a good idea 14

Component Redundancy vs. HA/DR All Microsoft HA/DR technologies have some failover duration Traditional FCI must move cluster resources and start SQL Server on the new node Availability groups and DBM require database property changes Log shipping requires a manual failover (scripts can semi-automate) It is much better to avoid some unplanned failovers with redundancy Component redundancy can help avoid unplanned failovers from hardware failures This improves your overall uptime statistics Take advantage of every possibility to make your server more robust The extra hardware cost involved is usually relatively small Be ready for resistance for financial reasons Keep in mind that this is a database server, not a web server 15

Windows Failover Clustering SQL Server failover cluster implemented on a Windows Server failover cluster Multiple nodes, one or more instances Requires shared storage, which is a single point of failure You can use SMB 3.0 file shares for SQL Server storage instead of a SAN tempdb database can be located on each node with SQL Server 2012 or newer Provides instance-level high availability All databases, logins, Agent jobs are included Failover time is longer than most other technologies Depends on how long crash recovery takes for each database Keep your VLF counts under control! 16

Availability Groups Availability group contains one or more user databases that failover together 17 Requires Windows failover cluster instance, but not shared storage Enterprise Edition-only feature, until SQL Server 2016 Availability database is a database that belongs in an AG Primary database is the read-write copy (limit 1) Secondary database is the read-only or non-readable copy Up to four on SQL Server 2012 and eight on SQL Server 2014, 2016, and 2017 Databases must be in Full recovery model at all times Offers relatively fast, automatic failover Can offload read-only activity, but no schema changes are allowed Makes it harder to use as a replacement for replication for reporting purposes

Basic Availability Groups (BAG) New feature added in SQL Server 2016 Standard Edition Basic AG enables a primary database to maintain a single replica. This replica can use either synchronous or asynchronous commit mode Asynchronous commit mode is a big advantage/improvement over DBM! Basic Availability Group Limitations Limit of one replica No read access on secondary replica No backups on secondary replica Only one database can be in a basic availability group BAG cannot be upgraded to a regular AG Basic availability groups are only supported on Standard Edition 18

Database Mirroring Database-level high availability, deprecated in SQL Server 2012 Still works in SQL Server 2016/2017, still a good solution for many scenarios Principal database and mirror database, on separate instances Only user databases can be mirrored Databases must be in Full recovery model Synchronous and asynchronous modes Must use synchronous mode with a witness for automatic failover Asynchronous mode is only allowed in Enterprise Edition Database mirroring offers very fast, automatic failover Only a single database, only one mirror 19

Transactional Replication Replication is a broad set of technologies that enable data to be copied and distributed between servers and then synchronized to maintain consistency You can replicate the entire database or just a portion of it Source database is a Publisher, destination is a Subscriber Log reader agent picks up all write activity from Publisher database This adds some read I/O workload to the log file Replication changes are temporarily stored in a Distribution database You can have multiple subscribers in multiple locations You can add additional indexes to subscriber databases for reporting 20

Peer-to-Peer Replication Database-level protection A form of transactional replication that lets you have multiple, writeable copies of a database These copies are often in different data centers Changes are sent to each peer database, and they eventually synchronize Often used for scalability purposes. HA is a secondary bonus May require application or database schema changes Example: identity columns Requires Enterprise Edition 21

Log Shipping Provides database-level protection Can have multiple copies in multiple locations Databases must be in FULL recovery model at all times Requires a manual failover (although you can write code to partially automate) Some data loss is possible (since last log backup that was copied over) Log shipping is most commonly used for DR purposes Can be used to protect against user error when you have a delayed restore Can be combined with most other HA technologies Does not add any extra performance overhead to primary database 22

High Availability Features by Edition Feature Name Enterprise Edition Standard Edition Express Edition Partial database availability Yes No No Backup compression Yes Yes No Database snapshots Yes No No Online index operations Yes No No Log shipping Yes Yes No Transactional replication Yes Yes Subscriber only Database mirroring Yes Yes, synch only Witness only Failover clustering Yes Yes No Availability Groups Yes No, until 2016 No 23

Planning a Disaster Recovery Strategy Designing a disaster recovery strategy is integral to designing a highly-available system Even with the most sophisticated redundancy, recovery from total loss of all data centers can only be done using backups What restores you need to be able to do depends on: What needs to be brought online first Data loss SLA (RPO) Downtime SLA (RTO) 24

A Good Disaster Recovery Plan Should be written by the most senior staff members who have seen the most failures Should be tested by the most junior staff members who may be on duty late at night It won t be the most senior people on call on a holiday Consider writing it for a non-dba to be able to follow Should be comprehensive Restore database from backups isn t good enough What if something goes wrong? Should consider human factors in a widespread disaster Should be tested regularly and updated after each test 25

More DR Planning Considerations Consider possible problems at each step: What if the server is physically damaged? What if the SAN is physically damaged? What if there is no power at the data center? What if there is no data center? Where are the off-site backups stored? What if the backups are corrupt? What if key staff members are unavailable? 26

DR Planning: People Issues Who gets notified first of failures? Who is responsible at each phase of recovery? Who is the sponsor that can resolve disputes about progress? Who needs to be kept informed of progress? Who has to authorize a failover? Who is in overall command of the DR effort? Contact info for everyone who may become involved? Which other teams need to be involved for success? How do you confirm the application is working after DR is complete? 27

HA/DR Testing Test the solution before going into production with various failures Pull out a drive, unplug a server Drop a table, truncate a table Unplug a network cable Sometimes called chaos monkey testing Try doing a bare metal install or a full restore from backups What if you can t meet your SLA requirements? Push back or tweak the strategy as appropriate Make sure management knows what is possible BEFORE going into production Perform regular real-life disaster testing IN production No other way to test it for real but easier said than done 28

Summary HA/DR is much more than just using a technology or feature Understand your RPO and RTO SLA requirements Understand your budget and infrastructure limitations Make sure you have a good backup/restore strategy, regardless of your HA/DR choices! Make sure you have good database backups regardless of what HA/DR techniques you are using! Keep in mind that you can combine HA/DR features to have a more robust solution 29

Resources Whitepaper: High Availability with SQL Server 2008 http://bit.ly/1xl8yej Whitepaper: Proven SQL Server Architectures for High Availability and Disaster Recovery http://bit.ly/1hvibue Microsoft SQL Server AlwaysOn Solutions Guide for High Availability and Disaster Recovery http://bit.ly/1jbom39 30

Additional Training SQL Server: Understanding, Configuring and Troubleshooting Database Mirroring http://bit.ly/2fgel5j SQL Server: Understanding, Configuring and Troubleshooting Log Shipping http://bit.ly/2bspubn SQL Server: Installing and Configuring SQL Server 2016 http://bit.ly/2vbwc76 31

Thank You