A r c h i t e c t i n g SQL Server Availability Groups Without Losing Your S A N I T Y
Edwin Sarmiento Microsoft MVP/Microsoft Certified Master: SQL Server http://www.edwinmsarmiento.com edwin@edwinmsarmiento.com @EdwinMSarmiento http://ca.linkedin.com/in/edwinmsarmiento
no DEMOS
DESIGN at it s worst https://youtu.be/j-zczjxsxnw?t=1m12s
DESIGN the starting point of any implementation
DESIGN should be driven by REQUIREMENTS NOT TECHNOLOGY
let s start with
RECOVERABILITY Recovery Point Objective (RPO) How much data can I afford to lose?
EXAMPLE RPO 12 hours June 10, 2017 10:00AM Maximum data loss: -12 hours
when did the last backup occur
AVAILABILITY Recovery Time Objective (RTO ) When is my application coming back?
EXAMPLE RTO 12 hours June 10, 2017 10:00AM Maximum outage: +12 hours
COMMITMENT Service Level Agreement (SLA) What have we agreed upon?
COMMITMENT Service Level Agreement (SLA) What was promised?
Cost RPO/RTO/SLA versus Cost 70000 60000 50000 40000 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 10 11 Time
High Availability How do you increase uptime *? *typically in the SAME data center
High Availability How do you increase uptime *? *defined by RTO
Disaster Recovery How do you continue operation in case of catastrophic failure? *defined by RPO
Disaster Recovery How do you continue operation in case of catastrophic failure? *typically in a DIFFERENT data center
*also considers SAME data center failure Disaster Recovery How do you continue operation in case of catastrophic failure?
HA!= DR
SQL Server Availability Groups is NOT a new technology
SQL Server Availability Groups combination of DATABASE MIRRORING & FAILOVER CLUSTERING
how does it really WORK? FAILOVER CLUSTERING
stop service
stop service wait service
stop service wait service start service
failover clustering how it works SQL Server t r a d i t i o n a l 2-node cluster log data Windows Server Failover Cluster
failover clustering Application how it works SQL Server t r a d i t i o n a l 2-node cluster log data Windows Server Failover Cluster
failover clustering Application how it works SQL Server t r a d i t i o n a l 2-node cluster data log data *Availability Groups Windows Server Failover Cluster
what IS NOT commonly mentioned
failover clustering Application how it works AD/DNS SQL Server t r a d i t i o n a l 2-node cluster log data Windows Server Failover Cluster
WHY do we even care
we need to speak to the AD and DNS administrators
SQL Server Availability Groups basic architecture
Maximum of 5 replicas in SQL Server 2012 (1 primary and 4 secondary) and 9 replicas in SQL Server 2014/2016 (1 primary and 8 secondary) Primary Replica SQL Server Secondary Replica SQL Server Secondary Replica SQL Server log data log data log data Windows Server Failover Cluster
Application Commit secondary replicas are always running REDO Primary Replica SQL Server Secondary Replica SQL Server Secondary Replica SQL Server log data log data log data Windows Server Failover Cluster
AD/DNS 4 QUORUM SQL Server what about quorum determines the number of failures that the cluster can sustain while still remaining online. Windows Server Failover Cluster
AD/DNS SQL Server quorum majority VOTE wins * all NODES have a vote 4 QUORUM Windows Server Failover Cluster
decision-making based on MAJORITY can I get a VOTE?
AD/DNS SQL Server quorum types 4 QUORUM N o d e M a j o r i t y recommended for clusters with an ODD number of nodes Windows Server Failover Cluster
AD/DNS SQL Server quorum types 4 QUORUM N o d e & D i s k M a j o r i t y recommended for clusters with an EVEN number of local nodes Windows Server Failover Cluster
AD/DNS SQL Server quorum types 4 N o d e & D i s k M a j o r i t y uses a SHARED DISK as a WITNESS QUORUM Windows Server Failover Cluster
AD/DNS SQL Server quorum types 4 N o d e & D i s k M a j o r i t y this dates back to Windows NT 4.0 days QUORUM Windows Server Failover Cluster
AD/DNS SQL Server cluster + quorum 4!= N o d e & D i s k M a j o r i t y QUORUM Windows Server Failover Cluster
AD/DNS SQL Server quorum types Node & File Share Majority 4 QUORUM uses FILE SHARE as a WITNESS instead of a disk Windows Server Failover Cluster
AD/DNS SQL Server quorum types 4 QUORUM N o M a j o r i t y : D i s k O n l y can sustain failures of all nodes except one if the disk is online *NOT RECOMMENDED Windows Server Failover Cluster
AD/DNS SQL Server quorum types 4 QUORUM C l o u d W i t n e s s uses the concept of a file share witness on Microsoft Azure *Windows Server 2016 and higher Windows Server Failover Cluster
AD/DNS SQL Server quorum goal is to have ODD/MAJORITY 4 number of votes QUORUM Windows Server Failover Cluster
HOW do you choose the appropriate quorum model
HOW do you choose the appropriate quorum model let your REQUIREMENTS guide your choice
multi-subnet clusters WHERE do you place the quorum
ideally, it should be on a SEPARATE l o c a t i o n WHERE do you place the quorum
Windows Server Failover Cluster SQL Server SQL Server log data PRODUCTION log DR data
it should be in the SAME location as what you are trying to protect WHERE do you place the quorum
SQL Server Availability Groups COMMON DESIGN PATTERNS
SQL Server Availability Groups COMMON DESIGN PATTERNS What will be your QUORUM model?
SQL Server Availability Groups COMMON DESIGN PATTERNS What will be your REPLICATION MODE?
SQL Server Availability Groups COMMON DESIGN PATTERNS What will be your NETWORK CONFIGURATION?
SQL Server Availability Groups COMMON DESIGN PATTERNS What will be your LICENSING MODE?
2 Replicas, Stand-alone instances (HA only) PRODUCTION Primary Replica Secondary Replica SQL Server SQL Server log data log data Windows Server Failover Cluster
*You need to discuss this with your network team 3 Replicas, Stand-alone instances (HA + DR) PRODUCTION DR Primary Replica Secondary Replica * LowerQuorumPriorityNodeID Site Awareness 2012 2016R2 Secondary Replica * NO VOTE 2008/2012 SQL Server SQL Server SQL Server log data log data log data Windows Server Failover Cluster
*You lose the ability to do automatic failover (on the Availability Group-level) 2 Replicas, FCI + Stand-alone instance (HA + DR) PRODUCTION DR * LowerQuorumPriorityNodeID Site Awareness Primary Replica 2012 2016R2 Secondary Replica * NO VOTE 2008/2012 SQL Server SQL Server log data Windows Server Failover Cluster
*You lose the ability to do automatic failover (on the Availability Group-level) 3 Replicas, FCI + Stand-alone instance (HA + DR + ) PRODUCTION Primary Replica DR * LowerQuorumPriorityNodeID Site Awareness 2012 2016R2 Secondary Replica * NO VOTE 2008/2012 Additional Data Center * LowerQuorumPriorityNodeID Site Awareness 2012 2016R2 Secondary Replica * NO VOTE 2008/2012 SQL Server SQL Server SQL Server log data log data Windows Server Failover Cluster
*You lose the ability to do automatic failover (on the Availability Group-level) 2 Replicas, FCI (HA + DR) PRODUCTION Primary Replica DR * LowerQuorumPriorityNodeID Site Awareness * NO VOTE 2008/2012 2016 2012 R2 Secondary Replica SQL Server SQL Server Windows Server Failover Cluster
3 Replicas, Stand-alone instances (HA + DR) PRODUCTION DR Additional Data Center does this even make sense? SQL Server SQL Server SQL Server log data log data log data Windows Server Failover Cluster
in SUMMARY
DESIGN according to REQUIREMENTS
REFERENCES
REFERENCES AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery Solution by Using AlwaysOn Availability Groups http://bit.ly/1itbjai
How many of you are interested to DIVE DEEPER into Windows Server Failover Clustering for SQL Server
https://learnsqlserverhadr.com
https://learnsqlserverhadr.com
Edwin Sarmiento Microsoft MVP/Microsoft Certified Master: SQL Server http://www.edwinmsarmiento.com edwin@edwinmsarmiento.com @EdwinMSarmiento http://ca.linkedin.com/in/edwinmsarmiento Q & A QUESTIONS