Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. Copyright 2016, Oracle and/or its affiliates. All rights reserved.
Headache-free Split Brain Resolution Ian Cookson Product Manager for Oracle Clusterware
Program Agenda 1 2 3 4 5 Split Brain What is it? Clusterware Concepts, Part 1 Split Brain Resolution in Current Releases Clusterware Concepts, Part 2 Split Brain Resolution in Oracle Clusterware 12c Rel 2 Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 5
Program Agenda 1 2 3 4 5 Split Brain What is it? Clusterware Concepts, Part 1 Split Brain Resolution in Current Releases Clusterware Concepts, Part 2 Split Brain Resolution in Oracle Clusterware 12c Rel 2 Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 6
Split Brain What Does It Mean for Oracle Clusterware? a condition in which Oracle Clusterware believes that there is a communication failure between nodes Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 7
Split Brain What s Happening? 1. Private Interconnect Failure 2. believes it is the cluster believes it is the cluster 3. Now what? Integrity of the shared data is paramount! Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 8
Split Brain How to Resolve It? Status of Cluster Nodes? Is a Node dead? or unresponsive? Is it just a network issue? Surviving cluster cohorts? Two-node cluster is simple Priorities? Integrity of the shared data is paramount! Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 9
Program Agenda 1 2 3 4 5 Split Brain What is it? Clusterware Concepts, Part 1 Split Brain Resolution in Current Releases Clusterware Concepts, Part 2 Split Brain Resolution in Oracle Clusterware 12c Rel 2 Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 10
Clusterware Concepts, Part 1 Clusterware terms: Fencing Rebootless Node Fencing Node Eviction misscount disktimeout Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 11
Clusterware Concepts - Fencing Fencing conceptually fencing the node off from shared cluster resources Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 12
Clusterware Concepts Fencing Actions Two approaches to implement node fencing in Clusterware: Rebootless Node Fencing Node Eviction (reboot) Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 13
Clusterware Concepts When Does Fencing Occur? 30 seconds later When is a node fenced? misscount is exceeded disktimeout is exceeded CSSD CSSD 200 seconds later Voting File Integrity of the shared data is paramount! Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 14
Clusterware Concepts What is the Voting File? Mechanism by which Clusterware tests that it can: read/write to shared storage verifies cluster participation CSSD CSSD Voting File Integrity of the shared data is paramount! Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 15
Clusterware Concepts Common Causes for Fencing? Resource starvation (memory/cpu) I/O path disruption Outage on private interconnect Integrity of the shared data is paramount! Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 16
Program Agenda 1 2 3 4 5 Split Brain What is it? Clusterware Concepts, Part 1 Split Brain Resolution in Current Releases Clusterware Concepts, Part 2 Split Brain Resolution in Oracle Clusterware 12c Rel 2 Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 17
Split Brain Resolution in Current Releases Split Brain Condition Split Brain Condition Node C A Node D Which Cohort Survives? 3-node Cohort Survives (Node D is fenced) Voting File Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 18
Split Brain Resolution in Two-node Clusters Which Cluster Cohort Survives? 1 2 I ve Got a Headache Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 19
Split Brain Resolution in Larger Clusters Split Brain Condition 3 2 4 1 Node C A Node D Which Cohort Survives? Cohort with Lowest Node_Number Survives Voting File Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 20
Split Brain Resolution in Oracle Clusterware 12c Rel 1 Which Cluster Cohort Survives? 1 2 One I ve Still Got a Headache Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 21
Program Agenda 1 2 3 4 5 Split Brain What is it? Clusterware Concepts, Part 1 Split Brain Resolution in Current Releases Clusterware Concepts, Part 2 Split Brain Resolution in Oracle Clusterware 12c Rel 2 Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 22
Clusterware Concepts, Part 2 Clusterware terms: Cluster Resources Singleton Resources User-defined Resources ora* resources Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 23
Clusterware Concepts, Part 2 Clusterware terms: Cluster Resources Singleton Resources User-defined Resources ora* resources What is a Cluster Resource? Program, application or script NFS-mount VIP Service Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 24
Clusterware Concepts, Part 2 Clusterware terms: Cluster Resources Singleton Resources User-defined Resources ora* resources FINANCE Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 25
Clusterware Concepts User-Defined vs Ora* Resources [grid@~] crsctl status resource NAME=MYVIP TYPE=app.appviptypex2.type TARGET=OFFLINE STATE=OFFLINE User-Defined Resource crsctl NAME=ora.LISTENER_SCAN1.lsnr TYPE=ora.scan_listener.type TARGET=ONLINE STATE=ONLINE on Ora* Resource srvctl Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 26
Program Agenda 1 2 3 4 5 Split Brain What is it? Clusterware Concepts, Part 1 Split Brain Resolution in Current Releases Clusterware Concepts, Part 2 Split Brain Resolution in Oracle Clusterware 12c Rel 2 Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 27
Split Brain Resolution in Oracle Clusterware 12c Rel 2 If Everything Else is Equal Cohorts of equal size Cohorts are both viable for doing work ASM instance accessible Public network available Which Cohort Should be Fenced? Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 28
Split Brain Resolution Access to an ASM Instance Surviving cohort must have access to at least one ASM instance Which Cohort Should Survive? 1 Oracle ASM GI 2 If Everything Else is Equal Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 29
Split Brain Resolution Viable Public Network Interface Surviving cohort must have a viable public network interface Which Cohort Should Survive? 1 ASM 2 Oracle ASM GI Public Network If Everything Else is Equal Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 30
Split Brain Resolution in Oracle Clusterware 12c Rel 2 If Everything Else is Equal 1. Customer can designate which server(s) and resource(s) are critical 2. Clusterware will evaluate cluster resources on implied workload 3. Cluster cohort containing the lowest cluster node number Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 31
Split Brain Resolution User Input of What is Critical User Input for Resolving Split Brains css_critical for nodes and resources 1 2 srvctl modify service -css_critical {YES NO} One One crsctl set server css_critical {YES NO} Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 32
Split Brain Resolution Workload Generating Resources [grid@~] crsctl modify resource MYVIP -attr USER_WORKLOAD=yes [grid@~] crsctl stat res -w USER_WORKLOAD == yes NAME=MYVIP TYPE=app.appviptypex2.type TARGET=OFFLINE STATE=OFFLINE 1 Oracle ASM GI MYVIP 2 ASM Which Cohort Survives? Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 33
Split Brain Resolution More Singleton Database Instances Cohort with most singleton instances will survive Which Cohort Survives? 2 Singleton Instances 1 Oracle ASM GI ASM 0 Singleton Instances 2 Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 34
Split Brain Resolution Default Behaviour (Node Number) Cohort with lowest node number will survive 1 2 I ve Got a Headache, again Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 35
Headache-free Split Brain Resolution Split Brain What is it? Clusterware Concepts Split Brain Resolution Prior to Oracle Clusterware 12c Oracle Clusterware 12c Rel 1 Oracle Clusterware 12c Rel 2 Copyright 2016, Oracle and/or its affiliates. All rights reserved. Public 36