External Reference Entity Method For Link Failure Detection in Highly Available Systems

Similar documents
Capturing and Formalizing SAF Availability Management Framework Configuration Requirements

Configuring High Availability (HA)

CIT 668: System Architecture

GR Reference Models. GR Reference Models. Without Session Replication

Performance Routing (PfR) Master Controller Redundancy Configuration

Firepower Management Center High Availability

SD-WAN Deployment Guide (CVD)

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays

Redundancy and Fail-Over

Abstract /10/$26.00 c 2010 IEEE

Synology High Availability (SHA)

Device Management Basics

Contingency Planning and Disaster Recovery

10 Having Hot Spare disks available in the system is strongly recommended. There are two types of Hot Spare disks:

BIG-IP Administrator Exam. Blueprint

Primary-Backup Replication

Design and Evaluation of a Socket Emulator for Publish/Subscribe Networks

Distributed Systems - I

Representing Product Designs Using a Description Graph Extension to OWL 2

Veritas Storage Foundation and High Availability Solutions Microsoft Clustering Solutions Guide for Microsoft Exchange 2007

Virtualization and High-Availability

WHITE PAPER. Header Title. Side Bar Copy. Header Title 5 Reasons to Consider Disaster Recovery as a Service for IBM i WHITEPAPER

Veritas Storage Foundation and High Availability Solutions Microsoft Clustering Solutions Guide for Microsoft SQL 2008

Computer-System Organization (cont.)

Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale

WHITE PAPER THE HIGHEST AVAILABILITY FEATURES FOR PRIMEQUEST

vsphere Availability Update 1 ESXi 5.0 vcenter Server 5.0 EN

VMware HA: Overview & Technical Best Practices

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz

Chapter 4. Fundamental Concepts and Models

INTERNATIONAL TELECOMMUNICATION UNION

PURE STORAGE PURITY ACTIVECLUSTER

WHITE PAPER AGILOFT SCALABILITY AND REDUNDANCY

primary Control Center, for the exchange of Real-time data with its Balancing

ExpressCluster X 1.0 for Windows

Dell EMC. VxBlock Systems for VMware NSX 6.2 Architecture Overview

SCOS-2000 Technical Note

ETHERNET. Clause 28 Auto-Negotiation Next Page Exchange Test Suite v2.3. Technical Document. Last Updated: Friday, February 03, :22 AM

Introduction to Parallel Programming and Computing for Computational Sciences. By Justin McKennon

Platform Availability Guarantee - EN

How to Implement High Availability for the SAS Metadata Server Using High Availability Cluster Multi-Processing (HACMP)

Highly Available Networks

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Added SerialNumber object to ECESSA-MIB Description Device serial number is readable via ECESSA-MIB::SerialNumber.0.

Overview. CPS Architecture Overview. Operations, Administration and Management (OAM) CPS Architecture Overview, page 1 Geographic Redundancy, page 5

Device Management Basics

Security Correlation Server Redundancy And Failover Guide

StarWind Virtual SAN Creating HA device using Node Majority Failover Strategy

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Identity Firewall. About the Identity Firewall

SRX Services Gateway Cluster Deployments Across Layer Two Networks. Deployment requirements for SRX cluster connectivity across layer two networks

IBM. IBM PC Servers. IBM SSA RAID Cluster Adapter. Hardware Maintenance Manual Supplement. October 1997

Avaya Aura TM System Platform R6.0.1 Service Pack Release Notes Issue 1.4

The Guide to... Hosted Exchange Disaster Recovery

Paper. It s all about input, connectivity, decision making, and output instructions.

HP StoreVirtual Storage Multi-Site Configuration Guide

A Graphical Interactive Debugger for Distributed Systems

Data Analysis 1. Chapter 2.1 V3.1. Napier University Dr Gordon Russell

SEC Appendix AG. Deleted: 0. Draft Version AG 1.1. Appendix AG. Incident Management Policy

Managing High-Availability and Elasticity in a Cluster. Environment

HP Certified Professional Implementing Compaq ProLiant Clusters for NetWare 6 exam #HP0-876 Exam Preparation Guide

Entity Relationship Modelling

ABSTRACT. Web Service Atomic Transaction (WS-AT) is a standard used to implement distributed

An Oracle White Paper May Oracle VM 3: Overview of Disaster Recovery Solutions

Disaster Recovery Solutions for Oracle Database Standard Edition RAC. A Dbvisit White Paper By Anton Els

What s New in VMware vsphere Availability

High Availability and MetroCluster Configuration Guide For 7-Mode

PRIMARY-BACKUP REPLICATION

CMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi

Paper Number: Architecting Highly Available Networking Environments. Pamela Williams Dickerman. Advanced Technology Consultant.

Apache Ignite Best Practices: Cluster Topology Management and Data Replication

Synology High Availability (SHA)

Tuesday, June 22, JBoss Users & Developers Conference. Boston:2010

QuickStart Guide vcenter Server Heartbeat 5.5 Update 1 EN

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

Consistency in Distributed Systems


Release Date File Name Avaya Aura System Platform R1.0 August 2009 vsp iso Avaya Aura System Platform R1.1 November vsp

University of New Hampshire InterOperability Laboratory Ethernet Consortium

MIMIX. Version 7.0 MIMIX Global Operations 5250

Clustered Data ONTAP 8.2 High-Availability Configuration Guide

Data Replication under Latency Constraints Siu Kee Kate Ho

Introduction to Distributed Systems

High Availability (HA) Feature Description

EBOOK. FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS

High Availability and Disaster Recovery features in Microsoft Exchange Server 2007 SP1

Dependability tree 1

Features Four popular platforms, same quality and ease of use

ExpressCluster X 2.1 for Windows

Hadoop and HDFS Overview. Madhu Ankam

HP ProLiant DL580 Generation 2 and HP ProLiant ML570 Generation 2 Server Hot-Add Memory. July 2003 (Second Edition) Part Number

Personal Grid Running at the Edge of Internet *

Step-by-Step Guide to Synchronous Volume Replication (Block Based) with multipath Active-Passive iscsi Failover supported by Open-E DSS V7

Traffic Flow, Inspection, and Device Behavior During Upgrade

vsphere Availability Update 1 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5 EN

SATA RAID For The Enterprise? Presented at the THIC Meeting at the Sony Auditorium, 3300 Zanker Rd, San Jose CA April 19-20,2005

CLUSTERING. What is Clustering?

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

HP StoreVirtual Storage Multi-Site Configuration Guide

Best Practices of Huawei SAP HANA TDI Active-Passive DR Solution Using BCManager. Huawei Enterprise BG, IT Storage Solution Dept Version 1.

Transcription:

IWODSS 2010 Montreal, 17-18 May, 2010 External Reference Entity Method For Link Failure Detection in Highly Available Systems Vilho Raatikka and Antoni Wolski IBM Finland, P.O. Box 265,, 00101 Helsinki, Finland {first_name.last_name}@fi.ibm.com Abstract. One of the biggest challenges in high-availability (HA) system is to prevent a situation called split brain, whereby there are two active des in the system, which both allow for updating the data. Split brain situation may occur when a de failure cant be distinguished from a communication link failure. External Reference Entity Method can be used to determine whether a communication link between the active and standby components of a HA system has failed or t. External Reference Entity (ERE) is a passive network component having the sole responsibility of responding to simple connectivity check requests like those of the ping protocol. With the ERE Method implemented in the active and standby des of an HA system, third (active) processing de is needed to detect active-standby link failures. 1 Introduction In highly available systems, the most common configuration is that of active-standby, called also hot spare, or hot standby [1] (HSB), or a 2N model [5]. In the HSB configuration, there are two peer computer des: active and standby. The active de is the one where the software service unit is running in the active HA state. At the standby de, the service unit is running in the standby HA state whereby it is being continuously updated with the latest service state information replicated from the active unit. Should the active unit or the active de fail, the failover is performed whereby the standby unit becomes the active one. A distinction should be made between a unit, or de, failure and the HSB communication link failure between the two. The HSB link might be implemented in a way making it a single point of failure. If the HSB link failure it t recognized (and interpreted as a de or unit failure), the result might be a split-brain situation whereby there are two active des, in the system. That might endanger data integrity, especially if the service offered is a database service. Traditionally, an HSB communication link failure in a two-de active-standby configuration is detected by using failure detection software residing in a third de. If there is an occurrence of a communication link loss at either of the active/standby des, the third de may use its own heartbeat connections to the active/standby des to detect the failure. If both des respond to heartbeat messages affirmatively, then the possibility of either active or standby de failure is excluded, and the conclusion is reached that a link failure between the active and standby de has

2 Vilho Raatikka and Antoni Wolski occurred. A third de approach can be also used as a part of a distributed HA framework functionality. A method requiring a third de for link failure detection is costly because of the added cost of the third de. Ather alternative method [2] of detecting link failures assumes using dual (or multiple) physical links between the active and standby des. In that solution, correlated failures (within a narrow time window) of both (or all) links is interpreted as a de failure. Here, an additional cost is born, in the form of parallel physical links. The motivation behind this work was to find a cost-efficient approach to link failure detection, especially in small systems where HA framework is used. The proposed ERE Method does t require any additional system component. It is assumed that a device suitable for ERE function already exists in the system. Such a device can be, for example, a network switch that is capable of responding to ping requests (most are). Ather possibility is to use any other computer de available in the system, for that purpose. Despite the used device, ERE doesn't require any additional software instance to be installed in the device. The ERE Method allows users to deploy a fully functional HA software in a minimal configuration of just two des: active and standby. The method has been implemented and tested in real production environments. 2 Description of the ERE Method 2.1 Architecture The configuration of a system utilizing the ERE method is shown in Fig.1. ERE External Reference Entity de Active de Part-link d Hub Standby de Active Unit Standby Unit HA Manager a c b HA Manager Fig. 1. Configuration of an HA system utilizing the ERE Method. Solid-line boxes represent computer des or other network-addressable components. The circle element represents a physical connectivity component like a network hub (repeater hub) or a network switch. Each of the components has a single physical network access point (a, b, c, d). The active and standby des are

External Reference Entity Method For Link Failure Detection in Highly Available Systems 3 interconnected by way of two part-links (a c and c b) that constitute a logical HSB link (a b) having the purpose of performing real-time service state replication between the active and standby service units. The ERE de is connected to both the active and standby des by way of existing network components like repeater hubs or network switches. The configuration in Fig.1 is a generalization in the sense that the ERE de can be collapsed with a switch, making the link d c nexistent. ERE can be implemented in such a way that it becomes a single point of failure. That, however, can be avoided by adding redundancy in a form of two ERE devices. HA Manager is a software component responsible for both failure detection, and failover mediation. The algorithms of both the failure detection and failover mediation are presented below. 2.2 Algorithms HA Manager failure detection algorithm Once the active and standby units lose their connections to the peer units, the corresponding HA Manager instances are made aware of that fact. Subsequently, the network failure resolution processing is enacted in both des. The algorithm is illustrated in Fig. 2. It is started by the HA Manager instances in both des by trying to reach ERE (a d, b d) with ping requests. Loss of HSB connection detected Ping ERE Try connect to the peer unit Success? Try connect to the peer unit Success? Success? Failure type = the partlink failure Exit ( action) Failure type = the peer unit failure Exit (Enter failover mediation) Fig. 2. Flowchart of the link failure detection algorithm.

4 Vilho Raatikka and Antoni Wolski If ERE is responsive, the peer unit is considered either failed or suffering from other failure in the de. If ERE is t responsive, then it is concluded that the part-link is broken. A broken part-link means that neither the HSB link can be established r the service can be offered over the network, from that de. However, at the peer de, the part-link may be operational deeming it possible to offer at least the service over the network. If both part-links are concluded to be broken, service can be offered. After the ping attempts, HA Manager instances reassure that their perception about the failure is correct by trying to re-establish connection with the other de (a b, b a). If the connection can w be established, then whatever the cause of the failure was it is assumed to has been intermittent, and further actions are taken. Otherwise, the failover mediation algorithm is started. HA Manager failover mediation algorithm After the failure type has been resolved, the failover mediation algorithm is enacted. If the de suffers from the part-link failure, that indicates that that unit cant be reached from the network and thus cant be used. Consequently, the unit's HA state is changed to standby. That is done to prevent the situation of two active units that are isolated from each other. The de for which the ERE is responsive (the part-link is functional) continues as the active de. If that de is a standby de, a failover is initiated by HA Manager, if necessary. The failover mediation algorithm is shown in Fig. 3. Enter failover mediation Part-link failure Failure type? Peer unit failure Active de? Active de? Change unit mode to standby Change unit mode to active (failover) Exit Fig. 3. Flowchart of the failover mediation algorithm.

External Reference Entity Method For Link Failure Detection in Highly Available Systems 5 2.3 Discussion In the configuration discussed, only the hub is a single point of failure. A hub failure invalidates the whole HSB link and the system cant operate. If a hub fails in a configuration shown in Fig. 1, t only the HSB link becomes broken but the failure detection algorithm (Fig. 2) executed in both des declares both part-links failed and, consequently, the failover mediation algorithm fails to select an active unit. A remedy here would be to introduce redundant hubs both connected to an ERE device, as shown in Figure 4. Here, if one hub fails, still one of the part-links can be declared operational and the corresponding unit selected as an active one. Fig. 4. Avoid single point of failure by using single physical part links, and redundant hubs. Ather possibility is to add a redundant hub and physical part-links, as shown in Figure 5. Here, if one hub fails, the other one can operate. Additionally, the configuration is more failure resilient than any of the previous ones because of a redundant HSB link. A single part-link failure will t even cause a failover because at least one end-to-end HSB link will be operational. Fig. 5. Avoid single point of failure by using redundant physical part links, and redundant ERE devices A single ERE de is t a single point of failure because ERE is t involved in the execution of the service. If the ERE de fails, that does t directly affect the execution of the service. The assumption is that the ERE de is restarted or replaced as fast as possible. If there happens a concurrent failure, i.e. the active de fails during the time ERE is iperable, the HA Manager will t be able to execute the failover until ERE is operational again. Thus, the system, as pictured above, can

6 Vilho Raatikka and Antoni Wolski handle only single failures (be it in ERE, in the part-links or in the service executing des). If higher failure resiliency is needed, more component redundancy can be applied. 2.4 Implementation and experience The ERE Method has been implemented in the IBM soliddb HA database product [3, 4, 6], starting from version 6.1 released in June 2008. It has ben received well in the field: most new soliddb HA applications utilize the ERE method although it is t enabled by default. 3 Conclusions The External Reference Entity Method can be used to reduce the number of computer components if a high availability system utilizing the hot-standby configuration. By using the method, hot-standby link failures can be detected without the need for a third active computing de. With the ERE method, only a passive network component is needed to respond to ping-like inquires. The corresponding failure detection algorithm is implemented in the HA Manager that is a software component deployed at both the active and standby des. HA Manager is also responsible for executing failovers. References [1] Gray, J. and Reuter, A.: Transaction Processing Systems, Concepts and Techniques. Morgan Kaufmann Publishers, 1992, ISBN 1-55860-190-2. [2] Carlson, W.C.: Failure Detection in a Symmetric System. Prior Art Database (ip.com), IP.COM number IPCOM000114622D, March 29, 2005. [3] Wolski, A. and Raatikka, V.: Performance Measurement and Tuning of Hot-Standby Databases. Third International Service Availability Symposium (ISAS 2006), May 15-16, 2006, Helsinki, Finland. [4] IBM soliddb High Availability User Guide, Version 6.5, IBM Corporation, November 2009, available at http://www-01.ibm.com/software/data/soliddb/. [5] SA Forum Application Interface Specification, Availability Management Framework (SAI- AIS-AMF-B.04.01), Release 6.1, February 19, 2010, available at http://www.saforum.org. [6] Wolski, A. and Hofhauser, B.: A Self-Managing High-Availability Database: Industrial Case Study. Proc. Workshop on Self-Managing Database Systems (SMDB2005), April 8-9, 2005, Tokyo, Japan.