AN EFFICIENT ALGORITHM IN FAULT TOLERANCE FOR ELECTING COORDINATOR IN DISTRIBUTED SYSTEMS

Size: px
Start display at page:

Download "AN EFFICIENT ALGORITHM IN FAULT TOLERANCE FOR ELECTING COORDINATOR IN DISTRIBUTED SYSTEMS"

Transcription

1 International Journal of Computer Engineering & Technology (IJCET) Volume 6, Issue 11, Nov 2015, pp , Article ID: IJCET_06_11_005 Available online at ISSN Print: and ISSN Online: IAEME Publication AN EFFICIENT ALGORITHM IN FAULT TOLERANCE FOR ELECTING COORDINATOR IN DISTRIBUTED SYSTEMS Manoj Niranjan Rustamji Institute of Technology, BSF Academy, Tekanpur Mahesh Motwani Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal Cite this Article: Manoj Niranjan and Mahesh Motwani. An Efficient Algorithm in Fault Tolerance for Electing Coordinator in Distributed Systems. International Journal of Computer Engineering and Technology, 6(11), 2015, pp INTRODUCTION A distributed system consists of various self-governing computers [15]. The selfgoverning computers communicate to attain a common goal through a computer network. The distributed computing systems, predominantly computing and computer-based systems generally tolerate changes which are not desired, in their internal structure or external environment in regular working which can be referred to as faults[15]. A Fault may be an operational fault or design fault. Fault may occur more than once or once. The techniques to tolerate the fault are used to make a system fault tolerable. Checkpointing is a technique for fault tolerance which periodically records the state of the system in stable storage. The Checkpointing technique provides fault tolerance without requiring extra efforts from the programmer [1]. Any state that is saved periodically is called the checkpoint of the process [2,3]. A global state [4] [15] of a distributed system is a set of individual process states, on per process [2] [15]. Checkpointing may be either independent or coordinated checkpointing. In Independent checkpointing, each process takes checkpoint independently without any synchronization between the processes [15] [5]. In coordinated checkpointing, the processes coordinate their checkpointing actions in a manner so that the set of local checkpoints taken is consistent [6,7,8,9]. The current work suggests a new coordinated checkpointing algorithm that effectively selects a new coordinator process whenever the existing coordinator stops working due to any failure. In this algorithm, the election of new coordinator takes 46 editor@iaeme.com

2 An Efficient Algorithm In Fault Tolerance For Electing Coordinator In Distributed Systems less time and minimum network message transmission in comparison to existing algorithms. 2. EXISTING WORK In the existing work, to create a checkpoint the initiator communicates with other processes. In the checkpointing protocols, the global checkpoint may be inconsistent, if message communication takes place after checkpoint request of initiator. This is shown in fig. 1 in which m is the message which is sent by P 1 after receiving a checkpoint request from the initiator. The checkpoint will become inconsistent if m reaches to the process P 0 before the checkpoint request, because checkpoint c 0,x indorses that message m is received from process P 1, while checkpoint c 1,x states that it is not sent from P 1 [14] [15]. Non-Coordinator P 0 C 0,x Request for Checkpoint Coordinator Request for Checkpoint m Non-Coordinator P 1 C 1,x Figure 1 Message communication between P 0 and P 1 causing inconsistent checkpoint In other protocol, the message communication is permitted within a fixed time interval only which reduces message communications [10] [15]. This concept decreases the communication overhead. The main drawback of this protocol is that it fixes a particular process as coordinator. A coordinator process which is fixed for the entire system execution increases the probability of failure [15]. In another protocol, the coordinator process changes during entire system execution which reduces the probability of failure of coordinator process. The disadvantage of this protocol is that the communication of message happens at any time. Hence the communication overhead and output commit latency increase[11] [15]. The proposed protocol not only presents a new method for new coordinator selection in case of failure of coordinator, but also may be used to overcome to these shortfalls since a fixed time interval may be used for message communications to reduce the communication overhead. The proposed protocol overcomes to these shortfalls. The proposed protocol controls the message communication by allowing the message communications in a fixed time interval. This fixed time interval is called smart interval. This concept of smart interval minimizes the communication overhead [15]. Before the completion of process, if any non-coordinator process is not receiving any system messages (PREPARE CHECKPOINT or SAVE CHECKPOINT), then the process assumes that the coordinator process is not in working condition, i.e. failed. In this situation, the proposed algorithm starts working to select the new initiator editor@iaeme.com

3 Manoj Niranjan and Mahesh Motwani 3. SYSTEM MODEL Let us consider a system model consists of n processes, P 0, P 1,, P n-1. The no. of processes n do not change for the duration of execution. Let the i th checkpoint for k th i process is denoted as CP k i.e., initial checkpoint CP 0 1 k (i=0), first checkpoint CP k (i=1), second checkpoint CP 2 k (i=2) and so on [15]. The initial checkpoint is taken at the time of system initialization. The independent states, data structures, and computations are maintained by each process. The processes do not have shared memory and global clock. The communication among processes is made only by message passing. We are assuming that the underlying network guarantees reliable FIFO (First In First Out) delivery of messages between any pair of processes. The assumption of First in First Out delivery guarantees the message synchronization [15]. We have used the concept of smart interval in which the message communication took place only. The smart interval is a time interval which is elapsed between the control messages for checkpoint preparation and checkpoint taking. Any message which is sent within smart interval has to be logged and the process execution is continued. This enables handling of lost messages [13]. The control messages for checkpoint preparation and checkpoint taking to other processes is sent by initiator process [15]. In case of failure of initiator process at any moment, the process of selection of new initiator starts. Each process will have a priority and the process with highest priority will act as initiator. If the process with highest priority fails, then the process with the second highest priority i.e. highest priority-1 will be the coordinator. 4. PROTOCOL DESCRIPTION The checkpoint initiator process sends the message (checkpoint-prepare-requestmessage) to other processes to initiate checkpointing. Then the other processes respond to the initiator process by sending reply. If the reply from all processes is received within smart-interval then take-checkpoint-request-message is sent to all processes by initiator otherwise abort-checkpoint-request-message is sent. Initiator process prepares a Global checkpoint which is the set of local checkpoints of all processes. A local i th checkpoint for k th process is denoted by CP k i. The i th global checkpoint is denoted as set CP i ={CP 0 i, CP 1 i,, CP n-1 i } in a system of n processes. The i th global checkpoint CP i is said to be consistent if and only if j,k[0,n-1]:j k(cp j i CP k i ) where denotes the happened-before relation described by Lamport in [12] [15]. t is the maximum transmission delay of a message to reach to destination and T is the checkpointing interval. Here T>3t, since checkpoint interval (T) is obviously greater than smart-interval and the length of smart-interval is bound to be at least 3t to survive the transmission delay of control messages (checkpoint-prepare-requestmessage, response of checkpoint-prepare-request-message and take-checkpointrequest-message and each transmission will take at least t) and to enable logging of computational messages[15] editor@iaeme.com

4 An Efficient Algorithm In Fault Tolerance For Electing Coordinator In Distributed Systems Figure 2 Diagram showing message communication during smart interval Now, let us define the following terms: t prep =time stamp at which initiator process sends prepare request[15] t rec = time stamp at which prepare request is received by a process[15] T trns =maximum transmission time for message including permissible delay (which is t) [15] save_state (P i )=method that saves the current state of process P i [15] send(), receive()=methods for sending and receiving messages respectively. [15] 5. CHECKPOINTING PROCESS The checkpointing process starts with the system initialization. The initiator process starts the process of next checkpoint after time interval T (Time decided by the programmer) of previous checkpoint. The checkpoint-prepare-request-message is sent by initiator process P i to all other processes at t prep. Each process writes tentative checkpoint after sending response to the initiator on receiving checkpoint-preparerequest-message [15]. 1. Now, if response from all processes is received within (t prep +2*T trns ), the initiator process sends take-checkpoint-request-message to all processes. The tentative checkpoint is made permanent after receiving take-checkpoint-request-message from initiator process. This will save the states of all processes which are responsible for preparing a global checkpoint. The tentative checkpoint (which is prepared in response to checkpoint-prepare-request-message) is used to recover the failed process if one or more process fails after responding to checkpoint-prepare-request-message [15]. 2. Now suppose if one or more process does not respond to checkpoint-prepare-requestmessage, the initiator process sends abort-checkpoint-request-message to all processes. The tentative checkpoint is deleted after receiving this message. The copy of unacknowledged messages is logged in this case [15]. 3. Now if any process does not get any message, i.e., checkpoint-prepare-requestmessage or take-checkpoint-request-message within smart interval, it will assume that the initiator has failed. In this condition, the process of leader election will start editor@iaeme.com

5 Manoj Niranjan and Mahesh Motwani 6. LEADER-ELECTION As soon as a process knows that the initiator has failed, it starts the process of electing new initiator. Each process knows the priority number of rest of the processes. In case of failure of initiator, the process with next higher priority, i.e., (Highest Priority-1) will be the initiator. It sends the message to the process with second highest priority i.e. next initiator about electing new initiator. On receiving this message, the new initiator sends messages to all the remaining processes that I am the new initiator. The existing protocols, such as Bully algorithm and algorithm presented by Basu [15], take more time in comparison to presented algorithm. The network overhead of existing algorithms is also higher than presented algorithm. 7. LEADER ELECTION ALGORITHM Step-I Any non-initiator process executes this step Smart Interval Started//Start smart interval If checkpoint-prepare-request-message received from initiator Then Prepare the Checkpoint accordingly and Exit Else if no-message-received AND smart-interval-ended Then go to Step C Send message to process with PRIORITY=(HIGHEST PRIORITY-1) Update the initiator priority, i.e., HIGHEST PRIORITY=HIGEST PRIORITY-1 Step-II This step is executed at process with PRIORITY= (HIGHEST PRIORITY-1) Received message NEW-LEADER from any process Update initiator priority=myself Send message to all remaining processes with HIGHEST PRIORITY=HIGHEST PRIORITY-1 8. PERFORMANCE RESULTS The planned algorithm is simulated in Microsoft Windows Environment using JPVM library. The result shows that the leader election time of proposed protocol is lower than the existing protocols. This time difference is shown in Table-: Test Case Existing Algorithm Table 1 Result for proposed algorithm New Algorithm Difference editor@iaeme.com

6 An Efficient Algorithm In Fault Tolerance For Electing Coordinator In Distributed Systems Test Case Existing Algorithm New Algorithm Difference editor@iaeme.com

7 Thousands Manoj Niranjan and Mahesh Motwani New Algorithm Existing Algorithm CONCLUSION Above mentioned results show that the new algorithm takes lesser time than the existing algorithms in electing new coordinator as well as tolerating the faults. The Smart Interval reduces the message overhead because message communication is not allowed outside the Smart Interval. REFERENCES [1] Partha Sarathi Mandal, Checkpointing and Self-Stabilization for Fault- Tolerance in Distributed Systems, Ph.D. Thesis (2006) [2] D. Manivannan, R.H.B. Netzer & M. Singhal, Finding Consistent Global Checkpoints in a Distributed Computation, IEEE Trans. On Parallel & Distributed Systems, Vol.8, No.6, pp (June 1997) [3] D. Manivannan, Quasi-Synchronous Checkpointing: Models, Characterization, and Classification ; IEEE Trans. On Parallel and Distributed Systems, Vol. 10, No. 7, pp (July 1999) [4] J. Tsai & S. Kuo, Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability ; IEEE Trans. On Parallel & Distributed Systems, Vol.9, No. 10, pp (October 1998) [5] B. Bhargava and S.R. Lian, Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems-An Optimistic Approach, Proceeding of IEEE Symposium on Reliable Distributed Systems, pp (1988) [6] Jiannong cao, Weijia jia,xiaohua Jia, and To-yat cheung, Design and Analysis of an Efficient Algorithm for Coordinated Checkpointing in Distributed systems, Proc. Of Advances in Parallel and Distributed Computing, pp (March 1997) [7] Guohong Cao, and Mukesh Singhal, On Coordinated Checkpointing in Distributed Systems, IEEE Transactions On Parallel And Distributed Systems, Vol. 9, No. 12, pp (Dec.1998) [8] Sharma D. D. and Pradhan D. K., An Efficient Coordinated Checkpointing Scheme for Multicomputers, Proc. IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, pp (June 1994) 52 editor@iaeme.com

8 An Efficient Algorithm In Fault Tolerance For Electing Coordinator In Distributed Systems [9] E.N. Elnozahy, D.B. Johnson, and W. Zwaenepoel, The Performance of Consistent Checkpointing, Proc. 11 th Symp. Reliable Distributed Systems, pp (Oct. 1992) [10] Ch. D.V. Subba Rao and M.M. Naidu, A New, Efficient Coordinated Checkpointing Protocol Combined with Selective Sender-Based Message Logging, IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008, pp (2008) [11] Sarmistha Neogy, Anupam Sinha, Pradip K Das, CCUML: A Checkpointing Protocol for Distributed System Processes, IEEE Transactions on TENCON 2004, IEEE Region 10 Conference, Volume B, Nov. 2004, Page(s): (2004) [12] K.M. Chandy & L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems, ACM Trans. On Computer Systems, Vol. 3, no., Feb 1985, pp (1985) [13] Ch. D.V. Subba Rao and M.M. Naidu, A Survey of Error Recovery Techniques in Distributed Systems, Proc. 28 th Annual Convention and Exihibition of IEEE India Council, pp (December 2002) [14] E.N. (Mootaz) Elnozahy, Lorenzo Alvisi, Yi-Min Wang and David B.Johnson, A Survey of Rollback-Recovery Protocols in Message-Passing Systems, ACM Computing Surveys (CSUR), Volume 34, Issue 3 (September 2002) Page(s): (2002) [15] Jagdish Makhijani, Manoj Kumar Niranjan, Mahesh Motwani, A K Sachan, Anil Rajput, An Efficient Protocol using Smart Interval using Coordinated Checkpointing, Communications in Computer and Information Science, 2011, ISBN: (Print) (Online) [16] Partha Das and Sushabhan Biswas, Fault Tolerance and Power Quality Study of DFIG Based Wind Turbine System, International Journal of Electrical Engineering & Technology, Volume 5, Issue 5, 2014, pp editor@iaeme.com

MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS

MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS Ruchi Tuli 1 & Parveen Kumar 2 1 Research Scholar, Singhania University, Pacheri Bari (Rajasthan) India 2 Professor, Meerut Institute

More information

CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR

CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR CHECKPOINTING WITH MINIMAL RECOVERY IN ADHOCNET BASED TMR Sarmistha Neogy Department of Computer Science & Engineering, Jadavpur University, India Abstract: This paper describes two-fold approach towards

More information

A Token Ring Minimum Process Checkpointing Algorithm for Distributed Mobile Computing System

A Token Ring Minimum Process Checkpointing Algorithm for Distributed Mobile Computing System 162 A Token Ring Minimum Process Checkpointing Algorithm for Distributed Mobile Computing System P. Kanmani, Dr. R. Anitha, and R. Ganesan Research Scholar, Mother Teresa Women s University, kodaikanal,

More information

A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm for Mobile Distributed System

A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm for Mobile Distributed System A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm for Mobile Distributed System Parveen Kumar 1, Poonam Gahlan 2 1 Department of Computer Science & Engineering Meerut Institute of Engineering

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks

Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks J. Parallel Distrib. Comput. 64 (4) 649 661 Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks Partha Sarathi Mandal and Krishnu Mukhopadhyaya* Advanced Computing and

More information

Checkpointing and Rollback Recovery in Distributed Systems: Existing Solutions, Open Issues and Proposed Solutions

Checkpointing and Rollback Recovery in Distributed Systems: Existing Solutions, Open Issues and Proposed Solutions Checkpointing and Rollback Recovery in Distributed Systems: Existing Solutions, Open Issues and Proposed Solutions D. Manivannan Department of Computer Science University of Kentucky Lexington, KY 40506

More information

Enhanced N+1 Parity Scheme combined with Message Logging

Enhanced N+1 Parity Scheme combined with Message Logging IMECS 008, 19-1 March, 008, Hong Kong Enhanced N+1 Parity Scheme combined with Message Logging Ch.D.V. Subba Rao and M.M. Naidu Abstract Checkpointing schemes facilitate fault recovery in distributed systems.

More information

An Efficient Approach of Election Algorithm in Distributed Systems

An Efficient Approach of Election Algorithm in Distributed Systems An Efficient Approach of Election Algorithm in Distributed Systems SANDIPAN BASU Post graduate Department of Computer Science, St. Xavier s College, 30 Park Street (30 Mother Teresa Sarani), Kolkata 700016,

More information

A SURVEY AND PERFORMANCE ANALYSIS OF CHECKPOINTING AND RECOVERY SCHEMES FOR MOBILE COMPUTING SYSTEMS

A SURVEY AND PERFORMANCE ANALYSIS OF CHECKPOINTING AND RECOVERY SCHEMES FOR MOBILE COMPUTING SYSTEMS International Journal of Computer Science and Communication Vol. 2, No. 1, January-June 2011, pp. 89-95 A SURVEY AND PERFORMANCE ANALYSIS OF CHECKPOINTING AND RECOVERY SCHEMES FOR MOBILE COMPUTING SYSTEMS

More information

A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems

A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems Rachit Garg 1, Praveen Kumar 2 1 Singhania University, Department of Computer Science & Engineering, Pacheri Bari (Rajasthan),

More information

A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed System

A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed System 2682 A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed System Sudha Department of Computer Science, Amity University Haryana, India Email: sudhayadav.91@gmail.com Nisha Department

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Novel low-overhead roll-forward recovery scheme for distributed systems

Novel low-overhead roll-forward recovery scheme for distributed systems Novel low-overhead roll-forward recovery scheme for distributed systems B. Gupta, S. Rahimi and Z. Liu Abstract: An efficient roll-forward checkpointing/recovery scheme for distributed systems has been

More information

Time Synchronous Adaptive Rollback Recovery Protocol for Mobile Distributed Systems

Time Synchronous Adaptive Rollback Recovery Protocol for Mobile Distributed Systems Time Synchronous Adaptive Rollback Recovery Protocol for Mobile Distributed Systems Monika Nagpal 1, Parveen Kumar 2, Surender Jangra 3 1 Research Scholar, Deptt. of CSE, Singhania University Pacheri Bari

More information

Election Administration Algorithm for Distributed Computing

Election Administration Algorithm for Distributed Computing I J E E E C International Journal of Electrical, Electronics and Computer Engineering 1(2): 1-6(2012) Election Administration Algorithm for Distributed Computing SK Gandhi* and Pawan Kumar Thakur* **Department

More information

A Survey of Rollback-Recovery Protocols in Message-Passing Systems

A Survey of Rollback-Recovery Protocols in Message-Passing Systems A Survey of Rollback-Recovery Protocols in Message-Passing Systems Mootaz Elnozahy * Lorenzo Alvisi Yi-Min Wang David B. Johnson June 1999 CMU-CS-99-148 (A revision of CMU-CS-96-181) School of Computer

More information

International Journal of Distributed and Parallel systems (IJDPS) Vol.1, No.1, September

International Journal of Distributed and Parallel systems (IJDPS) Vol.1, No.1, September DESIGN AND PERFORMANCE ANALYSIS OF COORDINATED CHECKPOINTING ALGORITHMS FOR DISTRIBUTED MOBILE SYSTEMS Surender Kumar 1,R.K. Chauhan 2 and Parveen Kumar 3 1 Deptt. of I.T, Haryana College of Tech. & Mgmt.

More information

Fault-Tolerant Computer Systems ECE 60872/CS Recovery

Fault-Tolerant Computer Systems ECE 60872/CS Recovery Fault-Tolerant Computer Systems ECE 60872/CS 59000 Recovery Saurabh Bagchi School of Electrical & Computer Engineering Purdue University Slides based on ECE442 at the University of Illinois taught by Profs.

More information

Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks

Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks Southern Illinois University Carbondale OpenSIUC Publications Department of Computer Science 2008 Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks Bidyut Gupta Southern

More information

Study of various Election algorithms on the basis of messagepassing

Study of various Election algorithms on the basis of messagepassing IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727Volume 8, Issue 1 (Nov. - Dec. 2012), PP 23-27 Study of various Election algorithms on the basis of messagepassing approach

More information

Some Thoughts on Distributed Recovery. (preliminary version) Nitin H. Vaidya. Texas A&M University. Phone:

Some Thoughts on Distributed Recovery. (preliminary version) Nitin H. Vaidya. Texas A&M University. Phone: Some Thoughts on Distributed Recovery (preliminary version) Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 Phone: 409-845-0512 Fax: 409-847-8578 E-mail:

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 9, September 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Backup Two

More information

Novel Log Management for Sender-based Message Logging

Novel Log Management for Sender-based Message Logging Novel Log Management for Sender-based Message Logging JINHO AHN College of Natural Sciences, Kyonggi University Department of Computer Science San 94-6 Yiuidong, Yeongtonggu, Suwonsi Gyeonggido 443-760

More information

On the Relevance of Communication Costs of Rollback-Recovery Protocols

On the Relevance of Communication Costs of Rollback-Recovery Protocols On the Relevance of Communication Costs of Rollback-Recovery Protocols E.N. Elnozahy June 1995 CMU-CS-95-167 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 To appear in the

More information

An Analysis and Improvement of Probe-Based Algorithm for Distributed Deadlock Detection

An Analysis and Improvement of Probe-Based Algorithm for Distributed Deadlock Detection An Analysis and Improvement of Probe-Based Algorithm for Distributed Deadlock Detection Kunal Chakma, Anupam Jamatia, and Tribid Debbarma Abstract In this paper we have performed an analysis of existing

More information

A NON-BLOCKING MINIMUM-PROCESS CHECKPOINTING PROTOCOL FOR DETERMINISTIC MOBILE COMPUTING SYSTEMS

A NON-BLOCKING MINIMUM-PROCESS CHECKPOINTING PROTOCOL FOR DETERMINISTIC MOBILE COMPUTING SYSTEMS A NON-BLOCKING MINIMUM-PROCESS CHECKPOINTING PROTOCOL FOR DETERMINISTIC MOBILE COMPUTING SYSTEMS 1 Ajay Khunteta, 2 Praveen Kumar 1,Singhania University, Pacheri, Rajasthan, India-313001 Email: ajay_khunteta@rediffmail.com

More information

Distributed Fault-Tolerant Channel Allocation for Cellular Networks

Distributed Fault-Tolerant Channel Allocation for Cellular Networks 1326 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 18, NO. 7, JULY 2000 Distributed Fault-Tolerant Channel Allocation for Cellular Networks Guohong Cao, Associate Member, IEEE, and Mukesh Singhal,

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

Analysis of Distributed Snapshot Algorithms

Analysis of Distributed Snapshot Algorithms Analysis of Distributed Snapshot Algorithms arxiv:1601.08039v1 [cs.dc] 29 Jan 2016 Sharath Srivatsa sharath.srivatsa@iiitb.org September 15, 2018 Abstract Many problems in distributed systems can be cast

More information

Page 1 FAULT TOLERANT SYSTEMS. Coordinated Checkpointing. Time-Based Synchronization. A Coordinated Checkpointing Algorithm

Page 1 FAULT TOLERANT SYSTEMS. Coordinated Checkpointing. Time-Based Synchronization. A Coordinated Checkpointing Algorithm FAULT TOLERANT SYSTEMS Coordinated http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Chapter 6 II Uncoordinated checkpointing may lead to domino effect or to livelock Example: l P wants to take a

More information

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Web: http://www.cs.tamu.edu/faculty/vaidya/ Abstract

More information

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

Optimistic Message Logging for Independent Checkpointing. in Message-Passing Systems. Yi-Min Wang and W. Kent Fuchs. Coordinated Science Laboratory

Optimistic Message Logging for Independent Checkpointing. in Message-Passing Systems. Yi-Min Wang and W. Kent Fuchs. Coordinated Science Laboratory Optimistic Message Logging for Independent Checkpointing in Message-Passing Systems Yi-Min Wang and W. Kent Fuchs Coordinated Science Laboratory University of Illinois at Urbana-Champaign Abstract Message-passing

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Homework #2 Nathan Balon CIS 578 October 31, 2004

Homework #2 Nathan Balon CIS 578 October 31, 2004 Homework #2 Nathan Balon CIS 578 October 31, 2004 1 Answer the following questions about the snapshot algorithm: A) What is it used for? It used for capturing the global state of a distributed system.

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 10, October 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Synchronization. Clock Synchronization

Synchronization. Clock Synchronization Synchronization Clock Synchronization Logical clocks Global state Election algorithms Mutual exclusion Distributed transactions 1 Clock Synchronization Time is counted based on tick Time judged by query

More information

ISSN: Monica Gahlyan et al, International Journal of Computer Science & Communication Networks,Vol 3(3),

ISSN: Monica Gahlyan et al, International Journal of Computer Science & Communication Networks,Vol 3(3), Waiting Algorithm for Concurrency Control in Distributed Databases Monica Gahlyan M-Tech Student Department of Computer Science & Engineering Doon Valley Institute of Engineering & Technology Karnal, India

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 17 - Checkpointing II Chapter 6 - Checkpointing Part.17.1 Coordinated Checkpointing Uncoordinated checkpointing may lead

More information

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 Synchronization Part 2 REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 1 Outline Part 2! Clock Synchronization! Clock Synchronization Algorithms!

More information

Last Class: Clock Synchronization. Today: More Canonical Problems

Last Class: Clock Synchronization. Today: More Canonical Problems Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 12, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm

More information

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware Where should RC be implemented? In hardware sensitive to architecture changes At the OS level state transitions hard to track and coordinate At the application level requires sophisticated application

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Checkpointing HPC Applications

Checkpointing HPC Applications Checkpointing HC Applications Thomas Ropars thomas.ropars@imag.fr Université Grenoble Alpes 2016 1 Failures in supercomputers Fault tolerance is a serious problem Systems with millions of components Failures

More information

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Surender Kumar 1,R.K. Chauhan 2 and Parveen Kumar 3 1 Deptt. of I.T, Haryana College of Tech. & Mgmt. Kaithal-136027(HR), INDIA skjangra@hctmkaithal-edu.org 2 Deptt. of Computer Sc & Application, Kurukshetra

More information

Rollback-Recovery p Σ Σ

Rollback-Recovery p Σ Σ Uncoordinated Checkpointing Rollback-Recovery p Σ Σ Easy to understand No synchronization overhead Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart m 8

More information

Multi-cycle Deadlock Detection Algorithm for Distributed Systems

Multi-cycle Deadlock Detection Algorithm for Distributed Systems Asian Journal of Applied Science and Engineering, Volume 5, No 2/2016 ISSN 2305-915X(p); 2307-9584(e) Multi-cycle Deadlock Detection Algorithm for Distributed Systems Mohammad Ariful Islam 1*, Md. Serajul

More information

On the Effectiveness of Distributed Checkpoint Algorithms for Domino-free Recovery

On the Effectiveness of Distributed Checkpoint Algorithms for Domino-free Recovery On the Effectiveness of Distributed Checkpoint Algorithms for Domino-free Recovery Franco ambonelli Dipartimento di Scienze dell Ingegneria Università di Modena Via Campi 213-b 41100 Modena ITALY franco.zambonelli@unimo.it

More information

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

CMPSCI 677 Operating Systems Spring Lecture 14: March 9 CMPSCI 677 Operating Systems Spring 2014 Lecture 14: March 9 Lecturer: Prashant Shenoy Scribe: Nikita Mehra 14.1 Distributed Snapshot Algorithm A distributed snapshot algorithm captures a consistent global

More information

Applying Sequential Consistency to Web Caching

Applying Sequential Consistency to Web Caching Applying Sequential Consistency to Web Caching Francisco J. Torres-Rojas and Esteban Meneses Abstract Web caches have several advantages for reducing the server load, minimizing the network traffic and

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Journal of Electronics and Communication Engineering & Technology (JECET)

Journal of Electronics and Communication Engineering & Technology (JECET) Journal of Electronics and Communication Engineering & Technology (JECET) JECET I A E M E Journal of Electronics and Communication Engineering & Technology (JECET)ISSN ISSN 2347-4181 (Print) ISSN 2347-419X

More information

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation 230 The International Arab Journal of Information Technology, Vol. 6, No. 3, July 2009 An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation Ali Al-Humaimidi and Hussam Ramadan

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what

More information

Synchronization. Chapter 5

Synchronization. Chapter 5 Synchronization Chapter 5 Clock Synchronization In a centralized system time is unambiguous. (each computer has its own clock) In a distributed system achieving agreement on time is not trivial. (it is

More information

Message Logging: Pessimistic, Optimistic, Causal, and Optimal

Message Logging: Pessimistic, Optimistic, Causal, and Optimal IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 24, NO. 2, FEBRUARY 1998 149 Message Logging: Pessimistic, Optimistic, Causal, and Optimal Lorenzo Alvisi and Keith Marzullo Abstract Message-logging protocols

More information

Deadlock Managing Process in P2P System

Deadlock Managing Process in P2P System Deadlock Managing Process in P2P System Akshaya A.Bhosale Department of Information Technology Gharda Institute Of Technology,Lavel, Chiplun,Maharashtra, India Ashwini B.Shinde Department of Information

More information

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi DEPT. OF Comp Sc. and Engg., IIT Delhi Three Models 1. CSV888 - Distributed Systems 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1 Index - Models to study [2] 1. LAN based systems

More information

A Formal Model of Crash Recovery in Distributed Software Transactional Memory (Extended Abstract)

A Formal Model of Crash Recovery in Distributed Software Transactional Memory (Extended Abstract) A Formal Model of Crash Recovery in Distributed Software Transactional Memory (Extended Abstract) Paweł T. Wojciechowski, Jan Kończak Poznań University of Technology 60-965 Poznań, Poland {Pawel.T.Wojciechowski,Jan.Konczak}@cs.put.edu.pl

More information

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer - proposes a formal definition for the timed asynchronous distributed system model - presents measurements of process

More information

Consistent Checkpointing in Distributed Computations: Theoretical Results and Protocols

Consistent Checkpointing in Distributed Computations: Theoretical Results and Protocols Università degli Studi di Roma La Sapienza Dottorato di Ricerca in Ingegneria Informatica XI Ciclo 1999 Consistent Checkpointing in Distributed Computations: Theoretical Results and Protocols Francesco

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q Coordination 1 To do q q q Mutual exclusion Election algorithms Next time: Global state Coordination and agreement in US Congress 1798-2015 Process coordination How can processes coordinate their action?

More information

Exam 2 Review. Fall 2011

Exam 2 Review. Fall 2011 Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows

More information

The Cost of Recovery in Message Logging Protocols

The Cost of Recovery in Message Logging Protocols 160 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 12, NO. 2, MARCH/APRIL 2000 The Cost of Recovery in Message Logging Protocols Sriram Rao, Lorenzo Alvisi, and Harrick M. Vin AbstractÐPast

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: An Innovative Approach for Two Way Waiting Algorithm in Databases

More information

Analysis of Transaction and Concurrency Mechanism in Two Way Waiting Algorithm for different Databases

Analysis of Transaction and Concurrency Mechanism in Two Way Waiting Algorithm for different Databases Analysis of Transaction and Concurrency Mechanism in Two Way Waiting Algorithm for different Databases K.CHANDRA SEKHAR Associate Professer, Govt. Degree College(W),Madanapalli. Research Scholer,S.V.University,

More information

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure

More information

ERASURE-CODING DEPENDENT STORAGE AWARE ROUTING

ERASURE-CODING DEPENDENT STORAGE AWARE ROUTING International Journal of Mechanical Engineering and Technology (IJMET) Volume 9 Issue 11 November 2018 pp.2226 2231 Article ID: IJMET_09_11_235 Available online at http://www.ia aeme.com/ijmet/issues.asp?jtype=ijmet&vtype=

More information

Performance Analysis of Proactive and Reactive Routing Protocols for QOS in MANET through OLSR & AODV

Performance Analysis of Proactive and Reactive Routing Protocols for QOS in MANET through OLSR & AODV MIT International Journal of Electrical and Instrumentation Engineering, Vol. 3, No. 2, August 2013, pp. 57 61 57 Performance Analysis of Proactive and Reactive Routing Protocols for QOS in MANET through

More information

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems QUAZI EHSANUL KABIR MAMUN *, MORTUZA ALI *, SALAHUDDIN MOHAMMAD MASUM, MOHAMMAD ABDUR RAHIM MUSTAFA * Dept. of CSE, Bangladesh

More information

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan Distributed Synchronization EECS 591 Farnam Jahanian University of Michigan Reading List Tanenbaum Chapter 5.1, 5.4 and 5.5 Clock Synchronization Distributed Election Mutual Exclusion Clock Synchronization

More information

CLUSTERING BASED ROUTING FOR DELAY- TOLERANT NETWORKS

CLUSTERING BASED ROUTING FOR DELAY- TOLERANT NETWORKS http:// CLUSTERING BASED ROUTING FOR DELAY- TOLERANT NETWORKS M.Sengaliappan 1, K.Kumaravel 2, Dr. A.Marimuthu 3 1 Ph.D( Scholar), Govt. Arts College, Coimbatore, Tamil Nadu, India 2 Ph.D(Scholar), Govt.,

More information

Leader Election Algorithms in Distributed Systems

Leader Election Algorithms in Distributed Systems Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 6, June 2014, pg.374

More information

Multi-path Forward Error Correction Control Scheme with Path Interleaving

Multi-path Forward Error Correction Control Scheme with Path Interleaving Multi-path Forward Error Correction Control Scheme with Path Interleaving Ming-Fong Tsai, Chun-Yi Kuo, Chun-Nan Kuo and Ce-Kuen Shieh Department of Electrical Engineering, National Cheng Kung University,

More information

Today: Fault Tolerance. Reliable One-One Communication

Today: Fault Tolerance. Reliable One-One Communication Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues

More information

Adaptive Recovery for Mobile Environments

Adaptive Recovery for Mobile Environments This paper appeared in proceedings of the IEEE High-Assurance Systems Engineering Workshop, October 1996. Adaptive Recovery for Mobile Environments Nuno Neves W. Kent Fuchs Coordinated Science Laboratory

More information

Distributed Recovery with K-Optimistic Logging. Yi-Min Wang Om P. Damani Vijay K. Garg

Distributed Recovery with K-Optimistic Logging. Yi-Min Wang Om P. Damani Vijay K. Garg Distributed Recovery with K-Optimistic Logging Yi-Min Wang Om P. Damani Vijay K. Garg Abstract Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world

More information

Designing Issues For Distributed Computing System: An Empirical View

Designing Issues For Distributed Computing System: An Empirical View ISSN: 2278 0211 (Online) Designing Issues For Distributed Computing System: An Empirical View Dr. S.K Gandhi, Research Guide Department of Computer Science & Engineering, AISECT University, Bhopal (M.P),

More information

A Case for Two-Level Distributed Recovery Schemes. Nitin H. Vaidya. reduce the average performance overhead.

A Case for Two-Level Distributed Recovery Schemes. Nitin H. Vaidya.   reduce the average performance overhead. A Case for Two-Level Distributed Recovery Schemes Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-31, U.S.A. E-mail: vaidya@cs.tamu.edu Abstract Most distributed

More information

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems 11. Consensus. Paul Krzyzanowski Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one

More information

Chapter 14: Recovery System

Chapter 14: Recovery System Chapter 14: Recovery System Chapter 14: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Remote Backup Systems Failure Classification Transaction failure

More information

Chapter 17: Recovery System

Chapter 17: Recovery System Chapter 17: Recovery System Database System Concepts See www.db-book.com for conditions on re-use Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 16 - Checkpointing I Chapter 6 - Checkpointing Part.16.1 Failure During Program Execution Computers today are much faster,

More information

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures or predictable: exhibit a well defined failure behavior

More information

processes based on Message Passing Interface

processes based on Message Passing Interface Checkpointing and Migration of parallel processes based on Message Passing Interface Zhang Youhui, Wang Dongsheng, Zheng Weimin Department of Computer Science, Tsinghua University, China. Abstract This

More information

On Checkpoint Latency. Nitin H. Vaidya. Texas A&M University. Phone: (409) Technical Report

On Checkpoint Latency. Nitin H. Vaidya. Texas A&M University.   Phone: (409) Technical Report On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Phone: (409) 845-0512 FAX: (409) 847-8578 Technical Report

More information

Fault Tolerance Techniques in Grid Computing Systems

Fault Tolerance Techniques in Grid Computing Systems Fault Tolerance Techniques in Grid Computing Systems T. Altameem Dept. of Computer Science, RCC, King Saud University, P.O. Box: 28095 11437 Riyadh-Saudi Arabia. Abstract- In grid computing, resources

More information

PROCESS SYNCHRONIZATION

PROCESS SYNCHRONIZATION DISTRIBUTED COMPUTER SYSTEMS PROCESS SYNCHRONIZATION Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Process Synchronization Mutual Exclusion Algorithms Permission Based Centralized

More information

Concurrency Control in Distributed Database System

Concurrency Control in Distributed Database System Concurrency Control in Distributed Database System Qasim Abbas, Hammad Shafiq, Imran Ahmad, * Mrs. Sridevi Tharanidharan Department of Computer Science, COMSATS Institute of Information and Technology,

More information

Database management system Prof. D. Janakiram Department of Computer Science and Engineering Indian Institute of Technology, Madras

Database management system Prof. D. Janakiram Department of Computer Science and Engineering Indian Institute of Technology, Madras Database management system Prof. D. Janakiram Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 25 Basic 2-phase & 3-phase Commit protocol In the last lecture,

More information

Enhanced Live Migration of Virtual Machine Using Comparison of Modified and Unmodified Pages

Enhanced Live Migration of Virtual Machine Using Comparison of Modified and Unmodified Pages Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,

More information

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing Studying Different Problems from Distributed Computing Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing Problem statement: Mutual

More information

Clock and ordering. Yang Wang

Clock and ordering. Yang Wang Clock and ordering Yang Wang Review Happened- before relation Consistent global state Chandy Lamport protocol New problem Monitor node sometimes needs to observe other nodes events continuously Distributed

More information

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology Clock and Time THOAI NAM Faculty of Information Technology HCMC University of Technology Using some slides of Prashant Shenoy, UMass Computer Science Chapter 3: Clock and Time Time ordering and clock synchronization

More information

Distributed Systems

Distributed Systems 15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard

More information

Chapter 16: Distributed Synchronization

Chapter 16: Distributed Synchronization Chapter 16: Distributed Synchronization Chapter 16 Distributed Synchronization Event Ordering Mutual Exclusion Atomicity Concurrency Control Deadlock Handling Election Algorithms Reaching Agreement 18.2

More information