Deadlocks in Distributed Systems: Request Models and Definitions

Similar documents
A Concurrent Distributed Deadlock Detection/Resolution Algorithm for Distributed Systems

Coordination and Agreement

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment

Distributed termination detection : General model and algorithms

Distributed Deadlock Detection

Distributed Snapshots: Determining Global States of Distributed Systems

Specifying and Proving Broadcast Properties with TLA

Distributed Mutual Exclusion Algorithms

Research Report. (Im)Possibilities of Predicate Detection in Crash-Affected Systems. RZ 3361 (# 93407) 20/08/2001 Computer Science 27 pages

Distributed Algorithmic

Frequently asked questions from the previous class survey

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

A Dag-Based Algorithm for Distributed Mutual Exclusion. Kansas State University. Manhattan, Kansas maintains [18]. algorithms [11].

Byzantine Consensus in Directed Graphs

AN EFFICIENT DEADLOCK DETECTION AND RESOLUTION ALGORITHM FOR GENERALIZED DEADLOCKS. Wei Lu, Chengkai Yu, Weiwei Xing, Xiaoping Che and Yong Yang

6.852: Distributed Algorithms Fall, Class 12

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali

An Efficient Distributed Deadlock Detection and Prevention Algorithm by Daemons

Correctness Criteria Beyond Serializability

Analysis of Distributed Snapshot Algorithms

Distributed Deadlock

Node 1. m1 5 m7. token

Distributed Deadlocks. Prof. Ananthanarayana V.S. Dept. of Information Technology N.I.T.K., Surathkal

Multi-cycle Deadlock Detection Algorithm for Distributed Systems

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

An Analysis and Improvement of Probe-Based Algorithm for Distributed Deadlock Detection

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases

Core Membership Computation for Succinct Representations of Coalitional Games

Distributed Algorithms Benoît Garbinato

About the efficiency of partial replication to implement Distributed Shared Memory

Distributed Algorithms Reliable Broadcast

Consensus in Byzantine Asynchronous Systems

Breakpoints and Halting in Distributed Programs

Monitoring Stable Properties in Dynamic Peer-to-Peer Distributed Systems

CSC Discrete Math I, Spring Sets

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

NONBLOCKING COMMIT PROTOCOLS

On Sequential Topogenic Graphs

Mobile Agent Model for Transaction Processing in Distributed Database Systems

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems, 3rd edition. Uses content with permission from Assoc. Prof. Florin Fortis, PhD

Henning Koch. Dept. of Computer Science. University of Darmstadt. Alexanderstr. 10. D Darmstadt. Germany. Keywords:

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems

Distributed Systems. Rik Sarkar James Cheney Global State & Distributed Debugging February 3, 2014

Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev

System Models 2. Lecture - System Models 2 1. Areas for Discussion. Introduction. Introduction. System Models. The Modelling Process - General

Abstract DETECTING TERMINATION OF DISTRIBUTED COMPUTATIONS USING MARKERS

Module 11. Directed Graphs. Contents

Chapter 4: Global State and Snapshot Recording Algorithms

Achieving Robustness in Distributed Database Systems

Distributed Deadlock Detection

UNIT 02 DISTRIBUTED DEADLOCKS UNIT-02/LECTURE-01

TESTING MULTI-AGENT SYSTEMS FOR DEADLOCK DETECTION BASED ON UML MODELS

On the interconnection of message passing systems

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology


THEORY OF COMPUTATION

Local Coteries and a Distributed Resource Allocation Algorithm

A Suite of Formal Denitions for Consistency Criteria. in Distributed Shared Memories Rennes Cedex (France) 1015 Lausanne (Switzerland)

Lecture 1: Introduction to distributed Algorithms

Correctness Criteria Beyond Serializability

Consistency and Set Intersection

Distributed Transaction Management

Routing algorithms. Jan Lönnberg, 51101M. October 2, Based on G. Tel: Introduction to Distributed Algorithms, chapter 4.

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

A Dynamic Resource Synchronizer Mutual Exclusion Algorithm for Wired/Wireless Distributed Systems

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Computer Science Technical Report

arxiv: v1 [cs.dc] 13 May 2017

A Timing Assumption and a t-resilient Protocol for Implementing an Eventual Leader Service in Asynchronous Shared Memory Systems

Solutions to Homework 10

Chapter 17: Recovery System

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

The Encoding Complexity of Network Coding

2.2 Set Operations. Introduction DEFINITION 1. EXAMPLE 1 The union of the sets {1, 3, 5} and {1, 2, 3} is the set {1, 2, 3, 5}; that is, EXAMPLE 2

Snapshot Protocols. Angel Alvarez. January 17, 2012

Active leave behavior of members in a fault-tolerant group

Concurrent & Distributed 7Systems Safety & Liveness. Uwe R. Zimmer - The Australian National University

Distributed Transaction Management. Distributed Database System

Distributed Systems 11. Consensus. Paul Krzyzanowski

ATOMIC Broadcast is one of the most important agreement

A Connection between Network Coding and. Convolutional Codes

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer

Using Hybrid Algorithm in Wireless Ad-Hoc Networks: Reducing the Number of Transmissions

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

General Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to:

Chapter 16: Distributed Synchronization

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

Advanced Databases Lecture 17- Distributed Databases (continued)

For more Articles Go To: Whatisdbms.com CONCURRENCY CONTROL PROTOCOL

Deadlock. Chapter Objectives

A Token Based Distributed Mutual Exclusion Algorithm based on Quorum Agreements

Treewidth and graph minors

Chapter 7: Deadlocks. Operating System Concepts 9 th Edition

Integrity in Distributed Databases

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms

NEW STABILITY RESULTS FOR ADVERSARIAL QUEUING

Novel low-overhead roll-forward recovery scheme for distributed systems

Dr. D. M. Akbar Hussain DE5 Department of Electronic Systems

Transcription:

Deadlocks in Distributed Systems: Request Models and Definitions Jerzy Brzezinski Jean-Michel Helary, Michel Raynal Institute of Computing Science IRISA Poznan University of Technology Campus de Beaulieu 60-965 Poznan, POLAND 35042 Rennes Cedex, FRANCE brzezinski@pozn1v.tup.edu.pl helary, raynal@irisa.fr Abstract This paper deals with the problem of deadlock detection in asynchronous messages communication systems. The considered system model covers unspecified receptions, not FIFO channels, and general resource (message) requests including, among others, AND, OR, 1 Introduction Asynchronous systems are characterized by the absence of a known bound on relative processor speeds or message transfer time. Asynchrony, a characteristic of most real systems, makes coordination between distributed processors difficult. Even classical problems such as mutual exclusion, termination detection, deadlock handling, determination of global state, election, consensus gaining, transaction control, query optimization, etc. have to be addressed again to develop new control mechanisms as only distributed algorithms are acceptable in this context. These algorithms are composed of processes which are executed at system nodes and exchange information with each other by message passing. Efficient distributed control algorithms are necessary to realize the potential benefits of distributed systems. The attention is focused here on the deadlock problem in distributed systems. Informally, deadlock refers to a situation in which there exists a group of waiting processes such that no process in this group can send message (release resource) until it receives the required message (resource) from other processes in the group. When this occurs, all these processes wait permanently and the progress of their execution is halted. The work of this author was supported in part by inria grant (when he was visiting irisa), and by CRIT Project III/92 IC 1010. The work of these authors has been partly supported by the Commission of the European Communities under ESPRIT Program Basic Research Project 6360 (BROADCAST). In this case, the execution of processes can turn out to be completely useless unless proper and careful control is exercised. To handle deadlocks in distributed systems, one can try to adapt approaches known from centralized systems, i.e., prevention, avoidance, and detection with recovery ([9], [17]). Recall that deadlock prevention is based on violating one of necessary conditions for deadlock occurrence (e.g., circular wait, no preemption, or hold and wait). In deadlock avoidance, a message is sent (i.e., a resource is granted) when, in spite of this event, there is still at least one execution sequence that allows completion of all processes. In deadlock detection, messages are sent (i.e., resources are granted) without any constraints. However, the state of the system is checked periodically, or when a deadlock is suspected, to determine if a set of processes is deadlocked. This checking is performed by a deadlock detection algorithm. If a deadlock is discovered, recovery from it is done by aborting one or more deadlocked processes. The suitability of a deadlock handling approach greatly depends on the application and environment. In distributed environments, deadlock handling is peculiarly complex as distributed algorithms are desirable and no node has accurate knowledge of the system state [24]. Moreover,depending on the application, processes can make requests according to different request models (see e.g., [1], [12], [19]). The simplest possible request model is one in which a process can require at most one message (resource) at a time. In the AND model (also known as the resource model), processes are permitted to request simultaneously a set of messages. A process cannot execute until it acquires all messages (resources) for which it is waiting ([7], [9], [13]). This model represents, for instance, possible requests of transactions to lock several data items ([1]). Another model of requests is the OR

model (also called the communication model), where a process requests messages from a set of processes and it can proceed only after it has received a message from any one of the processes it is waiting for. Examples of this model include alternative control structures in programming languages like CSP and ADA as well as replicated database systems, where a read request for a replicated data item can be satisfied by reading any copy of it ([1], [9]). The OR-AND model is a generalization of the previous two models. It allows a process to specify message (resource) requests as a predicate expressed by any combination of logical and and or operators. For example, a process may require messages from i and from j or k. If the process receives a message from i, it continues waiting for messages from j or k. On the other hand, if the process receives first message from j, it need not wait for k. Thus, the request for k can be canceled. This model is well suited to many distributed systems (e.g., distributed operating systems, replicated database systems) where several sets of equivalent resources are available ([1]). In the k out of n model, processes are permitted to request any k messages from a set of n processes ([3]). This model is also a generalization of OR and AND models since an OR request corresponds to 1 out of n, and an AND request corresponds to n out of n one. An example of the k out of n model is replicated database systems, where a quorum-based replica control algorithm is used to preserve database consistency ([1], [12]). In such an algorithm, a transaction that wants to read a replicated data item must read r copies out of n, and in order to write a replicated data, a transaction must write w copies out of n, where r + w > n and 2w > n. The above request models classify distributed deadlock detection algorithms according to the complexity of the resource requests they permit. Several algorithms have been proposed for one resource and AND models ([2],[7], [9], [13], [14],[18],[22]) as well as for OR model ([9], [23]). A distributed deadlock detection algorithm for the OR-AND model has been developed by Hermann and Chandy (described by Knapp in [19]), and for the k out of n model by Bracha and Toueg ([3]). Many of the above mentioned solutions have been presented, carefully analyzed and compared in two surveys by Knapp ([19]), and Singhal ([24]). This paper presents, first, a hierarchy of deadlock models considered till now, and then abstracts away their differences to define a single, general deadlock model. This general model has the same modeling power as that of the OR-AND model; however, it has much more concise expressive power. Further, the general deadlock model is used to introduce abstract formulation of deadlock detection problems, which defines deadlocks in distributed systems independently of the underlying request model. Therefore, processes in an application can use different, even changing request models. This formulation constitutes a basis to design distributed algorithms which uniformly address deadlock problems in the context of various request models. To illustrate the idea, a simple generalized deadlock detection algorithm is specified. The paper is divided into five main sections. Section 2 presents the model of underlying computations. Section 3 introduces the request models. Section 4 proposes deadlock definitions suited to the previous request models, and Section 5 defines a hierarchy of problems of distributed deadlock detection. Finally, Section 6 expands previous definitions to include termination detection problems. 2 Model of Parallel and Distributed Computations 2.1 The Underlying System Model The underlying system, supporting distributed applications, consists of a set of nodes that are connected by communication channels. The nodes do not share a common memory and communicate only by exchanging messages through communication channels. These channels are assumed to be asynchronous (transfer delays are unpredictable) and reliable (no message is lost, corrupted or duplicated); they need not be FIFO and they have infinite buffer capacity. Nodes are assumed to be free of failures. Moreover, there is no global physical clock, instantaneously accessible to the nodes, in the system. Each message sent by a process running on a node is handed over to the underlying system. This system delivers the message to the destination node and puts it in a local buffer of that node (the message is then said to have arrived). This message can then be extracted from the buffer when the application process requires it (the message is then said to have been consumed). 2.2 Application Programs An application program is composed of a finite set of processes that communicate by asynchronous message passing. We assume that there is a one-to-one correspondence between nodes and processes, and that the assignment is static. As communication is asynchronous, the process sending a message does not wait for the delivery of the message to be complete. At any time, a process is either active or passive. Only active processes can send

and consume messages. An active process can spontaneously become passive requiring some messages (resources) in order to become active. Associated with a passive process is its dependent set, the set of processes from which it is expecting to receive messages; by definition, a process is not member of its dependent set. A passive process can only become active when its activation condition, defined over its dependent set, is fulfilled (see Section 3.6 for details on activation conditions). Moreover, when a process is activated, messages whose arrivals fulfilled the associated activation condition are extracted from input buffers and consumed. The formulation of the activation condition depends on the request models. The next section presents such models in order of increasing complexity. Let us remark that the model covers also unspecified receptions which occur when a message sent by a process P has arrived but will never be consumed by the destination process Q (this is the case when P nevers appears in the dependent set of Q). A passive process that has terminated its computation (for example, by executing an end or stop statement) is said to have been individually terminated: its dependent set is empty and therefore, it can never be activated. Initially, for the sake of presentation simplicity, we assume that processes are never individually terminated; this assumption will be relaxed in Section 6.1. 3 Request Models 3.1 The AND Model In the AND model, a passive process becomes active (i.e., its activation condition is fulfilled) only after a message from each process in its dependent set has arrived. (This models a receive statement that waits for all requested messages.) 3.2 The OR Model In the OR model, a passive process becomes active only after a message from any process in its dependent set has arrived. (This models classical nondeterministic choices of receive statements; e.g., CSP s or ADA s alternative command). 3.3 The Basic k out of n Model In the basic k out of n model, associated with a passive process are its dependent set DS and an integer k such that 1 k DS. The process becomes active only after k messages, each from a distinct process belonging to DS, have arrived. Obviously, this model includes the AND model (when k = DS ) and the OR model (when k=1), and this inclusion is strict. 3.4 The OR-AND Model Let P denote the set of processes. In the OR-AND model, the dependent set of a passive process is defined as DS 1 DS 2... DS q, where DS i P for all i. The process becomes active only after a message from every process in DS 1, or a message from every process in DS 2, or..., or a message from every process in DS q has arrived. For example, suppose a process s activation condition is: P a or (P b and (P c or (P d and P e ))), whose disjunctive form is: P a or (P b and P c ) or (P b and P d and P e ). In this case DS 1 = {P a }, DS 2 = {P b, P c }, and DS 3 = {P b, P d, P e }. If messages from only P c and P d have arrived, the activation condition of the process is not fulfilled at this point. If message from P b arrives next, the activation condition of the process is fulfilled and this activation will result in the consumption of the messages from P b and P c. The OR-AND model includes the basic k out of n model: this is the case when DS 1, DS 2,...,DS q are all the subsets of DS with cardinality k. The inclusion is strict. 3.5 The Disjunctive k out of n Model In the disjunctive k out of n request model, an activation condition consists of several k out of n requests. Associated with a passive process are a dependent set DS = DS 1 DS 2... DS q and q integers k 1, k 2,..., k q where DS i P and 1 k i DS i for all i. The process becomes active only after k 1 messages, each from a distinct process in DS 1, or k 2 messages, each from a distinct process in DS 2, or..., or k q messages, each from a distinct process in DS q, have arrived. Note that the disjunctive k out of n model includes the OR-AND model: this is the case when k i = DS i for all i. Consequently, it includes all other models: the basic k out of n model is obtained when q=1, the AND model when q = 1 and k = DS, and finally the OR model when DS i = 1 for all i. Conversely, the OR-AND model includes the disjunctive k out of n model. The disjunctive k out of n with DS 1, DS 2,..., DS q and k 1, k 2,..., k q can be converted into its OR-AND activation condition as follows: (disjunct of all the subsets of DS 1 of cardinality k 1 ) or (disjunct of all the subsets of DS 2 of cardinality k 2 ) or.. (disjunct of all the subsets of DS q of cardinality k q )

Thus, the modeling powers of both OR-AND and disjunctive k out of n models are identical. However, the disjunctive k out of n model has much more concise expressive power. It is capable of expressing an activation condition in a much more compact form. The formulation in disjunctive k out of n model requires q subsets DS 1, DS 2,..., DS q and q integers k 1, k 2,..., k q, whereas formulation of the same condition in the OR- AND model will require explicit enumeration of ( ) DS1 + k 1 ( ) DS2 + + k 2 ( ) DSq k q ( ) n n! subsets of processes, where = j j!(n j)!, and denotes the number of subsets of cardinality j in a set of cardinality n. Therefore, the disjunctive k out of n model is much more convenient and is amenable to a more efficient implementation. This is because a compact representation of an activation condition not only results in a saving in the storage, but it also considerably reduces the time taken to check if the condition is satisfied. 3.6 Predicate fulfilled In order to abstract the activation condition of a passive process P with dependent set DS, a predicate called fulfilled(a) is introduced, where A is a subset of P. Predicate fulfilled(a) is true if messages arrived from all processes belonging to set A are sufficient to activate process P (since P is passive, arrived messages are not consumed). Moreover, fulfilled( ) is false, and for DS, fulfilled(ds) is true. Of course, the following monotonicity property holds: if X Y and fulfilled(x) is true, then fulfilled(y) is also true. Consider the disjunctive k out of n model in which requirements of a passive process P are defined in terms of sets DS 1, DS 2,..., DS q and integers k 1, k 2,..., k q. The predicate fulfilled for P is expressed as follows: fulfilled(a) ( j :: 1 j q :: DS j A k j ) Definitions of fulfilled for a process in other models can be obtained as special cases of the above in the following manner: basic k out of n: q=1 and thus fulfilled(a) DS A k OR-AND: ( j :: k j = DS j ) and thus fulfilled(a) ( j :: 1 j q :: DS j A) OR: ( j :: DS j = 1) and thus fulfilled(a) DS A where DS = DS 1 DS 2... DS q. AND: q = 1 and k = DS and thus fulfilled(a) DS A It is easy to verify that in each case, fulfilled(ds) is true. 4 Deadlock Definitions We now give precise definitions of a deadlock for these request models. P denotes the set of processes {P 1, P 2,..., P n }. For sake of convenience, the quantification ( P i :: P i B ::...) means exactly ( i :: (1 i n) (P i B) ::...). Let deadlock (B) be a predicate signifying that the nonempty set B of processes is deadlocked. If deadlock (B) is true at some instant during the execution of an application program, it remains true as deadlock occurrence is persistent: deadlock(b) is a stable property ([6], [8], [15]). Depending on the request model, the formal definition of this predicate can take several forms. In order to give precise definitions of deadlock(b) for different models, the following predicates are introduced: passive i : true iff P i is passive. arr i (j): true iff a message from P j has arrived and has not yet been consumed by P i. empty i (j): true iff all messages sent by P j to P i have arrived. Note that condition empty i (j) arr i (j), used frequently in the following definitions, means that all messages sent by P j to P i have arrived and have been consumed. 4.1 Deadlock in the AND Model Let DS i denote the dependent set of P i. ( P i ::P i B::(passive i ( P j ::P j DS i B :: (empty i (j) arr i (j))) ) )

4.2 Deadlock in the OR Model ( P i ::P i B:: (passive i (DS i B) ( P j :: P j DS i :: (empty i (j) arr i (j))) ) ) Example: Let us consider a system consisting of nodes N a, N b, N c, N d and N e, whose topology is shown in Figure 1a. The application program consists of processes P a, P b, P c, P d, and P e. We analyze the system state at time t. At this moment processes P a, P b, P d, and P e are passive and process P c is active. Dependent sets are as follows (see Figure 1b): DS a = {P c, P d }, DS b = {P d }, DS c =, DS d = {P b, P e }, and DS e = {P b }. The local buffers of nodes N a, N c, N d, and N e are empty. In the local buffer of node N b, there is a message that arrived from P e (but P b is not expecting a message from P e at this moment). Moreover, a message m sent by P a to P b has not yet arrived and thus, empty b (a) is false. Since P a DS b when message m arrives at N b, it will not be able to activate P b (unless P b becomes active and then passive again with different dependent set). In this state, only process P a can be activated by active P c. Consequently, in the OR request model, deadlock(b) is true at time t, for B= {P b, P d, P e }. On the other hand, if P a needs a message from P c and P d each for its activation (i.e., AND request model), then at this time the set of deadlocked processes is equal to {P a, P b, P d, P e }. a) N c N a N d N b N e Figure 1: Example of a distributed computation 4.3 Deadlock in the OR-AND Model For each i, 1 i n, let DS i,r (1 r q i ) denote the subsets of the dependent set of P i. b) ( P i :: P i B:: (passive i ( r :: 1 r q i :: ( P j :: P j DS i,r B :: P c P a P d P b P e ) )) ) (empty i (j) arr i (j)) 4.4 Deadlock in the Basic k out of n Model Let DS i denote the dependent set of process P i and let n i = DS i. Clearly, 1 k i n i. Note that symbol \ denotes the set difference operator. ( P i :: P i B ::(passive i ( D i :: D i DS i B :: (( DS i \ D i < k i ) ( P j :: P j D i :: (empty i (j) arr i (j))) ) )) ) That is, each process P i in B can be associated with a set of processes D i such that arrivals of new messages from processes in D i are not possible, as D i B and ( P j :: P j D i :: empty i (j) arr i (j)). Thus, P i can receive at most DS i \ D i of the expected messages, which are not sufficient to activate P i because DS i \ D i < k i and ( P j :: P j D i :: arr i (j)). 4.5 Deadlock in the Disjunctive k out of n Model For each i, 1 i n, let k i,r denote the number of messages that P i must receive from processes belonging to the dependent subsets DS i,r (1 k i,r DS i,r ). ( P i :: P i B :: (passive i ( r :: 1 r q i :: ( D i,r :: D i,r DS i,r B :: (( DS i,r \ D i,r < k i,r ) ( P j :: P j D i,r :: (empty i (j) arr i (j)))) ) ) ) ) 4.6 An Abstract Definition of Deadlock We now give an abstract definition of deadlock, which holds irrespective of the underlying request model. Let ARR i denote the set of all processes P j such that arr i (j)= true. Moreover, let NE i denote the set of all processes P j such that empty i (j) = false. By fulfilled i we denote the predicate fulfilled associated with P i. An abstract definition for a set B of processes to be deadlocked is given below: ( P i ::P i B :: (passive i fulfilled i (ARR i

) NE i (P \ B))) That is, any P i B cannot be activated even after all messages from all processes in ARR i NE i and from all processes not deadlocked, have arrived because these messages are not sufficient to satisfy the activation condition of any process in B. Since the abstract definition of deadlocks is applicable to all deadlock models, it is amenable to a distributed deadlock detection algorithm that uniformly treats deadlocks in all deadlock models. Such algorithms will be very effective in environments, where processes can make requests according to any request model. A Remark: Under the assumption that no process can individually terminate, we have: if there exists B such that deadlock(b), then there are at least two processes deadlocked. In fact, assume that exists B = {P i }, deadlock(b) and no other process is deadlocked. Since P i ARR i NE i, we have ARR i NE i P \ B = P \ B. On the other hand, DS i P \ B = P \ {P i } and thus fullfilled i (P \ B) fullfilled i (DS i ), a contradiction since by assumption DS i. 5 Deadlock Detection Problems With the previous precise definitions of deadlocks, various formulations of the deadlock detection problem can be considered. 5.1 Detection of a Deadlock Occurrence In this formulation, the problem is to determine if there exist a set B, such that deadlock(b) is true. The result of a solution to this problem is a boolean dd, satisfying the following post-condition: dd ( B :: deadlock (B)) 5.2 Detection of a Deadlocked Process In this formulation, the problem is to determine if a given process P is deadlocked, i.e., does there exist a set B such that deadlock (B) = true and P B. The result of a solution to this problem is a boolean dd satisfying the following post-condition: dd ( B :: deadlock(b) P B) 5.3 Detection of a Deadlocked Set In this formulation, the problem is to find a set of deadlocked proceses. The result of a solution to this problem is a set of processes PD satisfying the following post-condition: deadlock(pd) ((PD = ) ( B :: deadlock(b))) 5.4 Detection of the Maximum Deadlocked Set In this formulation, the problem is to find a deadlocked set which contains all deadlocked sets. This set is unique since the property of being deadlocked is closed under the set union operation. The result of a solution to this problem is a set of processes PD satisfying the following post-condition: where (deadlock(pd) (PD = )) maxdead(pd) maxdead(pd) ( B :: deadlock(b) B PD) Note that deadlock detection problems have been stated in order of increasing complexity. In addition, this order is consistent with an increasing amount of information which is available at the time the detection terminates. This additional information is important as it is useful for efficient recovery from the deadlock. Solution of any of the above problems is difficult in distributed systems because no process has accurate knowledge of the global system state, i.e., the global state is not visible to any process instantaneously. Thus, in practice, only states related to earlier observations can be obtained. However, during the collection of local states of processes, these states are changing. Therefore, any deadlock detection algorithm in a distributed system can only ensure that: 1. a deadlock which has occurred before the initiation of the algorithm will be detected, and 2. the detected set of deadlocked processes was indeed deadlocked at the moment when the detection algorithm terminates. 5.5 Specification of an Algorithm for Detection of the Maximum Deadlocked Set An algorithm for detecting the maximum deadlocked set can be specified as follows. Its result is a set of processes PD. Informally, when the detection algorithm terminates, the meaning of the result is the following: PD = means that there was no deadlock at the time where the algorithm was initiated; PD means that PD includes the maximum deadlocked set at the time when the algorithm was initiated.

More formally, the specification of the detection algorithm can be stated using the classical decomposition into safety and liveness properties. Liveness (progress): Each execution of the detection algorithm terminates in finite time. Safety (consistency): Let t b and t e be the time instants of the detection initiation and termination, respectively. i) If the algorithm terminates with PD =, then for any set B of processes the predicate deadlock(b) was false at time t b. ii) If the algorithm terminates with PD, then the predicate deadlock(pd) was true at time t e ; moreover, the predicate maxdead(pd) (defined in Section 5.4) was true at time t b ; in other words, for any set B such that deadlock(b) was true at time t b, B PD. A distributed algorithm implementing these specifications is described, proven correct and analysed in [4]; it is based on a circulating token, but more parallel implementations, based on the general principles presented in [16], are also outlined. Due to space limitations, it is not described here. 6 Relation with Termination Problems 6.1 Individual Termination Individual termination of process P k, introduced in Section 2.2, is characterized by the following predicate: (state k = passive) (DS k = ) To take into account individual termination of processes, the definition of predicate deadlock(b), introduced in Section 4, has to be modified. Let T be the set of individually terminated processes. Since processes of T cannot send messages, they cannot contribute to the activation of any passive process. Thus, for the AND request model, predicate deadlock (B) becomes as shown below (for other models, a similar modification has to be done): (B P) (B ) ( P i :: P i B:: (passive i ( < P j :: P j DS i (B T) :: (empty i (j) arr i (j)) ) )) Note that when individual termination of processes is allowed, a set of processes can deadlock without getting involved in a circular wait. That is, a cycle in the classical wait-for-graph representation is no more a necessary condition for a deadlock. Such deadlocks are generally handled by the operating systems by timing out on a process that is waiting for a message from an individually terminated process. 6.2 Termination Detection The termination detection problem (global termination) involves detecting a state from which there is no more activity in the program execution ([10],[11]). A large number of algorithms exist to solve this problem (see e.g., [21]). This problem is closely related to the one presented in Section 5.3, with the difference that the set B is now predefined as the set P of all processes. Thus, the termination detection problem can be expressed in terms of the deadlock problem as follows: does the predicate deadlock(p) hold? In other words, are all processes globally terminated? Consequently, the answer is yes or no. A new termination detection algorithm based on these ideas has been presented by the authors [5]. It allows not FIFO channels and unspecified receptions. From a practical point of view, these are advantageous characteristics when this solution is compared with other termination detection algorithms that, to our knowledge, do not allow unspecified receptions. Interested readers can find further developments in [5]. 7 Conclusion Deadlock is a very important problem in various applications including computer networks, database systems, massively parallel systems, etc. In this paper, deadlock detection in asynchronous message passing systems has been considered. In distributed asynchronous environment, deadlock detection is peculiarly subtle and complex, because distributed algorithms are required and no node has accurate knowledge of the whole system state [20]. These complexities brought about significant errors in a large number of published deadlock detection algorithms. To enhance our understanding of the deadlock problem, this paper has introduced a hierarchy of formal deadlock models and deadlock detection problems. An abstract definition of deadlocks in distributed systems has been presented. Moreover, a general algorithm detecting sets of deadlocked processes has been specified, with possible extensions to cover the individual and the global termination detection problems. The very important and advantageous features of the deadlock characterization introduced here resides

in its straightforward application in parallel and distributed systems, even with unspecified receptions, not FIFO channels, and general request models permitting, among others, AND, OR, AND-OR, k-outof-n requests. Thus, abstract definition of deadlocks and the solutions ([4]) to detect distributed deadlocks that are based on this definition, show great promise for detecting deadlocks in distributed systems. Acknowledgments : The authors would like to thank M. Singhal for interesting discussions concerning the deadlock problem. References [1] Bernstein, P.A., Hadzilacos V., Goodman N., Concurrency Control and Recovery in Database Systems, Addison Wesley, Reading, Mass, 1987, 370 pages. [2] Blazewicz J., Brzezinski J., Gambosi G., Time-stamp approach to store-and-forward deadlock prevention, IEEE Trans. on Comm., vol. COM-35,5, 1987, pp. 490-495. [3] Bracha G., Toueg S., Distributed deadlock detection, Distributed Computing, vol. 2,3, 1987, pp. 127-138. [4] Brzezinski J., Helary J.M., Raynal M., Singhal M., Deadlock models and a general algorithm for distributed deadlock detection, to appear in Journal of Parallel and Distributed Computing. [5] Brzezinski J., Helary J.M., Raynal M., Termination detection in a very general distributed computing model, Proc. of the 13th IEEE Int. Conf. on Dist. Comp. Systems, Pittsburgh, May 25-28 1993, pp. 374-382. [6] Chandy K.M., Lamport L., Distributed snapshots: determining global state of distributed systems, ACM Trans. on Comp. Systems, vol. 3,1, 1985, pp. 63-75. [7] Chandy K.M., Misra J., A distributed algorithm for detecting resource deadlock in distributed systems, Proc. of ACM Symposium on Principles of Distributed Computing, ACM, New York, 1982, pp. 157-164. [8] Chandy K.M., Misra J., Parallel Program Design: a Foundation, Addison Wesley, 1988, 516 pages. [9] Chandy K.M., Misra J., Haas L.M., Distributed deadlock detection, ACM Trans. on Comp. Systems.,vol. 1,2, 1983, pp. 144-156. [10] Dijkstra E.W.D., Scholten C.S., Termination detection for diffusing computation, Inf. Proc. Letters, vol. 13,1, 1980, pp. 1-4. [11] Francez N., Distributed termination, ACM Trans. on Progr. Lang. and Systems., vol. 2,1, 1980, pp. 42-55. [12] Gifford D.G., Weighted voting for replicated data, Proc. of the 7th ACM symposium on Operating Systems Principles, ACM, New York, 1979, pp. 150-163. [13] Gligor V., Shattuck S., On deadlock detection in distributed databases, IEEE Trans. Soft. Eng., vol. SE-6,5, 1980, pp. 435-440. [14] Gunther K.D., Prevention of deadlock in packet-switched data transport system, IEEE Trans. on Comm., vol. COM-29,4, 1981, pp. 512-524. [15] Helary J.M., Jard Cl., Plouzeau N., Raynal M., Detection of stable properties in distributed systems, Proc. 6th ACM Symposium on Principles of Distributed Computing, ACM Press, New York, 1987, pp. 289-285. [16] Helary J.M., Raynal M., Towards the construction of distributed detection programs with an application to distributed termination, Distributed Computing, vol. 7, 4, (1994), pp.137-147. [17] Holt R.C., Some deadlock properties of computer systems, ACM Computing Surveys, vol. 4,3, 1972, 179-196. [18] Jaffe J.M., Sidi M., Distributed deadlock resolution in store-and-forward networks, Algorithmica, vol. 4,3, 1989, pp. 417-436. [19] Knapp E., Deadlock detection in distributed databases, ACM Computing Surveys, vol. 19,4, 1987, pp. 303-328. [20] Kshemkalyani A.D., Singhal M., On the characterization and correctness of distributed deadlock detection, Journal of Parallel and Distributed Computing, vol. 22, 1994, pp. 44-59.

[21] Mattern F., Algorithms for distributed termination detection, Distributed Computing, vol. 2,3, 1987, pp. 161-175. [22] Mitchell D., Merritt M.J., A distributed algorithm for deadlock detection and resolution, In Proc. of the 3rd ACM Symposium on Principles of Distributed Computing, ACM, New York, 1984, pp. 282-284. [23] Natarajan N., A distributed scheme for detecting communication deadlocks, IEEE Trans. Soft. Eng., vol. SE-12,4, 1986, pp. 531-537. [24] Singhal M., Deadlock detection in distributed systems, IEEE Computer, vol. 22,11. 1989, pp. 37-48.