Nonblocking and Orphan-Free Message Logging. Cornell University Department of Computer Science. Ithaca NY USA. Abstract

Size: px
Start display at page:

Download "Nonblocking and Orphan-Free Message Logging. Cornell University Department of Computer Science. Ithaca NY USA. Abstract"

Transcription

1 Nonblocking and Orphan-Free Message Logging Protocols Lorenzo Alvisi y Bruce Hoppe Keith Marzullo z Cornell University Department of Computer Science Ithaca NY USA Abstract Currently existing message logging protocols demonstrate a classic pessimistic vs. optimistic tradeo. We show that the optimistic{pessimistic tradeo is not inherent to the problem of message logging. We construct a message{logging protocol that has the positive features of both optimistic and pessimistic protocol: our protocol prevents orphans and allows simple failure recovery; however, it requires no blocking in failure{free runs. Furthermore, this protocol does not introduce any additional message overhead as compared to one implemented for a system in which messages may be lost but processes do not crash. 1 Introduction Message logging protocols are a common method of building a system that can tolerate process crash failures. These protocols require that each process periodically record its local state and log the messages it received after that state. When a process crashes, a new process is created, given the appropriate recorded local state, and then sent the logged messages in the order they were originally received. Message logging protocols are not the only techniques for making systems robust against process crashes. For example, active replication [12] or passive replication [2] are commonlyused techniques for masking failures more severe than crash failures. However, message This material is based upon work supported by Defense Advanced Research Projects Agency (DoD) under NASA Ames grant number NAG 2{593, a National Science Foundation Graduate Research Fellowship, and by grants from IBM Glendale Programming Laboratory, IBM T. J. Watson Research Center, Siemens, and Xerox Webster Research Center. yalso supported by research fellowship number of the Italian National Research Council (Consiglio Nazionale delle Ricerche). zcurrent address: Department of Computer Science and Engineering, University of California at San Diego, 9500 Gilman Dr. 0114, La Jolla, CA 92093{

2 logging protocols are attractive in that they are considered an inexpensive technique: they introduce little or no process replication in failure free runs and the protocols are relatively simple. Another advantage of message logging protocols is that they are readily applied to any process communication structure, while both active and passive replication are typically applied in a client-server setting. Of course, one can use either active or passive replication to implement a message logging system. The published message logging protocols exhibit a classic tradeo in performance: there exist pessimistic protocols that introduce blocking in order to simplify recovery and optimistic protocols that introduce no blocking but may require computation to be undone upon recovery. In this paper, we show that the optimistic{pessimistic tradeo is not necessary for message logging protocols. We derive a message logging policy that is weaker than that used by pessimistic protocols yet is strong enough to guarantee that computation need not be rolled back. This policy is based on the notion of logging a message when the eects of its delivery are visible to some process other than the process to which it was delivered. This logging policy can be implemented without introducing additional blocking. We then give a protocol that can tolerate non-concurrent failures and discuss its performance. The paper proceeds as follows. Section 2 describes the system model commonly assumed for message logging protocols. Section 3 discusses message logging protocols and explains the current optimistic{pessimistic tradeo. Section 4 denes what it means for a message to be relevant, derives logging and recovery rules for relevant messages, and presents a message logging protocol that is nonblocking but creates no orphans. Section 5 discusses the performance of an implementation our protocol, and Section 6 concludes the paper. 2 System Model We assume a set of processes that communicate only by exchanging messages. 1 The system is asynchronous: there exists no bounds on the relative speeds of processes, there exists no bounds on message transmission delays, and there exists no global time source. Hence, the order in which a process receives messages is nondeterministic. The execution of a system of n processes is represented by a run, which is an irreexive partial ordering of the send events, receive events and delivery events ordered by potential causality [9]. Delivery events are local to a process and represent the delivery of a received message to the application. For any message m from process p to process q, q delivers m only if it has received m, and q delivers m no more than once. 1 For simplicity, we also assume that a process never sends a message to itself. 2

3 We assume a fail-stop failure model for the processes [11] and an intermittent message loss failure model for the channels. Channels are FIFO, in that if process p sends two messages to process q and process q receives both of them, then it will receive them in the order that p sent them. Furthermore, we assume a transport protocol that provides a stronger FIFO property: if process p sends m 1 and then m 2 to process q, then q will not receive m 2 before receiving m 1. At any point in an execution, the state of a process is determined by its initial state and the sequence of messages that it has delivered. For any message m delivered by process p, its receive sequence number, denoted m:rsn, represents the order in which m was delivered: m:rsn = ` if m is the `th message delivered by p [14]. 2 We denote with p [`] the state of process p after having delivered ` messages. 3 Issues in Message Logging In message logging protocols, each process must record both the contents and receive sequence numbers of all the messages it has delivered in a location that will survive the failure of the process. This action is called logging. The process may also periodically record its local state (called checkpointing), thereby allowing its message log to be trimmed. For example, once process p knows that state p [`] is logged, then all messages m such that m:rsn ` can be removed from its message log. Note that the periodic checkpointing of a process's state is only needed to bound the length of its message log (and hence the recovery time). For simplicity we ignore checkpointing in this paper. Logging a message may take time, and so there is a natural design decision of whether or not a process should wait for the logging to complete before delivering the message to the application. For example, suppose that having delivered message m, process p sends message m 0 to process q. If message m were not logged by the time p sent m 0, then the crash of p may cause information about m to be lost. Then, when a new process p is initialized and replayed old logged messages, p may follow an execution in which m 0 is not sent to q. Hence, process q would no longer be consistent with p once it delivers m 0. Such a message m is called a lost message, message m 0 an orphan message and the state of process q an orphan state [14]. Protocols that can create orphans are called optimistic because the likelihood of creating an orphan is (hopefully) small. A pessimistic protocol is one in which each process p never sends a message m 0 until it knows that all messages delivered before sending m 0 are logged. Pessimistic protocols will 2 We avoid the term \delivery sequence number" simply to avoid new terminology for an old concept. 3

4 never create orphans, and so the reconstruction of the state of a crashed process is very straightforward as compared to optimistic protocols, in which orphans must be detected and rolled back. On the other hand, pessimistic protocols potentially block a process for each message it receives. Doing so can slow down the throughput of the process, and this price must be paid even in an execution in which no process crashes. Another design decision in message logging protocols is where each message is to be logged. An obvious choice is to log at the receiving process, since it is the receiving process that assigns the receive sequence number to an incoming message [14, 8]. Unfortunately, such a receiver-based protocol requires an implementation of stable storage that will survive the crash of the receiving process. Another choice, called sender-based logging, is to log each message m in the sender's volatile storage [7]. Doing so requires the receiver to tell the sender what receive sequence number was assigned to m. Furthermore, the receiver does not know that m is logged until it receives an extra acknowledgement from the sender indicating that the receive sequence number has been recorded. The receive sequence number can be piggybacked on the acknowledgement of message m, but the extra acknowledgement from the sender may require an additional message; and in pessimistic sender-based logging, the receiver still must block until it receives the nal acknowledgement [5]. A further problem with sender-based logging is that multiple concurrent failures may make a process unable to recover its previous state even when a pessimistic protocol is used. However, sender-based logging is, in some sense, optimal in the number of resources it uses: it requires no stable storage and an additional process is needed only when recovery of a crashed process begins. These two design decisions pessimistic vs. optimistic and receiver-based logging vs. sender-based logging are independent. In particular, there exist pessimistic receiver-based logging protocols [1, 10], optimistic receiver-based logging protocols [14, 13], pessimistic sender-based logging protocols [7, 5] and optimistic sender-based logging protocols [7, 6]. All message protocols must address the problem of communication with the environment. For input, the data must be stored in a location that is always accessible for the purpose of replay. For output, the process must be in a recoverable state before sending any message to the environment. This means that, in general, even optimistic message logging protocols may block before sending a message to the environment. Such issues are outside of the scope of this paper. 4

5 4 Nonblocking, Orphan-Free Protocols In this section, we rst review how recovery can be done when there are no orphans. We then develop a nonblocking protocol in which orphans cannot occur by more carefully considering the design decision: by what point must a message be logged? 4.1 Recovery with no Orphans Suppose that process q delivers a message m from process p. We dene another message m 0 to depend on m if the delivery of m causally precedes the sending of m 0. Consider the execution of distributed system and a set of local states 1 [`1]; : : : ; n [`n], one for each process 1; : : : ; n. We say that two states p [`p] and q [`q] are pairwise consistent if all messages from p that have been delivered to q by q [`q] have been sent by p [`p], and all messages from q that have been delivered to p by p [`p] have been sent by q [`q]. The collection of local states is a consistent global state if all pairs of states are pairwise consistent [3]. 3 With message logging protocols, an inconsistent global state arises when a lost message occurs due to optimistic recovery. A pessimistic message logging protocol is one in which a message m is always logged by the time any message m 0 that depends on m is sent. Recovery in pessimistic protocols is straightforward: the crashed process is reinitialized and replayed the old logged messages in increasing receive sequence number order. Since the process is deterministic with respect to message receipt order, it will follow the same path of execution as before. Thus, in the process of recovering, it will send the same sequence of messages as before. Any duplicate message m sent during recovery is acknowledged and discarded if the destination has already received m. 4 The following simple theorem shows that this recovery protocol is correct: Theorem 1 Consider an execution of a pessimistic message logging protocol in which process p crashes. If process p restarts from its initial state and delivers all logged messages in receive sequence number order, then the resulting global state is consistent. Proof: We will show that the global state is consistent by showing that the recovered local state of p is pairwise consistent with the local states of the other processes. 3 This denition is dierent from that of [3] in that it is dened in terms of deliver events rather than receive events. Our usage is consistent with the literature on message-logging protocols. 4 Another possibility is for acknowledgements to be logged in the same way as other messages. In this case, a process receiving a repeat message m would discard m without sending an acknowledgement because the message acknowledgement will be replayed. As is discussed in Section 4.2, the choice of implementation is not just a matter of eciency which implementation is correct depends on whether or not acknowledgements are visible to the application. 5

6 Since a process is deterministic given a sequence of message deliveries, the state that process p recovers to is p [`] where ` is the highest logged receive sequence number. Since the protocol is pessimistic, then by denition p has not sent any messages in a state p [`0] for `0 > `. Hence, no process q 6= p has received a message that follows p [`]. In addition, process p has not received any messages that were not sent, and so p [`] is pairwise consistent with each other local state Abstract Message Logging Protocol A pessimistic protocol guarantees that a message m is logged before any message that depends on m is sent. This fact makes the proof of Theorem 1 straightforward. However, it is not necessary that m be logged until a message that depends on m is delivered, because it is at this point that the eects of m will become visible to another process. Hence, we dene a message m to be relevant when a process has delivered a message that depends on m. In the following protocol, we assume that message acknowledgements are never delivered to the application: that is, they are only seen by the underlying transport protocol. Thus, an application-level send is nonblocking in that the application does not block waiting for an acknowledgement from the recipient. This assumption allows us to not log acknowledgements, since they carry no information as far as the application is concerned. Note that this assumption is not fundamental, in that if it does not hold, then acknowledgements can be logged and replayed in the same way the other messages are. A message is logged by including attributes about the message in a set L. A message m has the following ve attributes: m:source is the sender of m; m:dest is the destination of m; m:data is the data that the application m:source sends to m:dest; m:ssn is the send sequence number: m:ssn = ` denotes that m is the `th message sent by m:source; m:rsn is the receive sequence number. Following [7], a message m is partially logged if the four attributes m:source; m:dest; m:data; m:ssn are dened in L, and m is fully logged if all ve attributes are dened in L. Finally, L p is a subset of messages m 2 L such that m:dest = p. Abstract message logging protocol: The protocol consists of a logging policy and a recovery procedure: Logging policy: A message m is partially logged by the time it is sent and fully logged by the time it is relevant. Recovery procedure: To recover a crashed process p: after any messages that p sent before crashing are either received or dropped due to transient channel faults, p is restarted from its initial state. It is then sequentially sent the messages in L p in an order that 6

7 is consistent with the receive sequence numbers of the fully logged messages and that is consistent with FIFO channels for the partially logged messages. Theorem 2 Consider an execution of the abstract message logging protocol given above in which a process crashes and then recovers. The resulting global state is consistent. Proof: We will show that the global state is consistent by showing that the recovered local state of the crashed process p is pairwise consistent with the local states of the other processes. Let RM be the sequence of messages in L p in the order that p received them before crashing. Consider a message m 2 RM that is fully logged. Any message m 0 before m in RM is also fully logged by the logging strategy: if m is relevant then so are all messages m 0 that were received before m. Hence, RM consists of a (possibly empty) sequence of fully logged messages ordered by receive sequence number followed by a (possibly empty) sequence of partially-logged messages ordered by the constraint that the channels are FIFO. Let the subsequence RM[1::`f] be the fully-logged messages and the subsequence RM[`f + 1::`] be the partially-logged messages. Consider a message m that was originally sent by p before crashing but is not sent by p during recovery. Since p is deterministic with respect to message delivery order, the recovering process will follow the same execution as before through state p [`f] and so m must have been sent in a state after p [`f]; say, p [`f + ] where 1 `? `f. However, m could not have been delivered by m:dest since otherwise the messages RM[`f + 1::`f + ] would be relevant and hence fully logged. Hence, there are no messages that were originally sent by p and delivered to m:dest yet not resent during the recovery of p. Finally, p has not received any messages that were not sent. Thus, p's recovered local state is pairwise consistent with each other local state. 2 Theorem 2 is concerned with the crash and recovery of a single process and argues that the state of the application is reconstructed to a consistent global state. However, it does not say anything about the state of the logged messages. In particular, if the information about a logged message m can be lost due to a process crash and recovery, then the protocol may only be able to tolerate a single failure in any run. If the following property holds, however, then any sequence of (process crash; process recovery) pairs can be tolerated: Stable log: If a message m is partially logged and will eventually be received, then m will be partially logged after any process p crashes and recovers. And, if m is fully logged and a process p crashes, then m will be fully logged after p recovers. 7

8 Note that we allow partially logged messages that will never be received to become unlogged. Such a message, however, is never received by any process and so can be lost without any eect. 4.3 Family-Based Logging According to the abstract message logging protocol described in Section 4.2, a message m must be fully logged when it becomes relevant. What process determines when a message becomes relevant? In order to answer this question, we introduce the following denitions: We say a process p is a parent of a process q, and q is a child of p, if q delivers a message sent by p. Note that p can simultaneously be a parent and a child of q. Consider a message m from process p to process q. The relevance of m is not determined by q, nor in general by q's parent, p. Rather, m becomes relevant when some child of q delivers a message that depends on m. Thus, the children of q determine the moment when any message delivered by q must be fully logged, and so it is natural to assign the responsibility of logging m's receive sequence number to the children. To do so, after delivering m, q can piggyback m:rsn on every subsequent message and q's children can log these piggybacked receive sequence numbers. If m become relevant, then some child r must have delivered a message that depends on m, and so m:rsn is logged at r. Of course, q need not piggyback m:rsn on every subsequent message: once q receives an acknowledgement from r for any message that depends on m, then q knows that m:rsn is logged at r. Finally, the other attributes of m are already located at its sender p that is a parent of q, and so we assign to p the responsibility of partially logging m before m is sent. We call this logging strategy family-based logging. Suppose that process q fails. The messages logged at q's parents and the receive sequence numbers logged at q's children must be recombined to recover q. Thus, q's children must log more than receive sequence numbers; there must be some means of matching receive sequence numbers to the corresponding messages. To do so, q can piggyback triplets of the form (m:source; m:ssn; m:rsn) where m is the message delivered by q. Each triplet (p; ssn; rsn) corresponds to a unique message partially logged at process p with send sequence number ssn; and so when q fails, the receive sequence numbers logged at q's children can be matched with the corresponding messages logged at q's parents. 8

9 t 1 t 2 t 3 p 1 p 2 m 2 = (d 2 ; 1; ;) m 5 = (d 5 ; 2; ;) m 1 = (d 1 ; 1; ;) p 3 ack(1) m 3 = (d 3 ; 1; f#m 2 g) m 6 = (d 6 ; 3; f#m 5 g) p 4 m 4 = (d 4 ; 2; f#m 1 ; #m 2 g) ack(2) p 5 #m i = (m i :source; m i :ssn; m i :rsn) Figure 1: Family-Based Logging Data Structures The protocol requires each process p to maintain the following data structures: 5 Send sequence number: SSN p is an integer, initially 0, used to uniquely identify and order each message sent by p. Receive sequence number: RSN p is an integer, initially 0, used to uniquely identify and order each message delivered by p. Send log: SendLog p is a set, initially empty, of elements of the form e = (data; ssn; rsn; dest). e 2 SendLog p if there exists a message m sent by p in state p [e:rsn] such that m:data = e:data, m:ssn = e:ssn, and m:dest = e:dest. RSN log: RsnLog p is a set, initially empty, of elements of the form e = (parent; ssn; rsn; child). e 2 RsnLog p if there exists a message m delivered by p such that m:source = e:parent, m:ssn = e:ssn, and m:rsn = e:rsn. If e:child 6=? then m:rsn is logged at e:child. Receive log: ReceiveLog p is a set, initially empty, of elements of the form e = (parent; grandparent; ssn; rsn). If e 2 ReceiveLog p, then there exists a message m delivered by 5 r = (a; b; :::; n) a record r of type a b ::: n, and with r:i the value of eld i of record r. 9

10 process e:parent such that m:source = e:grandparent, m:ssn = e:ssn, m:rsn = e:rsn; and furthermore, there exists a message m 0 with m 0 :source = e:parent delivered by p such that m 0 depends on m. 6 SSN table: SsnT able p is a vector of send sequence numbers whose entries are initialized to 0. SsnT able p [q] records the highest send sequence number of any message from q delivered by p. Piggyback sequence number: P SN p is an integer, initially 0. P SN p keeps the value of the highest receive sequence number such that the corresponding entry e 2 RsnLog p has e:child 6=?. Entries e 2 RsnLog p such that e:rsn > P SN p might not be logged, and will be therefore piggybacked by p on the next outgoing message Example As an illustration of how FBL's message logging protocol works in the absence of link failures, consider Figure 1. 7 Each message that carries data is a triple (data; ssn; piggyback) where data is the data of the message, ssn is the message's send sequence number and piggyback is the information that is piggybacked on the message. We consider the execution of the protocol from the perspective of process p 3. Notice that p 3 piggybacks on each outgoing message m i enough information to fully log all the partially logged messages on which m i depends. In particular, consider the situation at time t 1 : even though process p 3 has already piggybacked the information concerning m 2 on message m 3, p 3 has not yet received an acknowledgment for m 3, and so cannot assume m 2 is fully logged. Hence, m 4 contains the information necessary to fully log m 1 and m 2. By time t 2, however, p 3 has received the acknowledgments for both m 3 and m 4, and therefore knows that m 1 is logged at p 4 and m 2 is logged at p 5. It then piggybacks on m 6 only the information necessary to fully log m 5. Figure 2 presents a snapshot of the system at time t Protocol Figures 3 through 7 give the code for the family-based logging protocol. Due to lack of space, we rely on the example given above instead of giving a detailed explanation of the code. Briey, Figures 3 and 4 give the code for the send and deliver operations. The 6 We avoid \Delivery log" just as we avoided \delivery sequence number." 7 For the sake of clarity we have included only the acknowledgments for messages m 3 and m 4. We assume that the acknowledgments for the remaining messages have not yet been received. 10

11 Process SSN RSN PSN SendLog RsnLog ReceiveLog SsnTable p (d 2 ; 1; 0; p 3 ) ; ; (0; 0; 0; 0; 0) (d 5 ; 2; 0; p 3 ) p (d 1 ; 1; 0; p 3 ) ; ; (0; 0; 0; 0; 0) p (d 3 ; 1; 1; p 4 ) (p 1 ; 1; 1; p 4 ) ; (2; 1; 0; 0; 0) (d 4 ; 2; 2; p 5 ) (p 2 ; 1; 2; p 5 ) (d 6 ; 3; 3; p 4 ) (p 1 ; 2; 3;?) p ; (p 3 ; 1; 1;?) (p 3 ; p 1 ; 1; 1) (0; 0; 3; 0; 0) (p 3 ; 3; 2;?) (p 3 ; p 1 ; 2; 3) p ; (p 3 ; 2; 1;?) (p 3 ; p 1 ; 1; 1) (0; 0; 2; 0; 0) (p 3 ; p 2 ; 1; 2) Figure 2: Snapshot of Execution of Figure 1 at time t 3 Process p sends data to process q SSN p SSN p + 1 m:source p m:data data m:ssn SSN p m:piggyback ; for all e 2 RsnLog p such that e:rsn > P SN p m:piggyback m:piggyback [ f(e:parent; e:ssn; e:rsn)g SendLog p SendLog p [ f(m:data; SSN p ; RSN p ; q)g send m to q Figure 3: Logging Protocol: Message Send message attribute m.piggyback is the set of triples (m 0 :parent; m 0 :ssn; m 0 :rsn) for messages m 0 received by p but not yet fully logged. Figure 5 gives the part of the protocol that advances the piggyback sequence number when an acknowledgement is received. Figure 6 gives the protocol a process runs when recovering, and Figure 7 gives the protocol a process executes when it is requested to send its logs to a recovering process. When p recovers, it uses this protocol to collect the send logs from its parents and the receive logs from its children in order to construct the logged messages in the data structure ReplayLog p, and it uses the RSN logs of its parents to reconstruct its receive log. In Figure 7, process q sends a sequence of messages to the recovering process p bracketed by the messages \q begin replay" and \q exit replay". The receives of these bracketing messages are not explicitly shown in Figure 6, but instead are denoted implicitly by the predicates \m.source has begun replaying" and \all processes have exited replay to p". Reconstruction of the logs is discussed further in Theorem 4 of Section

12 Process q delivers message m repeat receive message m from transport protocol until (m:ssn > SsnT able q [m:source]) RSN q RSN q + 1 SsnT able q [m:source] m:ssn RsnLog q RsnLog q [ f(m:source; m:ssn; RSN q ;?)g for all e 2 m:piggyback ReceiveLog q ReceiveLog q [ f(m:source; e)g deliver m:data Figure 4: Logging Protocol: Message Deliver Transport protocol informs process p of ack(ssn) Let l 2 SendLog p such that l:ssn = ssn if (l:rsn > P SN p ) P SN p l:rsn for all e 2 RsnLog p such that ((e:child =?) ^ (e:rsn P SN p )) e:child l:dest Figure 5: Logging Protocol: Acknowledgement Receive Proofs Theorem 3 Family-based logging is an implementation of the abstract message logging protocol. Proof: We will show this by giving a renement mapping from the data structures of the family-based logging protocol to L. For each process p and each entry e = (data; ssn; rsn; dest) 2 SendLog p there exists a message m 2 L: m:orig = p; m:dest = e:dest; m:ssn = e:ssn; and m:data = e:data. If there exists a process q with an entry e 0 = (grandparent; parent; ssn; rsn) 2 ReceiveLog q where m:orig = e 0 :grandparent and m:ssn = e 0 :ssn, then m:rsn = e 0 :rsn; otherwise, m:rsn =?. From Figure 3, by the time m is sent by p, the corresponding entry for m is in SendLog p. Message m delivered by p rst becomes relevant when some process q is the rst to deliver a message m 0 that depends on m. Since it is the rst such delivered message, m 0 must have been sent by p and so from Figure 4 the receive sequence number of m is in ReceiveLog q by the time m 0 is delivered. Thus, family-based logging implements the logging policy. Note that the value of L p does not reference any of the data structures local to process p, and so L p is dened when p has crashed and is recovering. Furthermore, Figure 6 references 12

13 Failure recovery for the faulty process p Reinitialize all message logging data structures ReplayLog p ; send ``p crashed/recovering'' to all other processes while : (all processes have exited replay to p) receive m if (m:source has begun replaying) if (m = (source; data; ssn)) if 9e 2 ReplayLog p such that ((e:orig = m:source) ^ (e:ssn = m:ssn)) e:data m:data else ReplayLog p ReplayLog p [ flg where e:orig = m:source e:data = m:data e:ssn = m:ssn e:rsn =? else if (m = (source; parent; ssn; rsn)) if 9e 2 ReplayLog p such that ((e:orig = m:parent) ^ (e:ssn = m:ssn)) e:rsn m:rsn e:child m:source else ReplayLog p ReplayLog p [ flg where e:orig = m:parent e:ssn = m:ssn e:rsn = m:rsn e:child = m:source else if (m = (source; grandparent; ssn; rsn)) ReceiveLog p ReceiveLog p [ f(m:source; m:grandparent; m:ssn; m:rsn)g else discard m for all e 2 ReplayLog p such that (e:rsn 6=?), in ascending e:rsn order RSN p RSN p + 1 SsnT able p [e:orig] e:ssn RsnLog p RsnLog p [ f(e:orig; e:ssn; e:rsn; e:child)g deliver e:data for all remaining e 2 ReplayLog p, in ascending e:ssn order RSN p RSN p + 1 SsnT able p [e:orig] e:ssn RsnLog p RsnLog p [ f(e:orig; e:ssn; RSN p ;?)g deliver e:data Figure 6: Recovery Protocol: Recovering Process 13

14 Recovery of process p from the perspective of non-faulty process q q receives ``p crashed/recovering'' send ``q begin replay'' to p for all e 2 SendLog q such that e:dest = p m:source q m:data e:data m:ssn e:ssn send m to p for all e 2 ReceiveLog q such that e:parent = p m:source q m:parent e:grandparent m:ssn e:ssn m:rsn e:rsn send m to p for all e 2 RsnLog q such that ((e:child = p) _ (e:child =?)) m:source q m:grandparent e:parent m:ssn e:ssn m:rsn e:rsn send m to p send ``q exit replay'' to p Figure 7: Recovery Protocol: Non-Faulty Process the same data structures and elds as the denition of L p does when recovering p and resends the messages in an order consistent with the recovery procedure. Thus, family-based logging implements the recovery procedure. 2 Theorem 4 Family-based logging satises the stable log property. Proof: Consider three processes s, p that is a child of s, and q that is a child of p. If p crashes, then the values of L s value of L q becomes undened. that are fully logged may no longer be fully logged, and the From Theorem 2, if p sent a message m that was delivered by some process, then p will send the same m (that is, with the same data and send sequence number) when recovering, since otherwise the recovered global state would not be consistent. By doing so, p will reenter m into SendLog p with the same values as were in the log before p's crash. The data structure ReceiveLog p is rebuilt from the entries in RsnLog s for all parents s of p that have the child value set to either p or?. Sending the latter entries those with (child =?) is necessary because s may not know whether a message delivered by s was fully logged or not, and so it ensures that all messages s has delivered are fully logged. Finally, RsnLog p is rebuilt from the information collected from the receive logs of p's children (contributing the child and rsn elds for the messages that were fully logged) and the send logs of p's parents (contributing the parent and ssn elds). 2 14

15 4.4 Optimizations In this section we discuss a number of techniques for reducing the cost of family-based logging. Our goal is to reduce the quantity of information piggybacked on each message, without adding too much complexity to the failure recovery protocol. For example, consider the grandparent and ssn elds of the receive log. Together, these elds uniquely determine a single message recorded in the appropriate send log; but, since channels are FIFO, the grandparent and rsn elds can be used to uniquely determine a message. Suppose that process p delivers message m and then later fails. Note that if m:rsn has been logged, then m is relevant, and so every message delivered by p before m must also be relevant, hence fully logged. Thus, by using the grandparent and rsn elds from the receive logs of its children, the recovering p can compute the order in which its parents sent it relevant messages. If p also constructs the replayed messages from each of its parents in separate sequences, each ordered by send sequence number, then the ordered sequence of parents can be merged with the ordered sequences of messages so that all relevant messages are matched with their original receive sequence numbers. Once the ssn eld is eliminated from the receive log, the RSN log no longer needs an ssn eld, and so ssn values need not be piggybacked (except for the single m:ssn associated with a message m itself). Not only can we eliminate send sequence numbers from piggybacks, we can also eliminate most of the receive sequence numbers. Consider a single message m from p to q; m carries a sequence of entries from RsnLog p. If we constrain p to piggyback in receive sequence number order, then only the lowest rsn value attached to m need be piggybacked, and q can compute the other receive sequence numbers itself. Thus, p need piggyback only a sequence of parents, together with a single receive sequence number Sender-Based Optimization Suppose there is some process p with only a single child, q. In this special case, we can easily optimize the usual message logging protocol. Because of FIFO channels, p need never piggyback the same entry in RsnLog p to q more than once. The message logging protocol for such a process p can thus be modied as follows: Whenever p sends a message m to q in state p [RSN p ], then p's piggyback sequence number, P SN p, can immediately be set to RSN p (without waiting for q to acknowledge m). This optimization can be generalized to an arbitrary process. For any process p, no entry in RsnLog p need be piggybacked to any single destination more than once. To implement this, p must keep track of a dierent piggyback sequence number for each child of p. We store these in a new data structure: the PSN table. For any pair of processes p and q, 15

16 let Q p be the set of entries of RsnLog p that have been piggybacked from p to q. Then dene P snt able p [q] = maxfe:rsn : e 2 Q p g. If P SN p is maintained as before (based on acknowledgements), then whenever p sends a message m to q, p need only piggyback all e 2 RsnLog p such that e:rsn > maxfp SN p ; P snt able p [q]g. After m has been sent, p must update P snt able p [q] to reect the information piggybacked on m Receiver-Based Optimization Using the above optimizations, we have transformed the piggyback data structure from a sequence of the form f(p 1 ; ssn 1 ; rsn 1 ); : : : ; (p k ; ssn k ; rsn k )g to a sequence of the form frsn 1 ; p 1 ; p 2 ; : : : ; p k g, such that no information is piggybacked to a single child more than once. We now consider one method of further compressing the piggybacked sequence of parent id's. Suppose there is some process p with only a single parent, q. In this case, we can again easily optimize the usual message logging protocol. Note that the send sequence numbers assigned by q dene a total ordering of the messages delivered by p. Thus, p need not keep any RSN log at all. Should p fail, q's send log contains sucient information to recover p. (Likewise, p does not need RSN p, P SN p, and P snt able p ; however, p does need SsnT able p and ReceiveLog p in case q fails.) Process p logs receive sequence numbers in order to record the nondeterministic characteristics of a run. Since channels are FIFO, however, p need only log the order in which p interleaves messages from dierent parents. If a message logging protocol records the interleaving of messages from dierent parents for a process p, then this information together with the send logs of p's parents is sucient to recover p from failure. Family-based logging can easily accommodate this general optimization. Each process p can maintain an interleave sequence number, or ISN p. ISN p is initially zero, and is incremented in state p [`] if ` = 1 or if the source of the `th message is dierent from the source of the (`? 1) th message. The RSN log can then be modied to record interleave sequence numbers. An entry e 2 RsnLog p contains two new elds: Field e:isn equals the value of ISN p in state p [e:rsn]. Field e:runlength equals the number of consecutive messages delivered by p from e:parent since state p [e:rsn]. The message logging protocol can then be modied so that p adds a new entry to its RSN log only when p increments its interleave sequence number. Thus, e:isn serves to count RSN log entries just as e:rsn does in the unoptimized protocol. If p delivers a message but does not increment its interleave sequence number, then p has delivered a run of messages from the same parent, and p must increment the runlength eld in the last entry 16

17 of its RSN log. Together with e:rsn, e:runlength encodes all the receive sequence numbers corresponding to entry e 2 RsnLog p. Given these modications, consider a message logging protocol in which P SN p and P snt able p contain interleave sequence numbers rather than receive sequence numbers, but otherwise function as before. Whenever p delivers consecutive messages from a single parent, RsnLog p will stop growing, and p will piggyback nothing on its outgoing messages. Whenever p interleaves messages from dierent parents, p will record the interleaving in new entries of RsnLog p, and then piggyback the information in these new entries on outgoing messages as before. (Note that the sequence of parent id's actually piggybacked by p can also be compressed with run length encoding, but this optimization is independent of the changes to the message logging data structures described here.) Finally, we must strengthen our failure recovery protocol in order to guarantee correctness. Suppose process p fails. Note that p may have delivered many consecutive relevant messages from one parent before crashing. Only the rst receive sequence number of this sequence has necessarily been piggybacked and recorded in a child's receive log. However, if we require that p deliver replayed messages such that runs of consecutive messages from the same parent are maximized (subject to the receive sequence number ordering imposed by p's replay log) then failure recovery will return the system to a consistent state. 5 Performance We have described in Section 4 an optimal message logging protocol, in the sense that it requires no extra messages and no extra processes in a failure-free run. However, this measure of optimality ignores the most interesting cost of family-based logging: the extra information that must be piggybacked on application messages. In this section we discuss theoretical bounds and empirical measurements of this cost. We also briey discuss failure recovery performance. Our discussion concerns family-based logging as described in Section 4.3 with the additional optimization provided by the PSN table of Section Predicted Performance For a message m from p to q, we will measure the quantity of piggybacked information by the number of RSN log entries contained in m. Unfortunately, for arbitrary m, we can only bound this number by the total number of messages delivered by p. However, let denote the average piggyback size on all messages sent during a run R of a set of processes P, jp j = n in which no process crashes. That is, equals the total number of piggybacked RSN 17

18 log entries divided by the total number of send events. We can bound as follows: If R contains s send events, r receive events, d delivery events, and f transient channel failures, then f = s? r. Note that the total number of RSN log entries recorded by all p 2 P is at most d. Using the PSN table optimization, each entry can be piggybacked no more than n? 1 + f times; thus, the total number of piggybacked RSN log entries is at most d(n?1+f). Dividing by the total number of send events, we obtain d(n?1+f)=(r +f). Since dr and f0, this bound simplies to n? 1 + f. This bound is achieved if each p2p runs the following application: do forever for all q2p such that q6=p send message to q for i 1 to n? 1 receive message The second time the do-loop iterates, each process must piggyback n? 1 RSN log entries on every outgoing message. Assuming no channel failures, the average piggyback size will quickly approach n? 1 as the loop repeats. This example illustrates the worst possible environment for family-based logging. We expect that many applications will not approach the n? 1 + f worst case. The practical behavior of family-based logging depends largely on two factors. First, the frequency of acknowledgements has an obvious eect. Piggyback size will decrease if acknowledgements arrive promptly. Thus, a positive acknowledgement protocol is an ideal setting for family-based logging. Negative acknowledgement protocols may delay acknowledgements and increase piggyback size compared to positive acknowledgement schemes; however, the n? 1 + f bound applies regardless of the underlying transport protocol. Second, the application communication pattern strongly aects the performance of familybased logging. As our example shows, family-based logging suers when each process has a large family of active parents and children. Note that in this case the set of parents and children will tend to intersect: each process will tend to send messages to and receive messages from some common set of processes. In such an environment, a negative acknowledgement protocol should be very eective, since extra acknowledgement packets will rarely be sent. Thus, in worst-case applications, family-based logging should actually perform better using a negative acknowledgement protocol. 18

19 5.2 Observed Performance We have completed an initial implementation of the message logging and failure recovery protocols described in Section 4.3, with the addition of the PSN table described in Section In addition, we have developed a special application to measure the performance of family-based logging under a wide variety of conditions. This section describes our current implementation and presents some empirical results Experimental Setup We implemented family-based logging and have used it on a set of four Sun workstations two SPARC2s and two IPXs running Sun OS Communications were implemented using UDP datagram sockets, layered over IP on a 10 Mbit/sec Ethernet; we used Sun LWP to manage program concurrency. Because the family-based logging protocol needs to receive the acknowledgements from the underlying data link layer protocol, we implemented a simple data link layer using a positive acknowledgement sliding window protocol; the data link layer can send 1024-byte messages between any pair of processes at the sustained rate of 2.5 milliseconds per message Application We designed a unique application in order to test our implementation of family-based logging. The application is controlled by a master process. Based on user input, the master determines the number of additional processes to create and the characteristics of the communication among these processes. The master then writes a script for each process and sends each script to the appropriate process. Once each process has received its script, it executes the instructions contained in the script (e.g., \send to p", \receive") while monitoring message logging statistics. There are two important properties of this application. First, it reduces application computational overhead to near zero. Writing, sending, and receiving scripts is completed before any performance data is collected. Each application event requires only a single array access and a test of the resulting value; the rest of the measured time is devoted to interprocess communication. Thus, we measure the message logging overhead as compared to virtually pure communication cost. Second, our single application can model a wide range of application communication patterns in a controlled and repeatable manner. In addition to specifying the number of processes and the number and size of application messages, the user can control two pa- 19

20 rameters: the blast factor and the branch factor. The blast factor determines the relative frequency of sends and receives for each process. A process in a \blasty" run typically sends many messages before receiving any, and then receives many before sending again; in a non-blasty run it usually alternates between a single send and a single receive. The branch factor determines the relative number of parents and children for each process. A process in a \branchy" run typically sends messages to and receives messages from many dierent processes; in a non-branchy run it communicates with a small subset of processes Results We measured message logging overhead for two distinct application scenarios. The worst-case example of Section 5.1 can be tested with a script that is maximally blasty and branchy. We call this scenario blast. To test family-based logging in a dierent setting, we set the blast factor to its minimum value, and kept the branch factor at its maximum value; a process in such a run typically alternates between sending one message and receiving one message, and communicates with all other processes equally often. We call this scenario spray. In Figure 8 we summarize our results. Both scenarios send and receive byte messages system-wide (approximately 1250 at each process). Using the Sun system clock, we measured the total run-time for three dierent logging settings: rst with no message logging, then with piggybacking only (that is, the send log is not updated), and nally with full message logging. The piggybacking gures do not truly isolate the cost of piggybacking, but do suggest the relative cost of piggybacking for dierent scenarios. For example, spray spends more time piggybacking less data than blast: spray almost always has to recompute a new piggyback for each send, while blast can re-use the same piggyback for each sequence of sends. Our message logging overhead of 25 percent is comparable to the overhead of optimized pessimistic sender-based logging [5]. 8 We also tested failure recovery by crashing a single process half-way through each scenario, and at the end of each scenario. Recovery in blast is signicantly slower than in spray: blast causes complete redundancy in receive logs, giving a recovering process extra work to do, whereas spray tends to piggyback each RSN log entry to only one or two children. 8 This implementation of sender-based logging does not block and sends no extra messages, and so avoids the theoretical cost of the protocol entirely. 20

21 Application No P'back Full Half-run Full-run P'back P'back Log Scenario Log Only Log Recovery Recovery Size Ovhd Ovhd spray % 25% blast % 25% Figure 8: Family-Based Logging Performance (time in seconds) 6 Conclusions and Further Directions In this paper, we developed a message logging protocol that introduces no additional blocking to the application and does not create orphans. Furthermore, the protocol is very ecient in that it only sends the application messages (possibly resent due to link failures) and their acknowledgements. Thus, our protocol does not use any more messages in a failure-free run than a message delivery protocol for a system in which transient link failures can occur but processes do not crash. The protocol may make application messages arbitrarily larger, but from our observations the average amount of overhead is small. The major limitation of this protocol is that it can only withstand a sequence of (process crash; process recovery) pairs. For example, if process p sends messages to process q and both p and q simultaneously crash, then orphans may be created and q may nd itself trying to reconstruct a message m 2 L q for which there exists only a receive sequence number. However, we are designing a protocol that can tolerate f 1 simultaneous crashes, implements the stable log property, and has the same eciency as the protocol presented here. We are trying to prove this protocol correct by extending the abstract message logging protocol described in Section 4.2. We are also examining how the message logging protocol can be further optimized by using the semantics of the application. For example, this research was rst motivated by discussions with a group at IBM FSC in the AAS project [4]. In this system, processes are assumed to be usually functional and can recover by simply receiving new messages. This idea can be generalized to the existence of messages that a process p can receive in any order with respect to messages from other processes without changing the sequence of messages that p subsequently sends. If such optimizations are taken into account, then the amount of logged information can be further diminished. Acknowledgements This work originated through discussions with Fred B. Schneider and with Jon Dehn and Burnie Witt of IBM FSC. We would like to thank Robbert Van Rennesse for his help in understanding the intricacies of Sun LWP and socket-level communication. 21

22 References [1] Anita Borg, J. Baumbach, and S. Glazer. A message system supporting fault tolerance. In Proceedings of the Symposium on Operating Systems Principles, pages 90{99. ACM SIGOPS, October [2] Navin Budhiraja, Keith Marzullo, Fred B. Schneider, and Sam Toueg. Primary{backup protocols: Lower bounds and optimal implementations. In Proceedings of the Third IFIP Conference on Dependable Computing for Critical Applications, September [3] K. M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63{75, February [4] F. Cristian, B. Dancey, and J. Dehn. Fault-tolerance in the Advanced Automation System. In Digest of Papers: 20th IEEE International Conference on Fault-Tolerant Computing. IEEE Computer Society, June [5] D. B. Johnson. Distributed System Fault Tolerance Using Message Logging and Checkpointing. PhD thesis, Rice University, December Available as report COMP TR [6] D. B. Johnson. Personal communication, November [7] D.B. Johnson and W. Zwaenepoel. Sender-based message logging. In Digest of Papers: 17 Annual International Symposium on Fault-Tolerant Computing, pages 14{19. IEEE Computer Society, June [8] D.B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11:462{491, [9] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558{565, July [10] M.L. Powell and D.L. Presotto. Publishing: A reliable broadcast communication mechanism. In Proceedings of the Ninth Symposium on Operating System Principles, pages 100{109. ACM SIGOPS, October [11] Fred B. Schneider. Byzntine generals in action: Implementing fail-stop processors. ACM Transactions on Computer Systems, 2(2):145{154, May [12] Fred B. Schneider. Implementing fault{tolerant services using the state machine approach: A tutorial. Computing Surveys, 22(3):299{319, September [13] A.P. Sistla and J.L. Welch. Ecient distributed recovery using message logging. In Proceedings of the Eighth Symposium on Principles of Distributed Computing, pages 223{238. ACM SIGACT/SIGOPS, August [14] Ray Strom and S. Yemeni. Optimistic recovery in distributed systems. ACM Transactions on Computer Systems, 3(3):204{226, April

Message Logging: Pessimistic, Optimistic, Causal, and Optimal

Message Logging: Pessimistic, Optimistic, Causal, and Optimal IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 24, NO. 2, FEBRUARY 1998 149 Message Logging: Pessimistic, Optimistic, Causal, and Optimal Lorenzo Alvisi and Keith Marzullo Abstract Message-logging protocols

More information

David B. Johnson. Willy Zwaenepoel. Rice University. Houston, Texas. or the constraints of real-time applications [6, 7].

David B. Johnson. Willy Zwaenepoel. Rice University. Houston, Texas. or the constraints of real-time applications [6, 7]. Sender-Based Message Logging David B. Johnson Willy Zwaenepoel Department of Computer Science Rice University Houston, Texas Abstract Sender-based message logging isanewlow-overhead mechanism for providing

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

a resilient process in which the crash of a process is translated into intermittent unavailability of that process. All message-logging protocols requ

a resilient process in which the crash of a process is translated into intermittent unavailability of that process. All message-logging protocols requ Message Logging: Pessimistic, Optimistic, Causal and Optimal Lorenzo Alvisi The University of Texas at Austin Department of Computer Sciences Austin, TX Keith Marzullo y University of California, San Diego

More information

Rollback-Recovery p Σ Σ

Rollback-Recovery p Σ Σ Uncoordinated Checkpointing Rollback-Recovery p Σ Σ Easy to understand No synchronization overhead Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart m 8

More information

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware Where should RC be implemented? In hardware sensitive to architecture changes At the OS level state transitions hard to track and coordinate At the application level requires sophisticated application

More information

Some Thoughts on Distributed Recovery. (preliminary version) Nitin H. Vaidya. Texas A&M University. Phone:

Some Thoughts on Distributed Recovery. (preliminary version) Nitin H. Vaidya. Texas A&M University. Phone: Some Thoughts on Distributed Recovery (preliminary version) Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 Phone: 409-845-0512 Fax: 409-847-8578 E-mail:

More information

On the Relevance of Communication Costs of Rollback-Recovery Protocols

On the Relevance of Communication Costs of Rollback-Recovery Protocols On the Relevance of Communication Costs of Rollback-Recovery Protocols E.N. Elnozahy June 1995 CMU-CS-95-167 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 To appear in the

More information

Distributed Recovery with K-Optimistic Logging. Yi-Min Wang Om P. Damani Vijay K. Garg

Distributed Recovery with K-Optimistic Logging. Yi-Min Wang Om P. Damani Vijay K. Garg Distributed Recovery with K-Optimistic Logging Yi-Min Wang Om P. Damani Vijay K. Garg Abstract Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Fault-Tolerant Computer Systems ECE 60872/CS Recovery

Fault-Tolerant Computer Systems ECE 60872/CS Recovery Fault-Tolerant Computer Systems ECE 60872/CS 59000 Recovery Saurabh Bagchi School of Electrical & Computer Engineering Purdue University Slides based on ECE442 at the University of Illinois taught by Profs.

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Optimistic Message Logging for Independent Checkpointing. in Message-Passing Systems. Yi-Min Wang and W. Kent Fuchs. Coordinated Science Laboratory

Optimistic Message Logging for Independent Checkpointing. in Message-Passing Systems. Yi-Min Wang and W. Kent Fuchs. Coordinated Science Laboratory Optimistic Message Logging for Independent Checkpointing in Message-Passing Systems Yi-Min Wang and W. Kent Fuchs Coordinated Science Laboratory University of Illinois at Urbana-Champaign Abstract Message-passing

More information

Novel Log Management for Sender-based Message Logging

Novel Log Management for Sender-based Message Logging Novel Log Management for Sender-based Message Logging JINHO AHN College of Natural Sciences, Kyonggi University Department of Computer Science San 94-6 Yiuidong, Yeongtonggu, Suwonsi Gyeonggido 443-760

More information

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally Hazard-Free Connection Release Jennifer E. Walter Department of Computer Science Texas A&M University College Station, TX 77843-3112, U.S.A. Jennifer L. Welch Department of Computer Science Texas A&M University

More information

The Primary{Backup Approach. Navin Budhiraja. Sam Toueg x. Cornell University

The Primary{Backup Approach. Navin Budhiraja. Sam Toueg x. Cornell University Chapter 8: The Primary{Backup Approach Navin Budhiraja Keith Marzullo y Fred B. Schneider z Sam Toueg x Department of Computer Science Cornell University Ithaca, New York 14853, USA 1 Introduction One

More information

Parallel & Distributed Systems group

Parallel & Distributed Systems group How to Recover Efficiently and Asynchronously when Optimism Fails Om P Damani Vijay K Garg TR TR-PDS-1995-014 August 1995 PRAESIDIUM THE DISCIPLINA CIVITATIS UNIVERSITYOFTEXAS AT AUSTIN Parallel & Distributed

More information

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered

More information

A Case for Two-Level Distributed Recovery Schemes. Nitin H. Vaidya. reduce the average performance overhead.

A Case for Two-Level Distributed Recovery Schemes. Nitin H. Vaidya.   reduce the average performance overhead. A Case for Two-Level Distributed Recovery Schemes Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-31, U.S.A. E-mail: vaidya@cs.tamu.edu Abstract Most distributed

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

Page 1 FAULT TOLERANT SYSTEMS. Coordinated Checkpointing. Time-Based Synchronization. A Coordinated Checkpointing Algorithm

Page 1 FAULT TOLERANT SYSTEMS. Coordinated Checkpointing. Time-Based Synchronization. A Coordinated Checkpointing Algorithm FAULT TOLERANT SYSTEMS Coordinated http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Chapter 6 II Uncoordinated checkpointing may lead to domino effect or to livelock Example: l P wants to take a

More information

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra Today CSCI 5105 Recovery CAP Theorem Instructor: Abhishek Chandra 2 Recovery Operations to be performed to move from an erroneous state to an error-free state Backward recovery: Go back to a previous correct

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 17 - Checkpointing II Chapter 6 - Checkpointing Part.17.1 Coordinated Checkpointing Uncoordinated checkpointing may lead

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University Using the Holey Brick Tree for Spatial Data in General Purpose DBMSs Georgios Evangelidis Betty Salzberg College of Computer Science Northeastern University Boston, MA 02115-5096 1 Introduction There is

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

A Survey of Rollback-Recovery Protocols in Message-Passing Systems

A Survey of Rollback-Recovery Protocols in Message-Passing Systems A Survey of Rollback-Recovery Protocols in Message-Passing Systems Mootaz Elnozahy * Lorenzo Alvisi Yi-Min Wang David B. Johnson June 1999 CMU-CS-99-148 (A revision of CMU-CS-96-181) School of Computer

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Lixia Zhang M. I. T. Laboratory for Computer Science December 1985

Lixia Zhang M. I. T. Laboratory for Computer Science December 1985 Network Working Group Request for Comments: 969 David D. Clark Mark L. Lambert Lixia Zhang M. I. T. Laboratory for Computer Science December 1985 1. STATUS OF THIS MEMO This RFC suggests a proposed protocol

More information

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter

More information

A SURVEY AND PERFORMANCE ANALYSIS OF CHECKPOINTING AND RECOVERY SCHEMES FOR MOBILE COMPUTING SYSTEMS

A SURVEY AND PERFORMANCE ANALYSIS OF CHECKPOINTING AND RECOVERY SCHEMES FOR MOBILE COMPUTING SYSTEMS International Journal of Computer Science and Communication Vol. 2, No. 1, January-June 2011, pp. 89-95 A SURVEY AND PERFORMANCE ANALYSIS OF CHECKPOINTING AND RECOVERY SCHEMES FOR MOBILE COMPUTING SYSTEMS

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what

More information

AN EFFICIENT ALGORITHM IN FAULT TOLERANCE FOR ELECTING COORDINATOR IN DISTRIBUTED SYSTEMS

AN EFFICIENT ALGORITHM IN FAULT TOLERANCE FOR ELECTING COORDINATOR IN DISTRIBUTED SYSTEMS International Journal of Computer Engineering & Technology (IJCET) Volume 6, Issue 11, Nov 2015, pp. 46-53, Article ID: IJCET_06_11_005 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=6&itype=11

More information

Problem 7. Problem 8. Problem 9

Problem 7. Problem 8. Problem 9 Problem 7 To best answer this question, consider why we needed sequence numbers in the first place. We saw that the sender needs sequence numbers so that the receiver can tell if a data packet is a duplicate

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems 2. Time and Global States Page 1 University of Freiburg, Germany Department of Computer Science Distributed Systems Chapter 3 Time and Global States Christian Schindelhauer 12. May 2014 2. Time and Global

More information

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212. Revision 2 Othon Michail Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS

MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS Ruchi Tuli 1 & Parveen Kumar 2 1 Research Scholar, Singhania University, Pacheri Bari (Rajasthan) India 2 Professor, Meerut Institute

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Web: http://www.cs.tamu.edu/faculty/vaidya/ Abstract

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

APPLICATION-TRANSPARENT ERROR-RECOVERY TECHNIQUES FOR MULTICOMPUTERS

APPLICATION-TRANSPARENT ERROR-RECOVERY TECHNIQUES FOR MULTICOMPUTERS Proceedings of the Fourth onference on Hypercubes, oncurrent omputers, and Applications Monterey, alifornia, pp. 103-108, March 1989. APPLIATION-TRANSPARENT ERROR-REOVERY TEHNIQUES FOR MULTIOMPUTERS Tiffany

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

21. Distributed Algorithms

21. Distributed Algorithms 21. Distributed Algorithms We dene a distributed system as a collection of individual computing devices that can communicate with each other [2]. This denition is very broad, it includes anything, from

More information

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi DEPT. OF Comp Sc. and Engg., IIT Delhi Three Models 1. CSV888 - Distributed Systems 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1 Index - Models to study [2] 1. LAN based systems

More information

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a Multi-Layer Incremental Induction Xindong Wu and William H.W. Lo School of Computer Science and Software Ebgineering Monash University 900 Dandenong Road Melbourne, VIC 3145, Australia Email: xindong@computer.org

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

A Synchronization Algorithm for Distributed Systems

A Synchronization Algorithm for Distributed Systems A Synchronization Algorithm for Distributed Systems Tai-Kuo Woo Department of Computer Science Jacksonville University Jacksonville, FL 32211 Kenneth Block Department of Computer and Information Science

More information

Process groups and message ordering

Process groups and message ordering Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create ( name ), kill ( name ) join ( name, process ), leave

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Novel low-overhead roll-forward recovery scheme for distributed systems

Novel low-overhead roll-forward recovery scheme for distributed systems Novel low-overhead roll-forward recovery scheme for distributed systems B. Gupta, S. Rahimi and Z. Liu Abstract: An efficient roll-forward checkpointing/recovery scheme for distributed systems has been

More information

1 Introduction A mobile computing system is a distributed system where some of nodes are mobile computers [3]. The location of mobile computers in the

1 Introduction A mobile computing system is a distributed system where some of nodes are mobile computers [3]. The location of mobile computers in the Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems Ravi Prakash and Mukesh Singhal Department of Computer and Information Science The Ohio State University Columbus, OH 43210. e-mail:

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Distributed algorithms

Distributed algorithms Distributed algorithms Prof R. Guerraoui lpdwww.epfl.ch Exam: Written Reference: Book - Springer Verlag http://lpd.epfl.ch/site/education/da - Introduction to Reliable (and Secure) Distributed Programming

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

Local Stabilizer. Yehuda Afek y Shlomi Dolev z. Abstract. A local stabilizer protocol that takes any on-line or o-line distributed algorithm and

Local Stabilizer. Yehuda Afek y Shlomi Dolev z. Abstract. A local stabilizer protocol that takes any on-line or o-line distributed algorithm and Local Stabilizer Yehuda Afek y Shlomi Dolev z Abstract A local stabilizer protocol that takes any on-line or o-line distributed algorithm and converts it into a synchronous self-stabilizing algorithm with

More information

A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems

A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems Rachit Garg 1, Praveen Kumar 2 1 Singhania University, Department of Computer Science & Engineering, Pacheri Bari (Rajasthan),

More information

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer - proposes a formal definition for the timed asynchronous distributed system model - presents measurements of process

More information

Optimistic Recovery in Distributed Systems

Optimistic Recovery in Distributed Systems Optimistic Recovery in Distributed Systems ROBERT E. STROM and SHAULA YEMINI IBM Thomas J. Watson Research Center Optimistic Recovery is a new technique supporting application-independent transparent recovery

More information

Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract

Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract Enhancing Integrated Layer Processing using Common Case Anticipation and Data Dependence Analysis Extended Abstract Philippe Oechslin Computer Networking Lab Swiss Federal Institute of Technology DI-LTI

More information

THE TRANSPORT LAYER UNIT IV

THE TRANSPORT LAYER UNIT IV THE TRANSPORT LAYER UNIT IV The Transport Layer: The Transport Service, Elements of Transport Protocols, Congestion Control,The internet transport protocols: UDP, TCP, Performance problems in computer

More information

The problem of minimizing the elimination tree height for general graphs is N P-hard. However, there exist classes of graphs for which the problem can

The problem of minimizing the elimination tree height for general graphs is N P-hard. However, there exist classes of graphs for which the problem can A Simple Cubic Algorithm for Computing Minimum Height Elimination Trees for Interval Graphs Bengt Aspvall, Pinar Heggernes, Jan Arne Telle Department of Informatics, University of Bergen N{5020 Bergen,

More information

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology Clock and Time THOAI NAM Faculty of Information Technology HCMC University of Technology Using some slides of Prashant Shenoy, UMass Computer Science Chapter 3: Clock and Time Time ordering and clock synchronization

More information

Specifying and Proving Broadcast Properties with TLA

Specifying and Proving Broadcast Properties with TLA Specifying and Proving Broadcast Properties with TLA William Hipschman Department of Computer Science The University of North Carolina at Chapel Hill Abstract Although group communication is vitally important

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

Ecient Redo Processing in. Jun-Lin Lin. Xi Li. Southern Methodist University

Ecient Redo Processing in. Jun-Lin Lin. Xi Li. Southern Methodist University Technical Report 96-CSE-13 Ecient Redo Processing in Main Memory Databases by Jun-Lin Lin Margaret H. Dunham Xi Li Department of Computer Science and Engineering Southern Methodist University Dallas, Texas

More information

Unit 2.

Unit 2. Unit 2 Unit 2 Topics Covered: 1. PROCESS-TO-PROCESS DELIVERY 1. Client-Server 2. Addressing 2. IANA Ranges 3. Socket Addresses 4. Multiplexing and Demultiplexing 5. Connectionless Versus Connection-Oriented

More information

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu Semantic Foundations of Commutativity Analysis Martin C. Rinard y and Pedro C. Diniz z Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 fmartin,pedrog@cs.ucsb.edu

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 18 Transaction Processing and Database Manager In the previous

More information

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo Two-Stage Service Provision by Branch and Bound Shane Dye Department ofmanagement University of Canterbury Christchurch, New Zealand s.dye@mang.canterbury.ac.nz Asgeir Tomasgard SINTEF, Trondheim, Norway

More information

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o Reconstructing a Binary Tree from its Traversals in Doubly-Logarithmic CREW Time Stephan Olariu Michael Overstreet Department of Computer Science, Old Dominion University, Norfolk, VA 23529 Zhaofang Wen

More information

Distributed Systems

Distributed Systems 15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard

More information

Breakpoints and Halting in Distributed Programs

Breakpoints and Halting in Distributed Programs 1 Breakpoints and Halting in Distributed Programs Barton P. Miller Jong-Deok Choi Computer Sciences Department University of Wisconsin-Madison 1210 W. Dayton Street Madison, Wisconsin 53706 Abstract Interactive

More information

Implementation of Process Networks in Java

Implementation of Process Networks in Java Implementation of Process Networks in Java Richard S, Stevens 1, Marlene Wan, Peggy Laramie, Thomas M. Parks, Edward A. Lee DRAFT: 10 July 1997 Abstract A process network, as described by G. Kahn, is a

More information

A Dag-Based Algorithm for Distributed Mutual Exclusion. Kansas State University. Manhattan, Kansas maintains [18]. algorithms [11].

A Dag-Based Algorithm for Distributed Mutual Exclusion. Kansas State University. Manhattan, Kansas maintains [18]. algorithms [11]. A Dag-Based Algorithm for Distributed Mutual Exclusion Mitchell L. Neilsen Masaaki Mizuno Department of Computing and Information Sciences Kansas State University Manhattan, Kansas 66506 Abstract The paper

More information

Optimistic Implementation of. Bulk Data Transfer Protocols. John B. Carter. Willy Zwaenepoel. Rice University. Houston, Texas.

Optimistic Implementation of. Bulk Data Transfer Protocols. John B. Carter. Willy Zwaenepoel. Rice University. Houston, Texas. Optimistic Implementation of Bulk Data Transfer Protocols John B. Carter Willy Zwaenepoel Department of Computer Science Rice University Houston, Texas Abstract During a bulk data transfer over a high

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 07 (version 16th May 2006) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel:

More information

A Reliable Broadcast System

A Reliable Broadcast System A Reliable Broadcast System Yuchen Dai, Xiayi Huang, Diansan Zhou Department of Computer Sciences and Engineering Santa Clara University December 10 2013 Table of Contents 2 Introduction......3 2.1 Objective...3

More information

Ruminations on Domain-Based Reliable Broadcast

Ruminations on Domain-Based Reliable Broadcast Ruminations on Domain-Based Reliable Broadcast Svend Frølund Fernando Pedone Hewlett-Packard Laboratories Palo Alto, CA 94304, USA Abstract A distributed system is no longer confined to a single administrative

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream

Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream Agent Roles in Snapshot Assembly Delbert Hart Dept. of Computer Science Washington University in St. Louis St. Louis, MO 63130 hart@cs.wustl.edu Eileen Kraemer Dept. of Computer Science University of Georgia

More information

Lecture 11 Overview. Last Lecture. This Lecture. Next Lecture. Medium Access Control. Flow and error control Source: Sections , 23.

Lecture 11 Overview. Last Lecture. This Lecture. Next Lecture. Medium Access Control. Flow and error control Source: Sections , 23. Last Lecture Lecture 11 Overview Medium Access Control This Lecture Flow and error control Source: Sections 11.1-11.2, 23.2 Next Lecture Local Area Networks 1 Source: Sections 13 Data link layer Logical

More information

Implementation of Clocks and Sensors

Implementation of Clocks and Sensors Implementation of Clocks and Sensors Term Paper EE 382N Distributed Systems Dr. Garg November 30, 2000 Submitted by: Yousuf Ahmed Chandresh Jain Onur Mutlu Global Predicate Detection in Distributed Systems

More information

Unreliable NET. Application Layer. Transport Layer. packet flow. ack flow. Network Layer. Unreliable Net. Actual Loss Rate λ loss.

Unreliable NET. Application Layer. Transport Layer. packet flow. ack flow. Network Layer. Unreliable Net. Actual Loss Rate λ loss. Partially Reliable Transport Service Rahmi Marasli Paul D. Amer Phillip T. Conrad Computer and Information Sciences Department University of Delaware, Newark, DE 976 USA Email: fmarasli,amer,pconradg@cis.udel.edu

More information

Fault-Tolerance & Paxos

Fault-Tolerance & Paxos Chapter 15 Fault-Tolerance & Paxos How do you create a fault-tolerant distributed system? In this chapter we start out with simple questions, and, step by step, improve our solutions until we arrive at

More information

On the interconnection of message passing systems

On the interconnection of message passing systems Information Processing Letters 105 (2008) 249 254 www.elsevier.com/locate/ipl On the interconnection of message passing systems A. Álvarez a,s.arévalo b, V. Cholvi c,, A. Fernández b,e.jiménez a a Polytechnic

More information