Implementing a Regular Register in an Eventually Synchronous Distributed System prone to Continuous Churn

Size: px
Start display at page:

Download "Implementing a Regular Register in an Eventually Synchronous Distributed System prone to Continuous Churn"

Transcription

1 1 Implementing a Regular Register in an Eventually Synchronous Distributed System prone to Continuous Churn Roberto Baldoni Silvia Bonomi Michel Raynal Abstract Due to their capability to hide the complexity generated by the messages exchanged between processes, shared objects are one of the main abstractions provided to developers of distributed applications. Implementations of such objects, in modern distributed systems, have to take into account the fact that almost all services, implemented on top of distributed infrastructures, are no longer fully managed due to either their size or their maintenance cost. Therefore, these infrastructures exhibit several autonomic behaviors in order to, for example, tolerate failures and continuous arrival and departure of nodes (churn phenomenon). Among all the shared objects, the register object is a fundamental one. Several protocols have been proposed to build fault resilient registers on top of message-passing system, but, unfortunately, failures are not the only challenge in modern distributed systems and new issues arise from the presence of churn. This paper addresses the construction of a multi-writer/multi-reader regular register in an eventually synchronous distributed system affected by the continuous arrival/departure of participants. In particular, a general protocol implementing a regular register is proposed and feasibility conditions associated with the arrival and departure of the processes are given. The protocol is proved correct under the assumption that a constraint on the churn is satisfied. Index Terms Regular Register, Dynamic Distributed Systems, Churn, Distributed Algorithms. 1 INTRODUCTION Context. Dealing with failures has been one of the main challenges in the construction of real reliable applications able to work in a distributed system. These applications are inherently managed, in the sense that, they implicitly assume the existence of a superior manager (i.e., the application/service provider) that controls processes running the application. The manager does its best to guarantee that assumptions made on the underlying distributed system (e.g. a majority of correct processes) hold along time by activating appropriate reactive or proactive recovery procedures[26]. As an example, the manager can either add new processes when crashes occur or ensure the required degree of synchrony of the underlying distributed platform in terms of processes and communication links. Air traffic control, telecommunication, banking systems and e-government systems are just a few examples of such application domains. In this context, robust abstractions have been defined (shared memory, communication, agreement, etc.) that behave correctly despite asynchrony and failures and that simplify application design and development. When considering protocols implementing such abstractions, in nearly all the cases, the system is always well defined in the sense that the whole set of participating processes is finite and known (directly or transitively) by each process. Universitá La Sapienza, via Ariosto 25, I Roma, Italy Universitá La Sapienza, via Ariosto 25, I Roma, Italy Senior Member, Institut Universitaire de France, IRISA, Université de Rennes, Campus de Beaulieu, F Rennes, France The system composition is modified only when either a process crashes or a new process is added. Therefore, if a process does not crash, it lives for the entire duration of the computation. Motivation. A new challenge is emerging due to the advent of new classes of applications and technologies such as smart environments, sensor networks, mobile systems, peerto-peer systems, cloud computing etc. In these settings, the underlying distributed systems cannot be fully managed but it needs some degree of self-management that depends on the specific application domain. However, it is possible to delineate some common consequences of the presence of such self management: first, there is no entity that can always ensure the validity of the system assumptions during the entire computation and, second, no one knows accurately who joins and who leaves the system at any time introducing a kind of unpredictability in the system composition (this phenomenon of arrival and departure of processes in a system is also known as churn) [6]. As a consequence, distributed computing abstractions have to deal not only with asynchrony and failures, but also with this dynamic dimension where a process that does not crash can leave the system at any time implying that membership can fully change several times during the same computation. Moreover, this dynamic behavior means each process cannot have a precise knowledge on the number of processes composing the system at any given time. Thus, it becomes of primary importance to check under which churn assumption, a protocol implementing a distributed

2 2 computing abstraction is correct. Hence, the abstractions and protocols implementing them have to be reconsidered to take into account this new adversary setting. This selfdefined and continuously evolving distributed system, that we will name in the following dynamic distributed system, makes abstractions more difficult to understand and master than in distributed systems where the set of processes is fixed and known by all participants. The churn notion becomes thus a system parameter whose aim is to make tractable systems having their composition evolving along the time (e.g., [15], [19], [23]). Contribution and roadmap. In this paper, a general churn model that we defined in [5] is considered and used to characterize a dynamic distributed computation where the number of participants change in a given range and the arrival and departure of processes is a non-quiescent phenomenon that depends on join and leave distributions. Such a model places constraints on process arrivals and departures. Specifically the computation size is constrained in a range that is between n 0 k 1 and n 0 + k 2 where n 0 is the number of processes participating in the computation at time t 0 while k 1 and k 2 are two integers greater than or equal to zero that depend on the join and leave distributions. In particular, this paper addresses the problem of deterministically building and maintaining a distributed computation implementing a multiple-writers/multiple-readers regular register. Processes participating in the computation are called active processes. We provide an implementation of a regular register based on a request/reply message pattern and we prove that: any operation issued on the regular register terminates if the number of reply messages needed to perform the operation is at most n 0 k 1 (Lemma 1), and any operation issued on the regular register is valid if the number of reply messages needed to perform the operation is at least (n 0 + k 2 )/2 (Lemma 2). From these two conditions it follows that n 0, k 1 and k 2 cannot be chosen arbitrarily. They are closely related by the condition n 0 > 2k 1 + k 2 (Corollary 1). Let us finally remark that the interest in addressing the regular register abstraction lies in the fact that it is a fundamental notion for building storage systems. Up to now, storage systems that cope with churn ensure regular register consistency criteria in a probabilistic way [2]. The result of this paper gives thus a bound on the churn that a storage system can cope with while still providing deterministic regular consistency guarantee. The paper is structured as follows: Section 2 defines the system model, and in particular Section 2.3 defines the churn model. Section 3 introduces the regular register specification for a dynamic distributed system. Section 4 presents a protocol implementing a regular register and its correctness proof in an eventually synchronous system. Finally, two sections on related work and concluding remarks conclude the paper. 2 SYSTEM MODEL 2.1 Dynamic Distributed System. In a dynamic distributed system, processes may join and leave the system at their will. In order to model processes continuously arriving to and departing from the system, we assume the infinite arrival model (as defined in [24]). The set of processes that can participate in the distributed system, i.e. the distributed system population, is composed by a potentially infinite set of processes Π = {... p i, p j, p k... }, each one having a unique identifier (i.e. its index). However, the distributed system is composed, at each time, by a finite subset of the population. A process enters the distributed system by executing the join System procedure. Such operation aims at connecting the new process to the processes that already belong to the system. A process leaves the system by means of a leave System operation. Processes belonging to the distributed system may fail by crashing before leaving the system; if a process crashes, it stops performing any action. A process that never crashes is said to be correct. In the following we assume the existence of a protocol managing the arrival and the departures of processes from the distributed system; such protocol is also responsible for the connectivity maintenance among processes part of the distributed system. Some examples of topologies and protocols keeping the system connected in a dynamic environment are [16], [17], [20], [27]. The system is eventually synchronous 1, that is after an unknown but finite time the system behaves synchronously [7], [9]. The passage of time is measured by a fictional global clock, represented by integer values, not accessible by processes. Processes belonging to the distributed system communicate by exchanging messages through t either point-to-point reliable channels or broadcast primitives. Both the communication primitives can be characterized by the following property: Eventual Time Delivery: there exists a bound δ, known by processes, and a time t such that any message sent (broadcast) at time t t, is delivered by time t + δ by all the processes that are in the system during the whole interval [t, t + δ]. It is important to notice that processes only know that the time t exists. They never know and nor can deduce or predict when the synchrony period starts. 2.2 Distributed Computation Processes belonging to the distributed system may decide autonomously to join a distributed computation running on top of the system (e.g. a regular register computation). Hence, a distributed computation is composed, at each 1. Sometime also called partially synchronous system.

3 3 leave_computation() leave_system() join_system() leave_system() Distributed Computation (e.g., Regular Register) join_computation() Distributed System based on the definition of two functions (i) the join function λ(t) (defining the join of new processes to the distributed computation with respect to time) and (ii) the leave function µ(t) (defining the leave of processes from the distributed computation with respect to time). Such functions are discrete functions of time. Definition 2: (Join function) The join function λ(t) is a discrete time function that returns the number of processes that invoke the join Computation() operation at time t. Fig. 1. Distributed System and Distributed Computation instant of time, by a subset of processes of the distributed system. A process p i, belonging to the distributed system, that wants to join the distributed computation has to execute the join Computation() operation. Such operation, invoked at some time t, is not instantaneous and it takes time to be executed; how much this time is, depends from the specific implementation provided for the join Computation() operation. However, from time t, the process p i can receive and process messages sent by any other process that participate in the computation. When a process p j, participating in the distributed computation, wishes to leave the computation, it executes the leave Computation operation. Without loss of generality, we assume that if a process leaves the computation and later wishes to re-join, it executes again the join Computation() operation with a new identity. Figure 1 shows the distributed system and the distributed computation layers. It is important to notice that (i) there may exist processes belonging to the distributed system that never join the distributed computation (i.e. they execute the join System() procedure but they never invoke the join Computation() operation) and (ii) there may exist processes that after leaving the distributed computation remain inside the distributed system (i.e. they are correct but they stop to process messages related to the computation). To this aim, it is important to identify the subset of processes that are actively participating in the distributed computation. Definition 1: A process is active in the distributed computation from the time it returns from the join Computation() operation until the time it start executing the leave Computation() operation. A(t) denotes the set of processes that are active at time t, while A([t, t ]) denotes the set of processes that are active during the whole interval [t, t ] (i.e. p i A([t, t ]) iff p i A(τ) for each τ [t, t ]). 2.3 Churn Model Processes may join and leave the distributed computation at any time. To model this activity, we consider the churn model that we introduced in [5]. The model is Definition 3: (Leave function) The leave function µ(t) is a discrete time function that returns the number of processes that invoke the leave Computation() operation at time t. Let t 0 be the starting time of the system. We assume that at time t 0 no process joins or leaves the distributed computation (i.e. λ(t 0 ) = 0 and µ(t 0 ) = 0) and therefore we can say that at t 0 the computation is composed by a set Π 0 of processes and the size of the distributed computation is n 0 (i.e., Π 0 = n 0 ). Moreover, for any time t < t 0 we have λ(t) = µ(t) = 0. The churn is continuous meaning that processes never stop to join and to leave the computation and the following conditions hold. t : τ > t : λ(τ) = 0, and t : τ > t : µ(τ) = 0. As soon as churn starts, the size of the computation and computation membership change. The number of participants of the computation can be calculated as follows. Definition 4: (Node function) Let n 0 be the number of processes participating in the computation at start time t 0. N(t) is the number of processes of the computation at time t for every t t 0 (i.e. N(t) = N(t 1)+λ(t) µ(t), with N(t 0 ) = n 0 ). Based on the previous definitions, let us derive the constraint that a join function and a leave function have to satisfy in order that the distributed computation size remains in a given interval. Note that, such behavior is typical of real applications like peer-to-peer systems, VoIP based application etc. [13], [14]. Let n 0 be the number of processes of the distributed computation at the start time t 0 and k 1, k 2 be two positive integers, the following Lemma (proved in [5]) states the constraints on the join function and the leave function such that the distributed computation size falls in the interval N = [n 0 k 1, n 0 + k 2 ]. Lemma 1: Let k 1 and k 2 be two integers such that k 1, k 2 0 and let n 0 be the number of processes in the distributed computation at starting time t 0. Given a join and leave function λ(t) and µ(t), the node function N(t) falls in the interval N = [n 0 k 1, n 0 + k 2 ] if and only if:

4 4 N(t) n 0 +k 2 n 0 n 0 k 1 t 1 t 2 Fig. 2. Distributed System Size in an interval N = [n 0 k 1, n 0 + k 2 ] (c1) t τ=t 0 µ(τ) t τ=t 0 λ(τ) + k 1 t, (c2) t τ=t 0 µ(τ) t τ=t 0 λ(τ) k 2 t. An example the evolution of the size of a distributed computation along the time is shown in Figure 2. Note that, constraints (c1) and (c2) have to be satisfied independently of the computation. In fact, they just follow from the requirement of having the computation size falling in the range defined by n 0, k 1 and k 2. 3 REGULAR REGISTER IN A DYNAMIC DIS- TRIBUTED SYSTEM. A register is a shared variable accessed by a set of processes through two operations, namely read() and write(). Informally, the write() operation updates the value stored in the shared variable while the read() obtains the value contained in the variable (i.e. the last written value). In case of concurrency while accessing the shared variable, the meaning of last written value becomes ambiguous. Depending on the semantics of the operations, three types of register have been defined by Lamport [18]: safe, regular and atomic. 3.1 Regular Register Computation Processes participating in the distributed computation implement a regular register abstraction. As a specialization of the generic model of the computation defined in Section 2, in the following we consider the existence of a join register operation and of a leave register operation. In particular, in the case of a regular register computation, the aim of the join register operation is to transfer the current value of the register variable to the new process to guarantee the persistence of the value of the register despite churn. The protocol implementing the join register operation is presented in Section 4. We model the leave register operation as an implicit operation; when a process p i leaves the computation it just stops to send and process messages related to the register computation. In this way, it is possible to address the same way (from the register computation point of view) both process failures and process leaves. Thus, in the following, we do not distinguish among voluntary leaves and failures but we refer to both of them as leave. Moreover, to simplify the notation, whenever not strictly necessary, we refer to the join register() operation as join() operation. t 3.2 Operation executions Every operation issued on a register is, generally, not instantaneous and it can be characterized by two events occurring at its boundary: an invocation event and a reply event. These events occur at two time instants (invocation time and reply time) according to the fictional global time. An operation op is complete if both the invocation event and the reply event occur (i.e. the process executing the operation does not crash between the invocation and the reply). Contrary, an operation op is said to be failed if it is invoked by a process that crashes before the reply event occurs. Given two operations op and op, and their invocation event and reply event times (t B (op) and t B (op )) and return times (t E (op) and t E (op )), we say that op precedes op (op op ) iff t E (op) < t B (op ). If op does not precede op and op does not precede op then op and op are concurrent (op op ). Given a write(v) operation, the value v is said to be written when the operation is complete. As a consequence, failed write() operations are incomplete operations. As in [12], we consider that if a process crashes during a write() operation, such write() is concurrent with all the successive operations. 3.3 Multi-reader/Multi-writer Specification The notion of a regular register, as specified in [18], is not directly applicable in a dynamic distributed system like the one presented in the previous section, because it does not consider failures, process joins and leaves. To this aim, we focus on the multi-writer/multi-reader regular register abstraction as defined in [22] and in [25] and we adapt it to consider arrivals and departures of processes. Before introducing the specification, let us introduce the notion of relevant write. Definition 5: A write() operation w is relevant for a read() operation r if: (i) w r or (ii) w : w w r We are now in the position to specify a regular register for a dynamic distributed system. A protocol implements a regular register in a dynamic distributed system if the following properties are satisfied. Termination: If a correct process participating in the computation invokes a read or write operation and does not leave the system, it eventually returns from that operation. Multi-Writer Regularity 1(MWR1): A read operation op returns any of the values written by some write() that is relevant for op. We assumed that each process p i issues either a read() or a write() operation only after it has returned from its join register() operation [4].

5 5 4 REGULAR REGISTER IN EVENTUALLY SYNCHRONOUS DISTRIBUTED SYSTEMS In [5] we presented an implementation of the regular register for a synchronous distributed system. Such implementation is based on the following considerations: (i) the join register() operation is executed once from each process and (ii) read() and write() operations are executed frequently. This led us to design a protocol having local read and fast write operations, by exploiting the synchrony of the communication. Moving to an eventually synchronous system, read() operations are no longer local. They indeed require to gather information from a certain number of active processes in the system in order to retrieve the last written value. Hence, the price to pay for not relying on synchrony is that read() operations cannot be local anymore. This section presents a protocol implementing a regular register in an eventually synchronous distributed system with continuous churn and where the number of processes participating in the distributed computation is alway in the range [n 0 k 1, n 0 + k 2 ]. To master the absence of synchrony assumptions holding at each time, the protocol implements join register(), read() and write() operations involving all the processes belonging to the computation. The basic idea behind join register() and read() operation is to have two phases: (i) the process issuing the operation broadcasts an INQUIRY message, then waits until it receives enough replies to confirm that the operation has been processed by enough processes; (ii) the process helps other processes that join the computation concurrently to terminate the operation by sending them the updated value. Concerning the write() operation, the basic idea is that the writer broadcasts a WRITE message and then just waits until it receives enough acknowledgments for such operation. In the following section, we provide the details of the protocols implementing such operations. 4.1 Protocol Each process p i maintains the following local variables. Two variables denoted register i and sn i, such that register i is the local copy of the register, while sn i is the sequence number of the last write operation that updated register i. A boolean active i, initialized to false. It flips to true just after p i has joined the regular register computation. Two set variables, denoted replies i and reply to i. The first one is used both in the join register() operation and in the read() operation while reply to i is used only during the join period. The local variable replies i contains the 3-uples < id, value, sn > that p i has received from other processes, while reply to i contains the processes that are joining the regular register computation concurrently with p i. read sn i is a sequence number used by p i to timestamp its read requests. The value read sn i equal to zero is used by the join operation. reading i is boolean whose value is true when p i is reading. write ack i is a set used by p i (when it writes a new value) to store identifiers of processes that have acknowledged p i s last write. while p i is joining the distributed computation, dl prev i is a set where p i stores identifiers of processes that have acknowledged p i s inquiry message while these processes were not yet active (so, these processes were joining the computation too) or while they are reading. When it terminates its join operation, p i has to send them a reply to prevent them from being blocked forever. The join register() operation. The protocol implementing this operation is described in Figure 3. After having initialized its local variables, p i broadcasts an INQUIRY(i, read sn i ) message to inform the other processes that it wants to obtain the value of the regular register (line 04, as indicated read sn i is then equal to 0). Then, after it has received a number C of replies (line 05) 2, p i updates its local pair (register i, sn i ) (lines 06-07), becomes active (line 08), and sends a reply to the processes in the set reply to i (line 09-11). It sends such a reply message also to the processes in its set dl prev i to prevent them from waiting forever (see proof of Lemma 3). In addition to the triple < i, register i, sn i >, a reply message sent to a process p j, from a process p i, carries also the read sequence number r sn that identifies the corresponding request issued by p j. When a process p i delivers a message INQUIRY(j), it answers p j sending back a REPLY(< i, register i, sn i >) message containing its local variable. If p i is active and reading (line 15), it also sends a DL PREV() message to p j (line 17); this is required in order that p j sends to p i the value p j has obtained when it terminated its join operation. If p i is not yet active, it postpones its answer until it becomes active (line 19 and lines 09-11) and it sends a DL PREV message (line 20). When p i delivers a REPLY(< j, value, sn >, r sn) message from a process p j, if the reply message is an answer to its INQUIRY(i, read sn) message (line 23), p i adds < j, value, sn > to the set of replies it has received so far and it sends back an ACK(i, r sn) message to p j (lines 24-25). Finally, when p i delivers a message DL PREV(j, r sn), it adds its content to the set dl prev i (line 28), in order to remember that it has to send a reply to p j when it will become active (lines 09-10). The read() operation. A read is a simplified version of the join operation 3. Hence, the code of the read() operation, 2. In the correctness proofs section we will compute the value of C that allows any operation to terminate and be valid. 3. As indicated before, the read identified (i, 0) is the join register() operation issued by p i.

6 6 operation join register(i): (01) register i ; sn i 1; active i false; (02) reading i false; replies i ; reply to i ; (03) write ack i ; dl prev i ; read sn i 0; (04) (05) broadcast INQUIRY(i, read sn i); wait until` replies i > C); (06) let < id, val, sn > replies i such that ( <,, sn > replies i : sn sn ); (07) if (sn > sn i) then sn i sn; register i val end if (08) active i true; (09) for each < j, r sn > reply to i dl prev i do (10) send REPLY (< i, register i, sn i >, r sn) to p j (11) end for; (12) return(ok). (13) when INQUIRY(j, r sn) is delivered: (14) if (active i) (15) then send REPLY (< i, register i, sn i >, r sn) to p j (16) if (reading i) then (17) send DL PREV (i, r sn) to p j (18) end if; (19) else reply to i reply to i {< j, r sn >}; (20) send DL PREV (i, r sn) to p j (21) end if. (22) when REPLY(< j, value, sn >, r sn) is delivered: (23) if ((r sn = read sn i) then (24) replies i replies i {< j, value, sn >}; (25) send ACK (i, r sn) to p j (26) end if. (27) when DL PREV(j, r sn) is delivered: (28) dl prev i dl prev i {< j, r sn >}. Fig. 3. The join register() protocol (code for p i ) described in Figure 4, is a simplified version of the code of the join register() operation. Each read invocation is identified by a pair made up of the process index i and a sequence number read sn i (line 03). p i first broadcasts a read request READ(i, read sn i ). Then, after it has received C replies, p i selects the one with the greatest sequence number, updates (if needed) its local pair (register i, sn i ), and returns the value of register i. When p i delivers a message READ(j, r sn) while being active (line 09). If it is joining the system, p i stores the p j s identifier to remember that p i has to send back a reply to p j when p i will terminate the join operation (line 11). operation read(i): (01) read sn i read sn i + 1; (02) replies i ; reading i true; (03) broadcast READ(i, read sn i); (04) wait until( replies i > C); (05) let < id, val, sn > replies i such that ( <,, sn > replies i : sn sn ); (06) if (sn > sn i) then sn i sn; register i val end if; (07) reading i false; return(register i). (08) when READ(j, r sn) is delivered: (09) if (active i) (10) then send REPLY (< i, register i, sn i >, r sn) to p j (11) else reply to i reply to i {< j, r sn >} (12) end if. Fig. 4. The read() protocol (code for p i ) The write() operation. The code of the write operation is described in Figure 5. Let us recall that it is assumed that a single process at a time issues a write. When a process p i wants to write, it issues first a read operation in order to obtain the sequence number associated with the last value written (line 01) 4. Then, after it has broadcast the WRITE(i, < v, sn i >) message to disseminate the new value and its sequence number to the other processes (line 04), p i waits until it has received C acknowledgments. When this happens, it terminates the write operation by returning the control value ok (line 05). When a message WRITE(j, < val, sn >) is delivered, p i takes into account the pair (val, sn) if it is more up-to-date than its current pair (line 08). In all cases, it sends back to the sender p j a message ACK (i, sn) to terminate its write operation (line 09). When an ACK (j, sn) message is delivered, p i adds it to its set write ack i if this message is an answer to its last write (line 11). operation write(v): (01) read(i); (02) sn i sn i + 1; register i v; (03) write ack i ; (04) broadcast WRITE(i, < v, sn i >); (05) wait until( write ack i > C); (06) return(ok). (07) when WRITE(j, < val, sn >) is delivered: (08) if (sn > sn i) then register i val; sn i sn end if; (09) send ACK (i, sn) to p j. (10) when ACK(j, sn) is delivered: (11) if (sn = sn i) then write ack i write ack i {j} end if. Fig. 5. The write() protocol (code for p i ) Due to lack of space, we omit thee correctness proofs that can be found in the supplementary material. 5 RELATED WORK Several works have been done recently with the aim to address the implementation of concurrent data structures on wired message passing dynamic systems (e.g., [1], [4], [8], [11], [21]). In [21], a Reconfigurable Atomic Memory for Basic Object (RAMBO) is presented. RAMBO works in a distributed system where processes can join and fail during the execution of the algorithm. To guarantee reliability of data, in spite of network changes, RAMBO replicates data at several network location and defines configuration to manage small and transient changes. Each configuration is composed by a set of members, a set of read-quorum and a set of writequorums. In order to manage large changes to the set of participant process, RAMBO defines a reconfiguration procedure whose aim is to move from an existing configuration to a new one where the set of members, read-quorum or write-quorum are modified. In order to ensure atomicity, the reconfiguration procedure is implemented by a distributed consensus algorithm that makes all the processes agree on the same successive configurations. Therefore, RAMBO cannot be implemented in a fully asynchronous system. It 4. In absence of concurrent write operations, this read obtains the greatest sequence number. The same strategy is used in protocols implementing atomic registers (e.g., [3], [10]).

7 7 is important to note that in RAMBO the notion of churn is abstracted by defining a sequence of configurations. Note that, RAMBO poses some constraints on the removal of old configurations and in particular, a certain configuration S cannot be removed until each operation, executed by processes belonging S, is ended; as a consequence, many old configurations may take long time to be removed. [11] and [8] presents some improvements to the original RAMBO protocol and in particular to its reconfiguration mechanism. In [11] the reconfiguration protocol has been changed by parallelizing new configuration installations and the removal of an arbitrary number of old configurations. In [8], the authors present a mechanism that combines the features of RAMBO and the underling consensus algorithm to speed up the reconfiguration and reduce the time during which old configurations are accessible. In [1] Aguilera et al. show that an atomic register can be realized without consensus and, thus, on a fully asynchronous distributed system provided that the number of reconfiguration operations is finite and thus the churn is quiescent (i.e., there exists a finite time after which there are no more joins or failures). Configurations are managed by taking into account all the changes (i.e. join and failure of processes) suggested by the participant and the quorums are represented by any majority of processes. To ensure liveness of read and write operations, the authors assume that the number of reconfigurations is finite and that there is a majority of correct processes in each reconfiguration. In [4], we presented an implementation of a regular register in an eventually synchronous distributed systems prone to continuous churn. Contrarily to what has been presented in this paper, [4] assumes the size of the distributed system is constant (i.e., at any instant of time the same number of processes join and leave the distributed system). In particular, we have shown that if the distributed system size n does not change, a regular register implementation can be done if at any time at least n 2 active processes participate in the regular register implementation and no constraint is given on the value n. The same paper shows that no regular register can be implemented in a fully asynchronous system in presence of continuous churn. Let us finally remark that the result presented in [4] can be seen as a particular case of the result presented in the previous section when considering k 1 = k 2 = 0 and n 0 c processes (where c is a percentage of nodes) invokes the join operation and n 0 c processes leave the system at every time unit (i.e., λ(t) = µ(t) = n 0 c). Figure 6 summarizes the system model assumptions and the assumption on the constraints on processes employed by different algorithms. Note that churn-quiescent implementations (e.g., [1], [8], [11], [21]) do not explicitly use the notion of active processes (they instead use the notion of correct process; it is however possible to consider an active process as being a correct process.). The other direction is not true because a correct process does not pass through a join operation. This is a consequence of the fact that churn-quiescent implementations do not separate distributed system from distributed computation. [9,12,23] Fig. 6. Register in Dynamic Systems prone to Churn 6 CONCLUSION In modern distributed systems the notion of processes continuously departing and joining the system (churn) is actually part of the system model and creates additional unpredictability to be mastered by distributed applications. As an example churn creates condition of consistency violations in large scale storage systems and the probability of consistency violations usually increases as the churn increases. This is why such storage system does not provide any deterministic consistency guarantees (e.g. regular or atomic registers). Hence, there is the need to capture the churn of a dynamic system through tractable realistic models in order to pave the way to distributed applications whose correctness can be formally proved. This paper, based on a generic churn model defined in [5] has presented the implementation of a single writer/multiple reader regular register in such a model. It has been formally proved that a regular register can be implemented in an eventually synchronous distributed system if, at any time, the number of active processes is greater than n0+k2 2, the number of processes in the distributed system remains between n 0 k 1 and n 0 + k 2 (where n 0 is the number of processes in the system at time t 0 ) and n 0 is greater than 2k 1 + k 2. Interestingly, this implementation has shown in a precise way that, when one wants to implement a shared register in a dynamic system there is a tradeoff relating the acceptable degree of churn and the synchrony of the underlying system (namely, the churn has to decrease when one wants to go from a synchronous system to an eventually synchronous system). ACKNOWLEDGEMENTS The authors want to thank the anonymous reviewers for their comments that greatly imporoved content and presentation of the paper. The work has been partially supported by the STREP EU project SM4ALL and the IP EU project SOFIA.

8 8 REFERENCES [1] Aguilera M. K., Keidar I., Malkhi D., Shraer A. Dynamic atomic storage without consensus in proceedings of 28 th Annual ACM Symposium on Principles of Distributed Computing (PODC) 2009, [2] Anderson, E. and Li, X. and Shah, M. A. and Tucek, J. and Wylie, J. J. What consistency does your key-value store actually provide? (To Appear) in proceedings of 6th Workshop on Hot Topics in System Dependability (HotDep) [3] Attiya H., Bar-Noy A. and Dolev D., Sharing Memory Robustly in Message-Passing Systems. JACM, 42(1): , [4] Baldoni R., Bonomi S., Kermarrec A.M., Raynal M., Implementing a Register in a Dynamic Distributed System. in Proc. 29th IEEE Int l Conference on Distributed Computing Systems (ICDCS 09), IEEE Computer Society Press, Montreal (Canada), June [5] Baldoni R., Bonomi S., Raynal M. Regular Register: an Implementation in a Churn Prone Environment. 16th International Colloquium on Structural Information and Communication Complexity (SIROCCO), Springer-Verlag #5869, pp , [6] Baldoni R., and Shvartsman, A. A., Theoretical aspects of dynamic distributed systems: report on the workshop. SIGACT News, 40(4):87-89, [7] Chandra T. and Toueg S., Unreliable Failure Detectors for Reliable Distributed Systems. JACM, 43(2): , [8] Chockler G., Gilbert S., Gramoli V., Musial P. M. and Shvartsman A. Reconfigurable distributed storage for dynamic networks Journal Parallel Distributed Computing 69(1): (2009) [9] Dwork C., Lynch N. and Stockmeyer L., Consensus in the Presence of Partial Synchrony. JACM, 35(2): , [10] Friedman R., Raynal M. and Travers C., Abstractions for Implementing Atomic Objects in Distributed Systems. 9th Int l Conference on Principles of Distributed Systems (OPODIS 05), LNCS #3974, pp , [11] Gilbert S., Lynch N., and Shvartsman A. RAMBO II: Rapidly Reconfigurable Atomic Memory for Dynamic Networks in proceeding of International Conference on Dependable Systems and Networks (DSN 2003). [12] Guerraoui, R. and Levy, R. R. and Pochon, B. and Pugh, J. The collective memory of amnesic processes ACM Transactions on Algorithms 4(1): (2008) [13] Godfrey B., Shenker S., Stoica I., Minimizing churn in distributed systems. Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM), , [14] Guha S.,Daswani N. and Jain R. An experimental study of the skype peer-to-peer voip system In Proceeding of he 5th International Workshop on Peer-to-Peer Systems (IPTPS), 2006 [15] Ko S., Hoque I. and Gupta I., Using Tractable and Realistic Churn Models to Analyze Quiescence Behavior of Distributed Protocols. Proc. 27th IEEE Int l Symposium on Reliable Distributed Systems (SRDS 08), [16] Kuhn F., Schmid S., Wattenhofer R. A Self-repairing Peer-to-Peer System Resilient to Dynamic Adversarial Churn. in Proceeding of 4th International Workshop on Peer-to-Peer Systems (IPTPS) [17] Kuhn F., Schmid S., Smit J., Wattenhofer R. A Blueprint for Constructing Peer-to-Peer Systems Robust to Dynamic Worst-Case Joins and Leaves in Proceeding of 14th IEEE International Workshop on Quality of Service (IWQoS) 2006 [18] Lamport. L., On Interprocess Communication, Part 1: Models, Part 2: Algorirhms. Distributed Computing, 1(2):77-101, [19] Liben-Nowell D., Balakrishnan H., and Karger D.R., Analysis of the Evolution of Peer-to-peer Systems. 21th ACM Symp. PODC, ACM press, pp , [20] Liben-Nowell D., Karger D. R., Kaashoek M. F., Dabek F., Balakrishnan H. Stoica I. and Morris R. Chord: A Scalable Peerto-peer Lookup Protocol for Internet Applications. in IEEE/ACM Transactions on Networking, 11(1): (2003). [21] Lynch, N. and Shvartsman A., RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks. Proc. 16th Int l Symposium on Distributed Computing (DISC 02), Springer-Verlag LNCS #2508, pp , [22] Malkhi D. and Reiter M. K. Byzantine Quorum Systems, Distributed Computing 11(4): (1998) [23] Mostefaoui A., Raynal M., Travers C., Peterson S., El Abbadi, Agrawal D., From Static Distributed Systems to Dynamic Systems. 24th IEEE Symposium on Reliable Distributed Systems (SRDS 05), IEEE Computer Society Press, pp , [24] Merritt M. and Taubenfeld G., Computing with Infinitely Many Processes. Proc. 14th Int l Symposium on Distributed Computing (DISC 00), LNCS #1914, pp , [25] Shao C., Pierce E. and Welch J., Multi-writer consistency conditions for shared memory objects. Proc. 17th Int l Symposium on Distributed Computing (DISC 03), Springer-Verlag, LNCS #2848, pp , [26] Sousa P., Bessani A. N., Correia M, Ferreira Neves N., and Verssimo P. Highly Available Intrusion-Tolerant Services with Proactive- Reactive Recovery IEEE Transaction on Parallel and Distributed Systems 21(4): pp (2010) [27] Voulgaris S., Gavidia D., and van Steen M. CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays. Journal of Network and Systems Management. 13(2): (2005) Roberto Baldoni is Professor at the University of Rome La Sapienza where he leads the Distributed Systems group and the MIDLAB Laboratory. His research interests include distributed computing, dependable and secure distributed systems, distributed information systems and distributed event based processing. Roberto s research at the University of Rome has been funded along the years by the European Commission, the Ministry of Italian Research, IBM, Microsoft, Finmeccanica and Telecom Italia. In 2010, he received the Science2business Award and the IBM Faculty Award. Roberto is author of around 150 research papers from theory to practice of distributed systems. Roberto belongs to the Steering Committee of ACM DEBS that he chaired in 2008 and he is a member of ACM, IEEE and of the IFIP WG Silvia Bonomi is a PhD in Computer Science at the University of Rome La Sapienza. She is doing research on various computerscience fields including dynamic distributed systems and event-based systems. In these research fields, she published several papers in peer reviewed scientic forums. As a part of the MIDLAB research group, she is currently involved in an EU-funded project dealing with energy saving in private and public buildings (GreenerBuildings project) and she worked on dependable distributed systems (ReSIST network of excellence) and on the definition of new semantic tools for e-government (SemanticGov). Michel Raynal is a professor of computer science at the University of Rennes, France. His main research interests are the basic principles of distributed computing systems. He is a world leading researcher in the domain of distributed computing. He is the author of numerous papers on distributed computing (more than 120 in journals and 250 papers in int l conferences) and is wellknown for his distributed algorithms and his (9) books on distributed computing. He has chaired the program committee of the major conferences on the topic (e.g., ICDCS, DISC, SIROCCO, and OPODIS). He has also served on the program committees of many international conferences, and is the recipient of several Best Paper awards (ICDCS 1999, 2000 and 2001, SSS 2009, Europar 2010). He has been invited by many universities all over the world to give lectures on distributed computing. His h-index is 45. He has recently written two books published by Morgan & Clayppool: Communication and Agreement Abstractions for Fault-Tolerant Asynchronous Distributed Systems (June 2010) and Fault-Tolerant Agreement in Synchronous Distributed Systems (September 2010). Since 2010, Michel Raynal is a senior member to the prestigious Institut Universitaire de France

Multi-writer Regular Registers in Dynamic Distributed Systems with Byzantine Failures

Multi-writer Regular Registers in Dynamic Distributed Systems with Byzantine Failures Multi-writer Regular Registers in Dynamic Distributed Systems with Byzantine Failures Silvia Bonomi, Amir Soltani Nezhad Università degli Studi di Roma La Sapienza, Via Ariosto 25, 00185 Roma, Italy bonomi@dis.uniroma1.it

More information

Implementing a Register in a Dynamic Distributed System

Implementing a Register in a Dynamic Distributed System Implementing a Register in a Dynamic Distributed System Roberto Baldoni, Silvia Bonomi, Anne-Marie Kermarrec, Michel Raynal Sapienza Università di Roma, Via Ariosto 25, 85 Roma, Italy INRIA, Univeristé

More information

Silvia Bonomi. Implementing Distributed Computing Abstractions in the presence of Churn

Silvia Bonomi. Implementing Distributed Computing Abstractions in the presence of Churn N o d ordre : 4041 ANNÉE 2010 THÈSE / UNIVERSITÉ DE RENNES 1 sous le sceau de l Université Européenne de Bretagne pour le grade de DOCTEUR DE L UNIVERSITÉ DE RENNES 1 Mention : Informatique Ecole doctorale

More information

Implementing a Register in a Dynamic Distributed System

Implementing a Register in a Dynamic Distributed System Implementing a Register in a Dynamic Distributed System Roberto Baldoni, Silvia Bonomi, Anne-Marie Kermarrec, Michel Raynal To cite this version: Roberto Baldoni, Silvia Bonomi, Anne-Marie Kermarrec, Michel

More information

Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev

Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev Eric Ruppert, York University, www.cse.yorku.ca/ ruppert INDEX TERMS: distributed computing, shared memory,

More information

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment Michel RAYNAL IRISA, Campus de Beaulieu 35042 Rennes Cedex (France) raynal @irisa.fr Abstract This paper considers

More information

Wait-Free Regular Storage from Byzantine Components

Wait-Free Regular Storage from Byzantine Components Wait-Free Regular Storage from Byzantine Components Ittai Abraham Gregory Chockler Idit Keidar Dahlia Malkhi July 26, 2006 Abstract We consider the problem of implementing a wait-free regular register

More information

Generating Fast Indulgent Algorithms

Generating Fast Indulgent Algorithms Generating Fast Indulgent Algorithms Dan Alistarh 1, Seth Gilbert 2, Rachid Guerraoui 1, and Corentin Travers 3 1 EPFL, Switzerland 2 National University of Singapore 3 Université de Bordeaux 1, France

More information

A Timing Assumption and a t-resilient Protocol for Implementing an Eventual Leader Service in Asynchronous Shared Memory Systems

A Timing Assumption and a t-resilient Protocol for Implementing an Eventual Leader Service in Asynchronous Shared Memory Systems A Timing Assumption and a t-resilient Protocol for Implementing an Eventual Leader Service in Asynchronous Shared Memory Systems Antonio FERNÁNDEZ y Ernesto JIMÉNEZ z Michel RAYNAL? Gilles TRÉDAN? y LADyR,

More information

Consensus in Byzantine Asynchronous Systems

Consensus in Byzantine Asynchronous Systems Consensus in Byzantine Asynchronous Systems R. BALDONI Universitá La Sapienza, Roma, Italy J.-M. HÉLARY IRISA, Université de Rennes 1, France M. RAYNAL IRISA, Université de Rennes 1, France L. TANGUY IRISA,

More information

Specifying and Proving Broadcast Properties with TLA

Specifying and Proving Broadcast Properties with TLA Specifying and Proving Broadcast Properties with TLA William Hipschman Department of Computer Science The University of North Carolina at Chapel Hill Abstract Although group communication is vitally important

More information

Fork Sequential Consistency is Blocking

Fork Sequential Consistency is Blocking Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer May 14, 2008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage

More information

Fork Sequential Consistency is Blocking

Fork Sequential Consistency is Blocking Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer Novembe4, 008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage

More information

A General Characterization of Indulgence

A General Characterization of Indulgence A General Characterization of Indulgence R. Guerraoui 1,2 N. Lynch 2 (1) School of Computer and Communication Sciences, EPFL (2) Computer Science and Artificial Intelligence Laboratory, MIT Abstract. An

More information

Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value

Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value Signature-Free Broadcast-Based Intrusion Tolerance: Never Decide a Byzantine Value Achour Mostéfaoui and Michel Raynal IRISA, Université de Rennes 1, 35042 Rennes, France {achour,raynal}@irisa.fr Abstract.

More information

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Yong-Hwan Cho, Sung-Hoon Park and Seon-Hyong Lee School of Electrical and Computer Engineering, Chungbuk National

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

Asynchronous Reconfiguration for Paxos State Machines

Asynchronous Reconfiguration for Paxos State Machines Asynchronous Reconfiguration for Paxos State Machines Leander Jehl and Hein Meling Department of Electrical Engineering and Computer Science University of Stavanger, Norway Abstract. This paper addresses

More information

Ruminations on Domain-Based Reliable Broadcast

Ruminations on Domain-Based Reliable Broadcast Ruminations on Domain-Based Reliable Broadcast Svend Frølund Fernando Pedone Hewlett-Packard Laboratories Palo Alto, CA 94304, USA Abstract A distributed system is no longer confined to a single administrative

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer

Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors Michel Raynal, Julien Stainer Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors

More information

Optimistic Erasure-Coded Distributed Storage

Optimistic Erasure-Coded Distributed Storage Optimistic Erasure-Coded Distributed Storage Partha Dutta IBM India Research Lab Bangalore, India Rachid Guerraoui EPFL IC Lausanne, Switzerland Ron R. Levy EPFL IC Lausanne, Switzerland Abstract We study

More information

Consensus. Chapter Two Friends. 8.3 Impossibility of Consensus. 8.2 Consensus 8.3. IMPOSSIBILITY OF CONSENSUS 55

Consensus. Chapter Two Friends. 8.3 Impossibility of Consensus. 8.2 Consensus 8.3. IMPOSSIBILITY OF CONSENSUS 55 8.3. IMPOSSIBILITY OF CONSENSUS 55 Agreement All correct nodes decide for the same value. Termination All correct nodes terminate in finite time. Validity The decision value must be the input value of

More information

Reconfigurable Distributed Storage for Dynamic Networks

Reconfigurable Distributed Storage for Dynamic Networks Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler 1,2, Seth Gilbert 1, Vincent Gramoli 3,4, Peter M Musial 3, and Alexander A Shvartsman 1,3 1 CSAIL, Massachusetts Institute of Technology,

More information

Fault Resilience of Structured P2P Systems

Fault Resilience of Structured P2P Systems Fault Resilience of Structured P2P Systems Zhiyu Liu 1, Guihai Chen 1, Chunfeng Yuan 1, Sanglu Lu 1, and Chengzhong Xu 2 1 National Laboratory of Novel Software Technology, Nanjing University, China 2

More information

Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications

Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications Pawel Jurczyk and Li Xiong Emory University, Atlanta GA 30322, USA {pjurczy,lxiong}@emory.edu Abstract. The continued advances

More information

Communication-Efficient Probabilistic Quorum Systems for Sensor Networks (Preliminary Abstract)

Communication-Efficient Probabilistic Quorum Systems for Sensor Networks (Preliminary Abstract) Communication-Efficient Probabilistic Quorum Systems for Sensor Networks (Preliminary Abstract) Gregory Chockler Seth Gilbert Boaz Patt-Shamir chockler@il.ibm.com sethg@mit.edu boaz@eng.tau.ac.il IBM Research

More information

A Mechanism for Sequential Consistency in a Distributed Objects System

A Mechanism for Sequential Consistency in a Distributed Objects System A Mechanism for Sequential Consistency in a Distributed Objects System Cristian Ţăpuş, Aleksey Nogin, Jason Hickey, and Jerome White California Institute of Technology Computer Science Department MC 256-80,

More information

Etna: a fault-tolerant algorithm for atomic mutable DHT data

Etna: a fault-tolerant algorithm for atomic mutable DHT data Etna: a fault-tolerant algorithm for atomic mutable DHT data Athicha Muthitacharoen Seth Gilbert Robert Morris athicha@lcs.mit.edu sethg@mit.edu rtm@lcs.mit.edu MIT Laboratory for Computer Science 200

More information

Consensus in the Presence of Partial Synchrony

Consensus in the Presence of Partial Synchrony Consensus in the Presence of Partial Synchrony CYNTHIA DWORK AND NANCY LYNCH.Massachusetts Institute of Technology, Cambridge, Massachusetts AND LARRY STOCKMEYER IBM Almaden Research Center, San Jose,

More information

Oh-RAM! One and a Half Round Atomic Memory

Oh-RAM! One and a Half Round Atomic Memory Oh-RAM! One and a Half Round Atomic Memory Theophanis Hadjistasi Nicolas Nicolaou Alexander Schwarzmann July 21, 2018 arxiv:1610.08373v1 [cs.dc] 26 Oct 2016 Abstract Emulating atomic read/write shared

More information

An Implementation of Causal Memories using the Writing Semantic

An Implementation of Causal Memories using the Writing Semantic An Implementation of Causal Memories using the Writing Semantic R. Baldoni, C. Spaziani and S. Tucci-Piergiovanni D. Tulone Dipartimento di Informatica e Sistemistica Bell-Laboratories Universita di Roma

More information

arxiv: v1 [cs.dc] 13 May 2017

arxiv: v1 [cs.dc] 13 May 2017 Which Broadcast Abstraction Captures k-set Agreement? Damien Imbs, Achour Mostéfaoui, Matthieu Perrin, Michel Raynal, LIF, Université Aix-Marseille, 13288 Marseille, France LINA, Université de Nantes,

More information

On the interconnection of message passing systems

On the interconnection of message passing systems Information Processing Letters 105 (2008) 249 254 www.elsevier.com/locate/ipl On the interconnection of message passing systems A. Álvarez a,s.arévalo b, V. Cholvi c,, A. Fernández b,e.jiménez a a Polytechnic

More information

RAMBO: A Robust, Reconfigurable Atomic Memory Service for Dynamic Networks

RAMBO: A Robust, Reconfigurable Atomic Memory Service for Dynamic Networks RAMBO: A Robust, Reconfigurable Atomic Memory Service for Dynamic Networks Seth Gilbert EPFL, Lausanne, Switzerland seth.gilbert@epfl.ch Nancy A. Lynch MIT, Cambridge, USA lynch@theory.lcs.mit.edu Alexander

More information

Consensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS

Consensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS 16 CHAPTER 2. CONSENSUS Agreement All correct nodes decide for the same value. Termination All correct nodes terminate in finite time. Validity The decision value must be the input value of a node. Chapter

More information

ACONCURRENT system may be viewed as a collection of

ACONCURRENT system may be viewed as a collection of 252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 3, MARCH 1999 Constructing a Reliable Test&Set Bit Frank Stomp and Gadi Taubenfeld AbstractÐThe problem of computing with faulty

More information

Time-related replication for p2p storage system

Time-related replication for p2p storage system Seventh International Conference on Networking Time-related replication for p2p storage system Kyungbaek Kim E-mail: University of California, Irvine Computer Science-Systems 3204 Donald Bren Hall, Irvine,

More information

BYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193

BYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193 BYZANTINE AGREEMENT by H. R. Strong and D. Dolev IBM Research Laboratory, K55/281 San Jose, CA 95193 ABSTRACT Byzantine Agreement is a paradigm for problems of reliable consistency and synchronization

More information

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel.

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel. Research Statement Yehuda Lindell Dept. of Computer Science Bar-Ilan University, Israel. lindell@cs.biu.ac.il www.cs.biu.ac.il/ lindell July 11, 2005 The main focus of my research is the theoretical foundations

More information

Self-stabilizing Byzantine Digital Clock Synchronization

Self-stabilizing Byzantine Digital Clock Synchronization Self-stabilizing Byzantine Digital Clock Synchronization Ezra N. Hoch, Danny Dolev and Ariel Daliot The Hebrew University of Jerusalem We present a scheme that achieves self-stabilizing Byzantine digital

More information

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Appears as Technical Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science, June 1999 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Miguel Castro and Barbara Liskov

More information

A Search Theoretical Approach to P2P Networks: Analysis of Learning

A Search Theoretical Approach to P2P Networks: Analysis of Learning A Search Theoretical Approach to P2P Networks: Analysis of Learning Nazif Cihan Taş Dept. of Computer Science University of Maryland College Park, MD 2742 Email: ctas@cs.umd.edu Bedri Kâmil Onur Taş Dept.

More information

Anonymous Agreement: The Janus Algorithm

Anonymous Agreement: The Janus Algorithm Anonymous Agreement: The Janus Algorithm Zohir Bouzid 1, Pierre Sutra 1, and Corentin Travers 2 1 University Pierre et Marie Curie - Paris 6, LIP6-CNRS 7606, France. name.surname@lip6.fr 2 LaBRI University

More information

Distributed Systems (5DV147)

Distributed Systems (5DV147) Distributed Systems (5DV147) Fundamentals Fall 2013 1 basics 2 basics Single process int i; i=i+1; 1 CPU - Steps are strictly sequential - Program behavior & variables state determined by sequence of operations

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS Mr. M. Raghu (Asst.professor) Dr.Pauls Engineering College Ms. M. Ananthi (PG Scholar) Dr. Pauls Engineering College Abstract- Wireless

More information

Consensus in Asynchronous Distributed Systems: A Concise Guided Tour

Consensus in Asynchronous Distributed Systems: A Concise Guided Tour Consensus in Asynchronous Distributed Systems: A Concise Guided Tour Rachid Guerraoui 1, Michel Hurfin 2, Achour Mostefaoui 2, Riucarlos Oliveira 1, Michel Raynal 2, and Andre Schiper 1 1 EPFL, Département

More information

System models for distributed systems

System models for distributed systems System models for distributed systems INF5040/9040 autumn 2010 lecturer: Frank Eliassen INF5040 H2010, Frank Eliassen 1 System models Purpose illustrate/describe common properties and design choices for

More information

Fault-Tolerant Distributed Consensus

Fault-Tolerant Distributed Consensus Fault-Tolerant Distributed Consensus Lawrence Kesteloot January 20, 1995 1 Introduction A fault-tolerant system is one that can sustain a reasonable number of process or communication failures, both intermittent

More information

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics System Models Nicola Dragoni Embedded Systems Engineering DTU Informatics 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models Architectural vs Fundamental Models Systems that are intended

More information

Mastering Agreement Problems in Distributed Systems

Mastering Agreement Problems in Distributed Systems focus fault tolerance Mastering Agreement Problems in Distributed Systems Michel Raynal, IRISA Mukesh Singhal, Ohio State University Overcoming agreement problems in distributed systems is a primary challenge

More information

Dynamic Atomic Storage Without Consensus

Dynamic Atomic Storage Without Consensus Dynamic Atomic Storage Without Consensus Marcos K. Aguilera Idit Keidar Dahlia Malkhi Alexander Shraer June 2, 2009 Abstract This paper deals with the emulation of atomic read/write (R/W) storage in dynamic

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Distrib. Comput. 69 (2009) 100 116 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Reconfigurable distributed storage for

More information

Towards Scalable and Robust Overlay Networks

Towards Scalable and Robust Overlay Networks Towards Scalable and Robust Overlay Networks Baruch Awerbuch Department of Computer Science Johns Hopkins University Baltimore, MD 21218, USA baruch@cs.jhu.edu Christian Scheideler Institute for Computer

More information

Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast

Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast HariGovind V. Ramasamy Christian Cachin August 19, 2005 Abstract Atomic broadcast is a communication primitive that allows a group of

More information

System Models for Distributed Systems

System Models for Distributed Systems System Models for Distributed Systems INF5040/9040 Autumn 2015 Lecturer: Amir Taherkordi (ifi/uio) August 31, 2015 Outline 1. Introduction 2. Physical Models 4. Fundamental Models 2 INF5040 1 System Models

More information

Coordination and Agreement

Coordination and Agreement Coordination and Agreement Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Introduction 2. Distributed Mutual Exclusion 3. Elections 4. Multicast Communication 5. Consensus and related problems

More information

R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch

R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch - Shared Memory - R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch R. Guerraoui 1 The application model P2 P1 Registers P3 2 Register (assumptions) For presentation simplicity, we assume

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

Semi-Passive Replication in the Presence of Byzantine Faults

Semi-Passive Replication in the Presence of Byzantine Faults Semi-Passive Replication in the Presence of Byzantine Faults HariGovind V. Ramasamy Adnan Agbaria William H. Sanders University of Illinois at Urbana-Champaign 1308 W. Main Street, Urbana IL 61801, USA

More information

Process groups and message ordering

Process groups and message ordering Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create ( name ), kill ( name ) join ( name, process ), leave

More information

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Announcements.  me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris Announcements Email me your survey: See the Announcements page Today Conceptual overview of distributed systems System models Reading Today: Chapter 2 of Coulouris Next topic: client-side processing (HTML,

More information

Emulating Shared-Memory Do-All Algorithms in Asynchronous Message-Passing Systems

Emulating Shared-Memory Do-All Algorithms in Asynchronous Message-Passing Systems Emulating Shared-Memory Do-All Algorithms in Asynchronous Message-Passing Systems Dariusz R. Kowalski 2,3, Mariam Momenzadeh 4, and Alexander A. Shvartsman 1,5 1 Department of Computer Science and Engineering,

More information

Dfinity Consensus, Explored

Dfinity Consensus, Explored Dfinity Consensus, Explored Ittai Abraham, Dahlia Malkhi, Kartik Nayak, and Ling Ren VMware Research {iabraham,dmalkhi,nkartik,lingren}@vmware.com Abstract. We explore a Byzantine Consensus protocol called

More information

Sigma: A Fault-Tolerant Mutual Exclusion Algorithm in Dynamic Distributed Systems Subject to Process Crashes and Memory Losses

Sigma: A Fault-Tolerant Mutual Exclusion Algorithm in Dynamic Distributed Systems Subject to Process Crashes and Memory Losses Sigma: A Fault-Tolerant Mutual Exclusion Algorithm in Dynamic Distributed Systems Subject to Process Crashes and Memory Losses Wei Chen Shi-Ding Lin Qiao Lian Zheng Zhang Microsoft Research Asia {weic,

More information

Coded Emulation of Shared Atomic Memory for Message Passing Architectures

Coded Emulation of Shared Atomic Memory for Message Passing Architectures Coded Emulation of Shared Atomic Memory for Message Passing Architectures Viveck R. Cadambe, ancy Lynch, Muriel Médard, Peter Musial Abstract. This paper considers the communication and storage costs of

More information

Degree Optimal Deterministic Routing for P2P Systems

Degree Optimal Deterministic Routing for P2P Systems Degree Optimal Deterministic Routing for P2P Systems Gennaro Cordasco Luisa Gargano Mikael Hammar Vittorio Scarano Abstract We propose routing schemes that optimize the average number of hops for lookup

More information

State-Optimal Snap-Stabilizing PIF In Tree Networks

State-Optimal Snap-Stabilizing PIF In Tree Networks State-Optimal Snap-Stabilizing PIF In Tree Networks (Extended Abstract) Alain Bui, 1 Ajoy K. Datta, 2 Franck Petit, 1 Vincent Villain 1 1 LaRIA, Université de Picardie Jules Verne, France 2 Department

More information

Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems

Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems Luís Rodrigues Michel Raynal DI FCUL TR 99 7 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande,

More information

Providing File Services using a Distributed Hash Table

Providing File Services using a Distributed Hash Table Providing File Services using a Distributed Hash Table Lars Seipel, Alois Schuette University of Applied Sciences Darmstadt, Department of Computer Science, Schoefferstr. 8a, 64295 Darmstadt, Germany lars.seipel@stud.h-da.de

More information

Secure Multi-Party Computation Without Agreement

Secure Multi-Party Computation Without Agreement Secure Multi-Party Computation Without Agreement Shafi Goldwasser Department of Computer Science The Weizmann Institute of Science Rehovot 76100, Israel. shafi@wisdom.weizmann.ac.il Yehuda Lindell IBM

More information

Distributed Algorithms 6.046J, Spring, Nancy Lynch

Distributed Algorithms 6.046J, Spring, Nancy Lynch Distributed Algorithms 6.046J, Spring, 205 Nancy Lynch What are Distributed Algorithms? Algorithms that run on networked processors, or on multiprocessors that share memory. They solve many kinds of problems:

More information

6.852: Distributed Algorithms Fall, Class 21

6.852: Distributed Algorithms Fall, Class 21 6.852: Distributed Algorithms Fall, 2009 Class 21 Today s plan Wait-free synchronization. The wait-free consensus hierarchy Universality of consensus Reading: [Herlihy, Wait-free synchronization] (Another

More information

Optimal Resilience for Erasure-Coded Byzantine Distributed Storage

Optimal Resilience for Erasure-Coded Byzantine Distributed Storage Optimal Resilience for Erasure-Coded Byzantine Distributed Storage Christian Cachin IBM Research Zurich Research Laboratory CH-8803 Rüschlikon, Switzerland cca@zurich.ibm.com Stefano Tessaro ETH Zurich

More information

A Suite of Formal Denitions for Consistency Criteria. in Distributed Shared Memories Rennes Cedex (France) 1015 Lausanne (Switzerland)

A Suite of Formal Denitions for Consistency Criteria. in Distributed Shared Memories Rennes Cedex (France) 1015 Lausanne (Switzerland) A Suite of Formal Denitions for Consistency Criteria in Distributed Shared Memories Michel Raynal Andre Schiper IRISA, Campus de Beaulieu EPFL, Dept d'informatique 35042 Rennes Cedex (France) 1015 Lausanne

More information

Data Distribution in Large-Scale Distributed Systems

Data Distribution in Large-Scale Distributed Systems Università di Roma La Sapienza Dipartimento di Informatica e Sistemistica Data Distribution in Large-Scale Distributed Systems Roberto Baldoni MIDLAB Laboratory Università degli Studi di Roma La Sapienza

More information

Distributed Algorithms Models

Distributed Algorithms Models Distributed Algorithms Models Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1 Taxonomy

More information

Help when needed, but no more: Efficient Read/Write Partial Snapshot

Help when needed, but no more: Efficient Read/Write Partial Snapshot Help when needed, but no more: Efficient Read/Write Partial Snapshot Damien Imbs, Michel Raynal To cite this version: Damien Imbs, Michel Raynal. Help when needed, but no more: Efficient Read/Write Partial

More information

Time-Free Authenticated Byzantine Consensus

Time-Free Authenticated Byzantine Consensus Time-Free Authenticated Byzantine Consensus Hamouma Moumen, Achour Mostefaoui To cite this version: Hamouma Moumen, Achour Mostefaoui. Time-Free Authenticated Byzantine Consensus. Franck Capello and Hans-Peter

More information

A Dual Digraph Approach for Leaderless Atomic Broadcast

A Dual Digraph Approach for Leaderless Atomic Broadcast A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version) Marius Poke Faculty of Mechanical Engineering Helmut Schmidt University marius.poke@hsu-hh.de Colin W. Glass Faculty of Mechanical

More information

Distributed Algorithms Benoît Garbinato

Distributed Algorithms Benoît Garbinato Distributed Algorithms Benoît Garbinato 1 Distributed systems networks distributed As long as there were no machines, programming was no problem networks distributed at all; when we had a few weak computers,

More information

Distributed Algorithms Reliable Broadcast

Distributed Algorithms Reliable Broadcast Distributed Algorithms Reliable Broadcast Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents

More information

From a Store-collect Object and Ω to Efficient Asynchronous Consensus

From a Store-collect Object and Ω to Efficient Asynchronous Consensus From a Store-collect Object and Ω to Efficient Asynchronous Consensus Michel Raynal, Julien Stainer To cite this version: Michel Raynal, Julien Stainer. From a Store-collect Object and Ω to Efficient Asynchronous

More information

Quiescent Consensus in Mobile Ad-hoc Networks using Eventually Storage-Free Broadcasts

Quiescent Consensus in Mobile Ad-hoc Networks using Eventually Storage-Free Broadcasts Quiescent Consensus in Mobile Ad-hoc Networks using Eventually Storage-Free Broadcasts François Bonnet Département Info & Télécom, École Normale Supérieure de Cachan, France Paul Ezhilchelvan School of

More information

Initial Assumptions. Modern Distributed Computing. Network Topology. Initial Input

Initial Assumptions. Modern Distributed Computing. Network Topology. Initial Input Initial Assumptions Modern Distributed Computing Theory and Applications Ioannis Chatzigiannakis Sapienza University of Rome Lecture 4 Tuesday, March 6, 03 Exercises correspond to problems studied during

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

The Alpha of Indulgent Consensus

The Alpha of Indulgent Consensus The Computer Journal Advance Access published August 3, 2006 Ó The Author 2006. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please

More information

Exclusion-Freeness in Multi-party Exchange Protocols

Exclusion-Freeness in Multi-party Exchange Protocols Exclusion-Freeness in Multi-party Exchange Protocols Nicolás González-Deleito and Olivier Markowitch Université Libre de Bruxelles Bd. du Triomphe CP212 1050 Bruxelles Belgium {ngonzale,omarkow}@ulb.ac.be

More information

GeoQuorums: implementing atomic memory

GeoQuorums: implementing atomic memory Distrib. Comput. (2005) 18(2): 125 155 DOI 10.1007/s00446-005-0140-9 SPEC ISSUE DISC 03 Shlomi Dolev Seth Gilbert Nancy A. Lynch Alexander A. Shvartsman JenniferL.Welch GeoQuorums: implementing atomic

More information

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change

More information

Etna: a Fault-tolerant Algorithm for Atomic Mutable DHT Data

Etna: a Fault-tolerant Algorithm for Atomic Mutable DHT Data Etna: a Fault-tolerant Algorithm for Atomic Mutable DHT Data Athicha Muthitacharoen Seth Gilbert Robert Morris athicha@lcs.mit.edu sethg@mit.edu rtm@lcs.mit.edu MIT Computer Science and Artificial Intelligence

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Arvind Krishnamurthy Fall Collection of individual computing devices/processes that can communicate with each other

Arvind Krishnamurthy Fall Collection of individual computing devices/processes that can communicate with each other Distributed Systems Arvind Krishnamurthy Fall 2003 Concurrent Systems Collection of individual computing devices/processes that can communicate with each other General definition encompasses a wide range

More information

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services PODC 2004 The PODC Steering Committee is pleased to announce that PODC 2004 will be held in St. John's, Newfoundland. This will be the thirteenth PODC to be held in Canada but the first to be held there

More information

Byzantine Fault-Tolerance with Commutative Commands

Byzantine Fault-Tolerance with Commutative Commands Byzantine Fault-Tolerance with Commutative Commands Pavel Raykov 1, Nicolas Schiper 2, and Fernando Pedone 2 1 Swiss Federal Institute of Technology (ETH) Zurich, Switzerland 2 University of Lugano (USI)

More information

Synchronization is coming back, but is it the same?

Synchronization is coming back, but is it the same? Synchronization is coming back, but is it the same? Michel Raynal To cite this version: Michel Raynal. Synchronization is coming back, but is it the same?. [Research Report] PI 1875, 2007, pp.16.

More information