Ordering events in distributed systems: A review

Size: px
Start display at page:

Download "Ordering events in distributed systems: A review"

Transcription

1 Ordering events in distributed systems: A review Yannic Bonenberger TU Kaiserslautern, Germany The concept of time is fundamental to our way of thinking. However, our intuitive concept of a total ordering of events is not well suited to manage temporal ordering in large, distributed systems. In this paper, we will introduce two methods how to design distributed systems to handle events gracefully. In the second part of this paper, we present concrete use-cases for these methods, and how they were implemented in these applications. 1 Introduction The concept of time is a very important part of human thinking. We all use it every day to describe the order in which events occur or to describe the duration of a single event. For example, we say that an event occurred at 3:15 if the clock we look at shows 3:15 and before it shows 3:16. We might also say that a given event took 15 minutes if it started at 3:15 and ended at 3:30. However, this intuitive concept of time is not very accurate and comes with some inherent problems when we try to use it in distributed systems. The fact that the accuracy of this way of thinking about time is not very high can be split into two rather obvious parts: Firstly, when we say that an event happened at 3:15, the clock already shows 3:15, which is when the exact moment where it was exactly 3:15 is over. This means that we always use the past to describe time. The second reason why the accuracy of our intuitive concept of time is not very high is that we cannot know that the clock we use to describe when an event happened actually shows the correct time. Sure, we may know if the time on a given clock is completely wrong, but we cannot know if there is a small difference between two arbitrary clocks. While this does not seem like it is an issue in everyday life, it can be important in some situations. Let us assume we sell tickets for a festival, and all sales have to be done by telephone. As usual, the tickets are sold First come, First served, which means we have to order the requests by time. Let us also say that it is a big festival, and there are a lot of people calling at the same time. Since we do not want our visitors to wait for a long time, we hire a big call center with multiple offices to handle the requests. Every time someone calls to order a ticket, we write down the current time and hand the request to some central station which orders them by time, and then assigns tickets to the first callers. As you can imagine, this total ordering of events is not very accurate because there might be some small, but noticeable, difference between the different clocks used to get the time when the tickets were ordered. If one clock now already shows 3:16, while another one still shows 3:15, we may violate our First come, First serve premise. These limitations of accuracy and scalability becomes more severe if we start to talk about distributed systems instead. Submitted to: ES Seminar 2018

2 2 Ordering events in distributed systems: A review A distributed system is a collection of distinct processes which can communicate with each other by sending messages to other processes. An example of such a system is the internet, which consists of independent hosts which communicate by exchanging messages. Single computers can also be viewed as a distributed system. Central control unit, the memory units, and the input-output channels are separate processes and can communicate by sending messages over a central BUS. Modern computers also usually have more than one processor core, and can execute more than one hardware thread concurrently. These independent threads can communicate with other thready by shared memory, or by sending messages. However, transmission time in single computers is usually negligible compared to the time between events in a process. Therefore, we will concern ourselves primarily with systems of spatially separated computers. However, most of the concepts apply more generally. If we go back to the example we used to demonstrate the limitations of accuracy and scalability of the intuitive concept of time: If we replace the manual process of ordering tickets by calling a call center with an automated system to handle these requests, we can create an automated system which is comparable to our manual system. However, we still have the same issues to solve. Due to scalability constraints, we cannot use a single process to handle a potentially very large number of concurrent requests. This means that we still have to order all events by time to fulfill the requests on a First come, First serve basis. However, it is not strictly necessary to have a total ordering across all requests, as long as we can guarantee that as soon as the request for the last ticket was fulfilled, all requests which happen after this final ticket is sold will be fulfilled. In this paper, we will formally define physical time, introduce to the concept of logical time and present some reasoning why true physical is not required in many applications. After we formally defined the relationship of events, and what it means when we say that an event e i happened before another event e j e i, or what it means when we say two events e i,e j happened concurrently, we introduce two methods, Lamport Time [4] and Vector Time [2], how to handle time in distributed systems and compare them. In the end, we present two distributed applications that require handling of events and how they use the methods presented earlier to solve their problem [3, 6].

3 Y. Bonenberger 3 2 Physical Clocks Let us introduce physical clocks into out model. Let C i (t) denote the reading of clock C i at time t. For mathematical consistency, we assume to have a clock running continuously rather than having a clock with discrete ticks (A clock with discrete ticks can be modeled by a clock running continuously, and a reading error of up to 1 2 tick). More precisely, we assume that C i(t) is a continuous, differentiable function of t except for isolated jump discontinuities where the clock is reset. Then dc i(t) dt represents the rate at which the clock is running at time t. To have a true physical clock C i, it is crucial to assume that t : dc i(t) dt. More precisely, the following condition must always be satisfied: x << 1 : i : dc i (t) dt 1 < x. (1) Typical crystal controlled clocks usually have x However, to have a true physical clock, all clocks must not only individually run at approximately the correct rate, it is also crucial that all clocks must be synchronized so that i, j : C i (t) C j (t). To be more precise ε : i, j : C i (t) C j (t) < ε. (2) Assuming we have a system as presented in Figure 1, we can consider the vertical distance between events to represent physical time. In this model, (2) states that the variation of the tick lines is less than ε if the clocks are sufficiently synchronized. However, since two different clocks tend to drift further and further apart because it is almost impossible that they run at exactly the same rate. Therefore, we must develop an algorithm to ensure that the condition (2) always holds. However, as we stated earlier in this section, as well as Section 1, using true physical time to describe the ordering of a set of events provides us with several challenges. Fortunately, most systems do not need true physical time to order events. Instead, it is often enough to assign a strictly incrementing number to each event to create a sufficient ordering. 3 Logical Clocks Now that we know physical clocks, we can introduce the concept of logical clocks into our system. Let us begin with an abstract point of view where a clock simply assigns numbers to events. We can think of these numbers as the time at which the event occurred. More precisely, we define an arbitrary clock C k for each process P k to be a simple function which assigns a number C k (e i ) to every event e i in that process. The entire system is represented by the function C which assigns to any event e j the number C(e j ), where C(e j ) = C l (e j ) if e j is an event in process P l. For now, we do not make any assumption about the relationship of the numbers C i (a) to physical time,

4 4 Ordering events in distributed systems: A review Figure 1: Three independent processes P,Q,R processing events p i,q i,r i, and sending messages to each other. so we can think of the clocks C i as logical rather than physical clocks. They may be implemented by counters with no actual timing mechanism. Now that we formally introduced the concept of logical clocks, we must define what it means for such systems to be correct. Since we cannot introduce true physical clocks keeping real physical time into our system and therefore also cannot base our definition of correctness on physical time, we must base our definition purely on the order in which events occur. The strongest reasonable condition for the correctness of the proposed timing mechanism is that if an event e i occurs before another event e j, then e j happens at a later time than e j. If we consider the vertical distance of events in Figure 1 to represent logical time, then an event e i happened before e j if and only if the time i at which e i occurs is truly smaller than the time j at which e j occurs (e i e j i < j).

5 Y. Bonenberger 5 4 Ordering Events To compare a set of events, we have to define a relation between two events e i, e j so that e i e j e i happened before e j. In this section, we will formally introduce the concepts of total order and partial order, as well as introducing some general concepts which apply when ordering events. Order theory is a branch of mathematics which investigates the notion of order using binary relations. It provides a formal foundation to describe statements such as this is less than that, or this happened before that. We can distinguish between two different, but related, kinds of orders: partial order and total order. Let E be a set of events and be a relation on E. Then is called a partial order if and only if it is reflexive, transitive, and antisymmetric. Figure 2: Hasse diagram of the set of all divisors of 60, partially ordered by divisibility [1].

6 6 Ordering events in distributed systems: A review This means that e i,e j,e k E we have that e i e i (reflexivity), (3) (e i e j ) (e j e k ) e i e k (transitivity), and (4) (e i e j ) (e j e i ) e i = e j (antisymmetry). (5) A set which fulfills those three properties is called a partial order. By checking these properties, one immediately sees that the well-known orders on natural numbers, integers, rational numbers, and reals are all orders in the above sense. However, they can have the additional property of being a total order. This means that e i,e j E we have that (e i e j ) (e j e i ) (totality). (6) In Figure 2, we present an example of divisibility of 60, which is only a partial order and not a total order. Please note that only nodes which are directly or indirectly connected by edges are comparable. For example, we know that 1 2 (read 1 divides 2 ), 10 20, or However, we do not know whether or 15 10, as these two nodes are not comparable. 5 Lamport Time In this section, we will present the algorithm described by Leslie Lamport in the paper Time, Clocks, and the Ordering of Events in a Distributed System in 1978 [4]. The algorithm proposed by Lamport uses logical clocks, which we have already formally defined in Section 3. As a reminder: Logical clocks can be implemented by counters with no actual timing mechanism, and every event is assigned a strictly increasing number. Because we cannot rely on physical time in such a system, the definition of correctness must be based on the order in which events occur. The strongest reasonable condition is that if an event e i occurs before another event e j, then e i should happen at an earlier time than e j. We formally define this Clock Condition as follows: e i,e j E : (e i e j ) (C(e i ) < C(e j )). (7) Looking at Figure 1, we can see that the events p 2 and p 3 are concurrent with q 3. Assuming that the converse condition of (7) also holds, both events p 2 and p 3 must occur at the same time as q 3. Since this would contradict the Clock Condition (p 2 p 3 ), we cannot expect this converse condition to be true. Given our definition of, it is easily derivable that the Clock Condition defined in (7) is satisfied

7 Y. Bonenberger 7 if these two conditions hold: If e i and e j are events in process P k, and e i e i, then C k (e i ) < C k (e j ), and (8) If e i is the sending of a message by process P k and e j is the receipt of that, then C k (e i ) < C k (e j ). (9) Let us consider the clocks in terms of a space-time diagram: It is easily imaginable that the clock of a process ticks through every number, incrementing between every event. For example, if e i and e j are successive events in process P k with C k (e i ) = 4 and C k (e j ) = 7, then the ticks 5, 6, and 7 of C k occur between these two events. If we draw a dashed tick line through all the ticks of the different processes, we can see that the space-time diagram presented in Figure 3 below yields a similar picture to the picture we used to illustrate the ticks of a physical clock in Figure 1 in section 2 above. From condition (8), we can derive that there must be a tick line between any two successive events of any process, and condition (9) requires that all message lines must cross at least one tick line. Looking at the meaning of in the space-time diagram in Figure 3, it is easily imaginable that the tick lines represent the coordinate lines of a Cartesian coordinate system on space-time, implying that the two necessary conditions of our Clock Condition are indeed true. If we redraw Figure 1 to straighten these dashed coordinate lines, we can create a similar picture which yields a valid alternative way of representing the same system of events. However, it is not decidable which of these two possible representations is better without introducing physical time into our system. Readers may find it helpful to use a two-dimensional spatial network of processes yielding a three-dimensional space-time diagram for visualization. Similarly to our representation in this paper, the alternative representation models processes and messages as lines. However, tick lines are now represented by two-dimensional surfaces. Now that we have formally defined our requirements, let us assume that the processes represent algorithms, and the events are certain actions during their execution. We will continue to show how to introduce clocks into the processes which satisfy the Clock Condition. If we use a register C k to represent the clock of process P k, and C k (e i ) is the value of C k during the event e i, the value in the register will change only between two events in the same process P k. For obvious reasons, this change must not constitute an event itself. We can show that this implementation satisfies the Clock Condition by ensuring that we satisfy (8) and (9). Showing that the approach proposed above satisfies condition (8) is simple: The processes only need to obey this following implementation rule: Each process P k increments C k between any two successive events. (10)

8 8 Ordering events in distributed systems: A review Figure 3: Three independent processes P,Q,R processing events p i,q i,r i, and sending messages to each other. Meeting the second condition (9) is slightly more complicated: We must ensure that each message m contains a timestamp T m equals to the time at which the message was sent. Every time a process P l receives a message m in with the timestamp T in, the process must advance its clock to be later than T in. More precisely, we define the following two rules: If event e i is the sending of a message m by process P k, then m contains a timestamp T m = C k (e i ), and (11) Upon receiving m, process P k sets C k so that C k its present value and C k T in. (12)

9 Y. Bonenberger 9 The first rule (12) requires that the event representing the receipt of the message m occurs after the setting of C k. However, we want to note that this is only a small nuisance in the notation, and not relevant in any actual implementation. It is also trivial to show that (11) and (12) satisfy the condition (9). Hence, these three simple rules (10), (11) and (12) for an implementation of the approach presented in this paper imply that the Clock Condition is satisfied, and guarantee a correct system of logical clocks. 6 Vector Time Although the relation introduced by Lamport [4] is always consistent with the observable behaviour of distributed systems, it only defines one of the possibly many valid event orderings for a given distributed computation and all other possible, certainly equally valid, event orderings are lost. Even the partial ordering resulting from the fact that subsets of the events can have the same timestamp does not preserve all potential and valid orderings. In this section, we will introduce a second approach which retains all possible and valid orderings. While this is the exact opposite approach to the one introduced earlier, it is best suited for problems concerning with the global state of a program. Figure 4: Use of timestamp vectors for asynchronous communication. In this model, rather than having only a single integer value shared by all processes, timestamps are represented as vectors c 1 c 2... c n (13)

10 10 Ordering events in distributed systems: A review with a dedicated integer value for every process in the distributed system. Formally, we define that e p represents an event e p executed by a process p, and that T ep is the timestamp vector permanently attached ( T to the record of the execution of this event. For example, assuming that ) is attached to an arbitrary event x in process 2, we can see that the clock value was T x2 [2] = 7 when x was executed, and that the last known clock value of process 4 was T x2 [4] = 12. It is important to note that the local timestamp of process 4 may have advanced well beyond this value by the time x is executed, but 12 is the most recent value available to process Asynchronous Communication Figure 4 shows an example of asynchronous communication In this case, the timestamp vectors are managed by the following algorithm: Initially, all values of the timestamp vector are set to zero. (14) The local clock value is incremented at least once before each atomic event. (15) Every outgoing message is augmented with the entire timestamp vector. (16) Upon receiving a message, a process sets the value of each entry in the timestamp vector to be the maximum of the two corresponding values in the local vector, and in the piggybacked vector received. The value corresponding to the sender, however, is a special case and is set to be one greater than the value received (to allow for transit time), but only if the local value is not already greater than that received (to allow for message overtaking ). (17) Values in the timestamp vectors are never decremented. (18) To compare timestamps attached to the stored records of events, we proceed as follows: e p f q T ep [p] < T fq [p]. (19) In order to satisfy this condition, an event e p is a predecessor of another event f q if and only if p has sent a message to q either during or after the execution of e p. To achieve transitivity and therefore make it possible to determine causal relation between events executed by processes which may never communicate directly, we additionally allow the indirect propagation of timestamp vectors.

11 Y. Bonenberger Synchronous Communication Now that we have presented our solution for asynchronous communication, we can show that the synchronous case can be solved easily if we require the exchange of timestamps from both sender and receiver with every message, and that both processes set their local clock to the maximum of the exchanged timestamp. This act is necessary because synchronous communication is always symmetric. To achieve synchronous communication, we define that every time a process receives a message, a dummy message with the local clock value of the receiver is returned to the sender, with both processes adjusting their local clocks according to the received timestamps. As long as all processes adhere to this simple protocol, it is impossible to introduce deadlocks into the system. For brevity, we will not present a proof in this paper. Interested readers can find it in the cited paper by Fidge [2]. Figure 5: Algorithm proposed by Lamport, adapted for synchronous communication: a) clock in sender running fast, b) clock in sender running slow. It is important to note that, due to the symmetry of synchronous communication and how we modeled the exchange of such messages, the direction of information transfer is not important. Therefore, we will omit the directional arrows for synchronous messages in future visualization. Now that we have formally defined synchronous communication and how we plan to handle it, we provide a modified version of the algorithm presented in Section 6.1 to manage timestamp vectors in the synchronous case:

12 12 Ordering events in distributed systems: A review Initially, all values of the timestamp vector are set to zero. (20) The local clock value is incremented at least once before each atomic event. (21) During a communication event, the two processes involved exchange timestamp vectors and each element in the local vector is set to be the maximum of its old value and the corresponding value in the received vector. (22) Values in the timestamp vectors are never decremented. (23) Figure 6: Timestamp vectors for synchronous communication. Each message causes an individual event in both processes involved in the communication to compensate for the fact that the execution of events is recorded in each process separately. Similarly, we modify the procedure to compare timestamps of stored record as described below: e p f q (T ep [p] T fq [p]) (T ep [q] < T fq [q]). (24) The first half comparatively complex conjunction ensures that process q has received a clock values from its communication partner p which is at least as recent as the execution of event e p. If we know that this precondition is satisfied, we also know that e p must have been executed before f q. Careful

13 Y. Bonenberger 13 readers may notice that we use as comparator rather than < to allow for the possibility that e p is a communication event itself. In this special case, process q may already have up-to-date information about the other process p when event f q was executed. The second half of conjunction (24) states that p cannot have up-to-date information about q (i.e. e p f q ). This second part is necessary to avoid reflexivity. Since, in our model, processes generate histories of timestamp traces for post-mortem analysis independently, we did not attempt to test for e p f q directly. Alternatively to the algorithm presented in (24) above, it is equivalent to compare the entire timestamp vectors because they are exactly the same if e p f q. However, since this computation is O(n) with n being the number of processes in the system, doing so will become inefficient if the number of processes is large, while the proposed approach is O(1) since it only needs to compare two integer values. 7 Comparison In this section, we will compare the approach proposed by Lamport [4] to the approach proposed by Fidge [2]. Both papers present an algorithm to handle time in distributed systems, with both of them avoiding the usage of true physical time to order the events. The approach introduced by Lamport uses only a single integer value as timestamp shared by all processes, while the approach proposed by Fidge uses a timestamp vector with a dedicated integer value for every process. Both approaches have certain advantages over the other. Two very important properties of the algorithm presented in Section 5 are that it is rather easy to implement, and that memory required to manage and save the timestamp is constant. While it is easy to see why the approach proposed by Lampost is very easy to implement, we do not think that this is a noteworthy advantage. The second algorithm presented in Section 6 is only slightly more complex. However, the approach proposed by Fidge requires that a timestamp vector with a dedicated integer value for every process is used, which leaves us with a space complexity of O(n), while the other approach requires only one timestamp value which is shared by all processes, which means that we have a space complexity of O(1). This can be especially problematic when we have a large number of processes, because it not only requires more memory, it also increases the total size of every message because of the piggybacked timestamp vector. However, while the algorithms differ in space complexity, both have a time complexity of O(1). Another advantage of the approach presented by Lamport is that it can handle dynamic processes. In this report, we have always assumed that there is a fixed number of processes. However, this assumption is problematic because most applications must be able to scale up or scale down, depending on the current workload. It is easy to see why the single integer timestamp presented in Section 5 has no problem with this, while a timestamp vector with a fixed size like we assumed in Section 6 cannot be used for this. To mitigate this fundamental limitation, Fidge proposed to replace the fixed-size timestamp vector with an extendable timestamp list, and add a slot for every new process. However, we can see at least two

14 14 Ordering events in distributed systems: A review issues with this solution: Firstly, this will severely increase the memory footprint of the timestamp vector, especially for systems which frequently scale their number of processes, because we only ever add new slots to the vector and never remove old and stale ones. Please note that it is not possible to remove slots of processes which are no longer running because the absence of messages from a process is not a good indicator whether it is still alive or not. This would require a central directory which brings us to our second concern: Allocating new slots requires a central directory because every process must have a unique slot in the timestamp vector. If two new processes are added at the same time, they may decide to use the same slot if they do not know of each other. In lots of cases, this requirement of a central directory is not desirable, especially if the distributed system has availability requirements. However, there is also very important advantages of the approach proposed by Fidge: Firstly, Lamport time cannot handle synchronous communication, and secondly, the relation defined by Lamport only defines one of many possible valid event orderings for a given distributed computation, and knowledge of any other, equally valid, orderings is lost. As a final note, it is crucial that applications weight which properties of a distributed event ordering algorithm are important for their use-case, and then decide which approach they chose. 8 Providing high availability using lazy replication In this section, we will present an example of a distributed system which uses the presented approaches to order events. As already noted earlier, high availability is a requirement for services such as mail or bulletin boards. More precisely, they should be accessible with high probability despite site crashes and network failures. To achieve this, data must be replicated to multiple independent nodes and data consistency must be guaranteed. One way to guarantee this required consistency is to force that all operations occur in the exact same order at all instances, which is expensive. Fortunately, not every application requires this stronger causal operation order to preserve the required level of consistency, yielding an improved performance. To achieve this weak casual order, the authors of the paper introduce a concept of lazy replication, which is intended for environments in which individual computers are connected by a communication network. With this architecture, both the nodes and the network can fail without bringing down the system as a whole. Nodes are modeled as fail-stop processors [5], network partitions can happen, messages between nodes may be lost, delayed, duplicated, and delivered out-of-order, and instances can leave or join the application at any time. Replicated systems are designed as a services consisting of multiple computers acting as replicas. Such systems are usually located in a dedicated network. However, large systems spanning multiple physical locations are possible. For brevity, we will only look at systems directly connected by a single network. locations. Replicas communicate new information among themselves by lazy exchange of gossip messages. For brevity, we will only look at two kinds of operations: update operations, modifying the state of the system but cannot observe it, and query operations, which observe the state but do not modify it. To be able to execute these operations in a valid order, we augment them

15 Y. Bonenberger 15 with a label indicating the state of the system, and specify which previous label is required to perform the operation. To achieve the best possible efficiency, we need to define compact representations of labels, and decide whether an operation is ready to be executed in a fast and efficient way. Additionally, labels must be generated by individual instances independently. To achieve these properties, we use multipart ( ) T timestamps. A multipart timestamp is a vector t 1 t 2... t n where n is the number of replicas in the service. Every entry in this vector is a non-negative integer, which is initially zero. These timestamps are ordered the intuitive way: t s (t 1 s 1 t 2 s 2... t n s n ). (25) Merging two timestamps t and s into a new timestamp u is done by taking their component-wise maximum (u[i] = max(t[i],s[i])). Replicas receives operations (call messages), and also gossip messages from other nodes. When receiving a call message for an update which has not been performed by this instance before, the update is assigned a timestamp and the replica adds information about it to its local log record. Periodically, this information is propagated to other replicas in the network as gossip messages, and then also reflected in the log of the receiving instance. Every node maintains a local timestamp, rep ts, identifying the set of records contained in the local log, and thus expressing the which updates are known by a particular instance. The local timestamp values are incremented every time an update call is processed. Therefore, the value of the replica s part of rep ts directly reflects the number of processed updates. All other parts of rep ts are only incremented when nodes receive gossip messages from other replicas. Therefore, every part i of rep ts counts the number of updates processed at replica i and are known by the replica maintaining this timestamp vector. When an update record is known by all replicas in the network, we know that it is reflected in the local state of all instances and can discard it from all logs. To be more precise, if an update is known to be known by all nodes, it can be excecuted meaning that it will also be reflected in the local state. The reasoning behind this more general claim is the following: ssuming a node knows about some abitrary update recurd u, the node knows that this record is known by all instance if and only if it has received gossip messages containing u from all other replicas. Since all gossip messages contain the accumulated log recorded by the sender, it is guaranteed that instances receiving the log entry containing u from replica i have also, either in the current gossip message or in an earlier one, received all operations processed by i before u was executed. Therefore, if a replica has heard about u from all other replicas, it has also heard about all updates u depends on. If that would not be true, the gossip could not have contained u because its dependencies must have been executed before u. Therefore, u is ready to execute. 8.1 Processing an update message If an update is late, or if it has already been processed by the replica receiving the call, the message is discarded. Otherwise, the following actions are performed to process the update:

16 16 Ordering events in distributed systems: A review Advances its local timestamp by incrementing its ith part by one while leaving all the other parts unchanged. (26) Computes the timestamp for the update, t s by replacing the ith part of the input timestamp with the ith part of the local timestamp. (27) Constructs the update record r associated with this execution of the update, r = makeu pdaterecord(u,i,t s ) and adds it to the local log. (28) Executes the update operation u if all the updates that u depends on have already been incorporated into the local state. (29) Returns the updates timestamp in a reply message. (30) Since updates can depend on other updates, the local timestamp and the timestamp assigned to the update call u may not necessarily be comparable. For example, if replica i receives an update u depending on another update v, and v has been executed by another node j, i may not know about v yet and has to delay the execution of u until v is propagated to i by gossip. 8.2 Processing a query message When replica i receives a query message q, it compares the querys input timestamp with its own state, which identifies all locally reflected updates. If the querys input timestamp is smaller than the timestamp presenting the local state, it executes the query and returns the result and the timestamp representing its local state. If the query s input timestamp is not smaller than the local timestamp, the replica waits since required information is missing. If a node is in this state, there are two possible ways to resolve the situation and continue with the query: The replica can either wait for gossip messages containing the missing data, or it can send a request to another instance and explicitly request the required information. 8.3 Processing a gossip message As mentioned earlier, gossip messages are used to propagate update messages to all nodes in the system. Therefore, these messages contain the log of the sender, as well as the senders local timestamp. Processing of the individual gossip messages can be done by executing the following three steps: Merging the log in the message with the local log. (31)

17 Y. Bonenberger 17 Computing the local view of the service state based on the new information. (32) Discarding records from the log and from the set of records which participated in the update. (33) For obvious reasons, the processing of incoming gossip messages only happens if the local state does not already reflect the updates of this particular gossip message. Receiving redundant information in gossip messages can happen for two reasons: Messages are delivered out-of-order, or another gossip message from a different node already contained the updates from this gossip message. If the gossip message is not discarded, the replica performs the following actions: Adds the new information in the message m new to the replicas log: log = log r m new \ log (34) Merges the replicas timestamp with the timestamp in the message so that the local state reflects the information known at the replica. (35) Finds all the update records that are ready to be added to the local value. (36) Computes the new local value. (37) Updates its local timestamp table. (38) Discards update records from the log if they have been received by all replicas. (39) Discards records from the list of records which participated in an update if an ack for this update is in the log and there is no update record for that update in the log. (40) Discards ack records from the log if they are known everywhere and sufficient time has passed. (41) The decision to delete records from the log only if these records are known everywhere can be problematic in the case of a network partition, since it uses information from all other replicas which may not be available in this case. Supposing that a partition divided the network into two sides A and

18 18 Ordering events in distributed systems: A review B, and that a record r is known by all nodes in both A and B, if no replica in partition A knows that r is known by all replicas in B, r will not be discarded from the log of the nodes in A. Once the two network parts A and B connect, this problem is automatically solved. 8.4 Analysis For brevity, we will not present an analysis of the correctness of the proposed system in this review. Interested readers can find it in Section of the paper by Ladin, Liskov, Shrira, and Ghemawat [3]. One very important aspect of the performance of the proposed system is that it highly depends on the type and frequency of executed operations. For brevity, will only present a very brief overview of the proposed system. Figure 7: Capacity of a single replica. In Figure 7 and 8, we present the response times of a single instance in a system consisting of three replicas for a given mix of operations, and the response times of an unreplicated system respectively. By comparing the capacity of the unreplicated system to the capacity of the replica, it is possible to derive the savings due to gossip. However, it is worth mentioning that the performance of the system as a whole is likely dependent to the relative priorities of gossip and operations. The system used to measure the response times visualized in Figures 7 and 8 was configured to prioritize gossip, meaning that the gossip will be processed whenever there is gossip to send or receive. Configuring the system to prioritize update

19 Y. Bonenberger 19 Figure 8: Capacity of the unreplicated system. or query operations over gossip will likely yield better response times. However, it is crucial that gossip cannot be allowed to lag too far behind since this would slow down the propagation of information about updates. For brevity, no experiments with changes in the relative priority were performed during this analysis. Real implementations should perform this analysis to find the optimal configuration. References [1] (2018): Available at the_divisibility_of_60.svg. [2] Colin J Fidge (1987): Timestamps in message-passing systems that preserve the partial ordering. [3] Rivka Ladin, Barbara Liskov, Liuba Shrira & Sanjay Ghemawat (1992): Providing high availability using lazy replication. ACM Transactions on Computer Systems (TOCS) 10(4), pp [4] Leslie Lamport (1978): Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21(7), pp [5] Richard D Schlichting & Fred B Schneider (1983): Fail-stop processors: an approach to designing faulttolerant computing systems. ACM Transactions on Computer Systems (TOCS) 1(3), pp [6] Reinhard Schwarz & Friedemann Mattern (1994): Detecting causal relationships in distributed computations: In search of the holy grail. Distributed computing 7(3), pp

Providing High Availability Using Lazy Replication

Providing High Availability Using Lazy Replication Providing High Availability Using Lazy Replication 1 Rivka Ladin Digital Equipment Corp. One Kendall Square Cambridge, MA 02139 Barbara Liskov Liuba Shrira Sanjay Ghemawat MIT Laboratory for Computer Science

More information

Fault-Tolerance & Paxos

Fault-Tolerance & Paxos Chapter 15 Fault-Tolerance & Paxos How do you create a fault-tolerant distributed system? In this chapter we start out with simple questions, and, step by step, improve our solutions until we arrive at

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems 2. Time and Global States Page 1 University of Freiburg, Germany Department of Computer Science Distributed Systems Chapter 3 Time and Global States Christian Schindelhauer 12. May 2014 2. Time and Global

More information

殷亚凤. Synchronization. Distributed Systems [6]

殷亚凤. Synchronization. Distributed Systems [6] Synchronization Distributed Systems [6] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Protocols Remote Procedure Call

More information

Specifying and Proving Broadcast Properties with TLA

Specifying and Proving Broadcast Properties with TLA Specifying and Proving Broadcast Properties with TLA William Hipschman Department of Computer Science The University of North Carolina at Chapel Hill Abstract Although group communication is vitally important

More information

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Appears as Technical Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science, June 1999 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Miguel Castro and Barbara Liskov

More information

1 Achieving IND-CPA security

1 Achieving IND-CPA security ISA 562: Information Security, Theory and Practice Lecture 2 1 Achieving IND-CPA security 1.1 Pseudorandom numbers, and stateful encryption As we saw last time, the OTP is perfectly secure, but it forces

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics System Models Nicola Dragoni Embedded Systems Engineering DTU Informatics 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models Architectural vs Fundamental Models Systems that are intended

More information

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Philip A. Bernstein Microsoft Research Redmond, WA, USA phil.bernstein@microsoft.com Sudipto Das Microsoft Research

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000 Brewer s CAP Theorem Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000 Written by Table of Contents Introduction... 2 The CAP-Theorem...

More information

Time in Distributed Systems

Time in Distributed Systems Time Slides are a variant of slides of a set by Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13- 239227-5 Time in Distributed

More information

TIME AND SYNCHRONIZATION. I. Physical Clock Synchronization: Motivation and Challenges

TIME AND SYNCHRONIZATION. I. Physical Clock Synchronization: Motivation and Challenges TIME AND SYNCHRONIZATION In previous lectures, we discussed some important concepts and goals of distributed. One important concept is remote procedure calls, where we saw how failures creep up into semantics

More information

Data Replication CS 188 Distributed Systems February 3, 2015

Data Replication CS 188 Distributed Systems February 3, 2015 Data Replication CS 188 Distributed Systems February 3, 2015 Page 1 Some Other Possibilities What if the machines sharing files are portable and not always connected? What if the machines communicate across

More information

CSE 5306 Distributed Systems. Synchronization

CSE 5306 Distributed Systems. Synchronization CSE 5306 Distributed Systems Synchronization 1 Synchronization An important issue in distributed system is how processes cooperate and synchronize with one another Cooperation is partially supported by

More information

Fault tolerant TTCAN networks

Fault tolerant TTCAN networks Fault tolerant TTCAN networks B. MŸller, T. FŸhrer, F. Hartwich, R. Hugel, H. Weiler, Robert Bosch GmbH TTCAN is a time triggered layer using the CAN protocol to communicate in a time triggered fashion.

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Synchronization Jia Rao http://ranger.uta.edu/~jrao/ 1 Synchronization An important issue in distributed system is how process cooperate and synchronize with one another Cooperation

More information

A proof-producing CSP solver: A proof supplement

A proof-producing CSP solver: A proof supplement A proof-producing CSP solver: A proof supplement Report IE/IS-2010-02 Michael Veksler Ofer Strichman mveksler@tx.technion.ac.il ofers@ie.technion.ac.il Technion Institute of Technology April 12, 2010 Abstract

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

Chapter 6 Synchronization (1)

Chapter 6 Synchronization (1) DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 6 Synchronization (1) With material from Ken Birman Tanenbaum & Van Steen, Distributed Systems:

More information

Propositional Logic. Part I

Propositional Logic. Part I Part I Propositional Logic 1 Classical Logic and the Material Conditional 1.1 Introduction 1.1.1 The first purpose of this chapter is to review classical propositional logic, including semantic tableaux.

More information

Cantor s Diagonal Argument for Different Levels of Infinity

Cantor s Diagonal Argument for Different Levels of Infinity JANUARY 2015 1 Cantor s Diagonal Argument for Different Levels of Infinity Michael J. Neely University of Southern California http://www-bcf.usc.edu/ mjneely Abstract These notes develop the classic Cantor

More information

14.1 Encoding for different models of computation

14.1 Encoding for different models of computation Lecture 14 Decidable languages In the previous lecture we discussed some examples of encoding schemes, through which various objects can be represented by strings over a given alphabet. We will begin this

More information

6.852: Distributed Algorithms Fall, Class 12

6.852: Distributed Algorithms Fall, Class 12 6.852: Distributed Algorithms Fall, 2009 Class 12 Today s plan Weak logical time and vector timestamps Consistent global snapshots and stable property detection. Applications: Distributed termination.

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

GOSSIP ARCHITECTURE. Gary Berg css434

GOSSIP ARCHITECTURE. Gary Berg css434 GOSSIP ARCHITECTURE Gary Berg css434 WE WILL SEE Architecture overview Consistency models How it works Availability and Recovery Performance and Scalability PRELIMINARIES Why replication? Fault tolerance

More information

Lecture 10: Clocks and Time

Lecture 10: Clocks and Time 06-06798 Distributed Systems Lecture 10: Clocks and Time Distributed Systems 1 Time service Overview requirements and problems sources of time Clock synchronisation algorithms clock skew & drift Cristian

More information

Consistency and Replication 1/65

Consistency and Replication 1/65 Consistency and Replication 1/65 Replicas and Consistency??? Tatiana Maslany in the show Orphan Black: The story of a group of clones that discover each other and the secret organization Dyad, which was

More information

Lecture Notes on Contracts

Lecture Notes on Contracts Lecture Notes on Contracts 15-122: Principles of Imperative Computation Frank Pfenning Lecture 2 August 30, 2012 1 Introduction For an overview the course goals and the mechanics and schedule of the course,

More information

TIME ATTRIBUTION 11/4/2018. George Porter Nov 6 and 8, 2018

TIME ATTRIBUTION 11/4/2018. George Porter Nov 6 and 8, 2018 TIME George Porter Nov 6 and 8, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides incorporate

More information

Introduction to Sets and Logic (MATH 1190)

Introduction to Sets and Logic (MATH 1190) Introduction to Sets and Logic () Instructor: Email: shenlili@yorku.ca Department of Mathematics and Statistics York University Dec 4, 2014 Outline 1 2 3 4 Definition A relation R from a set A to a set

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Topology and Topological Spaces

Topology and Topological Spaces Topology and Topological Spaces Mathematical spaces such as vector spaces, normed vector spaces (Banach spaces), and metric spaces are generalizations of ideas that are familiar in R or in R n. For example,

More information

Binary Relations McGraw-Hill Education

Binary Relations McGraw-Hill Education Binary Relations A binary relation R from a set A to a set B is a subset of A X B Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B. We can also represent

More information

Consistency and Replication 1/62

Consistency and Replication 1/62 Consistency and Replication 1/62 Replicas and Consistency??? Tatiana Maslany in the show Orphan Black: The story of a group of clones that discover each other and the secret organization Dyad, which was

More information

arxiv: v2 [cs.dc] 26 Dec 2016

arxiv: v2 [cs.dc] 26 Dec 2016 Timestamps for Partial Replication Zhuolun Xiang 1 and Nitin H. Vaidya 2 1 Computer Science Department 2 Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign arxiv:1611.04022v2

More information

CMSC 714 Lecture 14 Lamport Clocks and Eraser

CMSC 714 Lecture 14 Lamport Clocks and Eraser Notes CMSC 714 Lecture 14 Lamport Clocks and Eraser Midterm exam on April 16 sample exam questions posted Research project questions? Alan Sussman (with thanks to Chris Ackermann) 2 Lamport Clocks Distributed

More information

Abstract. A graph G is perfect if for every induced subgraph H of G, the chromatic number of H is equal to the size of the largest clique of H.

Abstract. A graph G is perfect if for every induced subgraph H of G, the chromatic number of H is equal to the size of the largest clique of H. Abstract We discuss a class of graphs called perfect graphs. After defining them and getting intuition with a few simple examples (and one less simple example), we present a proof of the Weak Perfect Graph

More information

Synchronization. Clock Synchronization

Synchronization. Clock Synchronization Synchronization Clock Synchronization Logical clocks Global state Election algorithms Mutual exclusion Distributed transactions 1 Clock Synchronization Time is counted based on tick Time judged by query

More information

Interactions A link message

Interactions A link message Interactions An interaction is a behavior that is composed of a set of messages exchanged among a set of objects within a context to accomplish a purpose. A message specifies the communication between

More information

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today) Last Class: Naming Naming Distributed naming DNS LDAP Lecture 12, page 1 Today: Classical Problems in Distributed Systems Time ordering and clock synchronization (today) Next few classes: Leader election

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Fall 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

1 Overview, Models of Computation, Brent s Theorem

1 Overview, Models of Computation, Brent s Theorem CME 323: Distributed Algorithms and Optimization, Spring 2017 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 1, 4/3/2017. Scribed by Andreas Santucci. 1 Overview,

More information

1 Variations of the Traveling Salesman Problem

1 Variations of the Traveling Salesman Problem Stanford University CS26: Optimization Handout 3 Luca Trevisan January, 20 Lecture 3 In which we prove the equivalence of three versions of the Traveling Salesman Problem, we provide a 2-approximate algorithm,

More information

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast Coordination 2 Today l Group communication l Basic, reliable and l ordered multicast How can processes agree on an action or a value? Modes of communication Unicast 1ç è 1 Point to point Anycast 1è

More information

Chapter 3. Set Theory. 3.1 What is a Set?

Chapter 3. Set Theory. 3.1 What is a Set? Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any

More information

Sets 1. The things in a set are called the elements of it. If x is an element of the set S, we say

Sets 1. The things in a set are called the elements of it. If x is an element of the set S, we say Sets 1 Where does mathematics start? What are the ideas which come first, in a logical sense, and form the foundation for everything else? Can we get a very small number of basic ideas? Can we reduce it

More information

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen Distributed Systems Lec 9: Distributed File Systems NFS, AFS Slide acks: Dave Andersen (http://www.cs.cmu.edu/~dga/15-440/f10/lectures/08-distfs1.pdf) 1 VFS and FUSE Primer Some have asked for some background

More information

Greedy Algorithms. Previous Examples: Huffman coding, Minimum Spanning Tree Algorithms

Greedy Algorithms. Previous Examples: Huffman coding, Minimum Spanning Tree Algorithms Greedy Algorithms A greedy algorithm is one where you take the step that seems the best at the time while executing the algorithm. Previous Examples: Huffman coding, Minimum Spanning Tree Algorithms Coin

More information

Lecture 12: Time Distributed Systems

Lecture 12: Time Distributed Systems Lecture 12: Time Distributed Systems Behzad Bordbar School of Computer Science, University of Birmingham, UK Lecture 12 1 Overview Time service requirements and problems sources of time Clock synchronisation

More information

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein 10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety Copyright 2012 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4.

More information

Leslie Lamport: The Specification Language TLA +

Leslie Lamport: The Specification Language TLA + Leslie Lamport: The Specification Language TLA + This is an addendum to a chapter by Stephan Merz in the book Logics of Specification Languages by Dines Bjørner and Martin C. Henson (Springer, 2008). It

More information

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline 10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Copyright 2003 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4. Other Approaches

More information

Time Synchronization and Logical Clocks

Time Synchronization and Logical Clocks Time Synchronization and Logical Clocks CS 240: Computing Systems and Concurrency Lecture 5 Mootaz Elnozahy Today 1. The need for time synchronization 2. Wall clock time synchronization 3. Logical Time

More information

DOWNLOAD PDF INSIDE RELATIONAL DATABASES

DOWNLOAD PDF INSIDE RELATIONAL DATABASES Chapter 1 : Inside Microsoft's Cosmos DB ZDNet Inside Relational Databases is an excellent introduction to the topic and a very good resource. I read the book cover to cover and found the authors' insights

More information

To prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions):

To prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions): CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 7 This lecture returns to the topic of propositional logic. Whereas in Lecture 1 we studied this topic as a way of understanding proper reasoning

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

FOUR EDGE-INDEPENDENT SPANNING TREES 1

FOUR EDGE-INDEPENDENT SPANNING TREES 1 FOUR EDGE-INDEPENDENT SPANNING TREES 1 Alexander Hoyer and Robin Thomas School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332-0160, USA ABSTRACT We prove an ear-decomposition theorem

More information

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212. Revision 2 Othon Michail Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

More information

Lecture 22 Tuesday, April 10

Lecture 22 Tuesday, April 10 CIS 160 - Spring 2018 (instructor Val Tannen) Lecture 22 Tuesday, April 10 GRAPH THEORY Directed Graphs Directed graphs (a.k.a. digraphs) are an important mathematical modeling tool in Computer Science,

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 3 Relational Model Hello everyone, we have been looking into

More information

Utility Maximization

Utility Maximization Utility Maximization Mark Dean Lecture Notes for Spring 2015 Behavioral Economics - Brown University 1 Lecture 1 1.1 Introduction The first topic we are going to cover in the course isn t going to seem

More information

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Page 1 Example Replicated File Systems NFS Coda Ficus Page 2 NFS Originally NFS did not have any replication capability

More information

Chapter 2: Number Systems

Chapter 2: Number Systems Chapter 2: Number Systems Logic circuits are used to generate and transmit 1s and 0s to compute and convey information. This two-valued number system is called binary. As presented earlier, there are many

More information

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

INCREMENTAL SOFTWARE CONSTRUCTION WITH REFINEMENT DIAGRAMS

INCREMENTAL SOFTWARE CONSTRUCTION WITH REFINEMENT DIAGRAMS INCREMENTAL SOFTWARE CONSTRUCTION WITH REFINEMENT DIAGRAMS Ralph-Johan Back Abo Akademi University July 6, 2006 Home page: www.abo.fi/~backrj Research / Current research / Incremental Software Construction

More information

Synchronisation in. Distributed Systems. Co-operation and Co-ordination in. Distributed Systems. Kinds of Synchronisation. Clock Synchronization

Synchronisation in. Distributed Systems. Co-operation and Co-ordination in. Distributed Systems. Kinds of Synchronisation. Clock Synchronization Co-operation and Co-ordination in Distributed ystems Naming for searching communication partners Communication Mechanisms for the communication process ynchronisation in Distributed ystems But... Not enough

More information

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Holger Karl Computer Networks Group Universität Paderborn Goal of this chapter Apart from issues in distributed time and resulting

More information

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Transaction Management A.R. Hurson Computer Science Missouri Science & Technology 1 Distributed Database System Note, this unit will be covered

More information

Notes on Bloom filters

Notes on Bloom filters Computer Science B63 Winter 2017 Scarborough Campus University of Toronto Notes on Bloom filters Vassos Hadzilacos A Bloom filter is an approximate or probabilistic dictionary. Let S be a dynamic set of

More information

Synchronization. Chapter 5

Synchronization. Chapter 5 Synchronization Chapter 5 Clock Synchronization In a centralized system time is unambiguous. (each computer has its own clock) In a distributed system achieving agreement on time is not trivial. (it is

More information

OCL Support in MOF Repositories

OCL Support in MOF Repositories OCL Support in MOF Repositories Joachim Hoessler, Michael Soden Department of Computer Science Technical University Berlin hoessler@cs.tu-berlin.de, soden@cs.tu-berlin.de Abstract From metamodels that

More information

Consul: A Communication Substrate for Fault-Tolerant Distributed Programs

Consul: A Communication Substrate for Fault-Tolerant Distributed Programs Consul: A Communication Substrate for Fault-Tolerant Distributed Programs Shivakant Mishra, Larry L. Peterson, and Richard D. Schlichting Department of Computer Science The University of Arizona Tucson,

More information

Advanced Databases Lecture 17- Distributed Databases (continued)

Advanced Databases Lecture 17- Distributed Databases (continued) Advanced Databases Lecture 17- Distributed Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Alternative Models of Transaction Processing Notion of a single

More information

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System Eventual Consistency: Bayou Availability versus consistency Totally-Ordered Multicast kept replicas consistent but had single points of failure Not available under failures COS 418: Distributed Systems

More information

Spanning Trees and IEEE 802.3ah EPONs

Spanning Trees and IEEE 802.3ah EPONs Rev. 1 Norman Finn, Cisco Systems 1.0 Introduction The purpose of this document is to explain the issues that arise when IEEE 802.1 bridges, running the Spanning Tree Protocol, are connected to an IEEE

More information

NOTES ON OBJECT-ORIENTED MODELING AND DESIGN

NOTES ON OBJECT-ORIENTED MODELING AND DESIGN NOTES ON OBJECT-ORIENTED MODELING AND DESIGN Stephen W. Clyde Brigham Young University Provo, UT 86402 Abstract: A review of the Object Modeling Technique (OMT) is presented. OMT is an object-oriented

More information

Reliable Distributed System Approaches

Reliable Distributed System Approaches Reliable Distributed System Approaches Manuel Graber Seminar of Distributed Computing WS 03/04 The Papers The Process Group Approach to Reliable Distributed Computing K. Birman; Communications of the ACM,

More information

CS Amazon Dynamo

CS Amazon Dynamo CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed

More information

Symmetric Product Graphs

Symmetric Product Graphs Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 5-20-2015 Symmetric Product Graphs Evan Witz Follow this and additional works at: http://scholarworks.rit.edu/theses

More information

6.852: Distributed Algorithms Fall, Class 21

6.852: Distributed Algorithms Fall, Class 21 6.852: Distributed Algorithms Fall, 2009 Class 21 Today s plan Wait-free synchronization. The wait-free consensus hierarchy Universality of consensus Reading: [Herlihy, Wait-free synchronization] (Another

More information

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer - proposes a formal definition for the timed asynchronous distributed system model - presents measurements of process

More information

Replication and Consistency

Replication and Consistency Replication and Consistency Today l Replication l Consistency models l Consistency protocols The value of replication For reliability and availability Avoid problems with disconnection, data corruption,

More information

15 212: Principles of Programming. Some Notes on Induction

15 212: Principles of Programming. Some Notes on Induction 5 22: Principles of Programming Some Notes on Induction Michael Erdmann Spring 20 These notes provide a brief introduction to induction for proving properties of ML programs. We assume that the reader

More information

Technische Universität München Zentrum Mathematik

Technische Universität München Zentrum Mathematik Technische Universität München Zentrum Mathematik Prof. Dr. Dr. Jürgen Richter-Gebert, Bernhard Werner Projective Geometry SS 208 https://www-m0.ma.tum.de/bin/view/lehre/ss8/pgss8/webhome Solutions for

More information

Behavioural Equivalences and Abstraction Techniques. Natalia Sidorova

Behavioural Equivalences and Abstraction Techniques. Natalia Sidorova Behavioural Equivalences and Abstraction Techniques Natalia Sidorova Part 1: Behavioural Equivalences p. p. The elevator example once more How to compare this elevator model with some other? The cabin

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley Presenter: Yifei Teng Part of slides are cited from Nomchin Banga Road Map Eventual Consistency:

More information

Distributed Algorithms 6.046J, Spring, Nancy Lynch

Distributed Algorithms 6.046J, Spring, Nancy Lynch Distributed Algorithms 6.046J, Spring, 205 Nancy Lynch What are Distributed Algorithms? Algorithms that run on networked processors, or on multiprocessors that share memory. They solve many kinds of problems:

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information