Midterm Examination ECE 419S 2015: Distributed Systems Date: March 13th, 2015, 6-8 p.m.

Size: px
Start display at page:

Download "Midterm Examination ECE 419S 2015: Distributed Systems Date: March 13th, 2015, 6-8 p.m."

Transcription

1 Midterm Examination ECE 419S 2015: Distributed Systems Date: March 13th, 2015, 6-8 p.m. Instructor: Cristiana Amza Department of Electrical and Computer Engineering University of Toronto Problem number Maximum Score Your Score bonus bonus total bonus points This exam is closed textbook and closed lecture notes. You have two hours to complete the exam. Use of computing and/or communicating devices is NOT permitted. You do not need to obtain more than 100 points for this exam. 100 points will give you the full midterm exam credit. However, additional points are provided, which may help if you run out of time. Moreover, for problems with a higher degree of difficulty some bonus points are provided as guidance for you on how to budget your time. Write your name and student number in the space below. Do the same on the top of each sheet of this exam book. Your Student Number Your First Name Your Last Name 1

2 Problem 1. Basic Distributed System Concepts, Architectures and Algorithms (12 points) (a) (5 points) Name three differences between a multiprocessor and a distributed system (DS) that cause problems for the DS, and at least two concrete problems that these differences create for distributed system algorithms and their implementation. a.1 Three differences: Lack of a unique physical clock Network latency Lack of a physically shared state in DS a.2 Two problems for DS algorithms that the (above) differences cause: These make agreement impossible, algorithms for fault tolerance, synchronization and data consistency difficult. Hard to distinguish between process/network failure versus slow processor or slow network. b) (3 points) Lamport describes an algorithm to logically order events in a distributed system. In this algorithm, events a and b, in processes i and j, have logical time stamps C i (a) and C j (b). If C i (a) < C j (b), we do not know if a b or if a and b are un-ordered. (Note that is the right-arrow Lamport happened-before operator.) How would you extend the logical time information sent with each message to include enough information that would enable you to decide for sure which of the two cases described above applies (ordered by happened-before or unordered)? The correct answer is a description of the vector timestamp algorithm, as shown in p. 447, CDK and in the lecture notes. c) (4 points) Mention 5 uses of Replication by briefly naming the scenarios it is used in for each. Scaling through load balancing, low latency local replicas for use in the Wide Area, mobility, data availability, and fault tolerance. Grading Scheme: 5/4 if the student gave 5 distinct uses, 4/4 for 4 distinct uses. 2

3 Problem 2. Physical Clocks (14 Points) In the lectures, we sketched the implementation of at-most-once message delivery semantics (e.g., in RPC) using physically synchronized clocks. The goal of the algorithm is that at-most-once semantics should always be guaranteed even if some new messages may be incorrectly rejected as duplicates. The algorithm is partially described below. Every client RPC request message carries a client or connection identifier and a physical clock timestamp. For each client connection, the server records in a table the most recent timestamp it has seen. The client always timestamps an RPC message retransmission with the same timestamp as the original message. If an incoming message for a connection is lower or equal than the timestamp stored for that connection, then the server rejects the message as a duplicate. To protect against crashes, when the above table would be lost, periodically the server writes its current time to disk. Let p be the period between successive disk writes. When the server crashes and then reboots, it reloads the latest stored time value from disk,tlatest. The idea for guaranteeing at most once semantics is to reject all messages that might have been accepted before the crash (and to accept only new messages that could not have been accepted before the server crash). Some new messages may be incorrectly rejected, but at-most-once semantics should always be guaranteed. (a) (5 points) Modify the receiver algorithm/implementation in order to minimize the probability that valid messages are rejected due to potential message reordering in the network in the case above (a message may seem old and get discarded just because a later message from the same client arrived first at the server). You can assume that there are no node failures for this part without losing points. The receiver maintains a sliding window of 50 ms for each sender. It buffers all messages with a timestamp up to 50 ms less than the message with the largest timestamp received from the same sender. It then rejects duplicates by precise match with the message timestamps stored within this window and it rejects older messages that fall outside of the window. When the largest timestamp is updated, the receiver moves the window according to the new timestamp, and it delivers the older messages it still stores for that sender to the application. (b) (5 points) Assume that we are the sole engineers and in charge of all implementation aspects of the endto-end at-most-once delivery algorithm from scratch with no use of libraries or other packages besides TCP/IP (reliable, in-order network, so the previous problem in part a) cannot occur), at all levels of our DS. So, we assume reliable network communication, but, other than that, we can never blame it/anything on anybody else. The algorithm as stated makes it sound like, with the right choice of parameter p, most messages will be delivered to the application only once (or at least we are led to believe that we can control how many messages do not get delivered by a good choice of p). Also assuming that, when nodes fail they come back up almost immediately, describe other characteristics where the fraction of messages that we cannot deliver may end up to be high, in practice, regardless of any parameter settings and our best efforts. You should provide messaging patterns, and bottlenecks or limitations of end points and the network, which, in the context of incompletely specified assumptions or specifications in our algorithm as stated, lead to a large fraction of messages getting dropped. 3

4 I expected that you would find a stress test that brings this system to its knees, or makes it impractical - the more specific to this algorithm and the less contrived the conditions the better. The at-most-once-message-delivery paper from MIT was published in Since then the Internet scale has increased significantly, and the number of clients that any kind of server on the Internet is expected to service has increased from hundreds of thousands to millions or more. The only idea worth anything in this algorithm is for the case of server crashes - minimizing the amount of information we keep, by writing to disk only the server timestamp. So, only for covering the case of crashes is the algorithm worth anything, and, if we show that the algorithm is impractical in this case, we don t need to search further. Hence, the vulnerabilities of this algorithm are related to crashes, the way it rejects messages upon crashes, and its only building block used - synchronized physical clocks. Specifically, if the server crashes, upon recovery, it will need to clock sync with ALL its clients as soon as possible; otherwise the algorithm is not applicable. This may take a while if there are millions of clients which we cannot assume organize themselves into an NTP hierarchy just to accommodate the server. So, the worst pattern is periodic server crashes with the need to resync the server s clock with millions of client clocks with different drifts, and the worst client pattern related to this is 1. bursts of messages while the server has not resync-ed, which will need to be buffered by the server, possibly exceeding its capacity and 2. periodic bursts of messages within p, before the server gets to write its timestamp to disk, messages which will later have to be rejected, when the server recovers, because they are within the uncertainty window, but which will be retransmitted by the client(s), thus increasing the server s processing load during the resync period after the crash. (c) (4 points) If the bound on the clock skew bound is large enough, briefly describe a scenario where a message can be accepted twice violating the at-most-once delivery guarantee (in error). The scenario may happen under the following assumptions: i) the clock skew, epsilon, between the server and client clocks is comparable to p (or to make it very easy to understand greater than p) and ii) the server accepts client messages which have client timestamps greater than its own clock at the time of the receipt of the message. In this case, say epsilon = p + 10 ms. At the time T at the server (read from the server s clock), and before the server crash, a message comes from the client with client timestamp T + epsilon. The message timestamp is greater than the previous client timestamp stored in the table for this client, so the server accepts it. Let s say that the server writes its time (T) on disk right then (tlastest = T) and afterwards the server crashes and comes back up immediately at time T + 1 ms. Subsequently, it will also accept the duplicate (retransmission) of the previous client message with timestamp T + epsilon, because the rule is that it will accept all messages after tlatest + p (and we assumed that tlatest = T and epsilon > p). 4

5 Problem 3. Logical Clocks, Causal and Total Order Multicast (39 points + 4 bonus points) In a bulletin board application, each post is multicasted to all members of a chat room. We would like to avoid anomalies, such as the one in the figure below, where a chat room participant observes a reply to any previous post which the reply logically depends on before the original post. Assume that we don t know the type of application messages (post or reply) in the multicast messaging layer that does the ordering. Furthermore, for the purposes of defining causality in terms of happens-before on our bulletin board application, we define that post 1 happens-before post 2 iff post 1 is delivered to the display on the node issuing post 2 before post 2 is sent by mcast. The relation is transitive i.e., a post can be based on seeing the full history of all the related messages, ordered by happens-before posted by other nodes, not just the immediately preceding post seen. For this whole problem, we will assume that there are no failures of nodes, and that network channels are reliable and FIFO, and that network parallelism exists, but maybe not full network parallelism. a) (12 points)for the types of logical timestamp used below, please provide a brief general description of how the overall solution for avoiding the temporal anomaly works for N nodes (not only 3 nodes or the special case shown). For all parts below, please make sure to think about and to describe how a participating node decides when to deliver/display a post to the application running on that node in each algorithm and also specify the total number of messages needed in the algorithm on behalf of each BB message post. You need to account for all messages in the system as a whole, from the initial multicast of a post until delivery for display of that post on all nodes, in Big Oh notation, as a function of a generic N (not 3 as in the Figure). Assume that a multicast generates N separate messages. a.1) (4 points) We use the same rules of computing Lamport clocks as in the lectures and then use these standard Lamport clocks to timestamp each message mcasted - what is the rule for delivering a message to display on each node? How does it work (briefly) and what is the total number of messages system-wide on behalf of each post? 5

6 If using Lamport clocks to timestamp each message mcasted, keep all messages received in a queue on each node, mcast an ACK for each message received, and deliver when the message is at head of queue and all N 2 ACK s for it have been received. a.2) (4 points) We use VTS-1 Vector timestamps with VTS counting/incrementing the local processor s position at both send message events, and receive message events, for any message. We do not increment on display events. We defer updating non-local positions in the local VTS to the time of processing post display events (with the same rules for the update of these other positions in the VTS as in the lecture notes and textbook). What is the rule for delivering a message to display on each node? How does it work (briefly) and what is the total number of messages system-wide on behalf of each post? If using standard Vector clocks to timestamp each message mcasted, keep all messages received in a queue on each node, mcast an ACK for each message received, and deliver when the message is at head of queue and all N 2 ACK s for it have been received. a.3) (4 points) If using VTS-2 Vector timestamps with VTS counting/incrementing of the local processor s position only on send events - when do we deliver a message to display on each node? How does it work (briefly) and what is the total number of messages system-wide on behalf of each post? When there is no gap between thev j [j] of this message and thev i [j] of this node, andv j [k] V i [k] for (k j), the node can deliver the message to the application/display i.e., for the human to see. It means all previous events of j have been delivered. Total messages on behalf of each post: N (no ACK s necessary). b) (5 points) For the purposes of this part, assume a correctly working solution for avoiding the anomaly based on your favourite one out of the three timestamp choices above (pick one of Lamport, VTS-1 or VTS-2). For the purposes of this part b, and for part c below, we assume that the Figure is correct in terms of the timing of all message sends and receives (this remains as shown on the Figure), but that the Figure does not specify when messages are displayed on each participant s screen (for any post and for any participant). Also assume that we are down in the machine, i.e., that we can t understand and/or that we completely ignore message content (that is for humans, not for us). Alternatively, if we still want to maintain human status, assume we can replace the content of any message to be posted on this BB by anyone in this chat room with our content of interest today, such as, The TA s are now on strike, or I hope the TA strike will be over soon. With the above assumptions, let s reformulate the problem to answer the following question: With the timestamping solution of your choice (pick one and fill it in below), and the assumptions above, what is the 6

7 earliest possible time when message m of Student 2 could have been delivered on the Prof s screen? This earliest time needs to be given as a logical clock on the local (Prof s) node. Please state the sequence of operations relevant to the local clock on the Prof s node up until the time of display, including the local logical clock just before the display of m on the Prof s node, and the local clock just after. Be as specific and precise as possible. Easiest to compute for VTS-2: The time just before and after display for message m on the Professor s node is [1,1,0]/[1,1,1]. Grading Scheme: For the schemes that required ACK s missing to count the ACK s in the clock was not penalized as long as the student explicitly specified that it should be included in the counts. c) (6 points) Given the reformulation of the problem statement in part b) above, and assuming that the ordering solutions above are working correctly, describe the order among the three post events shown in the Figure (Sm - the send of m by the Prof, Sm - the send of m by Student 1, and Sm - the send of m by Student 2). Which of these events are causally ordered by our definition of happens before, and which of them are concurrent with each other? All answers are required for each of the three possible solutions. This may be the hardest question on the exam. If we consider the definition of happens-before according to Lamport, and that Sm represents the send of the multicast of message m as a single event, then Sm > Sm > Sm no matter what the timestamp ordering scheme we use. This is because, for the classic definition of happens-before given by Lamport, the exchange of a message is sufficient to determine causality. It is plain from the figure that one message is sent by P1 and received by P2, ordering Sm > Sm. Similarly another message is sent by P2 and received by P3, ordering Sm > Sm. However, it is also plain to see that, if we consider the implementation of TO-Mcast based on Lamport clocks, also given by Lamport, the receipt of message m at P3, for example, does not imply that m is displayed at P3 at the time of receipt. Therefore, if we modify the happens before relationship to include message display, instead of mere message receipt, then, it is clear that the Send of m and the Send of m are concurrent events. Based on this revised definition of happens-before, we have. c1.) Lamport clocks (2 points). Sm, Sm and Sm are concurrent events. c2.) VTS-1 (2 points). With the solution given in a2), Sm, Sm and Sm are concurrent events. c3.) VTS-2 (2 points). With the solution given in a3), and the new definition of happens-before, Sm > Sm, but, Sm and Sm 7

8 are concurrent events and Sm and Sm are concurrent events. d.) (6 points). Totally ordered BB with Lamport clocks improved with RTT additional assumption. Assume we want to implement a totally ordered BB using Lamport clocks. In totally ordered multicast, all messages need to be ordered in the same order on all nodes. If we add the assumption that the network roundtrip latency is bounded by an RTT which we know up-front, design a solution for the totally ordered BB based on Lamport clocks which reduces the total number of messages and the total bandwidth consumption as much as possible. Specify for which kind of message patterns in the BB application and for which network characteristics the intended optimizations will matter the most? Assume that RTT is the worst case round trip delay for any two end-points in the network, if I assign a time-out of 2*RTT to the message at the head of the queue, and I wait for 2*RTT before delivering it, then I must have gotten all messages that were sent to me before that same message got to them, hence I am not missing anything, and the message at the head of my queue must have been received everywhere. So, I do not need any ACKs. At any node, I form the queue sorted by Lamport clocks, and then deliver the head of the queue if I received nothing with lower Lamport clock, from anyone, for 2*RTT duration, instead of waiting for ACKs. There is no need for ACKs, If we have low bandwidth, and a contended chat room, saving lots of ACKs pays off. e) (4 points) Mention the worst kind of disadvantage that your specialized RTT-based solution for the totally ordered BB in part d) can have compared to your best performing causally ordered BB solution so far. Describe the specific message patterns of the BB application and relevant network characteristics in terms of latency, bandwith, parallelism, etc, under which the disadvantage is expected to be the worst possible. Be as explicit and complete as possible in your answer. We will introduce a lot of unnecessary latency if high latency variability with some messages getting long delays, hence if the RTT representing the worst case is much larger than the average latency. The scheme is also at a disadvantage under any conditions that favor delivery of many ACKs in parallel, fast, such as high network parallelism, and high network bandwidth. 8

9 f.) (6 points + 4 bonus points) VTS-1 improvement without any additional assumption. Design an improved version of the VTS-1 solution from part (a2) which significantly reduces the total number of messages that need to be sent for a reliable in-order network, without using any additional assumptions about bounded network delay or RTT. No change to the VTS-1 s is allowed, so you will still increment the local position in the timestamp on both sends and receives. Your solution needs to remain decentralized. There are no other restrictions. A reduction of the total number of messages by (at least) a factor of 2 is acceptable for full points. A maximum of 4 bonus points is allocated for solutions that achieve big Oh reductions in the number of messages. Answers here can vary. Acceptable reductions are: 1. batching i.e., waiting to send ACKs until we can multicast two or more ACKs in the same multicast message, resulting in the reduction of overall number of messages and 2. using a different topology for connecting the nodes in order to send the multicasts e.g., token ring with piggy-backing and forwarding of multicast messages, or tree. Both of these are focused on how to group existing messages into fewer message batches. The best answer is a justification of the fact that, for this particular problem, with the assumptions we make here, there is actually no need for ACKs even for VTS-1. But, we need to change the way we process messages for delivery, and we need to understand why, in this case, even with gaps in the VTS, we are fine. 9

10 Problem 4. Replication, Performance, Fault Tolerance and Availability (8 points + 2 bonus points) Describe a solution to avoid the bulletin board anomaly from the previous question that provides the best of both worlds : the solution needs to combine performance in terms of competitive latency and scaling with the best algorithms designed for the BB so far, on one hand, and robustness on the other hand. Robustness is defined by high availability of data and service and fault tolerance to single node failures. The network is still assumed to be reliable and in order, but without an RTT bound. Your solution needs to be (fully) decentralized. Your own adaptation of Quorum Consensus (QC) to this problem is recommended with earning 2 bonus points for the correct and appropriate use of QC. However, you can use an algorithm of your own choice instead of QC if you can meet the above criteria. Hint: Quorum consensus has become widely used because it is good at maintaining small amounts of state in a replicated, consistent and fault tolerant manner. Think about maintaining small amounts of replicated state with QC. Then think about how to incorporate the replicated state in your BB solution, in order to maintain the replicated BB in a consistent, fault tolerant manner as well. Argue that, under reasonably common BB application patterns and network characteristics, with a nonnegligible probability of single node failures, your solution provides the best of both worlds. As a stresstest for your solution, please explain what happens if a post was communicated to some of the nodes but not to others e.g., due to a node failure in the middle of a multicast. We use Quorum Consensus in order to implement a bulletin board with totally ordered posting of all messages on all nodes and also with fault tolerance in the case of participant crashes. For this, the general idea is that the whole BB is an object we maintain using QC, which has a certain version number based on how many messages have been included (not necessarily displayed) in the BB. Whenever a node wants to post a new message, it multicasts its request and forms a Read Quorum. From the Read quorum, it extracts the most up to date (highest) version number that the BB has at that moment. The node then writes a new version of the BB with incremented version number and its new message included and waits for acknowledgments from a Write Quorum. The performance penalty is incurred on read operations. A read operation on the BB e.g., in order to figure out what messages we can send to display now needs to involve a Read Quorum. If the version number returned by the Read Quorum is the same as the one the local node has, then the read operation is complete. Otherwise, the local node selects the highest version number and the associated BB state, and performs a write (involving waiting responses from a Write Quorum) to all other replicas. The last step above is necessary if we want to be serious about fault tolerance of the BB messages. It is possible that a node fails in the middle of a write - and, in this case, a subsequent read by any node effectively finishes what that node started. Grading scheme: Lenient, based on the student s understanding of what Quorum Consensus is, or, in general, based on the student s understanding of how to extend any previous scheme with Fault Tolerance features. 10

11 Problem 5. Mutual Exclusion and Lamport clocks (26 points) For all parts of this question, assume reliable delivery, no failures of nodes, or links, message latency is limitted, but we may not know the bound unless otherwise specified, and a node only needs the resource for a limited amount of time; the critical section is of limited size and the number of lock reacquires in a loop is limitted on each node. Part A. Mutual Exclusion Algorithms with Elements of Centralization (14 points). Assume that, in a mutual exclusion algorithm of our own design, a centralized node, called manager (one manager per lock), maintains the location of the current lock owner. Also assume that the lock does not change its owner, and remains in the same location if it is released by its owner but not requested by another node. Also assume that each lock is initially placed at the manager s location (but not held). The algorithm is as follows: For the lock acquire: i) a node which wants the lock sends an acquire message to the manager. ii) the manager returns the location of the lock, i.e., its current owner id. iii) the requesting node contacts the owner directly. iv) the owner creates a local queue of process id s with acquire requests it has received. The queue is maintained in FIFO order. Upon releasing a lock, the lock owner i) reads the process id at the head of the queue who is next in line to become owner, ii) sends that process id of the new owner to the manager, iii) dequeues the head of the queue, iv) passes the lock and the remaining queue to the process which is next in line to become owner. a) (4 points) Is this algorithm correct? Argue that the algorithm is correct or give a counter-example. The algorithm suffers from the following race condition: A node gets the current owner from the manager, but, its request to the current owner arrives after the lock and the queue has already been passed to a new owner. The algorithm does not specify what to do in this case, so, the assumption is that the late requesters will hang waiting the lock forever. Note that the old owner cannot simply forward such delayed requests, because the lock may have changed owners yet again afterwards. b) (4 points) Now assume that we change the algorithm in part a) to add tagging each acquire message with the Lamport clock at the time of the send on the sending node (as usual when using Lamport clock timestamps). If the owner maintains its queue in order of Lamport clocks, instead of FIFO, is the algorithm correct? Argue that the algorithm is correct or give a counter-example. The algorithm suffers from the same race condition; ordering the queue differently did not resolve this. c) (6 points) Describe an improvement to either one of the previous two algorithms (withot Lamport clocks as in part a, or with Lamport clocks as in part b) if we add the assumption of a known RTT (round trip) bound in the system. You are not allowed to maintain the state maintained by the manager - it will still maintain the current lock owner. The improvement can be either in terms of correctness or performance of 11

12 the original algorithm; however, if an algorithm is incorrect, it does not matter what its performance is for the purposes of grading. Assume that, the current lock ownerowaits for 2*RTT just before passing the queue and the lock. Anyone who requested the lock owner identity from the manager and had obtained O must have sent their request, and the request must have been received by O also by the end of this wait. Some requests will accumulate on the future lock owner before it receives the lock and the old queue, and the algorithm needs to be modified to allow this. The algorithm also needs to be modified to merge the old queue from the old owner with the new queue from the new owner upon passing the queue and the lock (the new queue contains the latest requests). Part B. Decentralized Mutual Exclusion Algorithms based on Token Ring (12 points). a) (4 points) Explain why the mutual exclusion algorithm based on the Token Ring node topology is not fair (by giving a fairness counter-example). For example, a node I who needs/asks for the lock is located right behind he current owner on the ring. I must wait for the token to be passed all around the ring before it has a chance to get it. Other nodes may get the token while it is on its way even if many messages were exchanged around the ring since the time that I placed its request. b) (8 points) Describe your own variation on the standard implementation of a mutual exclusion based on the Token Ring algorithm that allows for fairness in the algorithm. Your solution should still use the Token Ring to transmit messages between nodes (message can be sent only around the Ring) and you need to try to minimize the total bandwidth consumption of your algorithm. You can use one or more of the standard assumptions for bounded and/or reliable network, clocks, etc, if it helps, without losing any points. One possibility: Assuming sync-ed physical clocks. Each node registers the physical timestamp of its lock request locally. The token circulates as usual. When receiving the token, a node which needs the lock adds its local request timestamp to the token, and passes on the token. A node will enter a critical section when 1. it receives the token, 2. it sees its own request carried on the token, and 3. its request is the lowest timestamp request from those carried on the token. At release, the node will delete its own request from the token and pass it on as before. 12

13 Problem 6. Global State. (8 points) Given the following code segments, how many and which results are not possible under sequential consistency (SC)? List all results of the three prints e.g., print x = 0, print y = 0, print z = 0, etc, that are not possible under SC. Assume that all variables, i.e., A, x, y and z are initialized to 0 before this code is reached. Show your thinking. 1. P1 A = 1 x = A y = A z = y P2 print y print x print z All combinations print x = 0/1, print y = 0/1 are valid under SC except for print y = 1 and print x = 0, and print z = 0/1. The key to thinking about this is that print z can happen either before z=y, or after this statement. 2. P1 P2 P3 A = 1 x = A y = x print y z = y print x print z All combinations print x = 0/1, print y = 0/1 are valid under SC except for print y = 1 and print x = 0, and print z = 0/1. 13

14 14

15 15

16 16

PROCESS SYNCHRONIZATION

PROCESS SYNCHRONIZATION DISTRIBUTED COMPUTER SYSTEMS PROCESS SYNCHRONIZATION Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Process Synchronization Mutual Exclusion Algorithms Permission Based Centralized

More information

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015 Distributed Systems Pre-Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2015 October 2, 2015 CS 417 - Paul Krzyzanowski 1 Selected Questions From Past Exams October 2, 2015 CS 417 - Paul Krzyzanowski

More information

Synchronization. Distributed Systems IT332

Synchronization. Distributed Systems IT332 Synchronization Distributed Systems IT332 2 Outline Clock synchronization Logical clocks Election algorithms Mutual exclusion Transactions 3 Hardware/Software Clocks Physical clocks in computers are realized

More information

Process Synchroniztion Mutual Exclusion & Election Algorithms

Process Synchroniztion Mutual Exclusion & Election Algorithms Process Synchroniztion Mutual Exclusion & Election Algorithms Paul Krzyzanowski Rutgers University November 2, 2017 1 Introduction Process synchronization is the set of techniques that are used to coordinate

More information

Lecture 7: Logical Time

Lecture 7: Logical Time Lecture 7: Logical Time 1. Question from reviews a. In protocol, is in- order delivery assumption reasonable? i. TCP provides it b. In protocol, need all participants to be present i. Is this a reasonable

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Process groups and message ordering

Process groups and message ordering Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create ( name ), kill ( name ) join ( name, process ), leave

More information

殷亚凤. Synchronization. Distributed Systems [6]

殷亚凤. Synchronization. Distributed Systems [6] Synchronization Distributed Systems [6] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Protocols Remote Procedure Call

More information

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2015 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 1 Question 1 Why did the use of reference counting for remote objects prove to be impractical? Explain. It s not fault

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single

More information

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2016 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 Question 1 Why does it not make sense to use TCP (Transmission Control Protocol) for the Network Time Protocol (NTP)?

More information

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok? Lamport Clocks: First, questions about project 1: due date for the design document is Thursday. Can be less than a page what we re after is for you to tell us what you are planning to do, so that we can

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED COORDINATION/MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University L22.1 Frequently asked questions from the previous

More information

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED COORDINATION/MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201 Distributed Systems ID2201 coordination Johan Montelius 1 Coordination Coordinating several threads in one node is a problem, coordination in a network is of course worse: failure of nodes and networks

More information

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

CMPSCI 677 Operating Systems Spring Lecture 14: March 9 CMPSCI 677 Operating Systems Spring 2014 Lecture 14: March 9 Lecturer: Prashant Shenoy Scribe: Nikita Mehra 14.1 Distributed Snapshot Algorithm A distributed snapshot algorithm captures a consistent global

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

Distributed Systems COMP 212. Lecture 17 Othon Michail

Distributed Systems COMP 212. Lecture 17 Othon Michail Distributed Systems COMP 212 Lecture 17 Othon Michail Synchronisation 2/29 What Can Go Wrong Updating a replicated database: Customer (update 1) adds 100 to an account, bank employee (update 2) adds 1%

More information

Lecture 5: Flow Control. CSE 123: Computer Networks Alex C. Snoeren

Lecture 5: Flow Control. CSE 123: Computer Networks Alex C. Snoeren Lecture 5: Flow Control CSE 123: Computer Networks Alex C. Snoeren Pipelined Transmission Sender Receiver Sender Receiver Ignored! Keep multiple packets in flight Allows sender to make efficient use of

More information

CS244a: An Introduction to Computer Networks

CS244a: An Introduction to Computer Networks Do not write in this box MCQ 9: /10 10: /10 11: /20 12: /20 13: /20 14: /20 Total: Name: Student ID #: CS244a Winter 2003 Professor McKeown Campus/SITN-Local/SITN-Remote? CS244a: An Introduction to Computer

More information

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Transaction Management A.R. Hurson Computer Science Missouri Science & Technology 1 Distributed Database System Note, this unit will be covered

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Homework #2 Nathan Balon CIS 578 October 31, 2004

Homework #2 Nathan Balon CIS 578 October 31, 2004 Homework #2 Nathan Balon CIS 578 October 31, 2004 1 Answer the following questions about the snapshot algorithm: A) What is it used for? It used for capturing the global state of a distributed system.

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Question. Reliable Transport: The Prequel. Don t parse my words too carefully. Don t be intimidated. Decisions and Their Principles.

Question. Reliable Transport: The Prequel. Don t parse my words too carefully. Don t be intimidated. Decisions and Their Principles. Question How many people have not yet participated? Reliable Transport: The Prequel EE122 Fall 2012 Scott Shenker http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica,

More information

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today) Last Class: Naming Naming Distributed naming DNS LDAP Lecture 12, page 1 Today: Classical Problems in Distributed Systems Time ordering and clock synchronization (today) Next few classes: Leader election

More information

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q Coordination 1 To do q q q Mutual exclusion Election algorithms Next time: Global state Coordination and agreement in US Congress 1798-2015 Process coordination How can processes coordinate their action?

More information

Synchronization. Chapter 5

Synchronization. Chapter 5 Synchronization Chapter 5 Clock Synchronization In a centralized system time is unambiguous. (each computer has its own clock) In a distributed system achieving agreement on time is not trivial. (it is

More information

CSE 5306 Distributed Systems. Consistency and Replication

CSE 5306 Distributed Systems. Consistency and Replication CSE 5306 Distributed Systems Consistency and Replication 1 Reasons for Replication Data are replicated for the reliability of the system Servers are replicated for performance Scaling in numbers Scaling

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Last Class: Clock Synchronization. Today: More Canonical Problems

Last Class: Clock Synchronization. Today: More Canonical Problems Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 12, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm

More information

Coordination and Agreement

Coordination and Agreement Coordination and Agreement Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Introduction 2. Distributed Mutual Exclusion 3. Elections 4. Multicast Communication 5. Consensus and related problems

More information

Distributed Systems Multicast & Group Communication Services

Distributed Systems Multicast & Group Communication Services Distributed Systems 600.437 Multicast & Group Communication Services Department of Computer Science The Johns Hopkins University 1 Multicast & Group Communication Services Lecture 3 Guide to Reliable Distributed

More information

COSC-4411(M) Midterm #1

COSC-4411(M) Midterm #1 12 February 2004 COSC-4411(M) Midterm #1 & answers p. 1 of 10 COSC-4411(M) Midterm #1 Sur / Last Name: Given / First Name: Student ID: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2004

More information

SEGR 550 Distributed Computing. Final Exam, Fall 2011

SEGR 550 Distributed Computing. Final Exam, Fall 2011 SEGR 550 Distributed Computing Final Exam, Fall 2011 (100 points total) 1) This is a take-home examination. You must send your solutions in a PDF or text file to zhuy@seattleu.edu by the deadline. Late

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

416 practice questions (PQs)

416 practice questions (PQs) 416 practice questions (PQs) 1. Goal: give you some material to study for the final exam and to help you to more actively engage with the material we cover in class. 2. Format: questions that are in scope

More information

DISTRIBUTED MUTEX. EE324 Lecture 11

DISTRIBUTED MUTEX. EE324 Lecture 11 DISTRIBUTED MUTEX EE324 Lecture 11 Vector Clocks Vector clocks overcome the shortcoming of Lamport logical clocks L(e) < L(e ) does not imply e happened before e Goal Want ordering that matches causality

More information

Transport Protocols & TCP TCP

Transport Protocols & TCP TCP Transport Protocols & TCP CSE 3213 Fall 2007 13 November 2007 1 TCP Services Flow control Connection establishment and termination Congestion control 2 1 TCP Services Transmission Control Protocol (RFC

More information

6.824 Distributed System Engineering: Spring Quiz I Solutions

6.824 Distributed System Engineering: Spring Quiz I Solutions Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.824 Distributed System Engineering: Spring 2009 Quiz I Solutions All problems are open-ended questions.

More information

Verteilte Systeme (Distributed Systems)

Verteilte Systeme (Distributed Systems) Verteilte Systeme (Distributed Systems) Karl M. Göschka Karl.Goeschka@tuwien.ac.at http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/ Lecture 6: Clocks and Agreement Synchronization of

More information

CS244a: An Introduction to Computer Networks

CS244a: An Introduction to Computer Networks Name: Student ID #: Campus/SITN-Local/SITN-Remote? MC MC Long 18 19 TOTAL /20 /20 CS244a: An Introduction to Computer Networks Final Exam: Thursday February 16th, 2000 You are allowed 2 hours to complete

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information

6.033 Computer Systems Engineering: Spring Quiz II THIS IS AN OPEN BOOK, OPEN NOTES QUIZ. NO PHONES, NO COMPUTERS, NO LAPTOPS, NO PDAS, ETC.

6.033 Computer Systems Engineering: Spring Quiz II THIS IS AN OPEN BOOK, OPEN NOTES QUIZ. NO PHONES, NO COMPUTERS, NO LAPTOPS, NO PDAS, ETC. Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.033 Computer Systems Engineering: Spring 2005 Quiz II There are 17 questions and 10 pages in this quiz

More information

THE TRANSPORT LAYER UNIT IV

THE TRANSPORT LAYER UNIT IV THE TRANSPORT LAYER UNIT IV The Transport Layer: The Transport Service, Elements of Transport Protocols, Congestion Control,The internet transport protocols: UDP, TCP, Performance problems in computer

More information

Lecture 6: Logical Time

Lecture 6: Logical Time Lecture 6: Logical Time 1. Question from reviews a. 2. Key problem: how do you keep track of the order of events. a. Examples: did a file get deleted before or after I ran that program? b. Did this computers

More information

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers Overview 15-441 Computer Networking TCP & router queuing Lecture 10 TCP & Routers TCP details Workloads Lecture 10: 09-30-2002 2 TCP Performance TCP Performance Can TCP saturate a link? Congestion control

More information

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen Distributed Systems Lec 9: Distributed File Systems NFS, AFS Slide acks: Dave Andersen (http://www.cs.cmu.edu/~dga/15-440/f10/lectures/08-distfs1.pdf) 1 VFS and FUSE Primer Some have asked for some background

More information

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech Advanced Topics in Distributed Systems Dr. Ayman A. Abdel-Hamid Computer Science Department Virginia Tech Synchronization (Based on Ch6 in Distributed Systems: Principles and Paradigms, 2/E) Synchronization

More information

Distributed Operating Systems. Distributed Synchronization

Distributed Operating Systems. Distributed Synchronization 2 Distributed Operating Systems Distributed Synchronization Steve Goddard goddard@cse.unl.edu http://www.cse.unl.edu/~goddard/courses/csce855 1 Synchronization Coordinating processes to achieve common

More information

Specifying and Proving Broadcast Properties with TLA

Specifying and Proving Broadcast Properties with TLA Specifying and Proving Broadcast Properties with TLA William Hipschman Department of Computer Science The University of North Carolina at Chapel Hill Abstract Although group communication is vitally important

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

CMPE 257: Wireless and Mobile Networking

CMPE 257: Wireless and Mobile Networking CMPE 257: Wireless and Mobile Networking Katia Obraczka Computer Engineering UCSC Baskin Engineering Lecture 10 CMPE 257 Spring'15 1 Student Presentations Schedule May 21: Sam and Anuj May 26: Larissa

More information

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems Chapter 5 Synchronization OUTLINE Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems Concurrent Processes Cooperating processes

More information

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast Coordination 2 Today l Group communication l Basic, reliable and l ordered multicast How can processes agree on an action or a value? Modes of communication Unicast 1ç è 1 Point to point Anycast 1è

More information

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586 Recap: Consensus On a synchronous system There s an algorithm that works. On

More information

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212. Revision 2 Othon Michail Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

More information

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit) CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 21: Network Protocols (and 2 Phase Commit) 21.0 Main Point Protocol: agreement between two parties as to

More information

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University L23.1 Frequently asked questions from the previous class survey

More information

Replication and Consistency

Replication and Consistency Replication and Consistency Today l Replication l Consistency models l Consistency protocols The value of replication For reliability and availability Avoid problems with disconnection, data corruption,

More information

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems CSE 486/586 Distributed Systems Failure Detectors Slides by: Steve Ko Computer Sciences and Engineering University at Buffalo Administrivia Programming Assignment 2 is out Please continue to monitor Piazza

More information

Mutual Exclusion. A Centralized Algorithm

Mutual Exclusion. A Centralized Algorithm Mutual Exclusion Processes in a distributed system may need to simultaneously access the same resource Mutual exclusion is required to prevent interference and ensure consistency We will study three algorithms

More information

Raft and Paxos Exam Rubric

Raft and Paxos Exam Rubric 1 of 10 03/28/2013 04:27 PM Raft and Paxos Exam Rubric Grading Where points are taken away for incorrect information, every section still has a minimum of 0 points. Raft Exam 1. (4 points, easy) Each figure

More information

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey Yes. But what really is a second? 1 second ==time for a cesium 133 atom

More information

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System Eventual Consistency: Bayou Availability versus consistency Totally-Ordered Multicast kept replicas consistent but had single points of failure Not available under failures COS 418: Distributed Systems

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what

More information

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How. Recap Best Practices Distributed Systems Failure Detectors Steve Ko Computer Sciences and Engineering University at Buffalo 2 Today s Question Two Different System Models How do we handle failures? Cannot

More information

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing

Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing Studying Different Problems from Distributed Computing Several of these problems are motivated by trying to use solutiions used in `centralized computing to distributed computing Problem statement: Mutual

More information

Time Synchronization and Logical Clocks

Time Synchronization and Logical Clocks Time Synchronization and Logical Clocks CS 240: Computing Systems and Concurrency Lecture 5 Mootaz Elnozahy Today 1. The need for time synchronization 2. Wall clock time synchronization 3. Logical Time

More information

Synchronization. Clock Synchronization

Synchronization. Clock Synchronization Synchronization Clock Synchronization Logical clocks Global state Election algorithms Mutual exclusion Distributed transactions 1 Clock Synchronization Time is counted based on tick Time judged by query

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Synchronization Jia Rao http://ranger.uta.edu/~jrao/ 1 Synchronization An important issue in distributed system is how process cooperate and synchronize with one another Cooperation

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Midterm Exam Solutions Amy Murphy 28 February 2001

Midterm Exam Solutions Amy Murphy 28 February 2001 University of Rochester Midterm Exam Solutions Amy Murphy 8 February 00 Computer Systems (CSC/56) Read before beginning: Please write clearly. Illegible answers cannot be graded. Be sure to identify all

More information

image 3.8 KB Figure 1.6: Example Web Page

image 3.8 KB Figure 1.6: Example Web Page image. KB image 1 KB Figure 1.: Example Web Page and is buffered at a router, it must wait for all previously queued packets to be transmitted first. The longer the queue (i.e., the more packets in the

More information

Name Student ID Department/Year. Final Examination. Introduction to Computer Networks Class#: Fall :20-11:00 Tuesday January 13, 2004

Name Student ID Department/Year. Final Examination. Introduction to Computer Networks Class#: Fall :20-11:00 Tuesday January 13, 2004 Final Examination Introduction to Computer Networks Class#: 901 31110 Fall 2003 9:20-11:00 Tuesday January 13, 2004 Prohibited 1. You are not allowed to write down the answers using pencils. Use only black-

More information

Transport Protocols and TCP: Review

Transport Protocols and TCP: Review Transport Protocols and TCP: Review CSE 6590 Fall 2010 Department of Computer Science & Engineering York University 1 19 September 2010 1 Connection Establishment and Termination 2 2 1 Connection Establishment

More information

Time Synchronization and Logical Clocks

Time Synchronization and Logical Clocks Time Synchronization and Logical Clocks CS 240: Computing Systems and Concurrency Lecture 5 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Today 1. The

More information

CSE 5306 Distributed Systems. Synchronization

CSE 5306 Distributed Systems. Synchronization CSE 5306 Distributed Systems Synchronization 1 Synchronization An important issue in distributed system is how processes cooperate and synchronize with one another Cooperation is partially supported by

More information

Lecture 12: Time Distributed Systems

Lecture 12: Time Distributed Systems Lecture 12: Time Distributed Systems Behzad Bordbar School of Computer Science, University of Birmingham, UK Lecture 12 1 Overview Time service requirements and problems sources of time Clock synchronisation

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

Distributed Systems. 05. Clock Synchronization. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 05. Clock Synchronization. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 05. Clock Synchronization Paul Krzyzanowski Rutgers University Fall 2017 2014-2017 Paul Krzyzanowski 1 Synchronization Synchronization covers interactions among distributed processes

More information

CPS 512 midterm exam #1, 10/7/2016

CPS 512 midterm exam #1, 10/7/2016 CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems. Fault Tolerance. Paul Krzyzanowski Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected

More information

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju Chapter 4: Distributed Systems: Replication and Consistency Fall 2013 Jussi Kangasharju Chapter Outline n Replication n Consistency models n Distribution protocols n Consistency protocols 2 Data Replication

More information

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF Your Name: Answers SUNet ID: root @stanford.edu In accordance with both the letter and the

More information

Time. COS 418: Distributed Systems Lecture 3. Wyatt Lloyd

Time. COS 418: Distributed Systems Lecture 3. Wyatt Lloyd Time COS 418: Distributed Systems Lecture 3 Wyatt Lloyd Today 1. The need for time synchronization 2. Wall clock time synchronization 3. Logical Time: Lamport Clocks 2 A distributed edit-compile workflow

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 20 Concurrency Control Part -1 Foundations for concurrency

More information

CS244a: An Introduction to Computer Networks

CS244a: An Introduction to Computer Networks Do not write in this box MCQ 13: /10 14: /10 15: /0 16: /0 17: /10 18: /10 19: /0 0: /10 Total: Name: Student ID #: Campus/SITN-Local/SITN-Remote? CS44a Winter 004 Professor McKeown CS44a: An Introduction

More information

TIME ATTRIBUTION 11/4/2018. George Porter Nov 6 and 8, 2018

TIME ATTRIBUTION 11/4/2018. George Porter Nov 6 and 8, 2018 TIME George Porter Nov 6 and 8, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides incorporate

More information

Final Exam Solutions May 11, 2012 CS162 Operating Systems

Final Exam Solutions May 11, 2012 CS162 Operating Systems University of California, Berkeley College of Engineering Computer Science Division EECS Spring 2012 Anthony D. Joseph and Ion Stoica Final Exam May 11, 2012 CS162 Operating Systems Your Name: SID AND

More information

Quality of Service (QoS)

Quality of Service (QoS) Quality of Service (QoS) The Internet was originally designed for best-effort service without guarantee of predictable performance. Best-effort service is often sufficient for a traffic that is not sensitive

More information

Assignment 10: TCP and Congestion Control Due the week of November 14/15, 2012

Assignment 10: TCP and Congestion Control Due the week of November 14/15, 2012 Assignment 10: TCP and Congestion Control Due the week of November 14/15, 2012 I d like to complete our exploration of TCP by taking a close look at the topic of congestion control in TCP. To prepare for

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Consistency and Replication Jia Rao http://ranger.uta.edu/~jrao/ 1 Reasons for Replication Data is replicated for the reliability of the system Servers are replicated for performance

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering

More information