CSE 124: LAMPORT AND VECTOR CLOCKS. George Porter October 30, 2017

Similar documents
TIME ATTRIBUTION 11/4/2018. George Porter Nov 6 and 8, 2018

Time Synchronization and Logical Clocks

Time Synchronization and Logical Clocks

Time. COS 418: Distributed Systems Lecture 3. Wyatt Lloyd

Time in Distributed Systems

PRIMARY-BACKUP REPLICATION

CSE 124: TIME SYNCHRONIZATION, CRISTIAN S ALGORITHM, BERKELEY ALGORITHM, NTP. George Porter October 27, 2017

CSE 5306 Distributed Systems

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Causal Consistency and Two-Phase Commit

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

Chapter 6 Synchronization

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

ICT 6544 Distributed Systems Lecture 7

CONTENT-DISTRIBUTION NETWORKS

Distributed Coordination 1/39

REPLICATED STATE MACHINES

DISTRIBUTED MUTEX. EE324 Lecture 11

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

CSE 124: REPLICATED STATE MACHINES. George Porter November 8 and 10, 2017

CSE 5306 Distributed Systems. Synchronization

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Lecture 7: Logical Time

Synchronization Part 1. REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5

Chapter 6 Synchronization (1)

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Synchronization. Clock Synchronization

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Process groups and message ordering

殷亚凤. Synchronization. Distributed Systems [6]

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

Multi-threaded programming in Java

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CSE 124: QUANTIFYING PERFORMANCE AT SCALE AND COURSE REVIEW. George Porter December 6, 2017

Distributed Systems COMP 212. Lecture 17 Othon Michail

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

CSE 124: TAIL LATENCY AND PERFORMANCE AT SCALE. George Porter November 27, 2017

THE DATACENTER AS A COMPUTER AND COURSE REVIEW

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Primary-Backup Replication

11/13/2018 CACHING, CONTENT-DISTRIBUTION NETWORKS, AND OVERLAY NETWORKS ATTRIBUTION

SWITCHING, FORWARDING, AND ROUTING

Lecture 6: Logical Time

TAIL LATENCY AND PERFORMANCE AT SCALE

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

CSE 124: OPTIONS, SIGNALS, TIMEOUTS, AND CONCURRENCY. George Porter Oct 13, 2017

Frequently asked questions from the previous class survey

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

wait with priority An enhanced version of the wait operation accepts an optional priority argument:

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Recall use of logical clocks

Synchronization. Chapter 5

Synchronisation in. Distributed Systems. Co-operation and Co-ordination in. Distributed Systems. Kinds of Synchronisation. Clock Synchronization

Synchronization. Distributed Systems IT332

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Byzantine Fault Tolerance

CS 425 / ECE 428 Distributed Systems Fall 2017

INTRODUCTION TO PROTOCOLS, LAYERING, FRAMING AND PARSING

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

CSE 486/586 Distributed Systems

CSE 5306 Distributed Systems. Consistency and Replication

CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY

Day06 A. Young W. Lim Mon. Young W. Lim Day06 A Mon 1 / 16

Fault Tolerance. Distributed Systems. September 2002

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi

2017 Paul Krzyzanowski 1

Distributed Algorithms Reliable Broadcast

Distributed Transactions

Consistency in Distributed Systems

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Replication and Consistency

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Data Replication CS 188 Distributed Systems February 3, 2015

Lecture 10: Clocks and Time

Distributed Systems - I

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Clock-Synchronisation

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Distributed Systems Principles and Paradigms

CSE 5306 Distributed Systems

Distributed Systems Fault Tolerance

Verteilte Systeme (Distributed Systems)

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

CSE 5306 Distributed Systems

IP ADDRESSES, NAMING, AND DNS

Last Class: Clock Synchronization. Today: More Canonical Problems

CSE 5306 Distributed Systems. Fault Tolerance

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

GFS: The Google File System. Dr. Yingwu Zhu

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

CSE 486/586 Distributed Systems

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Transcription:

CSE 124: LAMPORT AND VECTOR CLOCKS George Porter October 30, 2017

ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides incorporate material from: Kyle Jamieson, Princeton University (also under a CC BY-NC-SA 3.0 Creative Commons license)

ANNOUNCEMENTS

Outline 1. Lamport clocks 2. Vector clocks

MOTIVATION: MULTI-SITE DATABASE REPLICATION A New York-based bank wants to make its transaction ledger database resilient to whole-site failures Replicate the database, keep one copy in sf, one in nyc San Francisco New York

MOTIVATION: MULTI-SITE DATABASE REPLICATION Replicate the database, keep one copy in sf, one in nyc Client sends query to the nearest copy Client sends update to both copies Deposit $100 $1,000 $1,100 $1,111 Inconsistent replicas! Updates should have been performed in the same order at each copy Pay 1% interest $1,000 $1,010 $1,110

IDEA: LOGICAL CLOCKS Landmark 1978 paper by Leslie Lamport Insight: only the events themselves matter Idea: Disregard the precise clock time Instead, capture just a happens before relationship between a pair of events

DEFINING HAPPENS-BEFORE Consider three processes: P1, P2, and P3 Notation: Event a happens before event b (a b) P1 P2 P3 Physical time

DEFINING HAPPENS-BEFORE 1. Can observe event order at a single process a P1 P2 P3 b Physical time

DEFINING HAPPENS-BEFORE 1. If same process and a occurs before b, then a b a P1 P2 P3 b Physical time

DEFINING HAPPENS-BEFORE 1. If same process and a occurs before b, then a b 2. Can observe ordering when processes communicate a P1 P2 P3 b c Physical time

DEFINING HAPPENS-BEFORE 1. If same process and a occurs before b, then a b 2. If c is a message receipt of b, then b c a P1 P2 P3 b c Physical time

DEFINING HAPPENS-BEFORE 1. If same process and a occurs before b, then a b 2. If c is a message receipt of b, then b c 3. Can observe ordering transitively a P1 P2 P3 b c Physical time

TRANSITIVE HAPPENS-BEFORE 1. If same process and a occurs before b, then a b 2. If c is a message receipt of b, then b c 3. If a b and b c, then a c a P1 P2 P3 b c Physical time

CONCURRENT EVENTS We seek a clock time C(a) for every event a Plan: Tag events with clock times; use clock times to make distributed system correct Clock condition: If a b, then C(a) < C(b)

THE LAMPORT CLOCK ALGORITHM Each process P i maintains a local clock C i 1. Before executing an event, C i C i + 1 a P1 C 1 =0 P2 C 2 =0 P3 C 3 =0 b c Physical time

THE LAMPORT CLOCK ALGORITHM 1. Before executing an event a, C i C i + 1: Set event time C(a) C i a P1 C 1 =1 C(a) = 1 P2 C 2 =1 P3 C 3 =1 b c Physical time

THE LAMPORT CLOCK ALGORITHM 1. Before executing an event b, C i C i + 1: Set event time C(b) C i a b P1 C 1 =2 C(a) = 1 C(b) = 2 P2 C 2 =1 c P3 C 3 =1 Physical time

THE LAMPORT CLOCK ALGORITHM 1. Before executing an event b, C i C i + 1 2. Send the local clock in the message m a b P1 C 1 =2 C(a) = 1 C(b) = 2 C(m) = 2 P2 C 2 =1 c P3 C 3 =1 Physical time

THE LAMPORT CLOCK ALGORITHM 3. On process P j receiving a message m: Set C j and receive event time C(c) 1 + max{ C j, C(m) } a b P1 C 1 =2 C(a) = 1 C(b) = 2 C(m) = 2 P2 C 2 =3 C(c) = 3 c P3 C 3 =1 Physical time

ORDERING ALL EVENTS Break ties by appending the process number to each event: 1. Process P i timestamps event e with C i (e).i 2. C(a).i < C(b).j when: C(a) < C(b), or C(a) = C(b) and i < j Now, for any two events a and b, C(a) < C(b) or C(b) < C(a) This is called a total ordering of events

MAKING CONCURRENT UPDATES CONSISTENT Recall multi-site database replication: P1 San Francisco (P1) deposited $100: New York (P2) paid 1% interest: % $ P2 We reached an inconsistent state Could we design a system that uses Lamport Clock total order to make multi-site updates consistent?

TOTALLY-ORDERED MULTICAST Client sends update to one replica Lamport timestamp C(x) Key idea: Place events into a local queue Sorted by increasing C(x) P1 s local queue: $ 1.1 % 1.2 P2 s local queue: % 1.2 P1 P2 Goal: All sites apply the updates in (the same) Lamport clock order

TOTALLY-ORDERED MULTICAST (ALMOST CORRECT) 1. On receiving an event from client, broadcast to others (including yourself) 1. On receiving an event from replica: a) Add it to your local queue b) Broadcast an acknowledgement message to every process (including yourself) 2. On receiving an acknowledgement: Mark corresponding event acknowledged in your queue 3. Remove and process events everyone has ack ed from head of queue

TOTALLY-ORDERED MULTICAST (ALMOST CORRECT) P1 queues $, P2 queues % $ 1.1 % 1.2 % $ 1.1 1.2 P1 queues and ack s % P1 marks %fully ack ed P1 $ 1.1 % 1.2 P2 P2 marks % fully ack ed ack % % P2 processes % (Ack s to self not shown here)

TOTALLY-ORDERED MULTICAST (CORRECT VERSION) 1. On receiving an event from client, broadcast to others (including yourself) 2. On receiving or processing an event: a) Add it to your local queue b) Broadcast an acknowledgement message to every process (including yourself) only from head of queue 3. When you receive an acknowledgement: Mark corresponding event acknowledged in your queue 4. Remove and process events everyone has ack ed from head of queue

TOTALLY-ORDERED MULTICAST (CORRECT VERSION) $ 1.1 % 1.2 % $ 1.1 1.2 P1 $ 1.1 % 1.2 P2 $ % ack $ ack % (Ack s to self not shown here) $ %

SO, ARE WE DONE? Does totally-ordered multicast solve the problem of multisite replication in general? Not by a long shot! 1. Our protocol assumed: No node failures No message loss No message corruption 2. All to all communication does not scale 3. Waits forever for message delays (performance?)

TAKE-AWAY POINTS: LAMPORT CLOCKS Can totally-order events in a distributed system: that s useful! But: while by construction, a b implies C(a) < C(b), The converse is not necessarily true: C(a) < C(b) does not imply a b (possibly, a b) Can t use Lamport clock timestamps to infer causal relationships between events

Outline 1. Lamport clocks 2. Vector clocks

VECTOR CLOCK (VC) Label each event e with a vector V(e) = [c 1, c 2, c n ] c i is a count of events in process ithat causally precede e Initially, all vectors are [0, 0,, 0] Two update rules: 1. For each local event on process i, increment local entry c i 2. If process j receives message with vector [d 1, d 2,, d n ]: Set each local entry c k = max{c k, d k } Increment local entry c j

VECTOR CLOCK: EXAMPLE All counters start at [0, 0, 0] Applying local update rule P1 P2 P3 Applying message rule Local vector clock piggybacks on inter-process messages a b [1,0,0] [2,0,0] c d [2,1,0] [2,2,0] e f [0,0,1] [2,2,2] [2,2,2]: Remember we have event e at P3 with timestamp [0,0,1]. D s message gets timestamp [2,2,0], we take max to get [2,2,1] then increment the local entry to get [2,2,2]. Physical time

VECTOR CLOCKS CAN ESTABLISH CAUSALITY Rule for comparing vector clocks: V(a) = V(b) when a k = b k for all k V(a) < V(b) when a k b k for all k and V(a) V(b) a b [1,0,0] [2,0,0] c z [2,1,0] [2,2,0] Concurrency: a b if a i < b i and a j > b j, some i, j V(a) < V(z) when there is a chain of events linked by between a and z

Two events a, z Lamport clocks: C(a) < C(z) Conclusion: None Vector clocks: V(a) < V(z) Conclusion: a z Vector clock timestamps tell us about causal event relationships

VC APPLICATION: CAUSALLY-ORDERED BULLETIN BOARD SYSTEM Distributed bulletin board application Each post multicast of the post to all other users Want: No user to see a reply before the corresponding original message post Deliver message only after all messages that causally precede it have been delivered Otherwise, the user would see a reply to a message they could not find

VC APPLICATION: CAUSALLY-ORDERED BULLETIN BOARD SYSTEM Original post 1 s reply Physical time User 0 posts, user 1 replies to 0 s post; user 2 observes