CSE 486/586 Distributed Systems

Similar documents
CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems Reliable Multicast --- 1

CSE 486/586 Distributed Systems

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Coordination and Agreement

CSE 486/586 Distributed Systems

Coordination and Agreement

CSE 486/586: Distributed Systems

Multicast Communication. Brian Nielsen

The UNIVERSITY of EDINBURGH. SCHOOL of INFORMATICS. CS4/MSc. Distributed Systems. Björn Franke. Room 2414

CSE 486/586 Distributed Systems

Distributed Systems. Multicast and Agreement

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems (5DV020) Group communication. Fall Group communication. Fall Indirect communication

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

CSE 486/586 Distributed Systems

C 1. Today s Question. CSE 486/586 Distributed Systems Failure Detectors. Two Different System Models. Failure Model. Why, What, and How

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

Process groups and message ordering

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

CSE 486/586: Distributed Systems

CSE 486/586 Distributed Systems

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Chapter 6 Synchronization

Distributed Systems Multicast & Group Communication Services

Fault Tolerance. Distributed Systems. September 2002

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Indirect Communication

Distributed Algorithms Reliable Broadcast

Coordination and Agreement

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

Distributed Systems Reliable Group Communication

Mutual Exclusion. A Centralized Algorithm

CSE 124: LAMPORT AND VECTOR CLOCKS. George Porter October 30, 2017

CSE 486/586 Distributed Systems Peer-to-Peer Architectures

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Last Class: Clock Synchronization. Today: More Canonical Problems

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1

CS 425 / ECE 428 Distributed Systems Fall 2017

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

On the interconnection of message passing systems

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

CSE 5306 Distributed Systems

Failure Tolerance. Distributed Systems Santa Clara University

CSE 5306 Distributed Systems. Synchronization

殷亚凤. Synchronization. Distributed Systems [6]

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Remote Procedure Call. Chord: Node Joins and Leaves. Recall? Socket API

Accepted Manuscript. On the Interconnection of Message Passing Systems. A. Álvarez, S. Arévalo, V. Cholvi, A. Fernández, E.

Slides for Chapter 15: Coordination and Agreement

CSE 5306 Distributed Systems

Fault Tolerance. Distributed Software Systems. Definitions

Last Class. CSE 123b Communications Software. Today. Naming Processes/Services. Transmission Control Protocol (TCP) Picking Port Numbers.

CSE 5306 Distributed Systems. Fault Tolerance

Distributed Systems COMP 212. Lecture 17 Othon Michail

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

Delivering Multicast Messages in Networks with Mobile Hosts

Chapter 8 Fault Tolerance

FORMAL SPECIFICATION AND VERIFICATION OF TOTAL ORDER BROADCAST THROUGH DESTINATION AGREEMENT USING EVENT-B

Indirect Communication

Lecture 6: Logical Time

Causal Order Multicast Protocol Using Different Information from Brokers to Subscribers

Connections. Topics. Focus. Presentation Session. Application. Data Link. Transport. Physical. Network

Transport Protocols and TCP: Review

CSE 461 Connections. David Wetherall

Consensus Problem. Pradipta De

Analysis of Distributed Snapshot Algorithms

Priority Traffic CSCD 433/533. Advanced Networks Spring Lecture 21 Congestion Control and Queuing Strategies

EEC-682/782 Computer Networks I

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Election Algorithms. has elected i. will eventually set elected i

CSE 486/586 Distributed Systems

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

DISTRIBUTED MUTEX. EE324 Lecture 11

Some slides courtesy David Wetherall. Communications Software. Lecture 4: Connections and Flow Control. CSE 123b. Spring 2003.

TIME AND SYNCHRONIZATION. I. Physical Clock Synchronization: Motivation and Challenges

Network Working Group. Category: Standards Track March 2001

Transport Protocols & TCP TCP

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Congestion Collapse in the 1980s

Distributed Systems. Day 9: Replication [Part 1]

List Hygiene and Bounce Handling Processes

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Consensus and related problems

Flow control: Ensuring the source sending frames does not overflow the receiver

Packet Switching Networks. Dr. Indranil Sen Gupta Packet Switching Networks Slide 1

Part 5: Total Order Broadcast

CSE331 Introduction to Algorithms Lecture 15 Minimum Spanning Trees

How Bitcoin achieves Decentralization. How Bitcoin achieves Decentralization

G Bayou: A Weakly Connected Replicated Storage System. Robert Grimm New York University

Transcription:

CSE 486/586 Distributed Systems Reliable Multicast (Part 2) Slides by Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586

Last Time Multicast One-to-many communication within a group of processes Issues to be handled Processes fail Messages get delayed B-multicast R-multicast Integrity, Agreement, Validity Ordering Why do we care?

Recap: Ordering Totally ordered messages T 1 and T 2. FIFO-related messages F 1 and F 2. Causally related messages C 1 and C 3 T 1 F 1 F 2 T 2 F 3 Total ordering does not imply causal ordering. Time Causal ordering implies FIFO ordering Causal ordering does not imply total ordering. Hybrid mode: causaltotal ordering, FIFO-total ordering. C 1 C 2 C 3 P 1 P 2 P 3

Example: FIFO Multicast P1 P2 P3 0 0 0 0 0 0 0 0 0 0 0 0 Physical Time deliver: 1 0 0 2 0 0 2 = 1 + 1 2 1 0 1 2 2 1 1 1 0 0 2 0 0 2 1 0 1 deliver: 1 = 0 + 1 (Not to be confused with vector timestamps!) Sequence Vector 0 0 0 1 0 0 2 1 0 2 0 0 2 0 0 buffer: 2>0 +1 1 2 1 0 deliver: 1 = 0 + 1 deliver: 1 = 0 + 1 Reject: 1 < 1 + 1 deliver buffered: 2 =1 + 1

Totally Ordered Multicast Using a sequencer One dedicated sequencer that orders all messages Everyone else follows. ISIS system Similar in result to a sequencer The responsibility for sequencing is distributed to each sender

Total Ordering Using a Sequencer 1. Algorithm for group member p On initialization: r g := 0 To TO-multicast message m to group g: B-multicast(g { sequencer(g) }, <m, i> On B-deliver(<m, i>) with g = group(m): Place <m, i> in hold-back queue On B-deliver(m order = < order, i, S>) with g = group(m order ): Wait until <m, i> in hold-back queue and S = r g TO-deliver m r g = S + 1 2. Algorithm for sequencer of g On initialization: s g := 0 On B-deliver(<m, i>) with g = group(m): B-multicast(g, < order, I, s g >) s g := s g + 1

ISIS algorithm for total ordering P 2 1 2 3 1 Message 2 Proposed Seq P 4 1 3 Agreed Seq 2 3 P 1 P 3

ISIS algorithm for total ordering 1) Sender multicasts message to everyone 2) Each process replies with its proposed priority (seq. no.) Larger than all observed agreed priorities Larger than any priority previously proposed by this process 3) Store messages in a priority queue Ordered by priority (proposed or agreed) Mark message as undeliverable 4) Sender chooses agreed priority, multicasts message Agreed priority = maximum of all proposed priorities 5) Upon receiving agreed priority Mark message as deliverable Deliver any deliverable messages at the front of priority queue

Problematic Scenario Two processes P1 & P2 at their initial state. P1 sends m 1 & P2 sends m 2. P1 receives m 1 (its own) and proposes priority 1. P2 does the same for m 2. P2 receives m 1 (P1 s message) and proposes priority 2. P1 does the same for m 2. P1 picks 2 for m 1 & P2 also picks 2 for m 2. Same sequence number for two different messages! How do you want to solve this?

Example: ISIS algorithm [1] Showing the process id only when necessary P1 A A:1 A:2 C:3.2 C:3.3 B:3.1 P2 C B:1 B:3.1 A:2 C:3.3 P3 B B:1 C:3.3 A:2 B:3.1 C:3 t A:1 Message A:2 Updated Message A:2 Ordered Message Delivered Message Proposal Agreement

Proof of Total Order For a message m 1, consider the first process p that delivers m 1 Let m 1 be deliverable & at the head of the queue at p Let m 2 be another message that has not been delivered finalpriority(m 2 ) >= proposedpriority(m 2 ) >= finalpriority(m 1 ) Let p be a process that delivers m 2 before m 1. At p : finalpriority(m 1 ) >= proposedpriority(m 1 ) >= finalpriority(m 2 ) a contradiction! Due to max operation at sender Since queue ordered by increasing priority Due to max operation at sender Since queue ordered by increasing priority

Causally Ordered Multicast Each process keeps a vector clock Each entry i in the vector clock counts the number of messages delivered from process P i Each process s own entry in its vector represents the number of messages it has sent When sending a multicast message m, process P i increments its own entry, then sends its vector with m When receiving a message m, from process P j process P i waits to deliver m until it can preserve causal ordering: It has delivered all previous messages from P j It has delivered all messages that P j had delivered before m

Causal Ordering The number of group-g messages from process j that have been seen at process i so far

Example: Causal Ordering Multicast deliver Reject: P1 P2 P3 0,0,0 0,0,0 0,0,0 1,0,0 1,1,0 1,1,0 (1,0,0) (1,1,0) (1,1,0) 1,0,0 1,1,0 (1,1,0) (1,0,0) 1,1,0 1,0,0 1,1,0 deliver deliver: Buffer, missing P1(1) deliver buffered message Physical Time

Summary Total Ordering Multicast Sequencer ISIS Causal Ordering Multicast Uses vector timestamps

References [1] Kenneth Birman, Andre Schiper, and Pat Stephenson. Fast Causal Multicast. Technical Report. http://www.dtic.mil/dtic/tr/fulltext/u2/a220910.pdf [2] Textbook section 15.4. Required Reading.

Acknowledgements These slides are by Steve Ko, used and lightly modified with permission by Ethan Blanton. These slides contain material developed and copyrighted by Indranil Gupta (UIUC).