The Wait-Free Hierarchy

Similar documents
The Relative Power of Synchronization Methods

6.852: Distributed Algorithms Fall, Class 21

Generic Proofs of Consensus Numbers for Abstract Data Types

Generic Proofs of Consensus Numbers for Abstract Data Types

DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms

Section 4 Concurrent Objects Correctness, Progress and Efficiency

A Non-Blocking Concurrent Queue Algorithm

A class C of objects is universal for some set E of classes if and only if any object in E can be implemented with objects of C (and registers)

Consensus Problem. Pradipta De

From Bounded to Unbounded Concurrency Objects and Back

Solo-Valency and the Cost of Coordination

Consensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS

Concurrent Computing. January 21, 2018

Concurrent Objects and Linearizability

Concurrent specifications beyond linearizability

DREU Research. Alyssa Byrnes, Graduate Mentor: Edward Talmage, Faculty Mentor: Jennifer Welch August 26, 2015

Point-Set Topology for Impossibility Results in Distributed Computing. Thomas Nowak

Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer

Fork Sequential Consistency is Blocking

Fork Sequential Consistency is Blocking

The Universality of Consensus

Nesting-Safe Recoverable Linearizability: Modular Constructions for Non-Volatile Memory

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract

RESEARCH ARTICLE. A Simple Byzantine Fault-Tolerant Algorithm for a Multi-Writer Regular Register

k-abortable Objects: Progress under High Contention

Time and Space Lower Bounds for Implementations Using k-cas

Non-blocking Array-based Algorithms for Stacks and Queues!

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Consensus. Chapter Two Friends. 8.3 Impossibility of Consensus. 8.2 Consensus 8.3. IMPOSSIBILITY OF CONSENSUS 55

Concurrent Objects. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Shared Objects. Shared Objects

Relationships Between Broadcast and Shared Memory in Reliable Anonymous Distributed Systems

Consensus, impossibility results and Paxos. Ken Birman

Impossibility of Agreement in Asynchronous Systems

6.852 Lecture 17. Atomic objects Reading: Chapter 13 Next lecture: Atomic snapshot, read/write register

arxiv: v1 [cs.dc] 6 May 2014

Computing with Infinitely Many Processes under assumptions on concurrency and participation -M.Merritt&G.Taubenfeld. Dean Christakos & Deva Seetharam

Unit 6: Indeterminate Computation

1 The comparison of QC, SC and LIN

Byzantine Consensus in Directed Graphs

Initial Assumptions. Modern Distributed Computing. Network Topology. Initial Input

arxiv: v1 [cs.dc] 13 May 2017

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

Semi-Passive Replication in the Presence of Byzantine Faults

arxiv: v1 [cs.dc] 8 May 2017

On the Importance of Synchronization Primitives with Low Consensus Numbers

Shared Memory Seif Haridi

Shared Memory. Chapter Model

Distributed Algorithms 6.046J, Spring, Nancy Lynch

Temporal Logic of Actions (TLA) (a brief introduction) Shmuel Katz Computer Science Department The Technion

Review of last lecture. Goals of this lecture. DPHPC Overview. Lock-based queue. Lock-based queue

On the Space Complexity of Randomized Synchronization

Parallélisme. Aim of the talk. Decision tasks. Example: consensus

DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Lock-based queue

A Timing Assumption and a t-resilient Protocol for Implementing an Eventual Leader Service in Asynchronous Shared Memory Systems

Linearizability of Persistent Memory Objects

Asynchronous Models. Chapter Asynchronous Processes States, Inputs, and Outputs

Concurrent Programming: Algorithms, Principles, and Foundations

ACONCURRENT system may be viewed as a collection of

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

Lock-free Serializable Transactions

Reuse, don t Recycle: Transforming Lock-free Algorithms that Throw Away Descriptors

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Review. Review. Review. Constructing Reliable Registers From Unreliable Byzantine Components 12/15/08. Space of registers: Transformations:

Programming Paradigms for Concurrency Lecture 3 Concurrent Objects

Robust Data Sharing with Key-Value Stores

On the inherent weakness of conditional primitives

Constant RMR Transformation to Augment Reader-Writer Locks with Atomic Upgrade/Downgrade Support

Asynchronous Reconfiguration for Paxos State Machines

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Quasi-Linearizability Relaxed Consistency For Improved Concurrency

Anonymous Agreement: The Janus Algorithm

Treewidth and graph minors

How Live Can a Transactional Memory Be?

Shared Memory Synchronization

Coordination and Agreement

Ownership of a queue for practical lock-free scheduling

6.852: Distributed Algorithms Fall, Class 12

Multi-writer Regular Registers in Dynamic Distributed Systems with Byzantine Failures

Consensus and agreement algorithms

Thread Synchronization: Foundations. Properties. Safety properties. Edsger s perspective. Nothing bad happens

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Self-stabilizing Byzantine Digital Clock Synchronization

Shared memory model" Shared memory guarantees" Read-write register" Wait-freedom: unconditional progress " Liveness" Shared memory basics"

CSE 486/586 Distributed Systems

The Complexity of Renaming

What Can be Computed in a Distributed System?

CS 241 Honors Concurrent Data Structures

Algorithm 23 works. Instead of a spanning tree, one can use routing.

6.852: Distributed Algorithms Fall, Instructor: Nancy Lynch TAs: Cameron Musco, Katerina Sotiraki Course Secretary: Joanne Hanley

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Wait-Free Regular Storage from Byzantine Components

Proving linearizability using forward simulations

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Distributed Systems. Lec 11: Consistency Models. Slide acks: Jinyang Li, Robert Morris

Mutual Exclusion Algorithms with Constant RMR Complexity and Wait-Free Exit Code

Transcription:

Jennifer L. Welch

References 1 M. Herlihy, Wait-Free Synchronization, ACM TOPLAS, 13(1):124-149 (1991) M. Fischer, N. Lynch, and M. Paterson, Impossibility of Distributed Consensus with One Faulty Process, JACM 32(2): 374-382 (1985)

Implementing Shared Objects 2 Consider a concurrent (parallel distributed) system that is asynchronous (no timing guarantees) failure-prone (processes can crash unannounced) provides some kind of shared memory building blocks What kinds of additional shared memory objects can we build?

Preview of the Answer 3 Depends on the semantics of the shared objects Is related to the ability of the objects to solve the consensus problem Data types can be organized into a hierarchy based on the number of processes for which they can solve consensus Data types at one level of the hierarchy cannot implement data types at a higher level of the hierarchy (roughly speaking)

The Consensus Problem 4 Each process has an input for simplicity, assume 0 or 1 Each (non-crashed) process should terminate and decide on an output such that Agreement: All decisions are the same Validity: The (common) decision is one of the inputs

Wait-Free Algorithms 5 An algorithm for n processors is wait-free if it can tolerate n - 1 failures. Intuition is that a nonfaulty processor does not wait for other processors to do something: it cannot, because it might be the only processor left alive.

Negative Result About Shared Read-Write Registers 6 Theorem: There is no wait-free asynchronous algorithm for consensus using shared r/w registers. Proof: By contradiction. Assume there is such an algorithm. Show there exists an initial system state in which the decision cannot be pre-determined. Show inductively how to go from an undetermined state to another undetermined state. Thus we can construct an infinitely long execution in which a decision cannot be made.

Notion of Valency 7 For any system state, consider all decision values that are reachable from that system state in all the different futures just 0, just 1, or both 0 and 1 Note: because of the asynchrony, there are many possible executions starting at any point, depending on the order in which processes take steps and when processes crash If both 0 and 1 are reachable, the state is called bivalent, otherwise it is univalent (0-valent or 1- valent).

Valency of a System State 8 C 0/1 0 0/1 1 0/1 D E F G 0 0 0 0 0 1 0 1 1 1 1 1 1 0 0 1 decisions 0/1 : bivalent 1 : 1-valent 0 : 0-valent

Univalent Similarity 9 Lemma 1: If C 1 and C 2 are both univalent and they are similar w.r.t. process p (shared memory state is same, p s local state is same), then they have the same valency. Proof: only p takes steps C 1 v-valent p takes same number of steps p eventually decides v C 2 w-valent p behaves same and decides v

Bivalent Initial System State 10 Lemma 2: There exists a bivalent initial system state. Proof: By contradiction. Suppose all initial system states are univalent. The one with all 0 s input (call it C 0 ) is 0-valent, by validity. The one with all 1 s input (call it C 1 ) is 1-valent, by validity The one with half 0 s and half 1 s input (call it D): should be 0-valent by Lemma 1, comparing D and C 0 should be 1-valent by Lemma 1, comparing D and C 1

Critical Processors 11 Def: If C is bivalent and p(c) (result of p taking one step) is univalent, then p is critical in C. Lemma 3: If C is bivalent, then at least one processor is not critical in C, i.e., there is a bivalent extension. Proof: Suppose in contradiction all processors are critical. C bival. p q p(c) 0-val. q(c) 1-val. Rest of proof is case analysis of what p and q do in their two steps

Critical Processors 12 Case 1: p and q access different registers. p p(c) 0-val. q C bival. q q(c) 1-val. p Case 2: p and q read same register. Same proof.

Critical Processors 13 Case 3: p writes to a register R and q reads from R. C bival. p writes to R q reads from R p(c) 0-val q(c) 1-val p writes to R p(q(c)) 1-val similar w.r.t. p

Critical Processors 14 Case 4: What if p and q both write to the same shared variable? Can "assume away" the problem by assuming we only have single-writer shared variables. Or, can do a similar proof for this case.

Finishing the Impossibility Proof 15 Create an execution C 0,p 1,C 1,p 2,C 2, in which all system states are bivalent. contradicts termination requirement Start with bivalent initial system state (from Lemma 2). Suppose we have bivalent C k. To get bivalent C k+1 : Let p k+1 be a processor that is not critical in C k (exists by Lemma 3). Let C k+1 be p k+1 (C k ).

Data Types Beyond Registers 16 Registers support the operations read and write What about (wait-free) implementing a significantly different kind of data type out of registers? More generally, what about (wait-free) implementing an object of type X out of objects of type Y?

Key Insight 17 Ability of objects of type Y to be used to implement an object of type X is related to the ability of those data types to solve consensus! We are focusing on systems that are asynchronous shared memory wait-free

FIFO Queue Example 18 Sequential specification of a FIFO queue: operation with invocation enq(x) and response ack operation with invocation deq and response return(x) a sequence of operations is legal iff each deq returns the oldest enqueued value that has not yet been dequeued (returns if queue is empty)

19 Consensus Algorithm for 2 Processes (p 0 and p 1 ) Using FIFO Queue Initially Q = [0] and Prefer[i] = one shared FIFO queue two shared registers Prefer[i] := p i 's input val := deq(q) if val = 0 then decide on p i 's input else temp := Prefer[1 i] decide temp write my input into my register use shared queue to arbitrate between the 2 procs: first one to dequeue the initial 0 wins, decision value is its input loser obtains decision value from other proc's register

Implications of Consensus Algorithm 20 Using FIFO Queue Suppose we want to wait-free implement a FIFO queue using read/write registers. Is this possible? No! If it were possible, we could solve consensus: implement a FIFO queue using registers use implemented queue and previous algorithm to solve consensus

Extend Algorithm to More Procs? 21 Can we use FIFO queues to solve consensus with more than 2 processes? The ability to atomically dequeue a value was key to the 2-process alg: one process learns it is the winner the other learns it is the loser, therefore the id of the winner is obvious Not clear how to handle 3 processes. Suppose we have a different data type:

Compare & Swap Specification 22 compare&swap(x : shared memory address, old: value, new: value) previous := X // previous is a local var. if previous = old then X := new return previous X old new

Consensus Algorithm for n Processes 23 Using Compare-and-Swap Initially First = one shared C&S object val := compare&swap(first,, my input) if val = then decide on my input else decide val if First = then replace with my input simultaneously indicate the winner and the value to be decided by all the losers

Impossibility of 3-Process Consensus 24 with FIFO Queue Theorem: Wait-free consensus is impossible using FIFO queues and registers if n > 2. Proof: Same structure as for registers. Key difference is when considering situation when C is bivalent p(c) is 0-valent and q(c) is 1-valent.

Impossibility of 3-Process Consensus 25 with FIFO Queues p and q must be accessing the same FIFO queue. Case 1: Both steps are deq's. 0/1 C p deq's q deq's q deq's 0 1 0 1 look same to r p deq's

Impossibility Proof 26 Case 2: p deq's and q enq's. Case 2.1: The queue is not empty in C 0/1 C p deq's q enq's 0 1 q enq's p deq's?

Impossibility Proof 27 Case 2: p deq's and q enq's. Case 2.2: The queue is empty in C p deq's queue is still empty 0/1 C 0 look the same to r queue is empty q enq's 1 p deq's queue is empty again 1

Impossibility Proof 28 Case 3: Both p and q enq (on same queue). p enq's A q enq's B σ: p takes steps until deq'ing A C 0/1 q enq's B 0 1 p enq's A σ: p takes steps until deq'ing B why do σ and τ exist? τ: q takes steps until deq'ing B τ: q takes steps until deq'ing A 0 look the same to r 1

Impossibility Proof 29 Case 3 cont'd: Suppose σ does not exist: p enq's A q enq's B C 0/1 q enq's B 0 1 p enq's A p takes steps until deciding but never deq's A; decides 0 p takes same number of steps as on the left; never deq's B; also decides 0 0 1

Impossibility Proof 30 Case 3 cont'd: Prove existence of τ similarly. Thus there is no wait-free algorithm for consensus with 3 processes using FIFO queues and registers.

Implications 31 Suppose we want to wait-free implement a compare&swap object using FIFO queues (and registers). Is this possible? Not if n > 2! If it were possible, we could solve consensus using FIFO queues (and registers): implement a compare&swap object using FIFO queues (and registers) use implemented compare&swap object and c&s algorithm to solve consensus

Generalize these Arguments 32 Previous results concerning FIFO queues and compare&swap suggest a criterion for determining if wait-free implementations exist: based on ability of the data types to solve consensus for a certain number of processes.

Consensus Number 33 Data type X has consensus number n if n is the largest number of processes for which consensus can be solved using only objects of type X and read/write registers. data type consensus number read/write register 1 FIFO queue 2 compare&swap

Using Consensus Numbers 34 Theorem: If data type X has consensus number m and data type Y has consensus number n with n > m, then there is no wait-free implementation of an object of type Y using objects of type X and read/write registers in a system with more than m procs. X X X reg reg reg Y

Using Consensus Numbers 35 Proof: Suppose in contradiction there is a wait-free implementation S of Y using X and registers in a system with k processes, where m < k n. Construct consensus algorithm for k > m processes using objects of type X (and registers): Use S to implement some objects of type Y using objects of type X (and registers) Use the (implemented) type Y objects (and registers) in the k-process consensus algorithm that exists since CN(Y) = n.

Corollaries 36 There is no wait-free implementation of any object with consensus number > 1 using just read/write registers. There is no wait-free implementation of any object with consensus number > 2 using just FIFO queues and read/write registers.

Universality 37 Let's now consider positive results relating to consensus number. A data type is universal if objects of that type (together with read/write registers) can wait-free implement any data type. Theorem: If data type X has consensus number n, then it is universal in a system with at most n processes.

Proving Universality Result 38 1. Describe an algorithm that implements any data type uses compare&swap (instead of any object with consensus number n) implementation is only non-blocking, weaker than wait-free 2. Modify to use any object with consensus number n 3. Modify to be wait-free 4. Modify to bound shared memory used

Non-Blocking 39 Non-blocking vs. wait-free is analogous to nodeadlock vs. no-lockout for mutual exclusion. Non-blocking implementation: at any point in an execution, if at least one operation is pending (response is not yet ready to be done), then there is a finite sequence of steps by a single proc that completes one of the pending operations. Does not ensure that every pending operation is eventually completed.

Universal Construction 40 Keep history of operations that have been applied to the implemented object as a shared linked list. To apply an operation on the implemented object, the invoking proc. must insert an appropriate "node" into the linked list: it is convenient to put the newest node at the head of the list A compare&swap object is used to keep track of the head of the list

Details on Linked List 41 Each linked list node has operation invocation new state of the implemented object operation response pointer to previous node (previous op) anchor Head invocation state invocation state initial state response response before before

Implementation 42 Initially Head points to anchor node represents initial state of implemented object When inv is invoked: allocate a new linked list node in shared memory, pointed to by local var point point.inv := inv repeat h := Head // h is a local var point.state, point.response := apply(inv,h.state) point.before := h until compare&swap(head,h,point) = h do the output indicated by point.response depends on implemented data type if Head still points to same node h points to, then make Head point to new node.

Implementation Figure 43 invocation state response before point h p i Head invocation state response before if compare&swap indicates that Head has moved on, then try again to insert the new node, at the new location

Strengthenings of Algorithm 44 To replace compare&swap object with any object with consensus number n (the number of procs): define a consensus object (data type version of consensus problem) get around the difficulty that a consensus object can only be used once by adding a consensus object to each linked list node that points to next node in the list

Strengthenings of Algorithm 45 To get a wait-free implementation, use idea of helping: procs help each other to finish pending operations (not just their own) To reduce the size of the linked list (so it doesn't grow without bound), need to keep track of which list nodes can be recycled.

Effect of Randomization 46 Suppose we relax the liveness condition for linearizable shared memory: operations must terminate with high probability Now a randomized consensus algorithm can be used to implement any data type out of any other data type, including read/write registers I.e., hierarchy collapses.