Jennifer L. Welch
References 1 M. Herlihy, Wait-Free Synchronization, ACM TOPLAS, 13(1):124-149 (1991) M. Fischer, N. Lynch, and M. Paterson, Impossibility of Distributed Consensus with One Faulty Process, JACM 32(2): 374-382 (1985)
Implementing Shared Objects 2 Consider a concurrent (parallel distributed) system that is asynchronous (no timing guarantees) failure-prone (processes can crash unannounced) provides some kind of shared memory building blocks What kinds of additional shared memory objects can we build?
Preview of the Answer 3 Depends on the semantics of the shared objects Is related to the ability of the objects to solve the consensus problem Data types can be organized into a hierarchy based on the number of processes for which they can solve consensus Data types at one level of the hierarchy cannot implement data types at a higher level of the hierarchy (roughly speaking)
The Consensus Problem 4 Each process has an input for simplicity, assume 0 or 1 Each (non-crashed) process should terminate and decide on an output such that Agreement: All decisions are the same Validity: The (common) decision is one of the inputs
Wait-Free Algorithms 5 An algorithm for n processors is wait-free if it can tolerate n - 1 failures. Intuition is that a nonfaulty processor does not wait for other processors to do something: it cannot, because it might be the only processor left alive.
Negative Result About Shared Read-Write Registers 6 Theorem: There is no wait-free asynchronous algorithm for consensus using shared r/w registers. Proof: By contradiction. Assume there is such an algorithm. Show there exists an initial system state in which the decision cannot be pre-determined. Show inductively how to go from an undetermined state to another undetermined state. Thus we can construct an infinitely long execution in which a decision cannot be made.
Notion of Valency 7 For any system state, consider all decision values that are reachable from that system state in all the different futures just 0, just 1, or both 0 and 1 Note: because of the asynchrony, there are many possible executions starting at any point, depending on the order in which processes take steps and when processes crash If both 0 and 1 are reachable, the state is called bivalent, otherwise it is univalent (0-valent or 1- valent).
Valency of a System State 8 C 0/1 0 0/1 1 0/1 D E F G 0 0 0 0 0 1 0 1 1 1 1 1 1 0 0 1 decisions 0/1 : bivalent 1 : 1-valent 0 : 0-valent
Univalent Similarity 9 Lemma 1: If C 1 and C 2 are both univalent and they are similar w.r.t. process p (shared memory state is same, p s local state is same), then they have the same valency. Proof: only p takes steps C 1 v-valent p takes same number of steps p eventually decides v C 2 w-valent p behaves same and decides v
Bivalent Initial System State 10 Lemma 2: There exists a bivalent initial system state. Proof: By contradiction. Suppose all initial system states are univalent. The one with all 0 s input (call it C 0 ) is 0-valent, by validity. The one with all 1 s input (call it C 1 ) is 1-valent, by validity The one with half 0 s and half 1 s input (call it D): should be 0-valent by Lemma 1, comparing D and C 0 should be 1-valent by Lemma 1, comparing D and C 1
Critical Processors 11 Def: If C is bivalent and p(c) (result of p taking one step) is univalent, then p is critical in C. Lemma 3: If C is bivalent, then at least one processor is not critical in C, i.e., there is a bivalent extension. Proof: Suppose in contradiction all processors are critical. C bival. p q p(c) 0-val. q(c) 1-val. Rest of proof is case analysis of what p and q do in their two steps
Critical Processors 12 Case 1: p and q access different registers. p p(c) 0-val. q C bival. q q(c) 1-val. p Case 2: p and q read same register. Same proof.
Critical Processors 13 Case 3: p writes to a register R and q reads from R. C bival. p writes to R q reads from R p(c) 0-val q(c) 1-val p writes to R p(q(c)) 1-val similar w.r.t. p
Critical Processors 14 Case 4: What if p and q both write to the same shared variable? Can "assume away" the problem by assuming we only have single-writer shared variables. Or, can do a similar proof for this case.
Finishing the Impossibility Proof 15 Create an execution C 0,p 1,C 1,p 2,C 2, in which all system states are bivalent. contradicts termination requirement Start with bivalent initial system state (from Lemma 2). Suppose we have bivalent C k. To get bivalent C k+1 : Let p k+1 be a processor that is not critical in C k (exists by Lemma 3). Let C k+1 be p k+1 (C k ).
Data Types Beyond Registers 16 Registers support the operations read and write What about (wait-free) implementing a significantly different kind of data type out of registers? More generally, what about (wait-free) implementing an object of type X out of objects of type Y?
Key Insight 17 Ability of objects of type Y to be used to implement an object of type X is related to the ability of those data types to solve consensus! We are focusing on systems that are asynchronous shared memory wait-free
FIFO Queue Example 18 Sequential specification of a FIFO queue: operation with invocation enq(x) and response ack operation with invocation deq and response return(x) a sequence of operations is legal iff each deq returns the oldest enqueued value that has not yet been dequeued (returns if queue is empty)
19 Consensus Algorithm for 2 Processes (p 0 and p 1 ) Using FIFO Queue Initially Q = [0] and Prefer[i] = one shared FIFO queue two shared registers Prefer[i] := p i 's input val := deq(q) if val = 0 then decide on p i 's input else temp := Prefer[1 i] decide temp write my input into my register use shared queue to arbitrate between the 2 procs: first one to dequeue the initial 0 wins, decision value is its input loser obtains decision value from other proc's register
Implications of Consensus Algorithm 20 Using FIFO Queue Suppose we want to wait-free implement a FIFO queue using read/write registers. Is this possible? No! If it were possible, we could solve consensus: implement a FIFO queue using registers use implemented queue and previous algorithm to solve consensus
Extend Algorithm to More Procs? 21 Can we use FIFO queues to solve consensus with more than 2 processes? The ability to atomically dequeue a value was key to the 2-process alg: one process learns it is the winner the other learns it is the loser, therefore the id of the winner is obvious Not clear how to handle 3 processes. Suppose we have a different data type:
Compare & Swap Specification 22 compare&swap(x : shared memory address, old: value, new: value) previous := X // previous is a local var. if previous = old then X := new return previous X old new
Consensus Algorithm for n Processes 23 Using Compare-and-Swap Initially First = one shared C&S object val := compare&swap(first,, my input) if val = then decide on my input else decide val if First = then replace with my input simultaneously indicate the winner and the value to be decided by all the losers
Impossibility of 3-Process Consensus 24 with FIFO Queue Theorem: Wait-free consensus is impossible using FIFO queues and registers if n > 2. Proof: Same structure as for registers. Key difference is when considering situation when C is bivalent p(c) is 0-valent and q(c) is 1-valent.
Impossibility of 3-Process Consensus 25 with FIFO Queues p and q must be accessing the same FIFO queue. Case 1: Both steps are deq's. 0/1 C p deq's q deq's q deq's 0 1 0 1 look same to r p deq's
Impossibility Proof 26 Case 2: p deq's and q enq's. Case 2.1: The queue is not empty in C 0/1 C p deq's q enq's 0 1 q enq's p deq's?
Impossibility Proof 27 Case 2: p deq's and q enq's. Case 2.2: The queue is empty in C p deq's queue is still empty 0/1 C 0 look the same to r queue is empty q enq's 1 p deq's queue is empty again 1
Impossibility Proof 28 Case 3: Both p and q enq (on same queue). p enq's A q enq's B σ: p takes steps until deq'ing A C 0/1 q enq's B 0 1 p enq's A σ: p takes steps until deq'ing B why do σ and τ exist? τ: q takes steps until deq'ing B τ: q takes steps until deq'ing A 0 look the same to r 1
Impossibility Proof 29 Case 3 cont'd: Suppose σ does not exist: p enq's A q enq's B C 0/1 q enq's B 0 1 p enq's A p takes steps until deciding but never deq's A; decides 0 p takes same number of steps as on the left; never deq's B; also decides 0 0 1
Impossibility Proof 30 Case 3 cont'd: Prove existence of τ similarly. Thus there is no wait-free algorithm for consensus with 3 processes using FIFO queues and registers.
Implications 31 Suppose we want to wait-free implement a compare&swap object using FIFO queues (and registers). Is this possible? Not if n > 2! If it were possible, we could solve consensus using FIFO queues (and registers): implement a compare&swap object using FIFO queues (and registers) use implemented compare&swap object and c&s algorithm to solve consensus
Generalize these Arguments 32 Previous results concerning FIFO queues and compare&swap suggest a criterion for determining if wait-free implementations exist: based on ability of the data types to solve consensus for a certain number of processes.
Consensus Number 33 Data type X has consensus number n if n is the largest number of processes for which consensus can be solved using only objects of type X and read/write registers. data type consensus number read/write register 1 FIFO queue 2 compare&swap
Using Consensus Numbers 34 Theorem: If data type X has consensus number m and data type Y has consensus number n with n > m, then there is no wait-free implementation of an object of type Y using objects of type X and read/write registers in a system with more than m procs. X X X reg reg reg Y
Using Consensus Numbers 35 Proof: Suppose in contradiction there is a wait-free implementation S of Y using X and registers in a system with k processes, where m < k n. Construct consensus algorithm for k > m processes using objects of type X (and registers): Use S to implement some objects of type Y using objects of type X (and registers) Use the (implemented) type Y objects (and registers) in the k-process consensus algorithm that exists since CN(Y) = n.
Corollaries 36 There is no wait-free implementation of any object with consensus number > 1 using just read/write registers. There is no wait-free implementation of any object with consensus number > 2 using just FIFO queues and read/write registers.
Universality 37 Let's now consider positive results relating to consensus number. A data type is universal if objects of that type (together with read/write registers) can wait-free implement any data type. Theorem: If data type X has consensus number n, then it is universal in a system with at most n processes.
Proving Universality Result 38 1. Describe an algorithm that implements any data type uses compare&swap (instead of any object with consensus number n) implementation is only non-blocking, weaker than wait-free 2. Modify to use any object with consensus number n 3. Modify to be wait-free 4. Modify to bound shared memory used
Non-Blocking 39 Non-blocking vs. wait-free is analogous to nodeadlock vs. no-lockout for mutual exclusion. Non-blocking implementation: at any point in an execution, if at least one operation is pending (response is not yet ready to be done), then there is a finite sequence of steps by a single proc that completes one of the pending operations. Does not ensure that every pending operation is eventually completed.
Universal Construction 40 Keep history of operations that have been applied to the implemented object as a shared linked list. To apply an operation on the implemented object, the invoking proc. must insert an appropriate "node" into the linked list: it is convenient to put the newest node at the head of the list A compare&swap object is used to keep track of the head of the list
Details on Linked List 41 Each linked list node has operation invocation new state of the implemented object operation response pointer to previous node (previous op) anchor Head invocation state invocation state initial state response response before before
Implementation 42 Initially Head points to anchor node represents initial state of implemented object When inv is invoked: allocate a new linked list node in shared memory, pointed to by local var point point.inv := inv repeat h := Head // h is a local var point.state, point.response := apply(inv,h.state) point.before := h until compare&swap(head,h,point) = h do the output indicated by point.response depends on implemented data type if Head still points to same node h points to, then make Head point to new node.
Implementation Figure 43 invocation state response before point h p i Head invocation state response before if compare&swap indicates that Head has moved on, then try again to insert the new node, at the new location
Strengthenings of Algorithm 44 To replace compare&swap object with any object with consensus number n (the number of procs): define a consensus object (data type version of consensus problem) get around the difficulty that a consensus object can only be used once by adding a consensus object to each linked list node that points to next node in the list
Strengthenings of Algorithm 45 To get a wait-free implementation, use idea of helping: procs help each other to finish pending operations (not just their own) To reduce the size of the linked list (so it doesn't grow without bound), need to keep track of which list nodes can be recycled.
Effect of Randomization 46 Suppose we relax the liveness condition for linearizable shared memory: operations must terminate with high probability Now a randomized consensus algorithm can be used to implement any data type out of any other data type, including read/write registers I.e., hierarchy collapses.