Blocking Non-blocking Caveat:

Size: px

Start display at page:

Download "Blocking Non-blocking Caveat:"

Dayna Goodwin
6 years ago
Views:

1 Overview of Lecture 5 1 Progress Properties 2 Blocking The Art of Multiprocessor Programming. Maurice Herlihy and Nir Shavit. Morgan Kaufmann, Deadlock-free: some thread trying to get the lock eventually succeeds. Starvation-free: every thread trying to get the lock eventually succeeds. Lock-Based Data Structures The Problem with Locking Non-blocking Lock-free: some thread calling a method eventually returns. Wait-free: every thread calling a method eventually returns. Scalability & Measuring It Wait-Free Data Structures Lock-Free Data Structures Chapter 8 Section 1.5 Chapter 10 Lock- and wait-freeness disallow blocking methods like locks. They guarantee that the system can cope with crash-failures. Picking a progress property for a given application again depends on its needs. Wait-Free FIFO Queue (1) 3 Wait-Free FIFO Queue (2) 4 TAMP Fig. 3.3 (p.48), without exceptions Single Enqueuer/Single Dequeuer TAMP Fig. 3.3 (p.48), without exceptions class WaitFreeQueue<T> volatile int head = 0, tail = 0; T[] items; public void enq(t x) while (tail - head == items.length) /* spin */ public WaitFreeQueue(int capacity) items = (T[])new Object[capacity]; Identical to Lock-Based Queue, but without locks or synchronized! items[tail % items.length] = x; tail++; Why is spinning here not fatal? public void enq(t x) public T deq() Caveat: Single Enqueuer/Single Dequeuer! public T deq() while (tail == head) /* spin */ T x = items[head % items.length]; head++; return x; Why does it work without locks?

SE/SD Wait-Free FIFO Queue 5 Linearizability 6 Intuitively, this algorithm is correct for the following reasons: Only the enqueuer writes to tail and items[ ], and only the dequeuer writes to head.

2 SE/SD Wait-Free FIFO Queue 5 Linearizability 6 Intuitively, this algorithm is correct for the following reasons: Only the enqueuer writes to tail and items[ ], and only the dequeuer writes to head. The condition while (tail - head == capacity) stops the enqueuer from overwriting an element in the queue before the dequeuer has read it. head is volatile, and the dequeuer only increases head after it read items[head % capacity] The condition while (tail == head) stops the dequeuer from reading an element in the queue before the enqueuer has placed it in the queue. Here it is used that tail is volatile, and the enqueuer only increases tail after it wrote to items[tail % capacity]. Principle (TAMP) Each method call should appear to take effect instantaneously at some moment between its invocation and return. Real-time behavior of method calls must be preserved. Object is linearizable, if all its possible executions are linearizable. SE/SD WF FIFO is Linearizable 7 Java AtomicReference 8 public void enq(t x) while (tail - head == items.length); items[tail % items.length] = x; tail++; public T deq() while (tail == head); T x = items[head % items.length]; return x; head++; As the linearization point of the enqueuer, for any execution, we can take tail++. As the linearization point of the dequeuer, for any execution, we can take head++. Then any execution of this queue is linearizable. On the hardware level, before a read or write operation is performed, first the bus (between processors and memory) must be locked. A read-modify-write operation consists of a read followed by a write, where in the meantime the lock on the bus is maintained. The write value may be determined using the value returned by the read. In Java, some standard read-modify-write registers are: getandset(v): assign v, and return the prior value. compareandset(e,u): if the prior value is e, then replace it by u, else leave it unchanged; return a boolean to indicate whether the value was changed. (get(): returns current value)

3 Lock-Free Stack 9 Lock-Free Stack 10 Treiber s Lock-Free Stack public class LockFreeStack<T> AtomicReference<Node<T>> head = new AtomicReference<Node<T>>(); public void push(t item) public T pop() static class Node<T> final T item; Node<T> next; public Node(T item) this.item = item; Treiber s Lock-Free Stack public void push(t item) Node<T> newhead = new Node<T>(item); Node<T> oldhead; do oldhead = head.get(); newhead.next = oldhead; while (!head.compareandset(oldhead, newhead)); public T pop() Node<T> oldhead; Node<T> newhead; do oldhead = head.get(); if (oldhead == null) return null; newhead = oldhead.next; while (!head.compareandset(oldhead, newhead)); return oldhead.item; Why does it work? Linearization Points? Best-known low-load method, but scales poorly due to contention and inherent sequential bottleneck Better: Elimination Backoff Stack [TAMP Sec. 11.4, p.249] Free List: An Optimization 11 Invoking the ABA Problem 12 Suppose stack prefix is A B S Observation: Lots of Nodes created and thrown away (Allocation overhead) Solution: Reuse them, e.g., with a free list: When popped, put Node on free list On push, take Node from free list Only allocate fresh Node, when free list empty Thread 1 about to CAS head from Node A to B in pop() Thread 2 pops Nodes A and B, and puts them on free list Stack now S Thread 2 pushes A: stack now A S head has changed back to A! (That can t be good ) Thread 1 wakes up and head.cas(a,b) succeeds stack now B S, but B is already on free list! General Problem of CAS; Needs care when reusing values!

Overview of Lecture 5 13 Concurrent and Distributed Programming http://fmt.cs.utwente.nl/courses/cdp/ The Art of Multiprocessor Programming. Maurice Herlihy and Nir Shavit. Morgan Kaufmann, 2008.

5 Chapter 10 Lock-Free Data Structures CDP #6 http://fmt.cs.utwente.nl/~michaelw/ Overview of Lecture 6 The Art of Multiprocessor Programming. Maurice Herlihy and Nir Shavit. Morgan Kaufmann, 2008.

4 Overview of Lecture 5 13 Concurrent and Distributed Programming The Art of Multiprocessor Programming. Maurice Herlihy and Nir Shavit. Morgan Kaufmann, Lock-Based Data Structures Concurrent Programming The Problem with Locking HC 6 - Wednesday 21 December 2011 Scalability & Measuring It Wait-Free Data Structures Chapter 8 Section 1.5 Chapter 10 Lock-Free Data Structures CDP #6 Overview of Lecture 6 The Art of Multiprocessor Programming. Maurice Herlihy and Nir Shavit. Morgan Kaufmann, Java AtomicReference 16 On the hardware level, before a read or write operation is performed, first the bus (between processors and memory) must be locked. A read-modify-write operation consists of a read followed by a write, where in the meantime the lock on the bus is maintained. The write value may be determined using the value returned by the read. Patterns in Lock-Free Data Structures (Chapter 10) Alternatives to Lock-Based/Lock-Free Concurrency In Java, some standard read-modify-write registers are: getandset(v): assign v, and return the prior value. compareandset(e,u): if the prior value is e, then replace it by u, else leave it unchanged; return a boolean to indicate whether the value was changed. Channels (get(): returns current value)

Unbounded Locked Queue (1) 17 Unbounded Locked Queue (2) 18 TAMP Sec. 10.4 (p.

5 Unbounded Locked Queue (1) 17 Unbounded Locked Queue (2) 18 TAMP Sec (p.229) class UnboundedQueue<T> final Lock enqlock = new ReentrantLock(); final Lock deqlock = new ReentrantLock(); final Condition notempty = deqlock.newcondition(); Node<T> head, tail; public UnboundedQueue() head = tail = new Node<T>(null); public void enq(t x) public T deq() Sentinel Linearization Points public void enq(t x) enqlock.lock(); try Node<T> e = new Node<T>(x); tail.next = e; tail = e; notempty.signal(); finally enqlock.unlock(); TAMP Fig & 10.8 (p.229), with modifications public T deq() deqlock.lock(); try while (head.next == null) notempty.await(); T result = head.next.value; head = head.next; return result; finally deqlock.unlock(); Independent locks: Enqueuer works on tail, Dequeuer works on head R/W Data Race, Solutions? Unbounded Lock-Free Queue (1) 19 Unbounded LF Queue: Example 20 TAMP Fig (p.230), with modifications class Node<T> public final T value; public AtomicReference<Node<T>> next; public Node(T value) this.value = value; next = new AtomicReference<Node<T>>(); class LockFreeQueue<T> AtomicReference<Node<T>> head, tail; Sentinel public LockFreeQueue() Node<T> n = new Node<T>(null); head = tail = new AtomicReference<Node<T>>(n); public void enq(t x) public T deq() enq() (1) append node (2) update tail deq() (1) read value head.next (2) update head

Unbounded Lock-Free Queue (2) 21 Unbounded Lock-Free Queue (3) 22 TAMP Fig. 10.10 (p.230) public void enq(t x) Node<T> node = new Node<T>(x); while (true) Node<T> last = tail.

6 Unbounded Lock-Free Queue (2) 21 Unbounded Lock-Free Queue (3) 22 TAMP Fig (p.230) public void enq(t x) Node<T> node = new Node<T>(x); while (true) Node<T> last = tail.get(); Node<T> next = last.next.get(); if (last == tail.get()) if (next == null) if (last.next.compareandset(next, node)) tail.compareandset(last, node); return; else tail.compareandset(last, next); locate last node validate value of next check if really last append new node update tail help advance lagging tail Linearization Point locate last/append/tail update not atomic: other threads must help completing half-finished enq() calls TAMP Fig (p.231) public T deq() while (true) Node<T> first = head.get(); Node<T> last = tail.get(); Node<T> next = first.next.get(); if (first == head.get()) if (first == last) if (next == null) return null; tail.compareandset(last, next); else T value = next.value; if (head.compareandset(first, next)) return value; consistency check, as before check if queue not empty advance lagging tail try again if another thread snatched value As before: deq() must also help with half-finished enq() call Channels 23 Channels (1) 24 So far, concurrent programming constructs used shared memory. We now look at communications, in which processes send and receive messages to and from each other. Models for communications synchronous: exchange of a message is an atomic action asynchronous: messages are buffered through a (finite) buffer usually implemented in software adressing: (a)symmetric data flow: one way, duplex Channels were introduced by C.A.R. Hoare in CSP [1985].

7 Channels (2) 25 Channels in Promela (1) 26 Channel connects a sending and a receiving process is typed: the type of messages that are send have to be declared Alg. 6.8 (p. 119) finite queue of datatype buffer empty queue semaphore notempty (0, ) semaphore notfull (N, ) producer datatype d loop forever p1: d produce p2: wait(notfull) p3: append(d,buffer) p4: signal(notempty) consumer With channels this can be much shorter. datatype d loop forever q1: wait(notempty) q2: d take(buffer) q3: signal(notfull) q4: consume(d) Only integers can be sent over channel ch producer integer x loop forever p1: x produce p2: ch x channel of integer ch Alg. 8.1 (p. 182) consumer integer y loop forever q1: ch y q2: consume(y) ch x sending a message to ch ch x receiving a message from ch Communication between Promela processes is via channels: asynchronous (buffered) synchronous (handshake / rendez-vous) Both are defined as chan variables: chan name = [cap] of t1, t2,..., tn; name of the channel examples capacity of the channel: cap 0 cap = 0 is special case: rendez-vous chan ch = [1] of bit; chan tor = [2] of mtype, bit; chan line[2] = [1] of mtype, Msg; array of channels Promela channels are typed! type of the elements that will be transmitted over the channel Channels in Promela (2)! sending - putting a message in a channel ch! expr1, expr2,..., exprn; A send statement is executable if the channel is not full. The types of expri should correspond with the types ti of the channel declaration.? receiving - getting a message out of a channel A receive statement is executable if the channel is not empty. message passing If the channel is not empty, the message is fetched from the channel and the individual elements of the message are stored in the vari s. message testing ch? var1, var2,..., varn; If the channel is not empty and the message at front of the channel evaluates to the individual consti values, the statement is executable and the message is removed from the channel. >0 27 chan name = [cap] of t1, t2,..., tn; ch? const1, const2,..., constn; Promela supports some exotic variations: sorted send and random receive. vari and consti can be mixed, of course. Channels in Promela (3) Rendez-vous communication capacity = 0 the number of elements in the buffer is now zero A synchronous rendez-vous communication is executable, if i) a sending ch! is enabled, and ii) there is a corresponding receiving ch? in some other process that is ready to be executed, and iii) the sending expressions match the receiving constants If a rendez-vous is executable, both statements will handshake and execute the rendez-vous together in a single transition. Beware of atomic constructs: if the send operation is in an atomic sequence, its atomicity will be lost (but may be transferred to the receiver). 28 chan name = [0] of t1, t2,..., tn; If the sender is ready to send, but the receiver is not ready to receive, the sender is blocked. Similarly, if the receiver is ready to receive before the sender is ready to send, the receiver is blocked. The rendez-vous communication is the only construct in Promela where two processes perform statements at the same time (more or less).

Alternating Bit Protocol 29 Dining Philosophers 30 Alternating Bit Protocol: To every message, the sender adds a bit. The receiver acknowledges each message by sending the received bit back.

If the sender is sure that the receiver has correctly received the previous message, it sends a new message and it alternates the accompanying bit.

8 Alternating Bit Protocol 29 Dining Philosophers 30 Alternating Bit Protocol: To every message, the sender adds a bit. The receiver acknowledges each message by sending the received bit back. To receiver only accepts messages with a bit that it expected to receive. If the sender is sure that the receiver has correctly received the previous message, it sends a new message and it alternates the accompanying bit. mtype MSG, ACK; chan s2r = [2] of mtype, bit; chan r2s = [2] of mtype, bit; proctype Sender(chan in, out) bit sendbit, recvbit; do :: out! MSG, sendbit -> in? ACK, recvbit; if :: recvbit == sendbit -> sendbit = 1-sendbit :: else fi od proctype Receiver(chan in, out) bit recvbit; do :: in? MSG(recvbit) -> out! ACK(recvbit); od init run Sender(r2s, s2r); run Receiver(s2r, r2s); ch! MSG, par1, ch! MSG(par1, ) ch? MSG, par1, ch? MSG(par1, ) In this simple model we assume perfect lines: no messages are lost. byte n; chan forks[4] = [0] of bool ; proctype Phil(byte id; chan left; chan right) do :: left? _; right? _; n++; printf("phil %d eating, total = %d\n", id, n); n--; right! true; left! true; od proctype Fork(chan ch) do :: ch! true; ch? _; od write-only variable: we are not interested in the actual value: only in the synchronization init atomic run Fork(forks[0]); run Fork(forks[1]); run Fork(forks[2]); run Fork(forks[3]); run Phil(0, forks[0], forks[1]); run Phil(1, forks[1], forks[2]); run Phil(2, forks[2], forks[3]); run Phil(3, forks[3], forks[0]); Lecture 6: Concurrent Programming Lecture 6: Concurrent Programming Overview of Lecture 6 31 The Art of Multiprocessor Programming. Maurice Herlihy and Nir Shavit. Morgan Kaufmann, Patterns in Lock-Free Data Structures (Chapter 10) Alternatives to Lock-Based/Lock-Free Concurrency Channels

CSE 613: Parallel Programming. Lecture 17 ( Concurrent Data Structures: Queues and Stacks )

CSE 613: Parallel Programming. Lecture 17 ( Concurrent Data Structures: Queues and Stacks ) CSE 613: Parallel Programming Lecture 17 ( Concurrent Data Structures: Queues and Stacks ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2012 Desirable Properties of Concurrent