Non-blocking Array-based Algorithms for Stacks and Queues Niloufar Shafiei
Outline Introduction Concurrent stacks and queues Contributions New algorithms New algorithms using bounded counter values Correctness Time analysis Model checking Implementations and comparisons Conclusion and future work 7/15/13 2
Asynchronous distributed shared memory Processes communicate through shared memory. Each process has its own independent clock. Mutual exclusion (using locks) Disadvantages Not reliable and fault tolerant Priority inversions Deadlock 7/15/13 3
Non-blocking and wait-free algorithms Non-blocking (lock-free) Some operations complete in a finite number of steps. Wait-free All operations complete in a finite number of steps. Advantage Immune to deadlock Robust performance Disadvantage Complex and subtle 7/15/13 4
Linearizability Correctness condition for shared objects 7/15/13 5
Linearizability Correctness condition for shared objects time push(v 1 )? push(v 2 ) pop stack 7/15/13 6
Linearizability Correctness condition for shared objects time push(v 1 ) X push(v 2 ) X v 2 X pop empty v 1 stack 7/15/13 7
Linearizability Correctness condition for shared objects time push(v 1 ) X X push(v 2 ) pop X v 2 stack 7/15/13 8 v 1
Compare and Swap Impossible to construct some shared objects using only atomic read/write registers Use universal synchronization primitives Compare and Swap (C&S) 7/15/13 9
Compare and Swap Impossible to construct some shared object using only atomic read/write registers Use universal synchronization primitives Compare and Swap (C&S) C&S(X, old, new) if (X = old) X := new return true else return false 7/15/13 10
ABA problem X = A old = X. : X has not been changed C&S(X, old, new) 7/15/13 11
ABA problem X = A old = X. X = B : : X = A C&S(X, old, new) X has not been changed 7/15/13 12
ABA problem X = A old = X. X = B : : X = A C&S(X, old, new) X has not been changed Solution: counter values 7/15/13 13
Concurrent stacks and queues Fundamental data structures in distributed systems Application: parallel applications such as garbage collection and operating systems Two main categories Link-based Array-based 7/15/13 14
Link-based versus array-based Link-based Extra space required for pointers Potential memory fragmentation and memory management overhead Array-based Compact data structure Leave enough space in word for counters Fixed size Good locality of reference 7/15/13 15
Contributions Two non-blocking array-based algorithms for stacks and two non-blocking array-based algorithms for queues A shared array implement shared stack or queue Shared variables (Top, Rear, Front) store the index of the top or bottom element of stack or queue C&S primitive Counter values 7/15/13 16
New algorithms Linearization points of successful operations: Successful C&S on Top, Rear, Front Linearization points of operations that return Empty or Full: Last reading of Top, Rear, Front 7/15/13 17
New algorithms Linearization points of successful operations: Successful C&S on Top, Rear, Front Linearization points of operations that return Empty or Full: Last reading of Top, Rear, Front Structure of operations Operation push/pop/enqueue/dequeue: Loop: Read Top/Rear/Front : Update array : C&S on Top/Rear/Front to help previous operation 7/15/13 18
New algorithms Execution: an interleaving of steps of processes time X X X X Change Top/Rear Change Top/Rear Change Top/Rear Change Top/Rear 7/15/13 19
New algorithms Execution: an interleaving of steps of processes time X X X X Change Top/Rear Update array Change Top/Rear Update array Change Top/Rear Update array Change Top/Rear 7/15/13 20
New algorithms Execution: an interleaving of steps of processes time X X X X Change Top/Rear Update array Change Top/Rear Update array Change Top/Rear Update array Change Top/Rear Update array based on information in Top/Rear Index Value Counter value(s) 7/15/13 21
New algorithms 1. Non-blocking stack and queue algorithms using unbounded counter values 2. Non-blocking stack and queue algorithms using bounded counter values Reuse counter values Employ collect object 7/15/13 22
Collect object Collect object Store Store a process-value pair Collect Collect a set of process-value pairs of all processes that have stored process-value pairs 7/15/13 23
New algorithms using bounded Structure of operations counter values Operation push/pop/enqueue/dequeue: Loop: inner loop: Read Top/Rear/Front : store counter values into collect object : if Top/Rear/Front has not been changed exit inner loop : Update array : collect collect object to know counter values which are in use choose new counter value : C&S on Top/Rear/Front store Ø into collect object 7/15/13 24
New algorithms using bounded counter values A loop iteration of an operation: time Store counters Into Collect object 7/15/13 25
New algorithms using bounded counter values A loop iteration of an operation: time Store counters Into Collect object X Last reading Top/Rear/Front 7/15/13 26
New algorithms using bounded counter values A loop iteration of an operation: time Store counters Into Collect object X Last reading Top/Rear/Front Try to update array 7/15/13 27
New algorithms using bounded counter values A loop iteration of an operation: time Store counters Into Collect object X Last reading Top/Rear/Front Try to update array Collect the Collect object 7/15/13 28
New algorithms using bounded counter values A loop iteration of an operation: time Store counters Into Collect object X Last reading Top/Rear/Front Try to update array Collect the Collect object + Choose new counter 7/15/13 29
New algorithms using bounded counter values A loop iteration of an operation: time Store counters Into Collect object X Last reading Top/Rear/Front Try to update array Collect the Collect object + Choose new counter X Change Top/Rear/Front 7/15/13 30
Correctness Shared variables (Top, Rear, Front) has not been changed from last reading them to changing them with C&S How exactly shared array is changed What happened in data structure exactly matches with abstract stack/queue Operations return results consistent with their linearization order 7/15/13 31
Time analysis An operation can take arbitrarily many steps as long as some other operation is making progress Amortized analysis to evaluate the system as a whole Assign blame in unsuccessful loop iteration to other operations that did successfully change the shared variables The worst-case amortized cost of our algorithms depends only on point contention Point contention: maximum number of process running concurrently at a given point of time 7/15/13 32
Time analysis time op 1 op 2 op 3 op 4 7/15/13 33
Time analysis time op 1 blame op 4 op 2 blame op 4 op 3 blame op 4 T 1 op 4 7/15/13 34
Time analysis time op 1 blame op 4 op 2 blame op 4 op 3 blame op 4 T 1 op 4 7/15/13 35
Time analysis time op 1 blame op 4 blame op 2 T 2 op 2 blame op 4 op 3 blame op 4 blame op 2 op 4 T 1 7/15/13 36
Time analysis time op 1 blame op 4 blame op 2 T 2 op 2 blame op 4 op 3 blame op 4 blame op 2 op 4 T 1 7/15/13 37
Time analysis time op 1 blame op 4 blame op 2 blame op 3 op 2 blame op 4 T 2 T 3 op 3 blame op 4 blame op 2 op 4 T 1 7/15/13 38
Time analysis time T 4 op 1 blame op 4 blame op 2 blame op 3 op 2 blame op 4 T 2 T 3 op 3 blame op 4 blame op 2 op 4 T 1 7/15/13 39
Time analysis time T 4 op 1 blame op 4 blame op 2 blame op 3 op 2 blame op 4 T 2 T 3 op 3 blame op 4 blame op 2 T 1 op 4 Number of unsuccessful loop iteration: ( Point contention(t i ) -1 ) 7/15/13 40
Spin model checker Model checking Define abstract stack/queue variables Atomically change abstract stack/queue at linearization points of successful operations At linearization points, assert that the contents of shared data structures are the same as the state of the abstract stack/queue Define end-state labels when operations return to make sure all operations terminate 7/15/13 41
Model checking Verify our algorithms for four operations and array size of three Use exhaustive search Partial reduction 7/15/13 42
Implementations Compare our stack algorithms using unbounded counter values Treiber s stack algorithm Compare our queue algorithms using unbounded counter values Queue algorithm of Michael and Scott Array-based queue algorithm of Colvin and Groves Implementations java (java.util.concurrent.atomic) System with two quad processors 7/15/13 43
Comparison Compare in both low and high contentions Total number of operations is constant (1441440) in each execution Increase number of threads 50 runs 7/15/13 44
Comparison Compare in both low and high contentions Total number of operations is constant (1441440) in each execution Increase number of threads 50 runs possible Point contention 2 Thread 1 Thread 2 720720 operations 720720 operations 7/15/13 45
Comparison Compare in both low and high contentions Total number of operations is constant (1441440) in each execution Increase number of threads 50 runs possible Point contention 2 Thread 1 Thread 2 720720 operations 720720 operations 4 Thread 1 360360 operations Thread 2 360360 operations Thread 3 360360 operations Thread 4 360360 operations 7/15/13 46
Comparison Compare in both low and high contentions Total number of operations is constant (1441440) in each execution Increase number of threads 50 runs possible Point contention 2 Thread 1 Thread 2 720720 operations 720720 operations Thread 1 Thread 2 Thread 3 4 360360 operations 360360 operations 360360 operations Thread 1 Thread 2 : : Thread 3 32 45045 operations 45045 operations 45045 operations Thread 4 360360 operations Thread 32 45045 operations 7/15/13 47
Comparison of concurrent stack algorithms 7/15/13 48
Comparison of concurrent queue algorithms 7/15/13 49
Conclusions Propose new array-based algorithms for stacks and queues Prove their correctness Verify them by the Spin model checker Amortized time complexity of an operation depends on point contention Implementation and comparison Compared to Treiber s stack implementation, our stack algorithm is more scalable Our queue implementation outperforms Michael and Scott queue algorithm Our stack implementation is first practical array-based stack implementation It is the first time that bounded counter values are used to implement shared stack and queue 7/15/13 50
Future work Improvement of memory reclamation technique of link-based algorithms Optimal in general Do not increase the contention of algorithms 7/15/13 51
Thank you!