Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Similar documents
Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention

Spin Locks and Contention. Companion slides for Chapter 7 The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Locks Chapter 4. Overview. Introduction Spin Locks. Queue locks. Roger Wattenhofer. Test-and-Set & Test-and-Test-and-Set Backoff lock

Locking. Part 2, Chapter 11. Roger Wattenhofer. ETH Zurich Distributed Computing

Agenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks

Modern High-Performance Locking

Introduction to Multiprocessor Synchronization

9/28/2014. CS341: Operating System. Synchronization. High Level Construct: Monitor Classical Problems of Synchronizations FAQ: Mid Semester

9/23/2014. Concurrent Programming. Book. Process. Exchange of data between threads/processes. Between Process. Thread.

Spin Locks and Contention Management

Solution: a lock (a/k/a mutex) public: virtual void unlock() =0;

Lecture 7: Mutual Exclusion 2/16/12. slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit

Computer Engineering II Solution to Exercise Sheet Chapter 10

Models of concurrency & synchronization algorithms

Introduction. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Locking Granularity. CS 475, Spring 2019 Concurrent & Distributed Systems. With material from Herlihy & Shavit, Art of Multiprocessor Programming

Programming Paradigms for Concurrency Lecture 3 Concurrent Objects

Module 7: Synchronization Lecture 13: Introduction to Atomic Primitives. The Lecture Contains: Synchronization. Waiting Algorithms.

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Distributed Computing Group

Locking. Part 2, Chapter 7. Roger Wattenhofer. ETH Zurich Distributed Computing

Distributed Computing

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Multiprocessor Synchronization

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Multiprocessors and Locking

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Concurrent Queues, Monitors, and the ABA problem. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Lecture 19: Synchronization. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Spinlocks. Spinlocks. Message Systems, Inc. April 8, 2011

Shared-Memory Computability

Chapter 5. Thread-Level Parallelism

Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data.

Practice: Small Systems Part 2, Chapter 3

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Universality of Consensus. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

Peterson s Algorithm

Lecture 24: Virtual Memory, Multiprocessors

Cache Coherence and Atomic Operations in Hardware

Concurrent Preliminaries

Coarse-grained and fine-grained locking Niklas Fors

Handout 3 Multiprocessor and thread level parallelism

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

Mutual Exclusion. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Advance Operating Systems (CS202) Locks Discussion

Chapter 9 Multiprocessors

Introduction. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit. Art of Multiprocessor Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Other consistency models

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Introduction. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

Lecture 25: Multiprocessors

Page 1. Cache Coherence

The New Java Technology Memory Model

Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

The need for atomicity This code sequence illustrates the need for atomicity. Explain.

Multiprocessor Cache Coherency. What is Cache Coherence?

Chapter-4 Multiprocessors and Thread-Level Parallelism

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

M4 Parallelism. Implementation of Locks Cache Coherence

Implementing Locks. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Scalable Locking. Adam Belay

6.852: Distributed Algorithms Fall, Class 15

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Multiprocessor Systems. Chapter 8, 8.1

Concurrent Counting using Combining Tree

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Role of Synchronization. CS 258 Parallel Computer Architecture Lecture 23. Hardware-Software Trade-offs in Synchronization and Data Layout

Computer Architecture

Programming Paradigms for Concurrency Lecture 2 - Mutual Exclusion

SHARED-MEMORY COMMUNICATION

Chap. 4 Multiprocessors and Thread-Level Parallelism

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs

UNIT I (Two Marks Questions & Answers)

Synchronization. Erik Hagersten Uppsala University Sweden. Components of a Synchronization Even. Need to introduce synchronization.

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Concurrent Queues and Stacks. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

Flynn s Classification

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis

Multiprocessors & Thread Level Parallelism

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Transcription:

Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized (so we forgot to mention a few things) Protocols Elegant Important But naïve Art of Multiprocessor Programming 2

New Focus: Performance Models More complicated (not the same as complex!) Still focus on principles (not soon obsolete) Protocols Elegant (in their fashion) Important (why else would we pay attention) And realistic (your mileage may vary) Art of Multiprocessor Programming 3

Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single data stream SIMD (Vector) Single instruction Multiple data MIMD (Multiprocessors) Multiple instruction Multiple data. Art of Multiprocessor Programming 4

Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single data stream SIMD (Vector) Single instruction Multiple data MIMD (Multiprocessors) Multiple instruction Multiple data. Our space (1) Art of Multiprocessor Programming 5

MIMD Architectures memory Shared Bus Memory Contention Communication Contention Communication Latency Distributed Art of Multiprocessor Programming 6

Today: Revisit Mutual Exclusion Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multiprocessor machine s hardware And get to know a collection of locking algorithms Art of Multiprocessor Programming 7(1)

What Should you do if you can t get a lock? Keep trying spin or busy-wait Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor Art of Multiprocessor Programming 8(1)

What Should you do if you can t get a lock? Keep trying spin or busy-wait Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor our focus Art of Multiprocessor Programming 9

Basic Spin-Lock CS. spin lock critical section Resets lock upon exit Art of Multiprocessor Programming 10

Basic Spin-Lock lock introduces sequential bottleneck CS. spin lock critical section Resets lock upon exit Art of Multiprocessor Programming 11

Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Art of Multiprocessor Programming 12

Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Notice: these are distinct phenomena Art of Multiprocessor Programming 13

Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Seq Bottleneck no parallelism Art of Multiprocessor Programming 14

Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Contention??? Art of Multiprocessor Programming 15

Review: Test-and-Set Boolean value Test-and-set (TAS) Swap true with current value Return value tells if prior value was true or false Can reset just by writing false TAS aka getandset Art of Multiprocessor Programming 16

Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset(boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Art of Multiprocessor Programming 17(5)

Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset(boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Package java.util.concurrent.atomic Art of Multiprocessor Programming 18

Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset(boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Swap old and new values Art of Multiprocessor Programming 19

Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getandset(true) Art of Multiprocessor Programming 20

Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getandset(true) Swapping in true is called test-and-set or TAS Art of Multiprocessor Programming 21(5)

Test-and-Set Locks Locking Lock is free: value is false Lock is taken: value is true Acquire lock by calling TAS If result is false, you win If result is true, you lose Release lock by writing false Art of Multiprocessor Programming 22

Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Art of Multiprocessor Programming 23

Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean Art of Multiprocessor Programming 24

Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired Art of Multiprocessor Programming 25

Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); Release lock by resetting state to false void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Art of Multiprocessor Programming 26

Space Complexity TAS spin-lock has small footprint N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the (n) lower bound? We used a RMW operation Art of Multiprocessor Programming 27

Performance Experiment n threads Increment shared counter 1 million times How long should it take? How long does it take? Art of Multiprocessor Programming 28

time Graph no speedup because of sequential bottleneck ideal threads Art of Multiprocessor Programming 29

Mystery #1 TAS lock time threads Ideal What is going on? Art of Multiprocessor Programming 30(1)

Test-and-Test-and-Set Locks Lurking stage Wait until lock looks free Spin while read returns true (lock taken) Pouncing state As soon as lock looks available Read returns false (lock free) Call TAS to acquire lock If TAS loses, back to lurking Art of Multiprocessor Programming 31

Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Art of Multiprocessor Programming 32

Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Wait until lock looks free Art of Multiprocessor Programming 33

Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Then try to acquire it Art of Multiprocessor Programming 34

Mystery #2 TAS lock time TTAS lock Ideal threads Art of Multiprocessor Programming 35

Mystery Both TAS and TTAS Do the same thing (in our model) Except that TTAS performs much better than TAS Neither approaches ideal Art of Multiprocessor Programming 36

Opinion Our memory abstraction is broken TAS & TTAS methods Are provably the same (in our model) Except they aren t (in field tests) Need a more detailed model Art of Multiprocessor Programming 37

Bus-Based Architectures cache cache Bus cache memory Art of Multiprocessor Programming 38

Bus-Based Architectures Random access memory (10s of cycles) cache cache Bus cache memory Art of Multiprocessor Programming 39

Bus-Based Architectures Shared Bus Broadcast medium One broadcaster at a time Processors and memory all snoop cache cache Bus cache memory Art of Multiprocessor Programming 40

Per-Processor Caches Small Fast: 1 or 2 cycles Address & state information Bus-Based Architectures cache cache Bus cache memory Art of Multiprocessor Programming 41

Jargon Watch Cache hit I found what I wanted in my cache Good Thing Art of Multiprocessor Programming 42

Jargon Watch Cache hit I found what I wanted in my cache Good Thing Cache miss I had to shlep all the way to memory for that data Bad Thing Art of Multiprocessor Programming 43

Cave Canem This model is still a simplification But not in any essential way Illustrates basic principles Will discuss complexities later Art of Multiprocessor Programming 44

Processor Issues Load Request cache cache Bus cache memory data Art of Multiprocessor Programming 45

Processor Issues Load Gimme data Request cache cache Bus cache memory data Art of Multiprocessor Programming 46

Memory Responds cache cache cache Got your data right here Bus memory data Bus Art of Multiprocessor Programming 47

Processor Issues Load Gimme data Request data cache Bus cache memory data Art of Multiprocessor Programming 48

Processor Issues Load Gimme data Request data cache Bus cache memory data Art of Multiprocessor Programming 49

Processor Issues Load I got data Request data cache Bus cache memory data Art of Multiprocessor Programming 50

Other Processor Responds I got data data cache cache Bus Bus memory data Art of Multiprocessor Programming 51

Other Processor Responds data cache cache Bus Bus memory data Art of Multiprocessor Programming 52

Modify Cached Data data data Bus cache memory data Art of Multiprocessor Programming 53(1)

Modify Cached Data data data Bus cache memory data Art of Multiprocessor Programming 54(1)

Modify Cached Data data data Bus cache memory data Art of Multiprocessor Programming 55

Modify Cached Data data data Bus cache What s up with the other copies? memory data Art of Multiprocessor Programming 56

Cache Coherence We have lots of copies of data Original copy in memory Cached copies at processors Some processor modifies its own copy What do we do with the others? How to avoid confusion? Art of Multiprocessor Programming 57

Write-Back Caches Accumulate changes in cache Write back when needed Need the cache for something else Another processor wants it On first modification Invalidate other entries Requires non-trivial protocol Art of Multiprocessor Programming 58

Write-Back Caches Cache entry has three states Invalid: contains raw seething bits Valid: I can read but I can t write Dirty: Data has been modified Intercept other load requests Write back to memory before using cache Art of Multiprocessor Programming 59

Invalidate data data Bus cache memory data Art of Multiprocessor Programming 60

Invalidate Mine, all mine! data data Bus cache memory data Art of Multiprocessor Programming 61

Invalidate Uh,oh cache data data Bus cache memory data Art of Multiprocessor Programming 62

Invalidate Other caches lose read permission cache data Bus cache memory data Art of Multiprocessor Programming 63

Invalidate Other caches lose read permission cache data Bus cache This cache acquires write permission memory data Art of Multiprocessor Programming 64

Invalidate Memory provides data only if not present in any cache, so no need to change it now (expensive) cache data Bus cache memory data Art of Multiprocessor Programming 65(2)

Another Processor Asks for Data cache data Bus cache memory data Art of Multiprocessor Programming 66(2)

Owner Responds Here it is! cache data Bus cache memory data Art of Multiprocessor Programming 67(2)

End of the Day data data Bus cache memory data Reading OK, no writing Art of Multiprocessor Programming 68(1)

Mutual Exclusion What do we want to optimize? Bus bandwidth used by spinning threads Release/Acquire latency Acquire latency for idle lock Art of Multiprocessor Programming 69

Simple TASLock TAS invalidates cache lines Spinners Miss in cache Go to bus Thread wants to release lock delayed behind spinners Art of Multiprocessor Programming 70

Test-and-test-and-set Wait until lock looks free Spin on local cache No bus use while lock busy Problem: when lock is released Invalidation storm Art of Multiprocessor Programming 71

Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Art of Multiprocessor Programming 72

Local Spinning while Lock is Busy busy busy Bus busy memory busy Art of Multiprocessor Programming 73

On Release invalid invalid Bus free memory free Art of Multiprocessor Programming 74

Everyone misses, rereads On Release miss invalid miss invalid Bus free memory free Art of Multiprocessor Programming 75(1)

Everyone tries TAS On Release TAS( ) invalid TAS( ) invalid Bus free memory free Art of Multiprocessor Programming 76(1)

Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Art of Multiprocessor Programming 77

Problems Everyone misses Reads satisfied sequentially Everyone does TAS Invalidates others caches Eventually quiesces after lock acquired How long does this take? Art of Multiprocessor Programming 78

Measuring Quiescence Time X = time of ops that don t use the bus Y = time of ops that cause intensive bus traffic P 1 P 2 P n In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance. By gradually varying X, can determine the exact time to quiesce. Art of Multiprocessor Programming 79

Quiescence Time time Increses linearly with the number of processors for bus architecture threads Art of Multiprocessor Programming 80

Mystery Explained TAS lock time threads TTAS lock Ideal Better than TAS but still not as good as ideal Art of Multiprocessor Programming 81

Solution: Introduce Delay If the lock looks free But I fail to get it There must be lots of contention Better to back off than to collide again time r 2 d r 1 d d spin lock Art of Multiprocessor Programming 82

Dynamic Example: Exponential Backoff time 4d 2d d spin lock If I fail to get lock wait random duration before retry Each subsequent failure doubles expected wait Art of Multiprocessor Programming 83

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Art of Multiprocessor Programming 84

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay Art of Multiprocessor Programming 85

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free Art of Multiprocessor Programming 86

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return Art of Multiprocessor Programming 87

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration Art of Multiprocessor Programming 88

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason Art of Multiprocessor Programming 89

Spin-Waiting Overhead TTAS Lock time Backoff lock threads Art of Multiprocessor Programming 90

Backoff: Other Issues Good Easy to implement Beats TTAS lock Bad Must choose parameters carefully Not portable across platforms Art of Multiprocessor Programming 91

Idea Avoid useless invalidations By keeping a queue of threads Each thread Notifies next in line Without bothering the others Art of Multiprocessor Programming 92

Anderson Queue Lock next idle flags T F F F F F F F Art of Multiprocessor Programming 93

Anderson Queue Lock next acquiring getandincrement flags T F F F F F F F Art of Multiprocessor Programming 94

Anderson Queue Lock next acquiring getandincrement flags T F F F F F F F Art of Multiprocessor Programming 95

Anderson Queue Lock next acquired Mine! flags T F F F F F F F Art of Multiprocessor Programming 96

Anderson Queue Lock next acquired acquiring flags T F F F F F F F Art of Multiprocessor Programming 97

Anderson Queue Lock next acquired acquiring flags getandincrement T F F F F F F F Art of Multiprocessor Programming 98

Anderson Queue Lock next acquired acquiring flags getandincrement T F F F F F F F Art of Multiprocessor Programming 99

Anderson Queue Lock next acquired acquiring flags T F F F F F F F Art of Multiprocessor Programming 100

Anderson Queue Lock next released acquired flags T T F F F F F F Art of Multiprocessor Programming 101

Anderson Queue Lock next released acquired flags Yow! T T F F F F F F Art of Multiprocessor Programming 102

Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; Art of Multiprocessor Programming 103

Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; One flag per thread Art of Multiprocessor Programming 104

Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; Next flag to use Art of Multiprocessor Programming 105

Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> myslot; Thread-local variable Art of Multiprocessor Programming 106

Anderson Queue Lock public lock() { myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Art of Multiprocessor Programming 107

Anderson Queue Lock public lock() { myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Take next slot Art of Multiprocessor Programming 108

Anderson Queue Lock public lock() { myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Spin until told to go Art of Multiprocessor Programming 109

Anderson Queue Lock public lock() { myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Prepare slot for re-use Art of Multiprocessor Programming 110

Anderson Queue Lock public lock() { Tell next thread to go myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Art of Multiprocessor Programming 111

Performance TTAS queue Shorter handover than backoff Curve is practically flat Scalable performance FIFO fairness Art of Multiprocessor Programming 112

Anderson Queue Lock Good First truly scalable lock Simple, easy to implement Bad Space hog One bit per thread Unknown number of threads? Small number of actual contenders? Art of Multiprocessor Programming 113

CLH Lock FIFO order Small, constant-size overhead per thread See part 2 Art of Multiprocessor Programming 114

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Art of Multiprocessor Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming 115