Spin Locks and Contention. Companion slides for Chapter 7 The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Size: px

Start display at page:

Download "Spin Locks and Contention. Companion slides for Chapter 7 The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit"

Martina Garrett
5 years ago
Views:

1 Spin Locks and Contention Companion slides for Chapter 7 The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

2 Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized (so we forgot to mention a few things) Protocols Elegant Important But naïve Art of Multiprocessor Programming 2

3 New Focus: Performance Models More complicated (not the same as complex!) Still focus on principles (not soon obsolete) Protocols Elegant (in their fashion) Important (why else would we pay attention) And realistic (your mileage may vary) Art of Multiprocessor Programming 3

4 Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single stream SIMD (Vector) Single instruction Multiple MIMD (Multiprocessors) Multiple instruction Multiple. Art of Multiprocessor Programming 4

5 Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single stream SIMD (Vector) Single instruction Multiple MIMD (Multiprocessors) Multiple instruction Multiple. Our space (1) Art of Multiprocessor Programming 5

6 MIMD Architectures memory Shared Bus Memory Contention Communication Contention Communication Latency Distributed Art of Multiprocessor Programming 6

7 Today: Revisit Mutual Exclusion Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multiprocessor machine s hardware And get to know a collection of locking algorithms Art of Multiprocessor Programming 7 (1)

8 What Should you do if you can t get a lock? Keep trying spin or busy-wait Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor Art of Multiprocessor Programming 8 (1)

9 What Should you do if you can t get a lock? Keep trying spin or busy-wait Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor our focus Art of Multiprocessor Programming 9

10 Basic Spin-Lock CS. spin lock critical section Resets lock upon exit Art of Multiprocessor Programming 10

11 Basic Spin-Lock lock introduces sequential bottleneck CS. spin lock critical section Resets lock upon exit Art of Multiprocessor Programming 11

12 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Art of Multiprocessor Programming 12

13 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Notice: these are distinct phenomena Art of Multiprocessor Programming 13

14 Basic Spin-Lock lock suffers from contention. spin lock CS critical section Resets lock upon exit Seq Bottleneck à no parallelism Art of Multiprocessor Programming 14

15 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Contention à??? Art of Multiprocessor Programming 15

16 Review: Test-and-Set Boolean value Test-and-set (TAS) Swap true with current value Return value tells if prior value was true or false Can reset just by writing false TAS aka getandset Art of Multiprocessor Programming 16

17 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset(boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Art of Multiprocessor Programming 17(5)

18 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset(boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Package java.util.concurrent.atomic Art of Multiprocessor Programming 18

19 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset(boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Swap old and new values Art of Multiprocessor Programming 19

20 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getandset(true) Art of Multiprocessor Programming 20

21 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getandset(true) Swapping in true is called test-and-set or TAS Art of Multiprocessor Programming 21(5)

22 Test-and-Set Locks Locking Lock is free: value is false Lock is taken: value is true Acquire lock by calling TAS If result is false, you win If result is true, you lose Release lock by writing false Art of Multiprocessor Programming 22

23 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Art of Multiprocessor Programming 23

24 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean Art of Multiprocessor Programming 24

25 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired Art of Multiprocessor Programming 25

26 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); Release lock by resetting state to false void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Art of Multiprocessor Programming 26

27 Space Complexity TAS spin-lock has small footprint N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the Ω(n) lower bound? We used a RMW operation Art of Multiprocessor Programming 27

28 Performance Experiment n threads Increment shared counter 1 million times How long should it take? How long does it take? Art of Multiprocessor Programming 28

29 time Graph no speedup because of sequential bottleneck ideal threads Art of Multiprocessor Programming 29

30 Mystery #1 TAS lock time threads Ideal What is going on? Art of Multiprocessor Programming 30 (1)

31 Test-and-Test-and-Set Locks Lurking stage Wait until lock looks free Spin while read returns true (lock taken) Pouncing state As soon as lock looks available Read returns false (lock free) Call TAS to acquire lock If TAS loses, back to lurking Art of Multiprocessor Programming 31

32 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Art of Multiprocessor Programming 32

33 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Wait until lock looks free Art of Multiprocessor Programming 33

34 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Then try to acquire it Art of Multiprocessor Programming 34

35 Mystery #2 TAS lock time TTAS lock Ideal threads Art of Multiprocessor Programming 35

36 Mystery Both TAS and TTAS Do the same thing (in our model) Except that TTAS performs much better than TAS Neither approaches ideal Art of Multiprocessor Programming 36

37 Opinion Our memory abstraction is broken TAS & TTAS methods Are provably the same (in our model) Except they aren t (in field tests) Need a more detailed model Art of Multiprocessor Programming 37

38 Bus-Based Architectures Bus memory Art of Multiprocessor Programming 38

39 Bus-Based Architectures Random access memory (10s of cycles) Bus memory Art of Multiprocessor Programming 39

40 Bus-Based Architectures Shared Bus Broadcast medium One broadcaster at a time Processors and memory all snoop Bus memory Art of Multiprocessor Programming 40

41 Per-Processor Caches Small Fast: 1 or 2 cycles Address & state information Bus-Based Architectures Bus memory Art of Multiprocessor Programming 41

42 Jargon Watch Cache hit I found what I wanted in my Good Thing Art of Multiprocessor Programming 42

43 Jargon Watch Cache hit I found what I wanted in my Good Thing Cache miss I had to shlep all the way to memory for that Bad Thing Art of Multiprocessor Programming 43

44 Cave Canem This model is still a simplification But not in any essential way Illustrates basic principles Will discuss complexities later Art of Multiprocessor Programming 44

45 Processor Issues Load Request Bus memory Art of Multiprocessor Programming 45

46 Processor Issues Load Request Gimme Bus memory Art of Multiprocessor Programming 46

47 Memory Responds Got your right here Bus memory Bus Art of Multiprocessor Programming 47

48 Processor Issues Load Request Gimme Bus memory Art of Multiprocessor Programming 48

49 Processor Issues Load Request Gimme Bus memory Art of Multiprocessor Programming 49

50 Processor Issues Load Request I got Bus memory Art of Multiprocessor Programming 50

51 Other Processor Responds I got Bus Bus memory Art of Multiprocessor Programming 51

52 Other Processor Responds Bus Bus memory Art of Multiprocessor Programming 52

53 Modify Cached Data Bus memory Art of Multiprocessor Programming 53 (1)

54 Modify Cached Data Bus memory Art of Multiprocessor Programming 54 (1)

55 Modify Cached Data Bus memory Art of Multiprocessor Programming 55

56 Modify Cached Data Bus What s up with the other copies? memory Art of Multiprocessor Programming 56

57 Cache Coherence We have lots of copies of Original copy in memory Cached copies at processors Some processor modifies its own copy What do we do with the others? How to avoid confusion? Art of Multiprocessor Programming 57

58 Write-Back Caches Accumulate changes in Write back when needed Need the for something else Another processor wants it On first modification Invalidate other entries Requires non-trivial protocol Art of Multiprocessor Programming 58

59 Write-Back Caches Cache entry has three states Invalid: contains raw seething bits Valid: I can read but I can t write Dirty: Data has been modified Intercept other load requests Write back to memory before using Art of Multiprocessor Programming 59

60 Invalidate Bus memory Art of Multiprocessor Programming 60

61 Invalidate Mine, all mine! Bus memory Art of Multiprocessor Programming 61

62 Invalidate Uh,oh Bus memory Art of Multiprocessor Programming 62

63 Invalidate Other s lose read permission Bus memory Art of Multiprocessor Programming 63

64 Invalidate Other s lose read permission Bus This acquires write permission memory Art of Multiprocessor Programming 64

65 Invalidate Memory provides only if not present in any, so no need to change it now (expensive) Bus memory Art of Multiprocessor Programming 65(2)

66 Another Processor Asks for Data Bus memory Art of Multiprocessor Programming 66(2)

67 Owner Responds Here it is! Bus memory Art of Multiprocessor Programming 67(2)

68 End of the Day Bus memory Reading OK, no writing Art of Multiprocessor Programming 68 (1)

69 Mutual Exclusion What do we want to optimize? Bus bandwidth used by spinning threads Release/Acquire latency Acquire latency for idle lock Art of Multiprocessor Programming 69

70 Simple TASLock TAS invalidates lines Spinners Miss in Go to bus Thread wants to release lock delayed behind spinners Art of Multiprocessor Programming 70

71 Test-and-test-and-set Wait until lock looks free Spin on local No bus use while lock busy Problem: when lock is released Invalidation storm Art of Multiprocessor Programming 71

72 Local Spinning while Lock is Busy busy busy Bus busy memory busy Art of Multiprocessor Programming 72

73 On Release invalid invalid Bus free memory free Art of Multiprocessor Programming 73

74 Everyone misses, rereads On Release miss invalid miss invalid Bus free memory free Art of Multiprocessor Programming 74 (1)

75 Everyone tries TAS On Release TAS( ) invalid TAS( ) invalid Bus free memory free Art of Multiprocessor Programming 75 (1)

76 Problems Everyone misses Reads satisfied sequentially Everyone does TAS Invalidates others s Eventually quiesces after lock acquired How long does this take? Art of Multiprocessor Programming 76

77 Mystery Explained TAS lock time TTAS lock Ideal threads Better than TAS but still not as good as ideal Art of Multiprocessor Programming 77

78 Solution: Introduce Delay If the lock looks free But I fail to get it There must be lots of contention Better to back off than to collide again time r 2 d r 1 d d spin lock Art of Multiprocessor Programming 78

79 Dynamic Example: Exponential Backoff time 4d 2d d spin lock If I fail to get lock wait random duration before retry Each subsequent failure doubles expected wait Art of Multiprocessor Programming 79

80 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Art of Multiprocessor Programming 80

81 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay Art of Multiprocessor Programming 81

82 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free Art of Multiprocessor Programming 82

83 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return Art of Multiprocessor Programming 83

84 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration Art of Multiprocessor Programming 84

85 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason Art of Multiprocessor Programming 85

86 Spin-Waiting Overhead TTAS Lock time Backoff lock threads Art of Multiprocessor Programming 86

87 Backoff: Other Issues Good Easy to implement Beats TTAS lock Bad Must choose parameters carefully Not portable across platforms Art of Multiprocessor Programming 87

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized