Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Size: px

Start display at page:

Download "Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit"

Prosper Rice
5 years ago
Views:

1 Spin Locks and Contention Companion slides for The Programming by Maurice Herlihy & Nir Shavit

2 Focus so far: Correctness Models Accurate (we never lied to you) But idealized (so we forgot to mention a few things) Protocols Elegant Important But naïve 2

3 New Focus: Performance Models More complicated (not the same as complex!) Still focus on principles (not soon obsolete) Protocols Elegant (in their fashion) Important (why else would we pay attention) And realistic (your mileage may vary) 3

4 What Should you do if you can t get a lock? Keep trying spin or busy-wait Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor 4 (1)

5 What Should you do if you can t get a lock? Keep trying spin or busy-wait Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor our focus 5

6 Basic Spin-Lock CS. spin lock critical section Resets lock upon exit 6

7 Basic Spin-Lock lock introduces sequential bottleneck CS. spin lock critical section Resets lock upon exit 7

8 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit 8

9 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Notice: these are distinct phenomena 9

10 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Seq Bottleneck no parallelism 10

11 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Contention??? 11

12 Review: Test-and-Set Boolean value Test-and-set (TAS) Swap true with current value Return value tells if prior value was true or false Can reset just by writing false TAS aka getandset 12

13 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset (boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } 13 (5)

14 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset (boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Package java.util.concurrent.atomic 14

15 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset (boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Swap old and new values 15

16 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getandset(true) 16

17 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getandset(true) Swapping in true is called test-and-set or TAS 17 (5)

18 Test-and-Set Locks Locking Lock is free: value is false Lock is taken: value is true Acquire lock by calling TAS If result is false, you win If result is true, you lose Release lock by writing false 18

19 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} 19

20 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean 20

21 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired 21

22 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); Release lock by resetting void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} state to false 22

23 Space Complexity TAS spin-lock has small footprint N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the Ω(n) lower bound? We used a RMW operation 23

24 Performance Experiment n threads Increment shared counter 1 million times How long should it take? How long does it take? 24

25 Graph time no speedup because of sequential bottleneck ideal threads 25

26 Mystery #1 TAS lock time Ideal What is threads going on? 26 (1)

27 Test-and-Test-and-Set Locks Lurking stage Wait until lock looks free Spin while read returns true (lock taken) Pouncing state As soon as lock looks available Read returns false (lock free) Call TAS to acquire lock If TAS loses, back to lurking 27

28 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } 28

29 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Wait until lock looks free 29

30 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); } } void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; Then try to acquire it 30

31 Mystery #2 TAS lock time TTAS lock Ideal threads 31

32 Mystery Both TAS and TTAS Do the same thing (in our model) Except that TTAS performs much better than TAS Neither approaches ideal 32

33 Opinion Our memory abstraction is broken TAS & TTAS methods Are provably the same (in our model) Except they aren t (in field tests) Need a more detailed model 33

34 Bus-Based Architectures cache cache Bus cache memory 34

35 Bus-Based Architectures Random access memory (100s of cycles) cache cache Bus cache memory 35

36 Bus-Based Architectures Shared Bus broadcast medium One broadcaster at a time Processors and memory all snoop cache cache cache Bus memory 36

37 Per-Processor Caches Bus-Based Architectures Small Fast: 1 or 2 cycles Address & state information cache cache Bus cache memory 37

38 Jargon Watch Cache hit I found what I wanted in my cache Good Thing 38

39 Jargon Watch Cache hit I found what I wanted in my cache Good Thing Cache miss I had to shlep all the way to memory for that data Bad Thing 39

40 Cave Canem This model is still a simplification But not in any essential way Illustrates basic principles Will discuss complexities later 40

41 Processor Issues Load Request cache cache Bus cache memory data 41

42 Processor Issues Load Request Gimme data cache cache Bus cache memory data 42

43 Memory Responds cache cache cache Got your data right here Bus memory data Bus 43

44 Processor Issues Load Request Gimme data data cache Bus cache memory data 44

45 Processor Issues Load Request Gimme data data cache Bus cache memory data 45

46 Processor Issues Load Request I got data data cache Bus cache memory data 46

47 Other Processor Responds I got data data cache cache Bus Bus memory data 47

48 Other Processor Responds data cache cache Bus Bus memory data 48

49 Modify Cached Data data data Bus cache memory data 49 (1)

50 Modify Cached Data data data Bus cache memory data 50 (1)

51 Modify Cached Data data data Bus cache memory data 51

52 Modify Cached Data data data Bus cache What s up with the other copies? memory data 52

53 Cache Coherence We have lots of copies of data Original copy in memory Cached copies at processors Some processor modifies its own copy What do we do with the others? How to avoid confusion? 53

54 Write-Back Caches Accumulate changes in cache Write back when needed Need the cache for something else Another processor wants it On first modification Invalidate other entries Requires non-trivial protocol 54

55 Write-Back Caches Cache entry has three states Invalid: contains raw seething bits Valid: I can read but I can t write Dirty: Data has been modified Intercept other load requests Write back to memory before using cache 55

56 Invalidate data data Bus cache memory data 56

57 Invalidate Mine, all mine! data data Bus cache memory data 57

58 Invalidate Uh,oh cache data data Bus cache memory data 58

59 Invalidate Other caches lose read permission cache data Bus cache memory data 59

60 Invalidate Other caches lose read permission cache data Bus cache This cache acquires write permission memory data 60

61 Invalidate Memory provides data only if not present in any cache, so no need to cache data Bus cache change it now (expensive) memory data 61 (2)

62 Another Processor Asks for Data cache data Bus cache memory data 62 (2)

63 Owner Responds Here it is! cache data Bus cache memory data 63 (2)

64 End of the Day data data Bus cache memory data Reading OK, no writing 64 (1)

65 Mutual Exclusion What do we want to optimize? Bus bandwidth used by spinning threads Release/Acquire latency Acquire latency for idle lock 65

66 Simple TASLock TAS invalidates cache lines Spinners Miss in cache Go to bus Thread wants to release lock delayed behind spinners 66

67 Test-and-test-and-set Wait until lock looks free Spin on local cache No bus use while lock busy Problem: when lock is released Invalidation storm 67

68 Local Spinning while Lock is Busy busy busy Bus busy memory busy 68

69 On Release invalid invalid Bus free memory free 69

70 Everyone misses, rereads On Release invalid miss invalid miss Bus free memory free 70 (1)

71 Everyone tries TAS On Release invalid TAS( ) invalid TAS( ) Bus free memory free 71 (1)

72 Problems Everyone misses Reads satisfied sequentially Everyone does TAS Invalidates others caches Eventually quiesces after lock acquired How long does this take? 72

73 Measuring Quiescence Time X = time of ops that don t use the bus Y = time of ops that cause intensive bus traffic P 1 P 2 P n In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance. By gradually varying X, can determine the exact time to quiesce. 73

74 Quiescence Time time Increses linearly with the number of processors for bus architecture threads 74

75 Mystery Explained TAS lock time TTAS lock Ideal Better than threads TAS but still not as good as ideal 75

76 Solution: Introduce Delay If the lock looks free But I fail to get it There must be lots of contention Better to back off than to collide again time r 2 d r 1 d d spin lock 76

77 Dynamic Example: Exponential Backoff time 4d 2d d spin lock If I fail to get lock wait random duration before retry Each subsequent failure doubles expected wait 77

78 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} 78

79 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay 79

80 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free 80

81 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return 81

82 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration 82

83 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason 83

84 Spin-Waiting Overhead TTAS Lock time Backoff lock threads 84

85 Backoff: Other Issues Good Easy to implement Beats TTAS lock Bad Must choose parameters carefully Not portable across platforms 85

86 Idea Avoid useless invalidations By keeping a queue of threads Each thread Notifies next in line Without bothering the others 86

87 Anderson Queue Lock next idle flags T F F F F F F F 87

88 Anderson Queue Lock next acquiring getandincrement flags T F F F F F F F 88

89 Anderson Queue Lock next acquiring getandincrement flags T F F F F F F F 89

90 Anderson Queue Lock next acquired Mine! flags T F F F F F F F 90

91 Anderson Queue Lock next acquired acquiring flags T F F F F F F F 91

92 Anderson Queue Lock next acquired acquiring flags getandincrement T F F F F F F F 92

93 Anderson Queue Lock next acquired acquiring flags getandincrement T F F F F F F F 93

94 Anderson Queue Lock next acquired acquiring flags T F F F F F F F 94

95 Anderson Queue Lock next released acquired flags T T F F F F F F 95

96 Anderson Queue Lock next released acquired flags Yow! T T F F F F F F 96

97 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; 97

98 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; One flag per thread 98

99 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; Next flag to use 99

100 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> myslot; Thread-local variable 100

101 Anderson Queue Lock public lock() { int myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } 101

102 Anderson Queue Lock public lock() { int myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Take next slot 102

103 Anderson Queue Lock public lock() { int myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Spin until told to go 103

104 Anderson Queue Lock public lock() { int slot[i]=next.getandincrement(); while (!flags[slot[i] % n]) {}; flags[slot[i] % n] = false; } public unlock() { flags[slot[i]+1 % n] = true; } Prepare slot for re-use 104

105 Anderson Queue Lock public lock() { int myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } Tell next thread to go public unlock() { flags[(myslot+1) % n] = true; } 105

106 Performance TTAS queue Shorter handover than backoff Curve is practically flat Scalable performance FIFO fairness 106

107 Anderson Queue Lock Good First truly scalable lock Simple, easy to implement Bad Space hog One bit per thread Unknown number of threads? Small number of actual contenders? 107

108 CLH Lock FIFO order Small, constant-size overhead per thread 108

109 Initially idle tail false 109

110 Initially idle tail false Queue tail 110

111 Initially idle Lock is free tail false 111

112 Initially idle tail false 112

113 Purple Wants the Lock acquiring tail false 113

114 Purple Wants the Lock acquiring tail false true 114

115 Purple Wants the Lock acquiring Swap tail false true 115

116 Purple Has the Lock acquired tail false true 116

117 Red Wants the Lock acquired acquiring tail false true true 117

118 Red Wants the Lock acquired acquiring tail false Swap true true 118

119 Red Wants the Lock acquired acquiring tail false true true 119

120 Red Wants the Lock acquired acquiring tail false true true 120

121 Red Wants the Lock acquired acquiring Implicitely Linked list tail false true true 121

122 Red Wants the Lock acquired acquiring tail false true true 122

123 Red Wants the Lock acquired acquiring tail false true true true Actually, it spins on cached copy 123

124 Purple Releases release acquiring false Bingo! tail false false true 124

125 Purple Releases released acquired tail true 125

126 Space Usage Let L = number of locks N = number of threads ALock O(LN) CLH lock O(L+N) 126

127 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); } 127

128 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); } Not released yet 128

129 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = tail.getandset(mynode); while (pred.locked) {} }} 129 (3)

130 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = tail.getandset(mynode); while (pred.locked) {} }} Tail of the queue 130 (3)

131 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = tail.getandset(mynode); while (pred.locked) {} }} Thread-local Qnode 131 (3)

132 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = queue.getandset(mynode); while (pred.locked) {} }} Swap in my node 132 (3)

133 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = queue.getandset(mynode); while (pred.locked) {} }} Spin until predecessor releases lock 133 (3)

134 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { mynode.locked.set(false); mynode = pred; } } 134 (3)

135 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { mynode.locked.set(false); mynode = pred; } } Notify successor 135 (3)

136 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { mynode.locked.set(false); mynode = pred; } } Recycle predecessor s node 136 (3)

137 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { mynode.locked.set(false); mynode = pred; } } (notice that we actually don t reuse mynode. Code in Handout shows how its done.) 137 (3)

138 CLH Lock Good Lock release affects predecessor only Small, constant-sized space Bad Doesn t work for uncached NUMA architectures 138

139 NUMA Architecturs Acronym: Non-Uniform Memory Architecture Illusion: Flat shared memory Truth: No caches (sometimes) Some memory regions faster than others 139

140 NUMA Machines Spinning on local memory is fast 140

141 NUMA Machines Spinning on remote memory is slow 141

142 CLH Lock Each thread spin s on predecessor s memory Could be far away 142

143 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead 143

144 Initially idle tail false 144

145 Acquiring acquiring (allocate Qnode) true queue false 145

146 Acquiring acquired swap true tail false 146

147 Acquiring acquired true tail false 147

148 Acquired acquired true tail false 148

149 Acquiring acquired acquiring true tail swap true 149

150 Acquiring acquired acquiring true tail true 150

151 Acquiring acquired acquiring true tail true 151

152 Acquiring acquired acquiring true tail true 152

153 Acquiring acquired acquiring true tail false true 153

154 Acquiring acquired acquiring true Yes! tail false true 154

155 MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; } 155

156 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} 156 (3)

157 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Make a QNode 157 (3)

158 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} add my Node to the tail of queue 158 (3)

159 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Fix if queue was non-empty 159 (3)

160 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; Wait until public void lock() { Qnode qnode = new Qnode(); unlocked Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} 160 (3)

161 MCS Queue Unlock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} 161 (3)

162 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Missing successor? 162 (3)

163 MCS Queue Lock class MCSLock implements Lock { AtomicReference If really no successor, tail; public void return unlock() { if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} 163 (3)

164 MCS Queue Lock class MCSLock implements Lock { AtomicReference Otherwise wait tail; for public successor void unlock() to catch { up if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} 164 (3)

165 MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Pass lock to successor 165 (3)

166 Purple Release releasing swap false false 166 (2)

167 Purple Release releasing By looking at the queue, I see another thread is activeswap false false 167 (2)

168 Purple Release releasing By looking at the queue, I see another thread is activeswap false false I have to wait for that thread to finish 168 (2)

169 Purple Release releasing prepare to spin true false 169

170 Purple Release releasing spinning true false 170

171 Purple Release releasing spinning false true false 171

172 Purple Release releasing Acquired lock false true false 172

173 Abortable Locks What if you want to give up waiting for a lock? For example Timeout Database transaction aborted by user 173

174 Back-off Lock Aborting is trivial Just return from lock() call Extra benefit: No cleaning up Wait-free Immediate return 174

175 Queue Locks Can t just quit Thread in line behind will starve Need a graceful way out 175

176 Queue Locks spinning spinning spinning true true true 176

177 Queue Locks locked spinning spinning false true true 177

178 Queue Locks locked spinning false true 178

179 Queue Locks locked false 179

180 Queue Locks spinning spinning spinning true true true 180

181 Queue Locks spinning spinning true true true 181

182 Queue Locks locked spinning false true true 182

183 Queue Locks spinning false true 183

184 Queue Locks pwned false true 184

185 Abortable CLH Lock When a thread gives up Removing node in a wait-free way is hard Idea: let successor deal with it. 185

186 Initially idle Pointer to predecessor (or null) tail A 186

187 Initially idle Distinguished available node means lock is free tail A 187

188 Acquiring acquiring tail A 188

189 Acquiring Null predecessor means lock not acquiring released or aborted A 189

190 Acquiring acquiring Swap A 190

191 Acquiring acquiring A 191

192 Acquired locked Pointer to AVAILABLE means lock is free. A 192

193 Normal Case locked spinning spinning Null means lock is not free & request not aborted 193

194 One Thread Aborts locked Timed out spinning 194

195 Successor Notices locked Timed out spinning Non-Null means predecessor aborted 195

196 Recycle Predecessor s Node locked spinning 196

197 Spin on Earlier Node locked spinning 197

198 Spin on Earlier Node released spinning A The lock is now mine 198

199 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode; 199

200 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode; Distinguished node to signify free lock 200

201 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode; Tail of the queue 201

202 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode; Remember my node 202

203 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); mynode.set(qnode); qnode.prev = null; Qnode mypred = tail.getandset(qnode); if (mypred== null mypred.prev == AVAILABLE) { return true; } 203

204 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); mynode.set(qnode); qnode.prev = null; Qnode mypred = tail.getandset(qnode); if (mypred == null mypred.prev == AVAILABLE) { return true; } Create & initialize node 204

205 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); mynode.set(qnode); qnode.prev = null; Qnode mypred = tail.getandset(qnode); if (mypred == null mypred.prev == AVAILABLE) { return true; } Swap with tail 205

206 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); mynode.set(qnode); qnode.prev = null; Qnode mypred = tail.getandset(qnode); if (mypred == null mypred.prev == AVAILABLE) { return true; }... If predecessor absent or released, we are done 206

207 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } 207

208 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } Keep trying for a while 208

209 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } Spin on predecessor s prev field 209

210 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } Predecessor released lock 210

211 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } Predecessor aborted, advance one 211

212 Time-out Lock if (!tail.compareandset(qnode, mypred)) qnode.prev = mypred; return false; } } 212

213 Time-out Locks if (!tail.compareandset(qnode, mypred)) qnode.prev = mypred; return false; } } Do I have a successor? If CAS fails: I do have a successor, tell it about mypred 213

214 Time-out Locks if (!tail.compareandset(qnode, mypred)) qnode.prev = mypred; return false; } } If CAS succeeds: no successor, simply return false 214

215 Time-Out Unlock public void unlock() { Qnode qnode = mynode.get(); if (!tail.compareandset(qnode, null)) qnode.prev = AVAILABLE; } 215

216 Time-out Unlock public void unlock() { Qnode qnode = mynode.get(); if (!tail.compareandset(qnode, null)) qnode.prev = AVAILABLE; } If CAS failed: exists successor, notify successor it can enter 216

217 Timing-out Lock public void unlock() { Qnode qnode = mynode.get(); if (!tail.compareandset(qnode, null)) qnode.prev = AVAILABLE; } CAS successful: set tail to null, no clean up since no successor waiting 217

218 One Lock To Rule Them All? TTAS+Backoff, CLH, MCS, ToLock Each better than others in some way There is no one solution Lock we pick really depends on: the application the hardware which properties are important 218

219 ! This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License. You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. 219

220 Lock Review & Performance Test-and-set Test-and-test-and-set With or without exponential backoff Ticket lock Array-based queue lock Anderson List-based queue lock MCS 220

221 Lock Performance test & test /(/ & set * anderson 0! test k set, exp. backoff e mcs 18 Time (p.s) :~ O Processors Fig, 17. Performance of spm locks on the Symmetry (empty critical section) From Mellor-Crummey & Scott, ACM TOCS,

222 Other Lock Concerns Supporting bool trylock(timeout) Attempt to get the lock, give up after time Easy for simple TTS locks; harder for ticket and queue-based locks Reader/Writer locks lock_for_read(lock_ptr) lock_for_write(lock_ptr) Many readers can all hold lock at the same time Only one writer (with no readers) Reader/Writer locks sound great! Two problems: Acquiring a read lock requires read-modify-write of shared data Acquiring a read/write lock is slower than a simple TTS lock Other options: topology/hierarchy aware locks, adaptive hybrid TTS/queue locks, biased locks, etc. 222

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized