Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
|
|
- Prosper Rice
- 5 years ago
- Views:
Transcription
1 Spin Locks and Contention Companion slides for The Programming by Maurice Herlihy & Nir Shavit
2 Focus so far: Correctness Models Accurate (we never lied to you) But idealized (so we forgot to mention a few things) Protocols Elegant Important But naïve 2
3 New Focus: Performance Models More complicated (not the same as complex!) Still focus on principles (not soon obsolete) Protocols Elegant (in their fashion) Important (why else would we pay attention) And realistic (your mileage may vary) 3
4 What Should you do if you can t get a lock? Keep trying spin or busy-wait Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor 4 (1)
5 What Should you do if you can t get a lock? Keep trying spin or busy-wait Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor our focus 5
6 Basic Spin-Lock CS. spin lock critical section Resets lock upon exit 6
7 Basic Spin-Lock lock introduces sequential bottleneck CS. spin lock critical section Resets lock upon exit 7
8 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit 8
9 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Notice: these are distinct phenomena 9
10 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Seq Bottleneck no parallelism 10
11 Basic Spin-Lock lock suffers from contention CS. spin lock critical section Resets lock upon exit Contention??? 11
12 Review: Test-and-Set Boolean value Test-and-set (TAS) Swap true with current value Return value tells if prior value was true or false Can reset just by writing false TAS aka getandset 12
13 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset (boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } 13 (5)
14 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset (boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Package java.util.concurrent.atomic 14
15 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getandset (boolean newvalue) { boolean prior = value; value = newvalue; return prior; } } Swap old and new values 15
16 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getandset(true) 16
17 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getandset(true) Swapping in true is called test-and-set or TAS 17 (5)
18 Test-and-Set Locks Locking Lock is free: value is false Lock is taken: value is true Acquire lock by calling TAS If result is false, you win If result is true, you lose Release lock by writing false 18
19 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} 19
20 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean 20
21 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired 21
22 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); Release lock by resetting void lock() { while (state.getandset(true)) {} } void unlock() { state.set(false); }} state to false 22
23 Space Complexity TAS spin-lock has small footprint N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the Ω(n) lower bound? We used a RMW operation 23
24 Performance Experiment n threads Increment shared counter 1 million times How long should it take? How long does it take? 24
25 Graph time no speedup because of sequential bottleneck ideal threads 25
26 Mystery #1 TAS lock time Ideal What is threads going on? 26 (1)
27 Test-and-Test-and-Set Locks Lurking stage Wait until lock looks free Spin while read returns true (lock taken) Pouncing state As soon as lock looks available Read returns false (lock free) Call TAS to acquire lock If TAS loses, back to lurking 27
28 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } 28
29 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; } } Wait until lock looks free 29
30 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); } } void lock() { while (true) { while (state.get()) {} if (!state.getandset(true)) return; Then try to acquire it 30
31 Mystery #2 TAS lock time TTAS lock Ideal threads 31
32 Mystery Both TAS and TTAS Do the same thing (in our model) Except that TTAS performs much better than TAS Neither approaches ideal 32
33 Opinion Our memory abstraction is broken TAS & TTAS methods Are provably the same (in our model) Except they aren t (in field tests) Need a more detailed model 33
34 Bus-Based Architectures cache cache Bus cache memory 34
35 Bus-Based Architectures Random access memory (100s of cycles) cache cache Bus cache memory 35
36 Bus-Based Architectures Shared Bus broadcast medium One broadcaster at a time Processors and memory all snoop cache cache cache Bus memory 36
37 Per-Processor Caches Bus-Based Architectures Small Fast: 1 or 2 cycles Address & state information cache cache Bus cache memory 37
38 Jargon Watch Cache hit I found what I wanted in my cache Good Thing 38
39 Jargon Watch Cache hit I found what I wanted in my cache Good Thing Cache miss I had to shlep all the way to memory for that data Bad Thing 39
40 Cave Canem This model is still a simplification But not in any essential way Illustrates basic principles Will discuss complexities later 40
41 Processor Issues Load Request cache cache Bus cache memory data 41
42 Processor Issues Load Request Gimme data cache cache Bus cache memory data 42
43 Memory Responds cache cache cache Got your data right here Bus memory data Bus 43
44 Processor Issues Load Request Gimme data data cache Bus cache memory data 44
45 Processor Issues Load Request Gimme data data cache Bus cache memory data 45
46 Processor Issues Load Request I got data data cache Bus cache memory data 46
47 Other Processor Responds I got data data cache cache Bus Bus memory data 47
48 Other Processor Responds data cache cache Bus Bus memory data 48
49 Modify Cached Data data data Bus cache memory data 49 (1)
50 Modify Cached Data data data Bus cache memory data 50 (1)
51 Modify Cached Data data data Bus cache memory data 51
52 Modify Cached Data data data Bus cache What s up with the other copies? memory data 52
53 Cache Coherence We have lots of copies of data Original copy in memory Cached copies at processors Some processor modifies its own copy What do we do with the others? How to avoid confusion? 53
54 Write-Back Caches Accumulate changes in cache Write back when needed Need the cache for something else Another processor wants it On first modification Invalidate other entries Requires non-trivial protocol 54
55 Write-Back Caches Cache entry has three states Invalid: contains raw seething bits Valid: I can read but I can t write Dirty: Data has been modified Intercept other load requests Write back to memory before using cache 55
56 Invalidate data data Bus cache memory data 56
57 Invalidate Mine, all mine! data data Bus cache memory data 57
58 Invalidate Uh,oh cache data data Bus cache memory data 58
59 Invalidate Other caches lose read permission cache data Bus cache memory data 59
60 Invalidate Other caches lose read permission cache data Bus cache This cache acquires write permission memory data 60
61 Invalidate Memory provides data only if not present in any cache, so no need to cache data Bus cache change it now (expensive) memory data 61 (2)
62 Another Processor Asks for Data cache data Bus cache memory data 62 (2)
63 Owner Responds Here it is! cache data Bus cache memory data 63 (2)
64 End of the Day data data Bus cache memory data Reading OK, no writing 64 (1)
65 Mutual Exclusion What do we want to optimize? Bus bandwidth used by spinning threads Release/Acquire latency Acquire latency for idle lock 65
66 Simple TASLock TAS invalidates cache lines Spinners Miss in cache Go to bus Thread wants to release lock delayed behind spinners 66
67 Test-and-test-and-set Wait until lock looks free Spin on local cache No bus use while lock busy Problem: when lock is released Invalidation storm 67
68 Local Spinning while Lock is Busy busy busy Bus busy memory busy 68
69 On Release invalid invalid Bus free memory free 69
70 Everyone misses, rereads On Release invalid miss invalid miss Bus free memory free 70 (1)
71 Everyone tries TAS On Release invalid TAS( ) invalid TAS( ) Bus free memory free 71 (1)
72 Problems Everyone misses Reads satisfied sequentially Everyone does TAS Invalidates others caches Eventually quiesces after lock acquired How long does this take? 72
73 Measuring Quiescence Time X = time of ops that don t use the bus Y = time of ops that cause intensive bus traffic P 1 P 2 P n In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance. By gradually varying X, can determine the exact time to quiesce. 73
74 Quiescence Time time Increses linearly with the number of processors for bus architecture threads 74
75 Mystery Explained TAS lock time TTAS lock Ideal Better than threads TAS but still not as good as ideal 75
76 Solution: Introduce Delay If the lock looks free But I fail to get it There must be lots of contention Better to back off than to collide again time r 2 d r 1 d d spin lock 76
77 Dynamic Example: Exponential Backoff time 4d 2d d spin lock If I fail to get lock wait random duration before retry Each subsequent failure doubles expected wait 77
78 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} 78
79 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay 79
80 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free 80
81 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return 81
82 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration 82
83 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getandset(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason 83
84 Spin-Waiting Overhead TTAS Lock time Backoff lock threads 84
85 Backoff: Other Issues Good Easy to implement Beats TTAS lock Bad Must choose parameters carefully Not portable across platforms 85
86 Idea Avoid useless invalidations By keeping a queue of threads Each thread Notifies next in line Without bothering the others 86
87 Anderson Queue Lock next idle flags T F F F F F F F 87
88 Anderson Queue Lock next acquiring getandincrement flags T F F F F F F F 88
89 Anderson Queue Lock next acquiring getandincrement flags T F F F F F F F 89
90 Anderson Queue Lock next acquired Mine! flags T F F F F F F F 90
91 Anderson Queue Lock next acquired acquiring flags T F F F F F F F 91
92 Anderson Queue Lock next acquired acquiring flags getandincrement T F F F F F F F 92
93 Anderson Queue Lock next acquired acquiring flags getandincrement T F F F F F F F 93
94 Anderson Queue Lock next acquired acquiring flags T F F F F F F F 94
95 Anderson Queue Lock next released acquired flags T T F F F F F F 95
96 Anderson Queue Lock next released acquired flags Yow! T T F F F F F F 96
97 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; 97
98 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; One flag per thread 98
99 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n]; Next flag to use 99
100 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> myslot; Thread-local variable 100
101 Anderson Queue Lock public lock() { int myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } 101
102 Anderson Queue Lock public lock() { int myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Take next slot 102
103 Anderson Queue Lock public lock() { int myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Spin until told to go 103
104 Anderson Queue Lock public lock() { int slot[i]=next.getandincrement(); while (!flags[slot[i] % n]) {}; flags[slot[i] % n] = false; } public unlock() { flags[slot[i]+1 % n] = true; } Prepare slot for re-use 104
105 Anderson Queue Lock public lock() { int myslot = next.getandincrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } Tell next thread to go public unlock() { flags[(myslot+1) % n] = true; } 105
106 Performance TTAS queue Shorter handover than backoff Curve is practically flat Scalable performance FIFO fairness 106
107 Anderson Queue Lock Good First truly scalable lock Simple, easy to implement Bad Space hog One bit per thread Unknown number of threads? Small number of actual contenders? 107
108 CLH Lock FIFO order Small, constant-size overhead per thread 108
109 Initially idle tail false 109
110 Initially idle tail false Queue tail 110
111 Initially idle Lock is free tail false 111
112 Initially idle tail false 112
113 Purple Wants the Lock acquiring tail false 113
114 Purple Wants the Lock acquiring tail false true 114
115 Purple Wants the Lock acquiring Swap tail false true 115
116 Purple Has the Lock acquired tail false true 116
117 Red Wants the Lock acquired acquiring tail false true true 117
118 Red Wants the Lock acquired acquiring tail false Swap true true 118
119 Red Wants the Lock acquired acquiring tail false true true 119
120 Red Wants the Lock acquired acquiring tail false true true 120
121 Red Wants the Lock acquired acquiring Implicitely Linked list tail false true true 121
122 Red Wants the Lock acquired acquiring tail false true true 122
123 Red Wants the Lock acquired acquiring tail false true true true Actually, it spins on cached copy 123
124 Purple Releases release acquiring false Bingo! tail false false true 124
125 Purple Releases released acquired tail true 125
126 Space Usage Let L = number of locks N = number of threads ALock O(LN) CLH lock O(L+N) 126
127 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); } 127
128 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); } Not released yet 128
129 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = tail.getandset(mynode); while (pred.locked) {} }} 129 (3)
130 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = tail.getandset(mynode); while (pred.locked) {} }} Tail of the queue 130 (3)
131 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = tail.getandset(mynode); while (pred.locked) {} }} Thread-local Qnode 131 (3)
132 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = queue.getandset(mynode); while (pred.locked) {} }} Swap in my node 132 (3)
133 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode = new Qnode(); public void lock() { Qnode pred = queue.getandset(mynode); while (pred.locked) {} }} Spin until predecessor releases lock 133 (3)
134 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { mynode.locked.set(false); mynode = pred; } } 134 (3)
135 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { mynode.locked.set(false); mynode = pred; } } Notify successor 135 (3)
136 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { mynode.locked.set(false); mynode = pred; } } Recycle predecessor s node 136 (3)
137 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { mynode.locked.set(false); mynode = pred; } } (notice that we actually don t reuse mynode. Code in Handout shows how its done.) 137 (3)
138 CLH Lock Good Lock release affects predecessor only Small, constant-sized space Bad Doesn t work for uncached NUMA architectures 138
139 NUMA Architecturs Acronym: Non-Uniform Memory Architecture Illusion: Flat shared memory Truth: No caches (sometimes) Some memory regions faster than others 139
140 NUMA Machines Spinning on local memory is fast 140
141 NUMA Machines Spinning on remote memory is slow 141
142 CLH Lock Each thread spin s on predecessor s memory Could be far away 142
143 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead 143
144 Initially idle tail false 144
145 Acquiring acquiring (allocate Qnode) true queue false 145
146 Acquiring acquired swap true tail false 146
147 Acquiring acquired true tail false 147
148 Acquired acquired true tail false 148
149 Acquiring acquired acquiring true tail swap true 149
150 Acquiring acquired acquiring true tail true 150
151 Acquiring acquired acquiring true tail true 151
152 Acquiring acquired acquiring true tail true 152
153 Acquiring acquired acquiring true tail false true 153
154 Acquiring acquired acquiring true Yes! tail false true 154
155 MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; } 155
156 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} 156 (3)
157 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Make a QNode 157 (3)
158 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} add my Node to the tail of queue 158 (3)
159 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Fix if queue was non-empty 159 (3)
160 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; Wait until public void lock() { Qnode qnode = new Qnode(); unlocked Qnode pred = tail.getandset(qnode); if (pred!= null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} 160 (3)
161 MCS Queue Unlock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} 161 (3)
162 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Missing successor? 162 (3)
163 MCS Queue Lock class MCSLock implements Lock { AtomicReference If really no successor, tail; public void return unlock() { if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} 163 (3)
164 MCS Queue Lock class MCSLock implements Lock { AtomicReference Otherwise wait tail; for public successor void unlock() to catch { up if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} 164 (3)
165 MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (tail.cas(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Pass lock to successor 165 (3)
166 Purple Release releasing swap false false 166 (2)
167 Purple Release releasing By looking at the queue, I see another thread is activeswap false false 167 (2)
168 Purple Release releasing By looking at the queue, I see another thread is activeswap false false I have to wait for that thread to finish 168 (2)
169 Purple Release releasing prepare to spin true false 169
170 Purple Release releasing spinning true false 170
171 Purple Release releasing spinning false true false 171
172 Purple Release releasing Acquired lock false true false 172
173 Abortable Locks What if you want to give up waiting for a lock? For example Timeout Database transaction aborted by user 173
174 Back-off Lock Aborting is trivial Just return from lock() call Extra benefit: No cleaning up Wait-free Immediate return 174
175 Queue Locks Can t just quit Thread in line behind will starve Need a graceful way out 175
176 Queue Locks spinning spinning spinning true true true 176
177 Queue Locks locked spinning spinning false true true 177
178 Queue Locks locked spinning false true 178
179 Queue Locks locked false 179
180 Queue Locks spinning spinning spinning true true true 180
181 Queue Locks spinning spinning true true true 181
182 Queue Locks locked spinning false true true 182
183 Queue Locks spinning false true 183
184 Queue Locks pwned false true 184
185 Abortable CLH Lock When a thread gives up Removing node in a wait-free way is hard Idea: let successor deal with it. 185
186 Initially idle Pointer to predecessor (or null) tail A 186
187 Initially idle Distinguished available node means lock is free tail A 187
188 Acquiring acquiring tail A 188
189 Acquiring Null predecessor means lock not acquiring released or aborted A 189
190 Acquiring acquiring Swap A 190
191 Acquiring acquiring A 191
192 Acquired locked Pointer to AVAILABLE means lock is free. A 192
193 Normal Case locked spinning spinning Null means lock is not free & request not aborted 193
194 One Thread Aborts locked Timed out spinning 194
195 Successor Notices locked Timed out spinning Non-Null means predecessor aborted 195
196 Recycle Predecessor s Node locked spinning 196
197 Spin on Earlier Node locked spinning 197
198 Spin on Earlier Node released spinning A The lock is now mine 198
199 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode; 199
200 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode; Distinguished node to signify free lock 200
201 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode; Tail of the queue 201
202 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> mynode; Remember my node 202
203 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); mynode.set(qnode); qnode.prev = null; Qnode mypred = tail.getandset(qnode); if (mypred== null mypred.prev == AVAILABLE) { return true; } 203
204 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); mynode.set(qnode); qnode.prev = null; Qnode mypred = tail.getandset(qnode); if (mypred == null mypred.prev == AVAILABLE) { return true; } Create & initialize node 204
205 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); mynode.set(qnode); qnode.prev = null; Qnode mypred = tail.getandset(qnode); if (mypred == null mypred.prev == AVAILABLE) { return true; } Swap with tail 205
206 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); mynode.set(qnode); qnode.prev = null; Qnode mypred = tail.getandset(qnode); if (mypred == null mypred.prev == AVAILABLE) { return true; }... If predecessor absent or released, we are done 206
207 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } 207
208 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } Keep trying for a while 208
209 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } Spin on predecessor s prev field 209
210 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } Predecessor released lock 210
211 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predpred = mypred.prev; if (predpred == AVAILABLE) { return true; } else if (predpred!= null) { mypred = predpred; } } Predecessor aborted, advance one 211
212 Time-out Lock if (!tail.compareandset(qnode, mypred)) qnode.prev = mypred; return false; } } 212
213 Time-out Locks if (!tail.compareandset(qnode, mypred)) qnode.prev = mypred; return false; } } Do I have a successor? If CAS fails: I do have a successor, tell it about mypred 213
214 Time-out Locks if (!tail.compareandset(qnode, mypred)) qnode.prev = mypred; return false; } } If CAS succeeds: no successor, simply return false 214
215 Time-Out Unlock public void unlock() { Qnode qnode = mynode.get(); if (!tail.compareandset(qnode, null)) qnode.prev = AVAILABLE; } 215
216 Time-out Unlock public void unlock() { Qnode qnode = mynode.get(); if (!tail.compareandset(qnode, null)) qnode.prev = AVAILABLE; } If CAS failed: exists successor, notify successor it can enter 216
217 Timing-out Lock public void unlock() { Qnode qnode = mynode.get(); if (!tail.compareandset(qnode, null)) qnode.prev = AVAILABLE; } CAS successful: set tail to null, no clean up since no successor waiting 217
218 One Lock To Rule Them All? TTAS+Backoff, CLH, MCS, ToLock Each better than others in some way There is no one solution Lock we pick really depends on: the application the hardware which properties are important 218
219 ! This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License. You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. 219
220 Lock Review & Performance Test-and-set Test-and-test-and-set With or without exponential backoff Ticket lock Array-based queue lock Anderson List-based queue lock MCS 220
221 Lock Performance test & test /(/ & set * anderson 0! test k set, exp. backoff e mcs 18 Time (p.s) :~ O Processors Fig, 17. Performance of spm locks on the Symmetry (empty critical section) From Mellor-Crummey & Scott, ACM TOCS,
222 Other Lock Concerns Supporting bool trylock(timeout) Attempt to get the lock, give up after time Easy for simple TTS locks; harder for ticket and queue-based locks Reader/Writer locks lock_for_read(lock_ptr) lock_for_write(lock_ptr) Many readers can all hold lock at the same time Only one writer (with no readers) Reader/Writer locks sound great! Two problems: Acquiring a read lock requires read-modify-write of shared data Acquiring a read/write lock is slower than a simple TTS lock Other options: topology/hierarchy aware locks, adaptive hybrid TTS/queue locks, biased locks, etc. 222
Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized
More informationSpin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Real world concurrency Understanding hardware architecture What is contention How to
More informationSpin Locks and Contention
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified for Software1 students by Lior Wolf and Mati Shomrat Kinds of Architectures
More informationSpin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit MIMD Architectures memory Shared Bus Memory Contention Communication Contention Communication
More informationSpin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized
More informationSpin Locks and Contention. Companion slides for Chapter 7 The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Spin Locks and Contention Companion slides for Chapter 7 The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Focus so far: Correctness and Progress Models Accurate (we never lied to you)
More informationModern High-Performance Locking
Modern High-Performance Locking Nir Shavit Slides based in part on The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Locks (Mutual Exclusion) public interface Lock { public void lock();
More informationLocks Chapter 4. Overview. Introduction Spin Locks. Queue locks. Roger Wattenhofer. Test-and-Set & Test-and-Test-and-Set Backoff lock
Locks Chapter 4 Roger Wattenhofer ETH Zurich Distributed Computing www.disco.ethz.ch Overview Introduction Spin Locks Test-and-Set & Test-and-Test-and-Set Backoff lock Queue locks 4/2 1 Introduction: From
More informationLocking. Part 2, Chapter 11. Roger Wattenhofer. ETH Zurich Distributed Computing
Locking Part 2, Chapter 11 Roger Wattenhofer ETH Zurich Distributed Computing www.disco.ethz.ch Overview Introduction Spin Locks Test-and-Set & Test-and-Test-and-Set Backoff lock Queue locks 11/2 Introduction:
More informationAgenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks
Agenda Lecture Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks Next discussion papers Selecting Locking Primitives for Parallel Programming Selecting Locking
More informationSpin Locks and Contention Management
Chapter 7 Spin Locks and Contention Management 7.1 Introduction We now turn our attention to the performance of mutual exclusion protocols on realistic architectures. Any mutual exclusion protocol poses
More informationComputer Engineering II Solution to Exercise Sheet Chapter 10
Distributed Computing FS 2017 Prof. R. Wattenhofer Computer Engineering II Solution to Exercise Sheet Chapter 10 Quiz 1 Quiz a) The AtomicBoolean utilizes an atomic version of getandset() implemented in
More informationIntroduction to Multiprocessor Synchronization
Introduction to Multiprocessor Synchronization Maurice Herlihy http://cs.brown.edu/courses/cs176/lectures.shtml Moore's Law Transistor count still rising Clock speed flattening sharply Art of Multiprocessor
More informationPractice: Small Systems Part 2, Chapter 3
Practice: Small Systems Part 2, Chapter 3 Roger Wattenhofer Overview Introduction Spin Locks Test and Set & Test and Test and Set Backoff lock Queue locks Concurrent Linked List Fine grained synchronization
More informationLocking. Part 2, Chapter 7. Roger Wattenhofer. ETH Zurich Distributed Computing
Locking Part 2, Chapter 7 Roger Wattenhofer ETH Zurich Distributed Computing www.disco.ethz.ch Overview Introduction Spin Locks Test-and-Set & Test-and-Test-and-Set Backoff lock Queue locks Concurrent
More information9/28/2014. CS341: Operating System. Synchronization. High Level Construct: Monitor Classical Problems of Synchronizations FAQ: Mid Semester
CS341: Operating System Lect22: 18 th Sept 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Synchronization Hardware Support TAS, CAS, TTAS, LL LC, XCHG, Compare, FAI
More information9/23/2014. Concurrent Programming. Book. Process. Exchange of data between threads/processes. Between Process. Thread.
Dr A Sahu Dept of Computer Science & Engineering IIT Guwahati Course Structure & Book Basic of thread and process Coordination and synchronization Example of Parallel Programming Shared memory : C/C++
More informationDistributed Computing Group
Distributed Computing Group HS 2009 Prof. Dr. Roger Wattenhofer, Thomas Locher, Remo Meier, Benjamin Sigg Assigned: December 11, 2009 Discussion: none Distributed Systems Theory exercise 6 1 ALock2 Have
More informationLecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown
Lecture 9: Multiprocessor OSs & Synchronization CSC 469H1F Fall 2006 Angela Demke Brown The Problem Coordinated management of shared resources Resources may be accessed by multiple threads Need to control
More informationLocking Granularity. CS 475, Spring 2019 Concurrent & Distributed Systems. With material from Herlihy & Shavit, Art of Multiprocessor Programming
Locking Granularity CS 475, Spring 2019 Concurrent & Distributed Systems With material from Herlihy & Shavit, Art of Multiprocessor Programming Discussion: HW1 Part 4 addtolist(key1, newvalue) Thread 1
More informationDistributed Computing
HELLENIC REPUBLIC UNIVERSITY OF CRETE Distributed Computing Graduate Course Section 3: Spin Locks and Contention Panagiota Fatourou Department of Computer Science Spin Locks and Contention In contrast
More information6.852: Distributed Algorithms Fall, Class 15
6.852: Distributed Algorithms Fall, 2009 Class 15 Today s plan z z z z z Pragmatic issues for shared-memory multiprocessors Practical mutual exclusion algorithms Test-and-set locks Ticket locks Queue locks
More informationAdvance Operating Systems (CS202) Locks Discussion
Advance Operating Systems (CS202) Locks Discussion Threads Locks Spin Locks Array-based Locks MCS Locks Sequential Locks Road Map Threads Global variables and static objects are shared Stored in the static
More informationAdaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >
Adaptive Lock Madhav Iyengar < miyengar@andrew.cmu.edu >, Nathaniel Jeffries < njeffrie@andrew.cmu.edu > ABSTRACT Busy wait synchronization, the spinlock, is the primitive at the core of all other synchronization
More informationSolution: a lock (a/k/a mutex) public: virtual void unlock() =0;
1 Solution: a lock (a/k/a mutex) class BasicLock { public: virtual void lock() =0; virtual void unlock() =0; ; 2 Using a lock class Counter { public: int get_and_inc() { lock_.lock(); int old = count_;
More informationLinked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Linked Lists: Locking, Lock-Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Adding threads should not lower throughput Contention
More informationProgramming Paradigms for Concurrency Lecture 3 Concurrent Objects
Programming Paradigms for Concurrency Lecture 3 Concurrent Objects Based on companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Thomas Wies New York University
More informationScalable Locking. Adam Belay
Scalable Locking Adam Belay Problem: Locks can ruin performance 12 finds/sec 9 6 Locking overhead dominates 3 0 0 6 12 18 24 30 36 42 48 Cores Problem: Locks can ruin performance the locks
More information250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019
250P: Computer Systems Architecture Lecture 14: Synchronization Anton Burtsev March, 2019 Coherence and Synchronization Topics: synchronization primitives (Sections 5.4-5.5) 2 Constructing Locks Applications
More informationModule 7: Synchronization Lecture 13: Introduction to Atomic Primitives. The Lecture Contains: Synchronization. Waiting Algorithms.
The Lecture Contains: Synchronization Waiting Algorithms Implementation Hardwired Locks Software Locks Hardware Support Atomic Exchange Test & Set Fetch & op Compare & Swap Traffic of Test & Set Backoff
More informationConcurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Concurrent Skip Lists Companion slides for The by Maurice Herlihy & Nir Shavit Set Object Interface Collection of elements No duplicates Methods add() a new element remove() an element contains() if element
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationLinked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Linked Lists: Locking, Lock- Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Coarse-Grained Synchronization Each method locks the object Avoid
More informationLecture 7: Mutual Exclusion 2/16/12. slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit
Principles of Concurrency and Parallelism Lecture 7: Mutual Exclusion 2/16/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Time Absolute, true and mathematical time, of
More informationAlgorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott
Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott Presentation by Joe Izraelevitz Tim Kopp Synchronization Primitives Spin Locks Used for
More informationCoarse-grained and fine-grained locking Niklas Fors
Coarse-grained and fine-grained locking Niklas Fors 2013-12-05 Slides borrowed from: http://cs.brown.edu/courses/cs176course_information.shtml Art of Multiprocessor Programming 1 Topics discussed Coarse-grained
More informationSynchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Synchronization Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group-based
More informationSynchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Synchronization Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Types of Synchronization
More informationAdvanced Multiprocessor Programming: Locks
Advanced Multiprocessor Programming: Locks Martin Wimmer, Jesper Larsson Träff TU Wien 20th April, 2015 Wimmer, Träff AMP SS15 1 / 31 Locks we have seen so far Peterson Lock ilter Lock Bakery Lock These
More informationModels of concurrency & synchronization algorithms
Models of concurrency & synchronization algorithms Lecture 3 of TDA383/DIT390 (Concurrent Programming) Carlo A. Furia Chalmers University of Technology University of Gothenburg SP3 2016/2017 Today s menu
More informationCache Coherence and Atomic Operations in Hardware
Cache Coherence and Atomic Operations in Hardware Previously, we introduced multi-core parallelism. Today we ll look at 2 things: 1. Cache coherence 2. Instruction support for synchronization. And some
More informationCSC2/458 Parallel and Distributed Systems Scalable Synchronization
CSC2/458 Parallel and Distributed Systems Scalable Synchronization Sreepathi Pai February 20, 2018 URCS Outline Scalable Locking Barriers Outline Scalable Locking Barriers An alternative lock ticket lock
More informationNON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 3 Nov 2017
NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 3 Nov 2017 Lecture 1/3 Introduction Basic spin-locks Queue-based locks Hierarchical locks Reader-writer locks Reading without locking Flat
More informationMultiprocessor Systems. Chapter 8, 8.1
Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor
More informationAdvanced Topic: Efficient Synchronization
Advanced Topic: Efficient Synchronization Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is
More informationOther consistency models
Last time: Symmetric multiprocessing (SMP) Lecture 25: Synchronization primitives Computer Architecture and Systems Programming (252-0061-00) CPU 0 CPU 1 CPU 2 CPU 3 Timothy Roscoe Herbstsemester 2012
More informationCS758 Synchronization. CS758 Multicore Programming (Wood) Synchronization
CS758 Synchronization 1 Shared Memory Primitives CS758 Multicore Programming (Wood) Performance Tuning 2 Shared Memory Primitives Create thread Ask operating system to create a new thread Threads starts
More informationA simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract
A simple correctness proof of the MCS contention-free lock Theodore Johnson Krishna Harathi Computer and Information Sciences Department University of Florida Abstract Mellor-Crummey and Scott present
More informationConcurrent Queues, Monitors, and the ABA problem. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Concurrent Queues, Monitors, and the ABA problem Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Queues Often used as buffers between producers and consumers
More informationIntroduction. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Introduction Companion slides for The by Maurice Herlihy & Nir Shavit Moore s Law Transistor count still rising Clock speed flattening sharply 2 Moore s Law (in practice) 3 Nearly Extinct: the Uniprocesor
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationAlgorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors
Algorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu COMP 422 Lecture 18 17 March 2009 Summary
More informationMutual Exclusion Algorithms with Constant RMR Complexity and Wait-Free Exit Code
Mutual Exclusion Algorithms with Constant RMR Complexity and Wait-Free Exit Code Rotem Dvir 1 and Gadi Taubenfeld 2 1 The Interdisciplinary Center, P.O.Box 167, Herzliya 46150, Israel rotem.dvir@gmail.com
More informationSuggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!
1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and
More informationSpinlocks. Spinlocks. Message Systems, Inc. April 8, 2011
Spinlocks Samy Al Bahra Devon H. O Dell Message Systems, Inc. April 8, 2011 Introduction Mutexes A mutex is an object which implements acquire and relinquish operations such that the execution following
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationNON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 14 November 2014
NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6 Introduction Amdahl s law Basic spin-locks Queue-based locks Hierarchical locks Reader-writer locks Reading
More informationSynchronization. Erik Hagersten Uppsala University Sweden. Components of a Synchronization Even. Need to introduce synchronization.
Synchronization sum := thread_create Execution on a sequentially consistent shared-memory machine: Erik Hagersten Uppsala University Sweden while (sum < threshold) sum := sum while + (sum < threshold)
More informationConcurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis
oncurrent programming: From theory to practice oncurrent Algorithms 2015 Vasileios Trigonakis From theory to practice Theoretical (design) Practical (design) Practical (implementation) 2 From theory to
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationUniversality of Consensus. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Universality of Consensus Companion slides for The by Maurice Herlihy & Nir Shavit Turing Computability 1 0 1 1 0 1 0 A mathematical model of computation Computable = Computable on a T-Machine 2 Shared-Memory
More informationThe complete license text can be found at
SMP & Locking These slides are made distributed under the Creative Commons Attribution 3.0 License, unless otherwise noted on individual slides. You are free: to Share to copy, distribute and transmit
More informationMULTIPROCESSORS AND THREAD LEVEL PARALLELISM
UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared
More informationLecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform
More informationToday. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture
More informationIntroduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization
Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency
More informationMultiprocessors and Locking
Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access
More informationMultiprocessor Synchronization
Multiprocessor Synchronization Material in this lecture in Henessey and Patterson, Chapter 8 pgs. 694-708 Some material from David Patterson s slides for CS 252 at Berkeley 1 Multiprogramming and Multiprocessing
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationAlgorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors
Algorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 21 30 March 2017 Summary
More informationDistributed Operating Systems
Distributed Operating Systems Synchronization in Parallel Systems Marcus Völp 2009 1 Topics Synchronization Locking Analysis / Comparison Distributed Operating Systems 2009 Marcus Völp 2 Overview Introduction
More informationHardware: BBN Butterfly
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors John M. Mellor-Crummey and Michael L. Scott Presented by Charles Lehner and Matt Graichen Hardware: BBN Butterfly shared-memory
More informationConcurrent Preliminaries
Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures
More informationGLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs
GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs Authors: Jos e L. Abell an, Juan Fern andez and Manuel E. Acacio Presenter: Guoliang Liu Outline Introduction Motivation Background
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationRole of Synchronization. CS 258 Parallel Computer Architecture Lecture 23. Hardware-Software Trade-offs in Synchronization and Data Layout
CS 28 Parallel Computer Architecture Lecture 23 Hardware-Software Trade-offs in Synchronization and Data Layout April 21, 2008 Prof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs28 Role of
More informationLecture 19: Synchronization. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Lecture 19: Synchronization CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 4 due tonight at 11:59 PM Synchronization primitives (that we have or will
More informationCS Advanced Operating Systems Structures and Implementation Lecture 8. Synchronization Continued. Goals for Today. Synchronization Scheduling
Goals for Today CS194-24 Advanced Operating Systems Structures and Implementation Lecture 8 Synchronization Continued Synchronization Scheduling Interactive is important! Ask Questions! February 25 th,
More informationReaders-Writers Problem. Implementing shared locks. shared locks (continued) Review: Test-and-set spinlock. Review: Test-and-set on alpha
Readers-Writers Problem Implementing shared locks Multiple threads may access data - Readers will only observe, not modify data - Writers will change the data Goal: allow multiple readers or one single
More informationConcurrent Counting using Combining Tree
Final Project Report by Shang Wang, Taolun Chai and Xiaoming Jia Concurrent Counting using Combining Tree 1. Introduction Counting is one of the very basic and natural activities that computers do. However,
More informationPeterson s Algorithm
Peterson s Algorithm public void lock() { flag[i] = true; victim = i; while (flag[j] && victim == i) {}; } public void unlock() { flag[i] = false; } 24/03/10 Art of Multiprocessor Programming 1 Mutual
More informationCS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019
CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?
More informationCS377P Programming for Performance Multicore Performance Cache Coherence
CS377P Programming for Performance Multicore Performance Cache Coherence Sreepathi Pai UTCS October 26, 2015 Outline 1 Cache Coherence 2 Cache Coherence Awareness 3 Scalable Lock Design 4 Transactional
More informationCS 241 Honors Concurrent Data Structures
CS 241 Honors Concurrent Data Structures Bhuvan Venkatesh University of Illinois Urbana Champaign March 27, 2018 CS 241 Course Staff (UIUC) Lock Free Data Structures March 27, 2018 1 / 43 What to go over
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationDistributed Operating Systems
Distributed Operating Systems Synchronization in Parallel Systems Marcus Völp 203 Topics Synchronization Locks Performance Distributed Operating Systems 203 Marcus Völp 2 Overview Introduction Hardware
More informationLinked Lists: The Role of Locking. Erez Petrank Technion
Linked Lists: The Role of Locking Erez Petrank Technion Why Data Structures? Concurrent Data Structures are building blocks Used as libraries Construction principles apply broadly This Lecture Designing
More informationSynchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011
Synchronization CS61, Lecture 18 Prof. Stephen Chong November 3, 2011 Announcements Assignment 5 Tell us your group by Sunday Nov 6 Due Thursday Nov 17 Talks of interest in next two days Towards Predictable,
More information10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4
Temporal relations CSE 451: Operating Systems Autumn 2010 Module 7 Synchronization Instructions executed by a single thread are totally ordered A < B < C < Absent synchronization, instructions executed
More informationIncoherent each cache copy behaves as an individual copy, instead of as the same memory location.
Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages
More informationLecture 10: Multi-Object Synchronization
CS 422/522 Design & Implementation of Operating Systems Lecture 10: Multi-Object Synchronization Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationFinal Exam Solutions May 11, 2012 CS162 Operating Systems
University of California, Berkeley College of Engineering Computer Science Division EECS Spring 2012 Anthony D. Joseph and Ion Stoica Final Exam May 11, 2012 CS162 Operating Systems Your Name: SID AND
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationThe need for atomicity This code sequence illustrates the need for atomicity. Explain.
Lock Implementations [ 8.1] Recall the three kinds of synchronization from Lecture 6: Point-to-point Lock Performance metrics for lock implementations Uncontended latency Traffic o Time to acquire a lock
More informationDistributed Systems Synchronization. Marcus Völp 2007
Distributed Systems Synchronization Marcus Völp 2007 1 Purpose of this Lecture Synchronization Locking Analysis / Comparison Distributed Operating Systems 2007 Marcus Völp 2 Overview Introduction Hardware
More informationl12 handout.txt l12 handout.txt Printed by Michael Walfish Feb 24, 11 12:57 Page 1/5 Feb 24, 11 12:57 Page 2/5
Feb 24, 11 12:57 Page 1/5 1 Handout for CS 372H 2 Class 12 3 24 February 2011 4 5 1. CAS / CMPXCHG 6 7 Useful operation: compare and swap, known as CAS. Says: "atomically 8 check whether a given memory
More informationCache Coherence Tutorial
Cache Coherence Tutorial The cache coherence protocol described in the book is not really all that difficult and yet a lot of people seem to have troubles when it comes to using it or answering an assignment
More informationPage 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence
SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it
More informationCS377P Programming for Performance Multicore Performance Synchronization
CS377P Programming for Performance Multicore Performance Synchronization Sreepathi Pai UTCS October 21, 2015 Outline 1 Synchronization Primitives 2 Blocking, Lock-free and Wait-free Algorithms 3 Transactional
More information