CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 5: Concurrency Control Topics Data Database design Queries Decomposition Localization Optimization Transactions Concurrency control Reliability Replication CS 347 Notes 5 2 Transactions Programs with database accesses Always end in commit or abort Transactions Programs with database accesses Always end in commit or abort Properties Atomicity Consistency Isolation Durability CS 347 Notes 5 3 CS 347 Notes 5 4
Transactions Programs with database accesses Always end in commit or abort Concurrency control Properties Atomicity Consistency Isolation Durability Reliability Concurrency Control Synchronization primitives Mutual exclusive access Execution ordering Pessimistic Optimistic CS 347 Notes 5 5 CS 347 Notes 5 6 Concurrency Control Schedules and serializability Locking Timestamp ordering Schedule A schedule represents how a set of transactions are executed Just like in a centralized system Schedules may be good (i.e., preserve constraints) or bad CS 347 Notes 5 7 CS 347 Notes 5 8
Schedule Schedule Example X Y constraint: X = Y Schedule S 1 () a X 5 () c X 2 () X a+100 6 () X 2c 3 () b Y 7 () d Y 4 () Y b+100 8 () Y 2d Precedence relation 1 () a X 2 () X a+100 5 () c X 3 () b Y 6 () X 2c 4 () Y b+100 7 () d Y 8 () Y 2d Precedence Intra-transaction Inter-transaction If X = Y = 0 initially, X = Y = 200 at the end CS 347 Notes 5 9 CS 347 Notes 5 10 Schedule Schedule S 1 () a X 2 () X a+100 5 () c X 3 () b Y 6 () X 2c 4 () Y b+100 7 () d Y 8 () Y 2d If X = Y = 0 initially, X = Y = 200 at the end CS 347 Notes 5 11 Precedence Intra-transaction Inter-transaction Other, schedules with different outcomes? Schedule Formal definition Let T= {,,, T n } be a set of transactions A schedule S over T is a partial order with ordering relation < S where: 1) S = T i i = 1 2) < S i = 1 < T n n i 3) For any two conflicting operations p, q S, either p < S q or q < S p On same data, at least one is a write CS 347 Notes 5 12
Schedule Example r 1[X] w 1[X] T 3 S r 2[X] w 2[Y] w 2[X] r 3[X] w 3[X] w 3[Y] w 3[Z] r 2[X] w 2[Y] w 2[X] r 3[X] w 3[X] w 3[Y] w 3[Z] r 1[X] w 1[X] Schedule Precedence graph The precedence graph P(S) for some schedule is a directed graph Nodes the transactions in S Edges T i T j is an edge iff p T i, q T j such that p, q conflict and p < S q CS 347 Notes 5 13 CS 347 Notes 5 14 Schedule Serializability Example r 3[X] w 3[X] Theorem A schedule S is (conflict-)serializable iff P(S) is acyclic S r 1[X] w 1[X] w 1[Y] r 2[X] w 2[Y] P(S) T 3 T2 T1 because r2[x] w1[x], w2[y] w1[y] T 3 because w 1[X] r 3[X], w 1[X] w 3[X] T2 T3 because r2[x] w3[x] CS 347 Notes 5 15 CS 347 Notes 5 16
Serializability Enforcement Locks Timestamps Locking Just like in a centralized system But with multiple lock managers scheduler 1 D 1 node 1 D1 locks... scheduler 2 D 2 node 2 D2 locks CS 347 Notes 5 17 CS 347 Notes 5 18 Locking Just like in a centralized system But with multiple lock managers Locking Reminder Using locks alone does not guarantee serializability scheduler 1 D 1 D1 locks... scheduler 2 D 2 D2 locks node 1 access & lock data T access & lock data release all locks at the end node 2 CS 347 Notes 5 19 CS 347 Notes 5 20
Locking Reminder Using locks alone does not guarantee serializability L(X) 1 () a X 2 R( X) () X a+100 L(X) L(Y) 3 () c X 5 () d Y 4 R( X) () X 2c 6 R( Y) () Y 2d L(Y) 7 () b Y 8 () Y b+100 R( Y) CS 347 Notes 5 21 Locking L(X) 1 () a X 2 R( X) () X a+100 L(X) L(Y) 3 () c X 5 () d Y 4 R( X) () X 2c 6 R( Y) () Y 2d L(Y) 7 () b Y 8 () Y b+100 R( Y) If X = Y = 0 initially, X = 200 and Y = 100 at the end serializable CS 347 Notes 5 22 Two-Phase Locking One solution locks Two-Phase Locking One solution locks time time May lead to cascading aborts CS 347 Notes 5 23 CS 347 Notes 5 24
Strict Two-Phase Locking Locking with Shared Memory locks P P M Only release on commit or abort to avoid cascading aborts time Where does lock table live? How do we avoid race conditions? CS 347 Notes 5 25 CS 347 Notes 5 26 Locking with Shared Disk Locking with Shared Disk M Where does lock table live? How do we avoid race conditions? P M P Fixed partition Dynamic partition For each data item X need to have lock table entry L(X) For each L(X) need to know current owner processor P(X) Replicate P(X) at all processors CS 347 Notes 5 27 CS 347 Notes 5 28
Deadlocks Remember, 2PL may lead to deadlocks Deadlocks Even if nodes check WFG locally, global deadlocks are possible L(X), r(x), L(Y) L(Y), r(y), L(X) Need to avoid cycles in wait-for graph (WFG) between transactions Many deadlock solutions Detection vs. prevention Timeouts Wait-die Wound-wait Local WFG: No cycles Local WFG: No cycles CS 347 Notes 5 29 CS 347 Notes 6 30 Deadlocks Need to combine local WFG to discover global deadlocks Timestamp Ordering Assign timestamp when transaction begins Local WFG: No cycles Local WFG: No cycles If ts() < ts() < < ts(t n) then scheduler produces history equivalent to,,..., T n E.g., at central detection node CS 347 Notes 6 31 CS 347 Notes 5 32
Timestamp Ordering Rule If p i[x] and q j[x] are conflicting operations then p i[x] < S q j[x] iff ts(t i) < ts (T j) Timestamp Ordering Example Non-serializable schedule S () a X () d Y () X a+100 () Y 2d () c X () b Y () X 2c () Y b+100 CS 347 Notes 5 33 CS 347 Notes 5 34 Timestamp Ordering Timestamp Ordering Example Non-serializable schedule S Example Non-serializable schedule S ts ( T1 ) < ts ( T2 ) ts ( T1 ) < ts ( T2 ) () a X () d Y () X a+100 () Y 2d () c X () b Y () X 2c () Y b+100 reject! () a X () d Y () X a+100 () Y 2d () c X () b Y () X 2c () Y b+100 reject! abort T1 abort T1 abort T2 abort T2 CS 347 Notes 5 35 CS 347 Notes 5 36
Strict TO Lock written items until it is certain that writing transaction has been successful Avoid cascading aborts Strict TO Example Non-serializable schedule S ts ( T1 ) < ts ( T2 ) () a X () d Y () X a+100 () Y 2d () c X delay () b Y abort T1 reject! CS 347 Notes 5 37 CS 347 Notes 5 38 Strict TO TO Scheduler Example Non-serializable schedule S X data item ts ( T1 ) < ts ( T2 ) () a X () d Y () X a+100 () Y 2d () c X delay () b Y reject! mtsr[x] maximum timestamp of a transaction that read X mtsw[x] maximum timestamp of a transaction that wrote X nr[x] number of transactions currently reading X (= 0, 1, 2, ) nw[x] number of transactions currently writing X (= 0 or 1) abort T1 () c X () X 2c abort T1 CS 347 Notes 5 39 CS 347 Notes 5 40
TO Scheduler Part 1 r i[x] arrives TO Scheduler Part 2 w i[x] arrives if ts(t i) < mtsw[x] then abort T i else if ts(t i) > mtsr[x] then mtsr[x] ts(t i) if queue is empty and nw[x] = 0 then nr[x] nr[x] + 1 initiate read X else add r i[x] to queue CS 347 Notes 5 41 if ts(t i) < mtsw[x] or ts(t i) < mtsr[x] then abort T i else mtsw[x] ts(t i) if queue is empty and nw[x] = 0 and nr[x] = 0 then nw[x] 1 write X and wait for T i to finish else add w i[x] to queue CS 347 Notes 5 42 TO Scheduler Part 3 When some operation o (read or write) on X finishes no[x] no[x] 1 not_done true while not_done o j[x] head of queue (with smallest timestamp) if o = w and nr[x] = 0 and nw[x] = 0 then remove o j[x] from queue CS 347 Notes 5 43 TO Scheduler Part 3 When some operation o (read or write) on X finishes nw[x] 1 write X and wait for T j to finish else if o = r and nw[x] = 0 then remove o j[x] from queue nr[x] nr[x] + 1 initiate read X else not_done true CS 347 Notes 5 44
TO Scheduler TO Scheduler For reads nr[x] nr[x] + 1; initiate read X If a transaction is aborted, it must be retried with a new larger timestamp For writes nw[x] 1; write X and wait for T i to finish In part 3, the end of a write is only processed when all other writes for the transaction have completed mtsr[x] = 10 mtsw[x] = 9 ts(t) = 8 read X CS 347 Notes 5 45 CS 347 Notes 5 46 TO Scheduler If a transaction is aborted, it must be retried with a new larger timestamp mtsr[x] = 10 mtsw[x] = 9 ts(t) = 8 starvation (keeps being aborted), should use ts(t) = 11 read X TO Scheduler Theorem If S is a schedule representing an execution by a TO scheduler then S is serializable Proof Assume T i T j in P(S) conflicting p i[x], q j[x] in S: p i[x] < S q j[x] ts(t i) < ts(t j) by TO rule CS 347 Notes 5 47 CS 347 Notes 5 48
TO Scheduler Assume there is a cycle... T n in P(S) ts() < ts() < < ts(t n) < ts (), which is a contradiction Hence, P(S) is acyclic S is serializable Thomas Write Rule mtsr[x] ts(t i) mtsw[x] T i wants to write X CS 347 Notes 5 49 CS 347 Notes 5 50 Thomas Write Rule w i[x] arrives Thomas Write Rule mtsr[x] if ts(t i) < mtsr[x] then abort T i else if ts(t i) < mtsw[x] ignore write else // as before ts(t i) T i wants to write X mtsw[x] Why can t we let T i go ahead? CS 347 Notes 5 51 CS 347 Notes 5 52
Thomas Write Rule T j reads X mtsr[x] TO Optimizations Update mtsr and mtsw when the action is executed, not when it is added to the queue ts(t i) T i wants to write X mtsw[x] mtsw[x] = 9 or 7? Why can t we let T i go ahead? mtsr[x] is only the latest read there could be others before X w @ ts = 9 w @ ts = 8 w @ ts = 7 active write CS 347 Notes 5 53 CS 347 Notes 5 54 TO Optimizations Use multiple versions of data Timestamp Management data mtsr mtsw X Value written @ ts = 9 r i[x] ts(t i) = 8 Value written @ ts = 7 X 1 X 2 X n Tons of space and extra IO CS 347 Notes 5 55 CS 347 Notes 5 56
Timestamp Management Timestamp cache item mtsr mtsw X min Y Z If transaction reads/writes X, add/update entry for X in cache Periodically purge items X with mtsr[x] < min, mtsw[x] < min Track min (e.g., choose min current time d) Timestamp Management Timestamp cache Enforcing TO rule for p i[x]: if X has entry in cache then use mtsr, mtsw values in cache else assume mtsr[x] = mtsw[x] = min Use hashing (on items) for cache CS 347 Notes 5 57 CS 347 Notes 5 58 2PL TO 2PL TO w 1[ Y ] r 2[ X ] r 2[ Y ] w 2[ Z ] ts() < ts() < ts(t 3) T 3 w 3[ X ] S r 2[ X ] w 3[ X ] w 1 [ Y ] r 2[ Y ] w 2[ Z ] S could be produced with TO but not with 2PL Are all 2PL schedules TO? any examples here? 2PL schedules TO schedules previous example CS 347 Notes 5 59 CS 347 Notes 5 60
Distributed TO Scheduler Distributed TO Scheduler scheduler 1 D 1 D1 ts cache... scheduler 2 D 2 D2 ts cache scheduler 1 D 1 D1 ts cache... scheduler 2 D 2 D2 ts cache node 1 node 2 node 1 access data node 2 Each scheduler is independent Signal all schedulers involved at the end of the transaction T access data CS 347 Notes 5 61 CS 347 Notes 5 62 Pessimistic vs. Optimistic Control Pessimistic Pessimistic vs. Optimistic Control Optimistic control enables more parallelism validate read (compute) write R V W R V W Optimistic R V W read (compute) validate write R V W R V W R V W CS 347 Notes 5 63 CS 347 Notes 5 64
Summary 2PL Useful in a distributed system Most popular Deadlocks still possible TO Useful in a distributed system Aborts more likely No deadlocks CS 347 Notes 5 65