Part III Transactions
Transactions Example Transaction: Transfer amount X from A to B debit(account A; Amount X): A = A X; credit(account B; Amount X): B = B + X; Either do the whole thing or nothing
ACID Properties AC Atomicity: No intermediate results are observable To an external observer, T jumps from the initial state to the final state, or never leaves the initial state Consistency: T produces consistent results only, otherwise it aborts T fulfils the consistency constraints of the application
ACID Properties ID Isolation: T looks like it is the only program running Notice the looks like! Durability: T produces unforgettable by the system results T s results become part of reality
Some Transaction Primitives Primitive BEGIN_TRANSACTION END_TRANSACTION ABORT_TRANSACTION READ WRITE Description Make the start of a transaction Terminate the transaction and try to commit Kill the transaction and restore the old values Read data from a file, a table, or otherwise Write data to a file, a table, or otherwise
Transaction Types Flat Transactions Flat Transactions with Savepoints Chained Transactions Nested Transactions Distributed Transactions Multi level Transactions Open Nested Transactions Long lived Transactions
Flat Transactions All or Nothing Simplest Type Basic building block for organizing an application into atomic actions Encloses a program with brackets: BEGIN_TRANSACTION END_TRANSACTION All what is between the brackets is performed or nothing at all Strictly satisfies ACID
Flat Transactions Limitations All or nothing is not always appropriate It is beneficial to commit partial results (violate atomicity) Trip Planning: commit part of a transaction Bulk Updates: can be expensive to undo all updates
Trip Planning example BEGIN_TRANSACTION reserve LNK -> MSP; reserve MSP -> SEA; reserve SEA -> YVR; END_TRANSACTION (a) BEGIN_TRANSACTION reserve LNK -> MSP; reserve MSP -> SEA; reserve SEA -> YVR full => ABORT_TRANSACTION (b) a) Transaction to reserve three flights commits b) Transaction aborts when third flight is unavailable
T: Flat Transactions with SavePoints BEGIN_TRANSACTION S 1 S m SAVEPOINT Flat T w/o SavePoint S m+1 END_TRANSACTION W SavePoint ROLLBACK Failure here
Nested Transactions No D Generalize SavePoints They organize T actions into a hierarchy 1. A nested T is a tree of T s; sub trees are flat or nested 2. Leaf T s are flat 3. Root T = top level T; all other are subts 4. SubT can commit or roll back. Its commit will not take effect unless the root commits 5. When a subt rolls back, all its children are rolled back Satisfy ACI, not the D (except for top level)
Nested Transactions Trip Planning BEGIN_TRANSACTION BEGIN_SUB reserve LNK -> MSP; Can Commit END_SUB BEGIN_SUB reserve MSP -> SEA; Can Commit END_SUB Failure here BEGIN_SUB reserve SEA -> YVR; END_SUB END_TRANSACTION
Distributed Transactions A flat transaction that runs in a distributed environment: Visit several nodes in the system Distributed T: structure is determined by the distribution of data in the DS Nested T: structure is determined by the functional decomposition of the application
Nested versus Distributed Ts a) A nested transaction b) A distributed transaction
How to Implement Transactions?
Achieving Atomicity (I) Private Workspace: Change a copy of data, keeping original intact COMMIT: copy changed data to original ROLLBACK: discard copy
Private Workspace a) The file index and disk blocks for a three block file b) The situation after a transaction has modified block 0 and appended block 3 c) After committing
Achieving Atomicity (II) Writeahead Log: Change original data, logging every change before making it COMMIT: leave changes ROLLBACK: restore from log
Writeahead Log x = 0; Log Log Log y = 0; BEGIN_TRANSACTION; x = x + 1; [x = 0 / 1] [x = 0 / 1] [x = 0 / 1] y = y + 2 [y = 0 / 2] [y = 0 / 2] x = y * y; [x = 1 / 4] END_TRANSACTION; (a) (b) (c) (d) a) A transaction b) d)the log before each statement is executed
Achieving Consistency & Isolation Concurrency Control: controlling the execution of concurrent Ts operating on shared data Consistency & Isolation: All Ts must appear as if they executed in some sequential order, one after another (Serializability)
Concurrency Control Concurrency Control General organization of managers for handling transactions.
Concurrency Control for Distributed Ts General organization of managers for handling distributed transactions.
Serializability BEGIN_TRANSACTION x = 0; x = x + 1; END_TRANSACTION (a) BEGIN_TRANSACTION x = 0; x = x + 2; END_TRANSACTION (b) BEGIN_TRANSACTION x = 0; x = x + 3; END_TRANSACTION (c) Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal a) c)three transactions (d) d) Possible schedules
Ts as Sequences of Reads and Writes Transaction = sequence of read and write operations Read of x returning 1 by T: r T (x)1 Write to x a value 2 by T: w T (x)2 T a BEGIN_TRANSACTION x = 0; x = x + 1; END_TRANSACTION BT w a (x)0 r a (x) 0 w a (x)1 ET
Conflicting Operations Two operations o 1 and o 2 conflict if: Both o 1 and o 2 are on the same data item, and At least one of o 1 and o 2 is a write w a (x)0 r a (x) 0 w a (x)1 X w b (x)0 r b (x) 0 w b (x)2 w c (x)0 r c (x) 0 w c (x)3
Concurrency Control Algorithms (CCA) CCA: order read and write operations By using locks (critical sections) By using timestamps Pessimistic CCA: Act conservatively so that nothing can go wrong Optimistic CCA: Act aggressively, if something goes wrong, abort
Using Locks For T to access (read or write) a data item x: T requests a lock on x from the scheduler When T finishes accessing x: T releases the lock Scheduler grants acquisitions & releases on locks so that serializability is guaranteed Locks: Shared: can only read x Exclusive: can read and write x
T a x = 0 x = x + 1 Acquire x.lock w(x)0 Release x.lock Acquire x.lock r(x)0 Release x.lock Acquire x.lock w(x)1 Release x.lock x = 3 Serializability x = 1 or 2 only T b x = 0 x = x + 2 Acquire x.lock w(x)0 Release x.lock Acquire x.lock r(x)1 Release x.lock Acquire x.lock w(x)3 Release x.lock
Two Phase Locking Pessimistic A transaction executes in two phases
2PL Scheduler Rules 1. When sched receives operation o T (x)v: a. If ((exists o holding a lock on x) & (o and o are conflicting)) Delay o b. Else grant a lock to o and pass o to the data manager 2. Sched never releases a lock for x granted for operation o, until data manager acks the completion of o 3. Once sched release a lock for T, T cannot be granted another lock. If T tries to, abort it
2PL Analysis 2PL guarantees serializability BEGIN_TRANSACTION x = 0; x = x + 1; END_TRANSACTION BEGIN_TRANSACTION x = 0; x = x + 2; END_TRANSACTION BEGIN_TRANSACTION x = 0; x = x + 3; END_TRANSACTION (a) (b) (c) Schedule 1 x = 0; x = x + 3; x = 0; x = x + 1; x = 0; x = x + 2; Legal, allowed Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal, not allowed Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal, not allowed BUT not every serializable schedule can be generated by 2PL 2PL can cause a deadlock, concurrency is reduced
Strict 2PL Do not release locks until a T is finished
2PL Discussion Strict 2PL can also lead to a deadlock Use timeouts to preempt locks Strict 2PL advantages: T always reads committed data avoid cascaded aborts Acquisitions and releases can be done without T s knowledge Every 2PL (both versions) schedule is serializable BUT not every serializable schedule can be generated by 2PL (both versions)
Pessimistic Timestamp Ordering (PTO) Each T has a timestamp, denoted T.ts Each operation in T has same timestamp (T.ts) Using Lamport s or vector timestamps, all Ts have unique timestamp values If T.ts < T.ts then T must appear before T in the schedule With each data item x, associate: x.wts : largest TS of any T that executed w T (x)v x.rts : largest TS of any T that executed r T (x)v Update x.wts (resp. x.rts) whenever w T (x)v (resp. r T (x)v) occurs
PTO General Concept If T.ts < T.ts then T must appear before T in the schedule Process transactions in a serial order Can use the same file, but must do it in order Therefore atomicity is preserved
PTO read operation When Sched receives r T (x)v operation: if T.ts < x.wts \\T tries to read the past reject r T (x)v roll T back (assign T a new TS and restart) if T.ts x.wts execute r T (x)v x.rts = max(x.rts, T.ts)
PTO write operation When Sched receives w T (x)v operation: if (T.ts < x.rts) or (T.ts < x.wts) \\T tries to write in the past reject w T (x)v roll T back (assign T a new TS and restart) else execute w T (x)v x.wts = T.ts
PTO Analysis Will PTO cause deadlock? PTO is deadlock free Each PTO schedule is serializable Not every serializable schedule is possible in PTO 2PL can produce schedules not possible under PTO, and vice versa All serializable Schedules All PTO Schedules All 2PL Schedules
Optimistic Timestamp Ordering (OTO) Each T does its changes to data (in its private workspace) When done either commit or restart Each data item x has x.wts and x.rts, updated as before When T needs to commit, Check x s TS If there is a conflict then restart Else commit
OTO (cont.) Parallelism is maximized No waiting on locks Inefficient when an abort is needed Pessimistic v.s. Optimistic CCA For instance, PTO v.s. OTO??
TO: Cascaded Aborts Problem For example the following run with transactions T1 and T2: W 1 (x) R 2 (x) W 2 (y) C 2 R 1 (z) C 1 This could be produced by a TO scheduler T2 commits even though having read from an uncomitted transaction R 2 (x) Answer: a scheduler can keep a list of other transactions each transaction has read from, and not let a transaction commit before this list consisted of only committed transactions To avoid cascaded aborts, the scheduler could tag data written by uncommitted transactions as dirty, and never let a read operation start on such a data item before it was untagged