CMPUT 391 Database Management Systems Implementing Isolation Textbook: 20 & 21.1 (first edition: 23 & 24.1) University of Alberta 1
Isolation Serial execution: Since each transaction is consistent and isolated from all others, schedule is guaranteed to be correct for all applications Inadequate performance Since system has multiple asynchronous resources and transaction uses only one at a time Concurrent execution: Improved performance (multiprogramming) Some interleavings produce incorrect result We are interested in concurrent schedules that are equivalent to serial schedules. These are referred to as serializable schedules. University of Alberta 2
Transaction Schedule T1: begin_transaction();. p 1,1;. p 1,2;. p 1,3 ; commit(); local variables Consistent - performs correctly when executed in isolation starting in a consistent database state Preserves database consistency Transaction schedule p 1,3 p 1,2 p 1,1 To db server Moves database to a new state that corresponds to new real-world state University of Alberta 3
Schedule T1 T2 Arriving schedule (merge of transaction schedules) Concurrency Control Schedule in which requests are serviced To database T3 transaction schedules Database server University of Alberta 4
Schedule Representation 1: T 1 : p 1 p 2 p 3 p 4 T 2 : p 1 p 2 time Representation 2: p 1,1 p 1,2 p 2,1 p 1,3 p 2,2 p 1,4 time University of Alberta 5
Concurrency Control Transforms arriving schedule into a correct interleaved schedule to be submitted to the DBMS Delays servicing a request (reordering) - causes a transaction to wait Refuses to service a request - causes transaction to abort Actions taken by concurrency control have performance costs Goal is to avoid delaying servicing a request University of Alberta 6
Problems with Interleaved Transactions: Example Occurs, e.g., when a transaction reads several values from a database while a second transaction updates some of them. T1 sum=0 R(A) sum=sum+a R(B) sum=sum+b R(C) sum=sum+c T2 R(A) A=A-10 W(A) R(C) C=C+10 W(C) A B C sum $100 $50 $25 0 $100 $50 $25 0 $100 $50 $25 100 $90 $50 $25 100 $90 $50 $25 150 $90 $50 $25 150 $90 $50 $35 150 $90 $50 $35 150 $90 $50 $35 185 Should be 175 University of Alberta 7
Correct Schedules For interleaved schedules to be correct, they should be equivalent to serial schedules in their effect on the database for all applications. A strong notion of Equivalence (also called Conflict Equivalence) is based on the commutativity of operations Definition: Database operations p 1 and p 2 commute if, for all initial database states, they return the same results and leave the database in the same final state when executed in either order. University of Alberta 8
Commutativity of Operations Read r(x, X) - copy the value of database variable x to local variable X Write w(x, X) - copy the value of local variable X to database variable x We use r 1 (x) and w 1 (x) to mean a read or write of x by transaction T 1 University of Alberta 9
Commutativity of Read and p 1 commutes with p 2 if Write Operations They operate on different data items w 1 (x) commutes with w 2 (y) and r 2 (y) Both are reads r 1 (x) commutes with r 2 (x) Operations that do not commute conflict w 1 (x) conflicts with w 2 (x) w 1 (x) conflicts with r 2 (x) T2 conflicts T1 Read(x) Write(x) Read(x) No Yes Write(x) Yes Yes University of Alberta 10
Equivalence of Schedules An interchange of adjacent operations of different transactions in a schedule creates an equivalent schedule if the operations commute S 1 : S 1,1, p i,j, p k,l, S 1,2 where i < > k S 2 : S 1,1, p k,l, p i,j, S 1,2 Equivalence is transitive: If S 1 is eqivalent to S 2 (by a series of such interchanges), and S 2 is eqivalent to S 3, then S 1 is equivalent to S 3 University of Alberta 11
Example of Equivalence conflict S 1 : r 1 (x) r 2 (x) w 2 (x) r 1 (y) w 1 (y) S 2 : r 1 (x) r 2 (x) r 1 (y) w 2 (x) w 1 (y) S 3 : r 1 (x) r 1 (y) r 2 (x) w 2 (x) w 1 (y) S 4 : r 1 (x) r 1 (y) r 2 (x) w 1 (y) w 2 (x) S 5 : r 1 (x) r 1 (y) w 1 (y) r 2 (x) w 2 (x) S 1 is equivalent to S 5 S 5 is the serial schedule T 1, T 2 S 1 is serializable S 1 is not equivalent to the serial schedule T 2, T 1 conflicting operations ordered in same way University of Alberta 12
Example of Equivalence T 1 : begin transaction read (x, X); X = X+4; write (x, X); commit; Interchange commuting operations Interchange conflicting operations T 2 : begin transaction read (x,y); write (y,y); commit; r 1 (x) r 2 (x) w 2 (y) w 1 (x) x=1, y=3 x=5, y=1 x=5, y=1 r 2 (x) w 2 (y) r 1 (x) w 1 (x) T 2 T 1 r 1 (x) r 2 (x) w 2 (y) w 1 (x) x=1, y=3 x=5, y=1 x=5, y=5 r 1 (x) w 1 (x) r 2 (x) w 2 (y) T 1 T 2 University of Alberta 13
Equivalence of Schedules Theorem - Schedule S 1 can be derived from S 2 by a sequence of commutative interchanges if and only if conflicting operations in S 1 and S 2 are ordered in the same way Only if : Commutative interchanges do not reorder conflicting operations If : A sequence of commutative interchanges can be determined that takes S 1 to S 2 since conflicting operations do not have to be reordered (see text) University of Alberta 14
Conflict Equivalence Definition- Two schedules, S 1 and S 2, of the same set of operations are conflict equivalent if conflicting operations are ordered in the same way in both Or (using theorem) if one can be obtained from the other by a series of commutative interchanges University of Alberta 15
Conflict Serializable Definition: A schedule is conflict serializable iff it is conflict equivalent to a serial schedule r 1 (x) w 2 (x) w 1 (y) r 2 (y) r 1 (x) w 1 (y) w 2 (x) r 2 (y) conflict conflict If in S transactions T 1 and T 2 have several pairs of conflicting operations (p 1,1 conflicts with p 2,1 and p 1,2 conflicts with p 2,2 ) then p 1,1 must precede p 2,1 and p 1,2 must precede p 2,2 (or vice versa) in order for S to be serializable. University of Alberta 16
Serializable Schedules By default, serializable means conflict serializable Transactions are totally isolated in a serializable schedule A schedule is correct for any application if it is a serializable schedule of consistent transactions The schedule : r 1 (x) r 2 (y) w 2 (x) w 1 (y) is not serializable University of Alberta 17
View Equivalence Two schedules of the same set of operations are view equivalent if: Corresponding read operations in each return the same values (hence computations are the same) Both schedules yield the same final database state Conflict equivalence implies view equivalence. View equivalence does not imply conflict equivalence. Schedules that are view equivalent to a serial schedule are acceptable University of Alberta 18
View Equivalence T 1 : w(y) w(x) T 2 : r(y) w(x) T 3 : w(x) Schedule is not conflict equivalent to a serial schedule Schedule has same effect as serial schedule T 2 T 1 T 3. It is view equivalent to a serial schedule and hence would be acceptable University of Alberta 19
Conflict vs View Equivalence set of schedules that are view equivalent to serial schedules set of schedules that are conflict equivalent to serial schedules A concurrency control based on view equivalence should provide better performance than one based on conflict equivalence since less reordering is done but It is difficult to implement a view equivalence concurrency control University of Alberta 20
Serialization Graph of a Schedule, S Nodes represent transactions There is a directed edge from node T i to node T j if T i has an operation p i,k that conflicts with an operation p j,r of T j and p i,k precedes p j,r in S Theorem - A schedule is conflict serializable if and only if its serialization graph has no cycles University of Alberta 21
Example Conflict (*) S: p 1,i,, p 2,j,... * T 2 T 4 S is serializable in order T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 1 T 5 T 6 T 7 T 3 T 2 T 4 S is not serializable due to cycle T 2 T 6 T 7 T 2 T 1 T 5 T 6 T 7 T 3 University of Alberta 22
Intuition: Serializability and Nonserializability Consider the nonserializable schedule r 1 (x) w 2 (x) r 2 (y) w 1 (y) Two ways to think about it: Because of the read and write conflicts, the T 1 T 2 operations of T 1 and T 2 cannot be interchanged to make an equivalent serial schedule Because T 1 read x before T 2 wrote it, T 1 must precede T 2 in any ordering, and because T 1 wrote y after T 2 read it, T 1 must follow T 2 in any ordering --- clearly an impossibility University of Alberta 23
Recoverability: Schedules with Aborted Transactions T 1 : r (x) w(y) commit T 2 : w(x) abort T 2 has aborted but has had an indirect effect on the database schedule is unrecoverable Problem: T 1 read uncommitted data - dirty read Solution: A concurrency control is recoverable if it does not allow T 1 to commit until all other transactions that wrote values T 1 read have committed T 1 : r (x) w(y) req_commit T 2 : w(x) abort abort University of Alberta 24
Cascaded Abort Recoverable schedules solve abort problem but allow cascaded abort: abort of one transaction forces abort of another T 1 : r (y) w(z) abort T 2 : r (x) w(y) abort T 3 : w(x) abort Better solution: prohibit dirty reads University of Alberta 25
Dirty Write Dirty write: A transaction writes a data item written by an active transaction Dirty write complicates rollback: no rollback necessary T 1 : w(x) abort T 2 : w(x) abort what value of x should be restored? University of Alberta 26
Strict Schedules Strict schedule: Dirty writes and dirty reads are prohibited Strict and serializable are two different properties Strict, non-serializable schedule: r 1 (x) w 2 (x) r 2 (y) w 1 (y) c 1 c 2 Serializable, non-strict schedule: w 2 (x) r 1 (x) w 2 (y) r 1 (y) c 1 c 2 University of Alberta 27
Concurrency Control Arriving schedule (from transactions) Concurrency Control Serializable schedule (to processing engine) Concurrency control cannot see entire schedule: It sees one request at a time and must decide whether to allow it to be serviced Strategy: Do not service a request if: It violates strictness or serializability, or There is a possibility that a subsequent arrival might cause a violation of serializability University of Alberta 28
Models of Concurrency Controls Immediate Update A write updates a database item A read copies value from a database item Commit makes updates durable Abort undoes updates Deferred Update (we will likely not discuss this) A write stores new value in the transaction s intentions list (does not update database) A read copies value from database or transaction s intentions list Commit uses intentions list to durably update database Abort discards intentions list University of Alberta 29
Immediate vs. Deferred Update database database commit T s intentions list read/write read read/write Transaction T Transaction T Immediate Update Deferred Update University of Alberta 30
Models of Concurrency Controls Pessimistic A transaction requests permission for each database (read/write) operation Concurrency control can: Grant the operation (submit it for execution) Delay it until a subsequent event occurs (commit or abort of another transaction), or Abort the transaction Decisions are made conservatively so that a commit request can always be granted Takes precautions even if conflicts do not occur University of Alberta 31
Models of Concurrency Controls Optimistic - Request for database operations (read/write) are always granted Request to commit might be denied Transaction is aborted if it performed a non-serializable operation Assumes that conflicts are not likely The earlier it can aborted the better University of Alberta 32
Locking Implementation of an Immediate-Update Pessimistic Control A transaction can read a database item if it holds a read (shared) lock on the item It can read or update the item if it holds a write (exclusive) lock If the transaction does not already hold the required lock, a lock request is automatically made as part of the access University of Alberta 33
Locking Request for read lock granted if no transaction currently holds write lock on item Cannot read an item written by an active transaction Request for write lock granted if no transaction holds any lock on item Cannot write an item read/written by an active transaction Granted mode Requested mode read write read yes no write no no Compatibility of locks All locks held by a transaction are released when the transaction completes (commits or aborts) University of Alberta 34
Locking Result: A lock is not granted if the requested access conflicts with a prior access of an active transaction; instead the transaction waits. This enforces the rule: Do not grant a request that imposes an ordering among active transactions (delay the requesting transaction) Resulting schedules are serializable and strict University of Alberta 35
Locking Implementation Associate a lock set, L(x), and a wait set, W(x), with each active database item, x L(x) contains an entry for each granted lock W(x) contains an entry for each pending request When an entry is removed from L(x) (due to transaction termination), promote (non-conflicting) entries from W(x) using some scheduling policy (e.g., FCFS) Associate a lock list, L i, with each transaction, T i. L i links T i s elements in all lock and wait sets Used to release locks on termination University of Alberta 36
Locking Implementation L r r x W w T i holds an r lock on x and waits for a w lock on y y L W w r w L i University of Alberta 37
Deadlock Problem: Controls that cause transactions to wait can cause deadlocks w 1 (x) w 2 (y) request_r 1 (y) request_r 2 (x) Solution: Abort a transaction in the cycle Prevent Deadlock - based on timestamps priorities Detect Deadlock - by detecting a cycle in the waitfor graph when a request is delayed Time-Out - Assume a deadlock when a transaction waits longer than some time-out period University of Alberta 38
Deadlock Prevention based on Timestamp Priorities Assign priorities based on timestamps (i.e., the older a transaction, the higher its priority). Assume T i wants a lock that conflicts with a lock that T j holds. Two policies are possible: Wait-Die: If T i has higher priority, T i allowed to wait for T j ; otherwise (i.e., T i younger): T i aborts Wound-wait: If T i has higher priority, T j aborts; otherwise (i.e., T i younger): T i waits! If a transaction re-starts, make sure it has its original timestamp University of Alberta 39
Deadlock Prevention based on Timeouts A simple approach to deadlock resolution (pseudo prevention/detection) is based on lock request timeouts After requesting a lock on a locked data object, a transaction waits, but if the lock is not granted within a certain period, a deadlock is assumed and the waiting transaction is aborted and re-started. Very simple practical solution adopted by many DBMSs. University of Alberta 40
Deadlock Detection Create a waits-for graph: Nodes are transactions There is an edge from T i to T j if T i is waiting for T j to release a lock. Deadlock exists if there is a cycle in the graph. Periodically check for cycles in the waits-for graph. T1: S(A), R(A), S(B) T2: X(B),W(B) X(C) T3: S(C), R(C) X(A) T4: X(B) T1 T2 T1 T2 T4 T3 T4 T3 University of Alberta 41
Two-Phase Locking Transaction does not release a lock until it has all the locks it will ever require. Transaction, T, has a locking phase followed by an unlocking phase First unlock Number of locks In strict 2PL, all locks are released at once, Phase1 Phase2 Phase1 Objects Are used before the transaction completes/commits Two-phase locking (2PL) Strict two-phase locking (strict 2PL) University of Alberta 42
Two-Phase Locking Theorem: A concurrency control that uses two phase locking produces only serializable schedules. Proof: Consider two transactions T 1 and T 2 in schedule S produced by a two-phase locking control and assume T 1 s first unlock precedes T 2 s first unlock. If they do not access common data items, then all operations commute and S is serializable. Suppose they do. For each common item x, all of T 1 s accesses to x precede all of T 2 s. If this were not the case, T 2 s first unlock must precede a lock request of T 1. Since both transactions are twophase, this implies that T 2 s first unlock precedes T 1 s first unlock, contradicting the assumption. Thus S is serializable. University of Alberta 43
Two-Phase Locking A schedule produced by a two-phase locking control is: Equivalent to a serial schedule in which transactions are ordered by the time of their first unlock operation Not necessarily recoverable (dirty reads and writes are possible) lock unlock T1: l(x) r(x) l(y) w(y) u(y) abort T2: l(y) r(y) l(z) w(z) u(z) u(y) commit University of Alberta 44
Two-Phase Locking A two-phase locking control that holds write locks until commit produces strict serializable schedules A strict two-phase locking control holds all locks until commit and produces strict serializable schedules Equivalent to a serial schedule in which transactions are ordered by their commit time Strict is used in two different ways: a control that releases read locks early guarantees strictness, but is not strict two-phase locking control University of Alberta 45
Lock Granularity Data item: variable, record, row, table, file When an item is accessed, the DBMS locks an entity that contains the item. The size of that entity determines the granularity of the lock Coarse granularity (large entities locked) Advantage: If transactions tend to access multiple items in the same entity, fewer lock requests need to be processed and less lock storage space required Disadvantage: Concurrency is reduced since some items are unnecessarily locked Fine granularity (small entities locked) Advantages and disadvantages are reversed University of Alberta 46
Lock Granularity Table locking (coarse) Lock entire table when a row is accessed. Row (tuple) locking (fine) Lock only the row that is accessed. Page locking (compromise) When a row is accessed, lock the containing page University of Alberta 47
Timestamp-Ordered Concurrency Control Each transaction given a (unique) timestamp (current clock value) when initiated Uses the immediate update model Guarantees equivalent serial order based on timestamps (initiation order) Control is static (as opposed to dynamic, in which the equivalent serial order is determined as the schedule progresses) University of Alberta 48
Timestamp-Ordered Concurrency Control Associated with each database item, x, are two timestamps: wt(x), the largest timestamp of any transaction that has written x, rt(x), the largest timestamp of any transaction that has read x, and an indication function f(x) of whether or not the last write to that item is from a committed transaction University of Alberta 49
Timestamp-Ordered Concurrency Control If T requests to read x: R1: if TS(T) < wt(x), then T is too old; abort T R2: if TS(T) wt(x), then if the value of x is committed, grant T s read and if TS(T) > rt(x) assign TS(T) to rt(x) if the value of x is not committed, if TS(T) wt(x) T waits (to avoid a dirty read) if TS(T) = wt(x), then T was the last one to write x; in this case T is allowed to read x, and if TS(T) > rt(x), rt(x) is set to TS(T) University of Alberta 50
Timestamp-Ordered Concurrency If T requests to write x : Control W1: If TS(T) < rt(x), then T is too old; abort T W2: If rt(x) < TS(T) < wt(x), then no transaction that read x should have read the value T is attempting to write and no transaction will read that value (R1) If x is committed, grant the request but do not do the write If x is not committed, T waits to see if newer value will commit. If it does, discard T s write, else perform it W3: If wt(x), rt(x) TS(T), then if x is committed (or wt(x)=ts(t)), grant the request and assign TS(T) to wt(x) and set x to uncommitted, else T waits. University of Alberta 51
Example Assume TS(T 1 ) < TS(T 2 ), at t 0 x and y are committed, and x s and y s read and write timestamps are less than TS(T 1 ) T 1 : T 2 : r(y) w(x) commit w(y) w(x) commit t 0 t 1 t 2 t 3 t 4 t 1 : (R2) TS(T 1 ) > wt(y); assign TS(T 1 ) to rt(y) t 2 : (W3) TS(T2) > rt(y), wt(y); assign TS(T 2 ) to wt(y) t 3 : (W3) TS(T2) > rt(x), wt(x); assign TS(T 2 ) to wt(x) t 4 : (W2) rt(x) < TS(T1) < wt(x); grant request, but don t do the write University of Alberta 52
Timestamp-Ordered Concurrency Control Control accepts schedules that are not conflict equivalent to any serial schedule and would not be accepted by a two-phase locking control Previous example equivalent to T 1, T 2 But additional space required in database for storing timestamps and time for managing timestamps Reading a data item now implies writing back a new value of its timestamp University of Alberta 53
Optimistic Concurrency Control No locking (and hence no waiting) means deadlocks are not possible Rollback is a problem if optimistic assumption is not valid: work of entire transaction is lost With two-phase locking, rollback occurs only with deadlock With optimistic concurrency control, rollback is only detected before transaction completes University of Alberta 54
Locking in Relational Databases In the simple databases we have been studying, accesses are made to a named item, x, (for example r(x)), which can be locked. In relational databases, accesses are made to items that satisfy a predicate (for example, the set of rows returned by a SELECT statement) What should we lock? What is a conflict? University of Alberta 55
Conflicts in Relational Databases Audit: SELECT SUM (balance) FROM Accounts WHERE name = Mary ; SELECT totbal FROM Depositors WHERE name = Mary NewAccount: INSERT INTO Accounts VALUES ( 123, Mary,100); UPDATE Depositors SET totbal = totbal + 100 WHERE name = Mary Operations on Accounts and Depositors conflict Interleaved execution is not serializable University of Alberta 56
What to Lock? Lock tables: Execution is serializable but... Performance suffers because lock granularity is coarse Lock rows: Performance improves because lock granularity is fine but... Execution is not serializable University of Alberta 57
Problem with Row Locking time Audit (1) Locks and reads Mary s rows in Accounts NewAccount (2) Inserts and locks new row, t, in Accounts (3) Locks and updates Mary s row in Depositors (4) Commits and releases all locks Audit (5) Locks and reads Mary s row in Depositors University of Alberta 58
Row Locking Two SELECTs executed by Audit see inconsistent data The second sees effect of NewAccount; the first does not Problem: Audit s SELECT and NewAccount s INSERT on Accounts do not commute, but the row locks held by Audit did not prevent NewAccount from INSERTing t, (which satisfies the WHERE condition). t is referred to as a phantom University of Alberta 59
Phantoms Phantoms occur when row locking is used and T 1 SELECTs, UPDATEs, or DELETEs using a predicate, P T 2 creates a row (using INSERT or UPDATE) satisfying P Example: T 1 : UPDATE Table T 2 : INSERT INTO Table SET Attr =. VALUES ( satisfies P ) WHERE P University of Alberta 60
Preventing Phantoms Table locking does it; row locking does not Predicate locking does it A predicate describes a set of rows, some are in a table and some are not; e.g. name = Mary Every SQL statement has an associated predicate When executing a statement, acquire a (read or write) lock on the associated predicate Two predicate locks conflict if one is a write and there might be a row (not necessarily in the table) that is contained in both sets of tuples described by the predicates University of Alberta 61
Phantoms rows returned by read (rows that can be locked) rows in table R rows satisfying predicate P rows satisfying P that don t exist in R all rows that can possibly be in R University of Alberta 62
Preventing Phantoms With Predicate Locks Audit: SELECT SUM (balance) FROM Accounts WHERE name = Mary NewAccount: INSERT INTO Accounts VALUES ( 123, Mary,100) Audit gets read lock on predicate name= Mary. NewAccount requests a write lock on predicate (acctnum= 123 name= Mary bal=100) Request denied since predicates overlap University of Alberta 63
Preventing Conflicts With Example 1 Predicate Locks SELECT SUM (balance) DELETE FROM Accounts FROM Accounts WHERE name = Mary WHERE bal < 1 Statements conflict since predicates overlap There might be an account with bal < 1 and name = Mary Locking is conservative: there might be no rows in Accounts satisfying both predicates University of Alberta 64
Preventing Conflicts With Predicate Locks Example 2 SELECT SUM (balance) DELETE FROM Accounts FROM Accounts WHERE name = Mary WHERE name = John Statements commute since predicates are disjoint. There can be no rows (in or not in Accounts) that satisfy both predicates University of Alberta 65
Serializability in Relational Databases Predicate locking prevents phantoms and produces serializable schedules, but is very complex Table locking prevents phantoms and produces serializable schedules, but seriously affects performance Row locking does not prevent phantoms and can produce nonserializable schedules SQL defines several Isolation Levels weaker than SERIALIZABLE that allow non-serializable schedules and hence allow more concurrency University of Alberta 66
Isolation Levels Non-serializable schedules are not always correct, but might be sufficient for some applications SQL standard defines isolation levels in terms of certain anomalies they do or do not allow University of Alberta 67
Transaction Support in SQL-92 Each transaction has an access mode (read only, read write) an isolation level Isolation Level Read Uncommitted Anomaly Dirty Read Maybe Unrepeatable Read Maybe Phantom Problem Maybe Read Committed No Maybe Maybe Repeatable Reads No No Maybe Serializable No No No University of Alberta 68
SQL Isolation Levels and Anomalies Dirty Read Transaction T1 modifies a data item. Another transaction T2 then reads that data item before T1 performs a COMMIT or ROLLBACK. If T1 then performs a ROLLBACK, T2 has read a data item that was never committed and so never really existed. Unrepeatable Read Transaction T1 reads a data item. Another transaction T2 then modifies or deletes that data item and commits. If T1 then attempts to re-read the data item, it receives a modified value or discovers that the data item has been deleted. Phantom Transaction T1 reads a set of data items satisfying some search condition. Transaction T2 then creates data items that satisfy T1's search condition and commits. If T1 then repeats its read with the same search condition, it gets a set of data items different from the first read. University of Alberta 69