Concurrency Control Inherently Concurrent Systems: These are Systems that respond to and manage simultaneous activities in their external environment which are inherently concurrent and maybe broadly classified as: Real-time Systems DBMS (Transaction Processing Systems) Operating Systems The requirements There is a need to support separate activities. There is a need to ensure that these activities access and update common data without interference. There is a need that the results of transactions are recorded permanently and securely before the user is told that the operation has been done. Data Base Management Systems Concurrency control is the controlling of Transactions that operate on the same db simultaneously. Gives shorter response times by running several transactions in parallel. We have seen that a Transaction is defined as a sequence of operations (read, write, update) that transforms the db from one consistent state to another consistent state. BUT it does not always happen - there is a need for rules. So here are the rules: 1. A Transaction must be protected against inconsistencies caused by other transactions. 2. If a Transaction terminates abnormally or runs into unforeseen problems, the updating of the transaction must be canceled so that the db is left in a consistent state. Examples Two or more users accessing the same db as a repository Two or more people reading the same book Two or more people accessing the same bank account Two or more TV remote controls Two or more garage door openers, Playing chess one against two or more Cooking two or more course meals simultaneously. Two or more users use the same compiler to compile their programs.
Concurrent Atomic Transactions Serializability and Recoverability In order that we have control over concurrency we need to schedule transactions in such a way as to avoid any interference between them, when they compete during execution. A simple single-user situation is to allow only one Transaction to execute at a time: T1 is committed before T2 begins its execution. A multi-user DBMS objective is to maximize the degree of concurrency or parallelism in the system. Thus Transactions should be able to run concurrently without interfering with each other. (These are the same objectives with Real-Time Systems and Operating Systems) The ultimate goal is then to examine the serializability as a means of helping to identify those executions of transactions that are guaranteed to ensure consistency of the db. Schedule A Schedule is a sequence of the operations by a set of concurrent transactions that preserves the order of the operations in each of the individual transactions. Alternatively, schedules are execution sequences that represent the chronological order in which instructions are executed in the system. Serializability The execution is serializable when a new transaction is not started until the previous transaction is finished. A serial schedule is a schedule where the operations of each transaction are executed consecutively or serially without any interleaved operations from other transactions. A Non-serial schedule is a schedule where the operations from a set of concurrent transactions are interleaved. When concurrent Transactions are executing, its results are correct only if it produces the same results as some serial execution. Such a schedule is called serializable. To ensure integrity and consistency of the db, schedules should be serializable. The ordering of the read and write operations is important: 1. Two Transactions only read Data Item (DI), they do not conflict and order is not important. 2. Two Transactions either read or write completely separate DIs, they do not conflict and the order is not important. 3. If one Transaction writes a DI and another either reads or writes the same DI, the order of execution is important.
Concurrency Control Techniques These techniques include the following protocols: Locking Based Protocols Locking Two Phase Locking (2PL) Graph-Based Protocols (Index or Tree Structures) Time Stamping Protocols Basic Time Stamping Thomas' Write Rule Multi-version Time stamping Granularity of Data Items Multi-version Read Consistency (Oracle9i, 2004) Lock Based Protocols Access to the data items (DI) are done in a mutually exclusive (ME) manner. While one Transaction accesses a data item (DI) no other transaction can modify that DI. The most common way to achieve concurrency is to hold a lock on that data item (DI). Many systems hold a lock on the entire db (very bad), a file or a Relation (??), a record or a tuple (OK), a field (good). Mutually Exclusive means that if a Transaction holds a lock on that data item (DI), no other Transaction can interfere. Safeguarding against incorrect results - which in turn may compromise the consistency of the db.
Locks There are various types of lock modes: Shared If a transaction Ti has obtained a shared mode lock (SL) on data item DI then Ti can read but not write. So, only READ. Exclusive If TI has obtained an exclusive mode lock (XL) on data item DI then Ti can READ and WRITE. Lock Compatibility SL XL SL TRUE FALSE XL FALSE FALSE NOTE: SL is compatible with SL but not with XL, therefore at any time several SLs can be held simultaneously by different Transactions on a particular DI or tuple or file. A subsequent XL has to wait until the currently held XLs or SLs are released. Wait A Transaction Ti must request a lock first, for a DI or a file, in order to access it. If the file or DI is already locked by another Transaction Tj in an incompatible mode, then Ti must wait until all incompatible locks held by other Ts have been explicitly released. Tj will eventually release the lock (unlock) that DI, which already had locked at a previous time. Ergo, Ti will be granted the request. Example Consider two accounts A and B accessed by two transactions T1 and T2. T1 transfers $50 from B to A. T2 displays the sum of A + B. Suppose A = $100 and B = $200 Consider the following four (4) examples:
Example L1 Time T1 T2 B = B -50 Write(B) Lock_XL(A) A = A + 50 Write (A) Lock_SL(A) Lock_SL(B) Print (A+B) This is not good because although is serializable, there is no concurrency. If T1 & T2 are executed serially (either T1T2 or T2T1) the result A + B = $300 is the same. If they are executed concurrently the following results are possible!! Example L2 Time T1 T2 B = B -50 Write(B) Lock_SL(A) Lock_SL(B) Print (A+B) Lock_XL(A) A = A + 50 Write (A) The result of A + B = 250 (WRONG) T1 has unlocked B too early and as a result brought T2 to an inconsistent state!!!
Example L3 Now delay unlocking until the end of both Transactions. Time T1 T2 B = B -50 Write(B) Lock_XL(A) A = A + 50 Write (A) Lock_SL(A) Lock_SL(B) Print (A+B) Here we have that T2 has to wait for T1 to release the lock on B (Last line of T1 & 3rd line of T2). Notice the value of A. Example L4 Furthermore consider the following partial schedule (a variation of L3): Time T1 T2 B = B -50 Write(B) Lock_SL(A) Lock_SL(B) Lock_XL(A) Note Since T1 is XL on B and T2 is requesting a Shared mode Lock on B, T2 is waiting for T1 to unlock B. Since T2 is requesting a SL on A and T1 is requesting an XL on A, T1 is waiting for T2 to unlock A. This is a DEADLOCK! Neither of the Transactions can proceed normally!!! When deadlock occurs, the system must rollback one of the two Transactions. Once a Transaction has been rolled back, the data items that were locked by that Transaction are unlocked and available to other transactions, which in turn may continue with normal execution. Avoid STARVATION.
Deadlock A standstill, an impasse, or a stalemate of two or more Transactions. This is a result when one Transaction holds a lock (or a resource) on a DI and it needs another DI that is held by another Transaction which in turn needs the DI held by the first Transaction. Deadlock Detection and Prevention a) Deadlock Prevention A simple, mostly used approach to deadlock prevention is TIMEOUTs. Here a Transaction that requests a lock will wait for only a period of time defined by the system, called quantum (or quanta, pl.). If the lock has not been granted for this period, the lock request has timed out. When this occurs, the DBMS: i. assumes that the Transaction is in a deadlock, but in reality it may not be the case, ii. it aborts thetransaction, iii. it automatically restarts the Transaction (new). i.e. Rollbacks. This is very simple and practical solution to the deadlock problem and it is employed by most of the commercial DBMSs. But unfortunately is extremely time consuming. N.B. Operating Systems, such as Unix, employ a similar scheme, in which the OS chooses a victim (one of the two deadlocked processes) and applies (i), (ii), (iii) above. See below for Recovery from a deadlock. b) Deadlock Detection One way to detect deadlocks is to build a wait-for graph, or a DBMS graph. This is a special case locking protocol, where prior knowledge is used in order to avoid deadlocks. See next topic: Graph-based Protocols. c) Recovery from Deadlock The DBMS has to select one or more Transactions as victims. But this selection should depend on several issues: i. Choice of a victim: Should depend on, How long the T has been running (less time better), how many Dls have been updated (better if few), and how many Dls will update (better if many). ii. How far a Transaction has to rollback. (better less) iii. Avoid Starvation. When the same T is chosen as a victim, it can never complete. Solution: A counter that counts the number of times that T has been a victim.
Two-Phase Locking (2PL) Another way to ensure serializability is with two-phase locking. Transactions issue lock and unlock requests in 2 phases: 1. Growing phase: A T may obtain locks but may not release any lock (initially). 2. Shrinking phase: A T may release locks but may not obtain any new locks. Notes: i. Examples L1 and L2 above are not two-phase locking but L3 and L4 are. ii. Although L4 uses two-phase locking, it is also deadlocked! Graph Based Locking Protocols As mentioned above, one way to solve the Deadlock problem and to ensure serializability is to build a simple model from the requirements specification (prior knowledge) of the data items DIs (which and how) will be accessed. A DBMS graph or a wait-for graph is a directed acyclic graph G = (N, E) that consists of a set of nodes N and a set of directed edges E. It is rooted like a tree. Construction: In order to construct such a graph: i. Create a node for each Transaction. ii. Create a directed Edge Ti to Tj, if Ti is waiting to lock an item that is currently locked by Tj. Deadlock exists if and only if the graph contains a cycle. Rules: i. The first lock by Ti may be on any DI. ii. Then a DI can be locked by Ti only if the parent of that DI is currently locked by Ti. iii. DIs may be locked or unlocked at any time. iv. A DI that has been locked by Ti can not be subsequently relocked by Ti Example Consider the following lock and unlock instructions of four Transactions T1, T2, T3 and T4 that lock and unlock DIs: A, B, C, D, E, F, G, H, J, I as in the Graph.
T1 T2 T3 T4 Lock_XL(D) Lock_XL(D) Lock_XL(E) Unlock(E) Lock_XL(D) Lock_XL(G) Lock_XL(H) Lock_XL(E) Lock_XL(H) Unlock(D) Unlock(E) Unlock(D) Lock_XL(J) Unlock(H) Unlock(H) Unlock(J) Unlock(D) Unlock(G) This can be serialized so that there is a deadlock free solution. I can be shown. T1 T2 T3 T4 Lock_XL(E) Unlock(E) Lock_XL(D) Lock_XL(G) Unlock(D) Lock_XL(D) Lock_XL(H) Unlock(D) Unlock(H) Lock_XL(J) Unlock(J) Lock_XL(E) Unlock(E) Lock_XL(D) Lock_XL(H) Unlock(D) Unlock(H) Unlock(G) Advantages: Unlocking may occur earlier, i.e. less waiting time and increase in concurrency. Also No Rollbacks, since it is deadlock free. Less time, etc. Disadvantages: In some cases it is necessary for a T to lock DIs the T does not access (this is BAD). For example - A T needs to access DIs A and J in the graph. It must lock not only A and J, but also B, D and H. This increases locking overhead (waiting time) which may decrease concurrency. Note: There are schedules that are not possible under the 2PL and are not possible under the Graph based Protocol, and vice versa.
Time Stamped Based Protocols In the last two protocols, the order between every pair of conflicting transactions is determined at execution time by the first lock that both transactions request that involves incompatible modes. The problem is that Deadlock still may exist. Timestamp: A unique identifier created by the DBMS that indicated the relative starting time of a Transaction. Timestamping: A Concurrency control protocol that orders Ts in such a way that older Ts, that is Ts with smaller timestamps, get priority in the event of conflict. Thus, in the timestamp protocol, the serializability order is determined by selecting an ordering among Ts in advance, i.e. before execution. A fixed unique timestamp (TS), is associated with each Ti. TS(Ti) by the system. It is assigned by the system before execution of Ti. If a new Tj enters the system after Ti has been assigned a TS, then TS(Ti) is less than TS(Tj) => TS(Ti) < TS(Tj). Implementation There are two methods to implement Timestamps: i. The System Clock is used as the Timestamp; i.e., a Transaction timestamp TS(Ti) = "The value of the system clock when Ti enters the system." ii. A logical counter is used that is incremented after a new timestamp has been assigned; i.e., a Transaction Ti = "The value of the counter when the T enters the system. " Serializability order The TSs determine the serializability order: If TS(Ti) < TS(Tj) then the system must ensure that the produced schedule is equivalent to a serial schedule in which Ti appears before Tj. To implement this we need to associate with each DI two timestamp values: 1) W- Timestamp - denotes largest TS of any T which has successfully executed a Write(DI) 2) R- Timestamp - denotes largest TS of Read(DI). Notes i. These are updated whenever a Read(DI) or Write(DI) is executed ii. In Rollback, a Ti - if it issues a read or write operation it is then assigned a new timestamp and it is restarted. iii. Usually a TS is assigned immediately before its first instruction. Example TS Assigned -> T1 T2 <- TS Assigned
Timestamp Ordering: The Scheduler From above: each DI, I, has 2 TSs: a W-TS(I) and R-TS(I), which have the highest TSs of the Ts that carried out the read and write operation on DI, I. The Scheduler receives requests for access to Dis, I, as: read (I, ts) and write (I, ts), where ts is the Timestamp of the requesting Transaction. The protocol The scheduler now accepts or rejects the request as follows: Read (I, ts): If ts < W-TS(I) then the request is rejected and the T is killed (Aborted - Rollback), otherwise the request is accepted and R-TS(I) is set equal to the greater of R-TS(I) and TS, i.e. max(r-ts(i), ts)). Write(l, ts): if ts < W-ts(l) or ts < R-ts(l) then the request is rejected and the T is killed (Aborted, Rollback), otherwise the request is accepted and W-ts is set equal to TS. N.B. In practice, NO Transaction can read or write a DI written by a T with greater Timestamp (TS), and cannot write a DI that has been read by a T with greater TS. Example T2 Suppose r-ts(l) is equal to 7, and w-ts(i) is equal to 5. Request to (Scheduler) Reply from (Scheduler) New Values/Comments read(i, 6) OK still 7 (no change) read(i, 8) OK r-ts(i) = 8 read(i, 9) OK r-ts(i) = 9 write(i, 8) NO conflict: killed - rollback write(i, 11) OK w-ts(i) = 11 read(i, 10) NO conflict: killed - rollback Note: DI, I, was read by the T with highest ts (7), and written by the T with highest ts(5). Example T3 read(t1, B, 1) OK r-ts(i) = 1 T1 adds and prints the contents of A and B. read(t2, B, 2) OK r-ts(i) = 2 T2 transfers 50 from A to B write(t2, B, 3) OK w-ts(i) = 3 T1: read(b), read(a), print(a+b) read(ti, A, 4) OK r-ts(i) = 4 T2: read(b), B=B-50,write(B),read(A) write(t2, A, 5) OK w-ts(i) = 5 A=A+50,write(A),print(A+B)