NOTES W2006 CPS610 DBMS II Prof. Anastase Mastoras Ryerson University
Recovery Transaction: - a logical unit of work. (text). It is a collection of operations that performs a single logical function in a db application. The Recovery Procedure usually involves making copies of the db files (tables) at regular intervals and keeping the transaction files which contain updated information. If a file is damaged or destroyed, it is recovered by means of the copy ("backup") and the transactions files, This has the following advantages and disadvantages: Certain types of real-time updates could not be re-run directly because re-running requires a dialogue with a terminal user. Recovery by means of re-run operations can be very time consuming and costly, if the online terminal users can not use the db while the re-runs are executing. It only works for individual isolated files; if the files are interconnected (cascaded) as in integrated dbs it is often the need to re-run all the programs updated since the error occurred. There is a need for recovery lines! LOG FILES A log file is a file that records all db changes automatically. When a record is changed the log file records its contents before and after. Before: the record is called the before image. After: the record is called the after image. The log file also contains an Identifier for the program run that caused the error when the change occurred A log file and the copy is maintained by the DBMS TRANSACTIONS A Transaction is atomic or indivisible: all updates of the transaction have to be executed or all of the updates of the unfinished transaction have to be removed by a back out (the log file is read backwards) where the before images of the individual log records are written back onto disk storage A db Checkpoint records the state of the database at a particular point in time. If needed the db must be restored to the status it was in when the checkpoint was taken A checkpoint is connected with a program run that has been running an atomic transaction, in case there is more than one program running simultaneously we use a Global checkpoint A Global Checkpoint is a set of checkpoints belonging to all those programs active at a given moment (time).
Example - Suppose we have a system that has two threads (2 programs) running simultaneously. Time Thread 1 Thread 2 Registration in Logfile a run1 begins - run1checkpoint b - run2 begins run2 CHECKPOINT c - record A updated A logged d record B updated - B logged e end of run1 - end_of_run1 f run3 begins - run3 CHECKPOINT g - record B updated B logged h - end of run2 end_of_run2 i - run4 begins run4 CHECKPOINT j record A update - A logged k end of run3 - end_of_run3 l run5 begins - run5 CHECKPOINT m - record C updated C logged NOTE: if run4 terminates abnormally at time m the checkpoint for run4 can be used to recover run4. If the entire Transaction Management System terminates abnormally at time m then the checkpoints for run4 and run5 can be used as global checkpoints. Roll Forward and Roll Backward Recovery Roll Forward Suppose that a file (one) of the db has been damaged so that parts can not be read. This file can be recovered by transferring an old copy of the file to the disk storage. Then all log files produced since the copy was made are read in (forward) and each time a record is found that belongs to this file (the damaged file) it updates the copy! Roll Backward Suppose that a program aborts (i.e. power failure, etc) then recovery is achieved by backing out the updates of the application program. This back out is done by reading the log file backwards until the point is reached where the application program started. While reading the log file updates made by the program can be restored to their original contents and the application program is started up again! NOTE: it is assumed that no other programs are updating the file in realtime
Storage Types Volatile Storage Information does not survive system crashes (i.e. main memory, cache, register contents) Non-Volatile Storage Info does survive system crashes (i.e. disks, tapes, etc) Stable Storage Info is never lost. We need to replicate info in several non-volatile storage media (usually disks) with independent failure nodes and update information in a controlled manner. Failure Types Logical Errors The transaction can no longer continue with its normal execution (i.e. bad input, data not found, overflows, resource limit exceeded) System Errors The system has entered into a bad state, i.e. deadlock, but the transaction can be executed at a later time. System Crashes Hardware malfunction causing the loss of volatile storage content, but the non-volatile data is usually OK!! Disk failure A disk block loses its contents as a result of a head crash or a failure during a data transfer operation.
The Transactional Model A Transaction is a program unit that accesses and possibly updates various data items. Transactions interact with the db by transferring data from program variables to the db and from the db to program variables using two operations: read and write. The Transaction operations: READ(X, xi) Assigns data item X to the local variable xi This operation is executed as follows: i. If the block on which X resides is in main memory, issue an input(x). ii. Assign to xi the value of X from the buffer block. WRITE(X, xi) Assigns the value of local variable xi to the data item X. This operation is executed as follows: i. If the block on which X resides is not in main memory issue an input(x). ii. Assign the value of the local variable xi to X in the buffer block. N.B.: We require that Transactions do not violate any of the db consistency constraints, i.e. the db was consistent before the transaction started and should remain consistent after the transaction terminates.
Transaction States A successful completion of a transaction: committed; (in SQL the command is commit which updates the file on disk, and makes the operation (insert, delete, or update) permanent.) A Transaction must be in one of the following states. where: Active: is the initial state Partially Committed: committed after the last statement executed. Committed: after successfully completion. Failed: after discovery that normal execution can no longer proceed Aborted: After the transaction has been rolled back and the db is restored to its initial state (prior to the start of the transaction) NOTE: 1. A transaction enters the committed state if it has partially committed and is guaranteed that it will never be aborted. 2. A transaction enters the failed state after it determines that it can no longer proceed with normal execution, possibly because of H/W or logical errors. Such Transaction must be rolled back. Then the Transaction enters the aborted state: i. Restart the Transaction: if only it has reached the failed state as a result of H/W or S/W error which was not created through the internal logic of the transaction. A restart Transaction is a new Transaction. ii. Kill the Transaction: only through an internal logical error. If it can only be fixed in the application program, or bad input, or because the desired data was not found in the db.
The DB Log As we have seen that the most widely used structure for recording db modifications is the log. Here is an example. A typical log file can also be the following one, where the Fields include: Transaction Name (TN): a unique name of the Transaction that performs the write operation. Data Item Name (DIN): a unique name of the data item that is written Old Value (OV): value of the data item before the write operation. (Before image.) New Value: value of the data item after the write operation. (After image.) Special Log Records & Format <Ti start> Transaction Ti started <Ti, Xj, SW1, SW2> Transaction Ti has performed a write on data item Xj. Xj has value SW1 before the write & will have SW2 value after the write. <Ti commit> Transaction Ti has committed. Example Suppose we have three accounts X, Y, and Z. Let T0 be a transfer of $10 from X to Y. Let T1 be a withdrawal of $5 from Z. Initially X, Y, and Z have $22, $33, and $16 respectively. T0 is the transfer and T1 the withdrawal. Transaction Log Database (TN, Operation) (TN, DIN, OV, NV) (Contents) T0 read(x, xi) T0 starts - xi=xi-10 T0, X, 22,12 - write (X, xi) T0 commits - read(y, yi) T0 starts - yi = yi + 10 T0,Y, 33,43 - write (Y, yi) T0 commits x = 12 Y = 43 T1 read(z, zi) T1 starts - zi = zi - 5 T1, Z, 16,11 - write(z, zi) T1 commits Z = 11
The Redo & Undo Commands Using the log the system can handle any failure which results in any loss of data. redo command redo(ti) sets the value of all data items to the values updated by Transaction Ti to the new values. NOTE: undo command 1. Whether you execute this command once or many times, the result is the same. 2. In order that the system redoes the Transaction, it checks in the log file to see whether there is a <Ti start> and a <Ti commit>. 3. Otherwise the system rollbacks. undo(ti) restores the value of all data items updated by Transaction Ti to the old values. NOTE: 1. Whether you execute this command once or many times, the result is the same. 2. The system undoes - if the log file does contain the record <Ti start>, but does not contain the record <Ti commit>. 3. Otherwise the system rollbacks. CHECKPOINTS In case of a system error and failure, one must read the entire log to determine which Ts are to be undone and which to be redone. But this has some disadvantages: It is time consuming and creates overhead: On one hand, the searching process is time consuming (yes!), and on the other most of the Ts have already updated the db. Although there is no harm in redoing the committed transactions, this is time consuming. A checkpoint is introduced to allow the system to streamline its recovery procedure as we have seen in the previous example. The system periodically performs checkpoints as follows: i. Output all log records currently residing in main memory (volatile) onto disk (stable). ii. Output all modified data residing in volatile to stable storage. iii. Output a log record <checkpoint> onto stable storage.