Log-Based Recovery Schemes If you are going to be in the logging business, one of the things that you have to do is to learn about heavy equipment. Robert VanNatta, Logging History of Columbia County CS3223 Crash Recovery 1
Integrity or consistency constraints Predicates data must satisfy, e.g. x is key of relation R x y holds in R Domain(x) = {Red, Blue, Green} no employee should make more than twice the average salary Definitions Consistent state: satisfies all constraints Consistent DB: DB in consistent state CS3223 Crash Recovery 2
DB cannot always be consistent! Example: Transfer 100 from a2 to a10 a2 a2-100 a10 a10 + 100 Observation: a2. 500. 400. 400 a10. 1000. 1000. 1100 CS3223 Crash Recovery 3
Transaction: collection of actions that preserve consistency Consistent DB T Consistent DB If T starts with consistent state + T executes in isolation (and absence of errors) T leaves consistent state CS3223 Crash Recovery 4
Reasons for failures Transaction failures Logical errors, deadlocks System crash Power failures, operating system bugs etc Memory data lost Disk failure Disk Read-Write Head crashes STABLE STORAGE: Data is never lost. Can approximate by using RAID and maintaining geographically distant copies of the data CS3223 Crash Recovery 5
Key problem: Unfinished transaction Example Transfer fund from A to B T1: A A - 100 B B + 100 CS3223 Crash Recovery 6
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 B: 800 memory A: 800 B: 800 disk CS3223 Crash Recovery 7
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 B: 800 700 A: 800 B: 800 memory disk CS3223 Crash Recovery 8
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); Updated A value is written to disk. This may be triggered ANYTIME by explicit command or DBMS or OS. A: 800 B: 800 700 A: 800 B: 800 700 memory disk CS3223 Crash Recovery 9
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 B: 800 700 900 A: 800 B: 800 700 memory disk CS3223 Crash Recovery 10
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 B: 800 700 900 memory Need atomicity: execute all actions of a transaction or none at all Failure before commit (memory content lost before disk updated)! A: 800 B: 800 disk 700 CS3223 Crash Recovery 11
Recovery Manager guarantees atomicity and durability properties of Xacts Undo: remove effects of aborted Xact to preserve atomicity Redo: re-installing effects of committed Xact for durability Processes three operations: Commit(T) - install T s updated pages into disk Abort(T) - restore all data that T updated to their prior values Restart - recover database to a consistent state from system failure abort all active Xacts at the time of system failure installs updates of all committed Xacts that were not installed in the stable database before the failure Desirable properties: Recovery Manager Add little overhead to the normal processing of Xacts Recover quickly from a failure CS3223 Crash Recovery 12
Interaction Between Recover and Buffer Managers: Dirty pages in buffer pool Can a dirty page updated by Xact T be written to disk before T commits? Yes steal policy No no steal policy Must all dirty pages that are updated by Xact T be written to disk when T commits? Yes force policy No no force policy CS3223 Crash Recovery 13
Recovery schemes: Design options Four possible design options Force No-force Steal Undo & no redo Undo & redo No steal No undo & no redo No undo & redo No-steal policy No undo Force policy No redo Which is the best solution?? CS3223 Crash Recovery 14
Log-based Recovery Log (aka trail/journal): history of actions executed by DBMS Contains a log record for each write, commit, & abort Each log record has a unique identifier called Log Sequence Number (LSN) Log is stored as a sequential file of records in stable storage LSN of log record = address of log record CS3223 Crash Recovery 15
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); One Solution: Undo logging (Immediate modification/steal-force) <T1, start> A:800 B:800 A:800 B:800 memory disk Log (Stable) CS3223 Crash Recovery 16
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); One Solution: Undo logging (Immediate modification) <T1, start> A:800 B:800 700 900 A:800 B:800 memory disk log CS3223 Crash Recovery 17
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); One Solution: Undo logging (Immediate modification) A:800 B:800 700 900 A:800 B:800 <T1, start> <T1, A, 800> memory disk log CS3223 Crash Recovery 18
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); One Solution: Undo logging (Immediate modification) A:800 B:800 700 900 A:800 B:800 700 <T1, start> <T1, A, 800> memory disk log CS3223 Crash Recovery 19
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); One Solution: Undo logging (Immediate modification) A:800 B:800 700 900 A:800 B:800 <T1, start> <T1, A, 800> 700 <T1, B, 800> memory disk log CS3223 Crash Recovery 20
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); One Solution: Undo logging (Immediate modification) A:800 B:800 700 900 A:800 B:800 <T1, start> <T1, A, 800> 700 <T1, B, 800> 900 memory disk log CS3223 Crash Recovery 21
T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); One Solution: Undo logging (Immediate modification) A:800 B:800 700 900 A:800 B:800 <T1, start> <T1, A, 800> 700 <T1, B, 800> <T1, commit> 900 memory disk log CS3223 Crash Recovery 22
Log is first written in memory Complication memory A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800 B: 800 DB Log CS3223 Crash Recovery 23
Log is first written in memory Complication memory A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800700 B: 800 DB Log BAD STATE # 1 Failure occurs after partial updates on disk but before log is written to disk CS3223 Crash Recovery 24
Log is first written in memory Complication memory A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800700 B: 800 CS3223 Crash Recovery 25 DB Log BAD STATE # 1 Failure occurs after partial updates on disk but before log is written to disk This means log record for A must be on log disk before A can be updated on data disk (DB)
Complication Log is first written in memory Updates are not written to disk on every action memory A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> <T1, commit> A: 800 B: 800 <T1, B, 800> <T1, commit> DB Log CS3223 Crash Recovery 26
Complication Log is first written in memory Updates are not written to disk on every action memory A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800700 B: 800 <T1, B, 800> <T1, commit> DB Log BAD STATE # 2 All logs are on disk (including commit log) but only partial updates on disk. CS3223 Crash Recovery 27
Complication Log is first written in memory Updates are not written to disk on every action memory A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800700 B: 800 <T1, B, 800> <T1, commit> CS3223 Crash Recovery 28 DB Log BAD STATE # 2 All logs are on disk (including commit log) but only partial updates on disk. Before COMMIT log is written to Log, all updates must be on disk (DB)
Undo logging rules (1) For every action generate undo log record (containing old value) (2) Before x is modified on disk, log records pertaining to x must be on disk (write ahead logging: WAL) (3) Before commit is flushed to log, all writes of transaction must be reflected on disk CS3223 Crash Recovery 29
Undo Logging T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800 B: 800 log CS3223 Crash Recovery 30
Undo Logging T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800 B: 800 <T1, Start> <T1, A, 800> <T1, B, 800> log CS3223 Crash Recovery 31
Undo Logging T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800 B: 800 700 <T1, Start> <T1, A, 800> <T1, B, 800> log CS3223 Crash Recovery 32
Undo Logging T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> A: 800 B: 800 700 900 <T1, Start> <T1, A, 800> <T1, B, 800> log CS3223 Crash Recovery 33
Undo Logging T1: Read (A); A A-100 Write (A); Read (B); B B+100 Write (B); A: 800 700 B: 800 900 Log: <T1,start> <T1, A, 800> <T1, B, 800> <T 1, commit> A: 800 B: 800 700 900 <T1, Start> <T1, A, 800> <T1, B, 800> <T1, Commit> log CS3223 Crash Recovery 34
Recovery rules: Undo logging (1)Let S = set of transactions with <Ti, start> in log, but no <Ti, commit> (or <Ti, abort>) record in log What about those with Commit/Abort? (2) For each <Ti, X, v> in log, in reverse order (latest earliest) do: -if Ti S then - X v - Update disk (3) For each Ti S do - write <Ti, abort> to log CS3223 Crash Recovery 35
What if failure during recovery? No problem! Undo is idempotent CS3223 Crash Recovery 36
Redo logging (deferred modification/no-steal-no-force) In UNDO logging, we remember only the old value How about remembering only the new (updated) values instead? What does this mean? NO updates must be written to disk until a transaction commits! All updates have to be buffered in memory! Can also store dirty pages in temporary disk storage but very inefficient CS3223 Crash Recovery 37
Redo logging (deferred modification) T1: Read(A); A A-100; write (A); Read(B); B B+100; write (B); A: 800 B: 800 A: 800 B: 800 memory DB LOG CS3223 Crash Recovery 38
Redo logging (deferred modification) T1: Read(A); A A-100; write (A); Read(B); B B+100; write (B); A: 800 700 B: 800900 A: 800 B: 800 memory DB LOG CS3223 Crash Recovery 39
Redo logging (deferred modification) T1: Read(A); A A-100; write (A); Read(B); B B+100; write (B); A: 800 700 B: 800900 A: 800 B: 800 <T1, start> <T1, A, 700> <T1, B, 900> <T1, commit> memory DB LOG CS3223 Crash Recovery 40
Redo logging (deferred modification) T1: Read(A); A A-100; write (A); Read(B); B B+100; write (B); A: 800 700 B: 800900 output A: 800700 B: 800 <T1, start> <T1, A, 700> <T1, B, 900> <T1, commit> memory DB LOG CS3223 Crash Recovery 41
Redo logging rules (1) For every action, generate redo log record (containing new value) (2) Before X is modified on disk (DB), ALL log records for transaction that modified X (including commit) must be on disk CS3223 Crash Recovery 42
Recovery rules: Redo logging (1) Let S = set of transactions with <Ti, commit> in log (2) For each <Ti, X, v> in log, in forward order (earliest latest) do: -if Ti S then X v Update X on disk CS3223 Crash Recovery 43
Redo log: Recovery can be very, very SLOW!......... First T1 wrote A,B Last Record Committed a year ago Record (1 year ago) --> STILL, Need to redo after crash!! Crash CS3223 Crash Recovery 44
Solution: Checkpoint (simple version) Periodically: (1) Do not accept new transactions (2) Wait until all (active) transactions finish (3) Flush all log records to disk (log) (4) Flush all buffers to disk (DB) (5) Write checkpoint record on disk (log) (6) Resume transaction processing CS3223 Crash Recovery 45
Example: what to do at recovery? Redo log (disk): <T1,A,16> <T1,commit> Checkpoint <T2,B,17> <T2,commit>.................. <T3,C,21> Crash No need to examine log records before the most recent Checkpoint What about UNDO scheme? CS3223 Crash Recovery 46
Key drawbacks: Undo logging (steal/force) increase the number of disk I/Os Redo logging (no-steal/no-force) need to keep all modified blocks in memory until commit CS3223 Crash Recovery 47
Another Solution: undo/redo logging! Update <Ti, Xid, New X val, Old X val> page X Rules: 1) Page X can be flushed before or after Ti commit 2) Log record flushed before corresponding updated page (WAL) 3) All log records flushed at commit This is adopted in IBM DB2 known as the Aries Recovery Manager CS3223 Crash Recovery 48
Non-quiesce checkpoint (for Undo/Redo logging) L O G... Start-ckpt active TR: Ti,T2,...... end ckpt...... for undo dirty buffer pool pages flushed with disrupting runtime operations CS3223 Crash Recovery 49
Examples: what to do at recovery time? L O G... T1,- a... Ckpt T1... Ckpt end... no T1 commit T1- b Undo T1 (undo a,b) CS3223 Crash Recovery 50
Example L O G... T1 a ckpt-s T1...... b... ckptend... T1 c T1... T1 cmt... Redo T1: (redo b,c) CS3223 Crash Recovery 51
Recovery process: Backwards pass (end of log latest checkpoint start) construct set S of committed transactions undo actions of transactions not in S Undo transactions that are in checkpoint active list Forward pass (latest checkpoint start end of log) redo actions of S transactions start checkpoint backward pass forward pass CS3223 Crash Recovery 52
Undo/Redo Log (Example) <START S> <S, A, 60, 61> <COMMIT S> <START T> <T, A, 61, 62> <START U> <U, B, 20, 21> <T, C, 30, 31> <START V> <U, D, 40, 41> <V, F, 70, 71> <COMMIT U> <T, E, 50, 51> <COMMIT T> <V, B, 21, 22> <COMMIT V> Suppose we begin a nonquiescent checkpoint immediately after <T, A, 61, 62>, where could the <END CKPT> record be potentially written in the log? CS3223 Crash Recovery 53
Undo/Redo Log (Example) <START S> <S, A, 60, 61> <COMMIT S> <START T> <T, A, 61, 62> <START U> <U, B, 20, 21> <T, C, 30, 31> <START V> <U, D, 40, 41> <V, F, 70, 71> <COMMIT U> <T, E, 50, 51> <COMMIT T> <V, B, 21, 22> <COMMIT V> Suppose we begin a nonquiescent checkpoint immediately after <T, A, 61, 62>, where could the <END CKPT> record be potentially written in the log? ANYWHERE after <T, A, 61, 62> since the writing of dirty blocks can be performed independent of whatever actions the transactions are performing in the interim CS3223 Crash Recovery 54
Undo/Redo Log (Example) <START S> <S, A, 60, 61> <COMMIT S> <START T> <T, A, 61, 62> <START U> <U, B, 20, 21> <T, C, 30, 31> <START V> <U, D, 40, 41> <V, F, 70, 71> <COMMIT U> <T, E, 50, 51> <COMMIT T> <V, B, 21, 22> <COMMIT V> Suppose we begin a nonquiescent checkpoint immediately after <T, A, 61, 62>. Suppose further there is a crash at any possible point after <T, A, 61, 62>, how far back in the log we must look to find all possible incomplete transactions if <END CKPT> was written prior to the crash <END CKPT> was not written CS3223 Crash Recovery 55
Undo/Redo Log (Example) <START S> <S, A, 60, 61> <COMMIT S> <START T> <T, A, 61, 62> <START U> <U, B, 20, 21> <T, C, 30, 31> <START V> <U, D, 40, 41> <V, F, 70, 71> <COMMIT U> <T, E, 50, 51> <COMMIT T> <V, B, 21, 22> <COMMIT V> <END CKPT> was written prior to the crash We need only go back as far as the start of T (since T is the only active transaction) Start CKPT End CKPT Crash CS3223 Crash Recovery 56
Undo/Redo Log (Example) <START S> <S, A, 60, 61> <COMMIT S> <START T> <T, A, 61, 62> <START U> <U, B, 20, 21> <T, C, 30, 31> <START V> <U, D, 40, 41> <V, F, 70, 71> <COMMIT U> <T, E, 50, 51> <COMMIT T> <V, B, 21, 22> <COMMIT V> <END CKPT> was not written Any transaction that was active when the previous checkpoint ended may have written some but not all of its data to disk. So, go to the previous checkpoint In this case, the only other transaction that could qualify is S, so we must look back to the beginning of S, i.e., to the beginning of the log in this simple example. CS3223 Crash Recovery 57
Media failure (Loss of non-volatile storage) A: 16 Solution: Make copies of data! CS3223 Crash Recovery 58
Example 1 Triple modular redundancy Keep 3 copies on separate disks Output(X) --> three outputs Input(X) --> three inputs + vote X1 X2 X3 CS3223 Crash Recovery 59
Example 2 Redundant writes, Single reads Keep N copies on separate disks Output(X) --> N outputs Input(X) --> Input one copy - if ok, done - else try another one Assumes bad data can be detected CS3223 Crash Recovery 60
Example 3: DB Dump + Log backup database log active database If active database is lost, restore active database from backup bring up-to-date using redo entries in log CS3223 Crash Recovery 61
When can log be discarded? log db dump last needed undo checkpoint not needed for media recovery time not needed for undo after system failure not needed for redo after system failure CS3223 Crash Recovery 62
Summary Consistency of data Two sources of problems: Failures Logging Redundancy Data sharing Concurrency Control Log-based recovery mechanisms Undo, Redo, Undo/Redo What about No Undo/No Redo??? CS3223 Crash Recovery 63