Recovery
Review: The ACID properties A tomicity: All actios i the Xactio happe, or oe happe. C osistecy: If each Xactio is cosistet, ad the DB starts cosistet, it eds up cosistet. I solatio: Executio of oe Xactio is isolated from that of other Xacts. D urability: If a Xactio commits, its effects persist. CC guaratees Isolatio ad Cosistecy. The Recovery Maager guaratees Atomicity & Durability.
Why is recovery system ecessary? Trasactio failure : Logical errors: applicatio errors (e.g. div by 0, segmetatio fault) System errors: deadlocks System crash: hardware/software failure causes the system to crash. Disk failure: head crash or similar disk failure destroys all or part of disk storage Lost data ca be i mai memory or o disk
Volatile storage: Storage Media does ot survive system crashes examples: mai memory, cache memory Novolatile storage: survives system crashes examples: disk, tape, flash memory, o-volatile (battery backed up) RAM Stable storage: a mythical form of storage that survives all failures approximated by maitaiig multiple copies o distict ovolatile media
Recovery ad Durability To achieve Durability: Put data o stable storage To approximate stable storage make two copies of data Problem: data trasfer failure
Recovery ad Atomicity Durability is achieved by makig 2 copies of data What about atomicity Crash may cause icosistecies
Recovery ad Atomicity Example: trasfer $50 from accout A to accout B goal is either to perform all database modificatios made by Ti or oe at all. Requires several iputs (reads) ad outputs (writes) Failure after output to accout A ad before output to B. DB is corrupted!
Recovery Algorithms Recovery algorithms are techiques to esure database cosistecy ad trasactio atomicity ad durability despite failures Recovery algorithms have two parts 1. Actios take durig ormal trasactio processig to esure eough iformatio exists to recover from failures 2. Actios take after a failure to recover the database cotets to a state that esures atomicity ad durability
Backgroud: Data Access Physical blocks: blocks o disk. Buffer blocks: blocks i mai memory. Data trasfer: iput(b) trasfers the physical block B to mai memory. output(b) trasfers the buffer block B to the disk, ad replaces the appropriate physical block there. Each trasactio Ti has its private work-area i which local copies of all data items accessed ad updated by it are kept. Ti's local copy of a data item x is called xi. Assumptio: each data item fits i ad is stored iside, a sigle block.
Data Access (Cot.) Trasactio trasfers data items betwee system buffer blocks ad its private work-area usig the followig operatios : read(x) assigs the value of data item X to the local variable xi. write(x) assigs the value of local variable xi to data item {X} i the buffer block. both these commads may ecessitate the issue of a iput(bx) istructio before the assigmet, if the block BX i which X resides is ot already i memory. Trasactios Perform read(x) while accessig X for the first time; All subsequet accesses are to the local copy. After last access, trasactio executes write(x). output(bx) eed ot immediately follow write(x). System ca perform the output operatio whe it deems fit.
Buffer Block A Buffer Block B read(x) buffer iput(a X ) A Y output(b) B write(y) x 1 y 1 work area of T1 memor y x 2 work area of T2 disk
Recovery ad Atomicity (Cot.) To esure atomicity, first output iformatio about modificatios to stable storage without modifyig the database itself. We study two approaches: log-based recovery, ad shadow-pagig
Simplifyig assumptios: Log-Based Recovery Trasactios ru serially logs are writte directly o the stable storage Log: a sequece of log records; maitais a record of update activities o the database. (Write Ahead Log, W.A.L.) Log records for trasactio Ti: <Ti start > <Ti, X, V1, V2> <Ti commit > Two approaches usig logs Deferred database modificatio Immediate database modificatio
Log example Trasactio T1 Read(A) A =A-50 Write(A) Read(B) B = B+50 Write(B) Log <T1, start> <T1, A, 1000, 950> <T1, B, 2000, 2050> <T1, commit>
Deferred Database Modificatio Ti starts: write a <Ti start> record to log. Ti write(x) write <Ti, X, V> to log: V is the ew value for X The write is deferred Note: old value is ot eeded for this scheme Ti partially commits: Write <Ti commit> to the log DB updates by readig ad executig the log: <Ti start> <Ti commit>
Deferred Database Modificatio How to use the log for recovery after a crash? Redo: if both <Ti start> ad <Ti commit> are there i the log. Crashes ca occur while the trasactio is executig the origial updates, or while recovery actio is beig take example trasactios T0 ad T1 (T0 executes before T1): T0: read (A) T1 : read (C) A: - A - 50 C:- C- 100 write (A) write (C) read (B) B:- B + 50 write (B)
Deferred Database Modificatio (Cot.) Below we show the log as it appears at three istaces of time. <T0, start> <T0, A, 950> <T0, B, 2050> (a) <T0, start> <T0, A, 950> <T0, B, 2050> <T0, commit> <T1, start> <T1, C, 600> (b) <T0, start> <T0, A, 950> <T0, B, 2050> <T0, commit> <T1, start> <T1, C, 600> <T1, commit> (c) What is the correct recovery actio i each case?
Immediate Database Modificatio Database updates of a ucommitted trasactio are allowed Tighter loggig rules are eeded to esure trasactios are udoable LOG records must be of the form: <Ti, X, Vold, Vew > Log record must be writte before database item is writte Output of DB blocks ca occur: Before or after commit I ay order
Immediate Database Modificatio (Cot.) Recovery procedure : Udo : <Ti, start > is i the log but <Ti commit> is ot. Udo: restore the value of all data items updated by Ti to their old values, goig backwards from the last log record for Ti Redo: <Ti start> ad <Ti commit> are both i the log. sets the value of all data items updated by Ti to the ew values, goig forward from the first log record for Ti Both operatios must be idempotet: eve if the operatio is executed multiple times the effect is the same as if it is executed oce
Immediate Database Modificatio Example Log Write Output <T0 start> <T0, A, 1000, 950> <To, B, 2000, 2050> A = 950 B = 2050 <T0 commit> <T1 start> <T1, C, 700, 600> C = 600 <T1 commit> Note: BX deotes block cotaiig X. BB, BC BA
<T0, start> <T0, A, 1000, 950> <T0, B, 2000, 2050> (a) I M Recovery Example <T0, start> <T0, A, 1000, 950> <T0, B, 2000, 2050> <T0, commit> <T1, start> <T1, C, 700, 600> (b) <T0, start> <T0, A, 1000, 950> <T0, B, 2000, 2050> <T0, commit> <T1, start> <T1, C, 700, 600> <T1, commit> (c) Recovery actios i each case above are: (a) udo (T0): B is restored to 2000 ad A to 1000. (b) udo (T1) ad redo (T0): C is restored to 700, ad the A ad B are set to 950 ad 2050 respectively. (c) redo (T0) ad redo (T1): A ad B are set to 950 ad 2050 respectively. The C is set to 600
Checkpoits Problems i recovery procedure as discussed earlier : 1. searchig the etire log is time-cosumig 2. we might uecessarily redo trasactios which have already output their updates to the database. How to avoid redudat redoes? Put marks i the log idicatig that at that poit DB ad log are cosistet. Checkpoit!
Checkpoits At a checkpoit: Quiese system operatio. Output all log records curretly residig i mai memory oto stable storage. Output all modified buffer blocks to the disk. Write a log record < checkpoit> oto stable storage.
Checkpoits (Cot.) Recoverig from log with checkpoits: 1. Sca backwards from ed of log to fid the most recet <checkpoit> record 2. Cotiue scaig backwards till a record <Ti start> is foud. 3. Need oly cosider the part of log followig above start record. Why? 4. After that, recover from log with the rules that we had before.
Example of Checkpoits T c T 1 T 2 T 3 T 4 T f checkpoit checkpoit system failure T1 ca be igored (updates already output to disk due to checkpoit) T2 ad T3 redoe. T4 udoe
Shadow Pagig Shadow pagig: alterative to log-based recovery; works maily for serial executio of trasactios Keeps clea data (the shadow pages) utouched durig trasactio (i stable storage) Writes to a copy of the data Replace the shadow page oly whe the trasactio is committed ad output to the disk
Shadow Pagig Maitai two page tables durig the lifetime of a trasactio the curret page table, ad the shadow page table Store the shadow page table i ovolatile storage, Shadow page table is ever modified durig executio To start with, both page tables are idetical. Oly curret page table is used for data item accesses durig executio of the trasactio. Wheever ay page is about to be writte for the first time A copy of this page is made oto a uused page. The curret page table is the made to poit to the copy The update is performed o the copy
Sample Page Table
Example of Shadow Pagig Shadow ad curret page tables after write to page 4
Shadow Pagig To commit a trasactio : 1. Flush all modified pages i mai memory to disk 2. Output curret page table to disk 3. Make the curret page table the ew shadow page table, as follows: keep a poiter to the shadow page table at a fixed (kow) locatio o disk. to make the curret page table the ew shadow page table, simply update the poiter to poit to curret page table o disk Oce poiter to shadow page table has bee writte, trasactio is committed. No recovery is eeded after a crash! ew trasactios ca start right away, usig the shadow page table.
Advatages Shadow Pagig o overhead of writig log records recovery is trivial Disadvatages : Copyig the etire page table is very expesive Data gets fragmeted Hard to exted for cocurret trasactios
Recovery With Cocurret Trasactios To permit cocurrecy: All trasactios share a sigle disk buffer ad a sigle log Cocurrecy cotrol: Strict 2PL :i.e. Release exclusive locks oly after commit. Loggig is doe as described earlier. The checkpoitig techique ad actios take o recovery have to be chaged (based o ARIES) sice several trasactios may be active whe a checkpoit is performed.
Recovery With Cocurret Trasactios (Cot.) Checkpoits for cocurret trasactios: < checkpoit L> L: the list of trasactios active at the time of the checkpoit We assume o updates are i progress while the checkpoit is carried out Recovery for cocurret trasactios, 3 phases: 1. Iitialize udo-list ad redo-list to empty 2. Sca the log backwards from the ed, stoppig whe the first <checkpoit L> record is foud. For each record foud durig the backward sca: ANALYSIS H if the record is <Ti commit>, add Ti to redo-list 1. if the record is <Ti start>, the if Ti is ot i redo-list, add Ti to udo-list 3. For every Ti i L, if Ti is ot i redo-list, add Ti to udo-list
Recovery With Cocurret Trasactios Sca log backwards UNDO Perform udo(t) for every trasactio i udo-list Stop whe you have see <T, start> for every T i udo-list. Locate the most recet <checkpoit L> record. 1. Sca log forwards from the <checkpoit L> record till the ed of the log. ê perform redo for each log record that belogs to a trasactio o redo-list REDO
Example of Recovery : <T0 start> <T0, A, 0, 10> <T0 commit> <T1 start> <T1, B, 0, 10> <T2 start> <T2, C, 0, 10> <T2, C, 10, 20> <checkpoit {T1, T2}> <T3 start> <T3, A, 10, 20> <T3, D, 0, 10> <T3 commit> Redo-list{T3} Udo-list{T1, T2} Udo: Set C to 10 Set C to 0 Set B to 0 Redo: Set A to 20 Set D to 10 DB A B C D Iitial 0 0 0 0 At crash 20 10 20 10 After rec. 20 0 0 10
Remote Backup Systems Remote backup systems provide high availability by allowig trasactio processig to cotiue eve if the primary site is destroyed
Remote Backup Systems (Cot.) Detectio of failure: Backup site must detect whe primary site has failed to distiguish primary site failure from lik failure maitai several commuicatio liks betwee the primary ad the remote backup. Heart-beat messages Trasfer of cotrol: To take over cotrol backup site first performs recovery usig its copy of the database ad all the log records it has received from the primary. Thus, completed trasactios are redoe ad icomplete trasactios are rolled back. Whe the backup site takes over processig it becomes the ew primary To trasfer cotrol back to old primary whe it recovers, old primary must receive redo logs from the old backup ad apply all updates locally.
Remote Backup Systems (Cot.) Time to recover: To reduce delay i takeover, backup site periodically proceses the redo log records (i effect, performig recovery from previous database state), performs a checkpoit, ad ca the delete earlier parts of the log. Hot-Spare cofiguratio permits very fast takeover: Backup cotiually processes redo log record as they arrive, applyig the updates locally. Whe failure of the primary is detected the backup rolls back icomplete trasactios, ad is ready to process ew trasactios. Alterative to remote backup: distributed database with replicated data Remote backup is faster ad cheaper, but less tolerat to failure more o this later.