CS122 Lecture 18 Winter Term, All hail the WAL-Rule!

Similar documents
CS122 Lecture 19 Winter Term,

UNIT 9 Crash Recovery. Based on: Text: Chapter 18 Skip: Section 18.7 and second half of 18.8

ARIES (& Logging) April 2-4, 2018

Crash Recovery CMPSCI 645. Gerome Miklau. Slide content adapted from Ramakrishnan & Gehrke

Database Recovery. Lecture #21. Andy Pavlo Computer Science Carnegie Mellon Univ. Database Systems / Fall 2018

Atomicity: All actions in the Xact happen, or none happen. Consistency: If each Xact is consistent, and the DB starts consistent, it ends up

Crash Recovery. The ACID properties. Motivation

Crash Recovery. Chapter 18. Sina Meraji

CS122 Lecture 15 Winter Term,

COURSE 4. Database Recovery 2

ACID Properties. Transaction Management: Crash Recovery (Chap. 18), part 1. Motivation. Recovery Manager. Handling the Buffer Pool.

Transaction Management: Crash Recovery (Chap. 18), part 1

some sequential execution crash! Recovery Manager replacement MAIN MEMORY policy DISK

Crash Recovery Review: The ACID properties

Outline. Purpose of this paper. Purpose of this paper. Transaction Review. Outline. Aries: A Transaction Recovery Method

Chapter 17: Recovery System

Failure Classification. Chapter 17: Recovery System. Recovery Algorithms. Storage Structure

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Last Class. Faloutsos/Pavlo CMU /615

Review: The ACID properties. Crash Recovery. Assumptions. Motivation. More on Steal and Force. Handling the Buffer Pool

A tomicity: All actions in the Xact happen, or none happen. D urability: If a Xact commits, its effects persist.

ARIES. Handout #24. Overview

Introduction. Storage Failure Recovery Logging Undo Logging Redo Logging ARIES

Database Recovery Techniques. DBMS, 2007, CEng553 1

Last time. Started on ARIES A recovery algorithm that guarantees Atomicity and Durability after a crash

Introduction to Database Systems CSE 444

CAS CS 460/660 Introduction to Database Systems. Recovery 1.1

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

Review: The ACID properties. Crash Recovery. Assumptions. Motivation. Preferred Policy: Steal/No-Force. Buffer Mgmt Plays a Key Role

Aries (Lecture 6, cs262a)

CSE 544 Principles of Database Management Systems. Fall 2016 Lectures Transactions: recovery

Transaction Management. Readings for Lectures The Usual Reminders. CSE 444: Database Internals. Recovery. System Crash 2/12/17

CompSci 516 Database Systems

Distributed Systems

Transactions and Recovery Study Question Solutions

Slides Courtesy of R. Ramakrishnan and J. Gehrke 2. v Concurrent execution of queries for improved performance.

Lecture 16: Transactions (Recovery) Wednesday, May 16, 2012

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #19: Logging and Recovery 1

Database Applications (15-415)

System Malfunctions. Implementing Atomicity and Durability. Failures: Crash. Failures: Abort. Log. Failures: Media

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. General Overview NOTICE: Faloutsos CMU SCS

6.830 Lecture 15 11/1/2017

Problems Caused by Failures

Chapter 9. Recovery. Database Systems p. 368/557

Chapter 14: Recovery System

Advanced Database Management System (CoSc3052) Database Recovery Techniques. Purpose of Database Recovery. Types of Failure.

Database Systems ( 資料庫系統 )

CSE 444, Winter 2011, Midterm Examination 9 February 2011

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

COMPSCI/SOFTENG 351 & 751. Strategic Exercise 5 - Solutions. Transaction Processing, Crash Recovery and ER Diagrams. (May )

Log-Based Recovery Schemes

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Recoverability. Kathleen Durant PhD CS3200

Database Management System

RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E)

6.830 Lecture Recovery 10/30/2017

Recovery and Logging

Chapter 17: Recovery System

Concurrency Control & Recovery

6.830 Lecture Recovery 10/30/2017

Recovery from failures

Chapter 16: Recovery System. Chapter 16: Recovery System

AC61/AT61 DATABASE MANAGEMENT SYSTEMS JUNE 2013

1/29/2009. Outline ARIES. Discussion ACID. Goals. What is ARIES good for?

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Name Class Account UNIVERISTY OF CALIFORNIA, BERKELEY College of Engineering Department of EECS, Computer Science Division J.

Lecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo

Homework 6 (by Sivaprasad Sudhir) Solutions Due: Monday Nov 27, 11:59pm

CS5200 Database Management Systems Fall 2017 Derbinsky. Recovery. Lecture 15. Recovery

Transactional Recovery

Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed McGraw-Hill by

CS 541 Database Systems. Implementation of Undo- Redo. Clarification

Employing Object-Based LSNs in a Recovery Strategy

Lecture X: Transactions

Advances in Data Management Transaction Management A.Poulovassilis

Overview of ARIES. Database Systems 2018 Spring

Implementing a Demonstration of Instant Recovery of Database Systems

Final Review. May 9, 2017

Architecture and Implementation of Database Systems (Summer 2018)

Transaction Management Part II: Recovery. Shan Hung Wu & DataLab CS, NTHU

CSEP544 Lecture 4: Transactions

Final Review. May 9, 2018 May 11, 2018

CompSci 516: Database Systems

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery

Database Management Systems Reliability Management

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Redo Log Removal Mechanism for NVRAM Log Buffer

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Transaction Management Part II: Recovery. vanilladb.org

Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed

CS 541 Database Systems. Recovery Managers

Chapter 15 Recovery System

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

Transactions. CS6450: Distributed Systems Lecture 16. Ryan Stutsman

An Efficient Log.Based Crash Recovery Scheme for Nested Transactions

Overview of Transaction Management

CSE 444 Midterm Exam

Chapter 22. Introduction to Database Recovery Protocols. Copyright 2012 Pearson Education, Inc.

Concurrency Control & Recovery

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

DBS related failures. DBS related failure model. Introduction. Fault tolerance

Transcription:

CS122 Lecture 18 Winter Term, 2014-2015 All hail the WAL-Rule!

2 Write- Ahead Logging Last time, introduced write- ahead logging (WAL) as a mechanism to provide atomic, durable transactions Log records: <T i : start>, <T i : commit>, <T i : abort> Updates: <T i, X j, V 1, V 2 > Records that transaction T i changed data- item X j from V 1 to V 2 Redo- only updates: <T i, X j, V> Records that data- item X j was rolled back to value V Must always follow the write- ahead logging rule: All database state changes must be recorded to the log on disk, before any table- Oiles are changed

3 Write- Ahead Logging (2) Recovery processing involves two phases Redo phase: Replay all state changes recorded in the write- ahead log Find the set of all incomplete transactions Undo phase: Roll back all incomplete transactions, updating log as with a normal transaction rollback Once recovery is completed: All table Oiles are in sync with the write- ahead log All transactions are in a completed state

4 Recovery Processing The write- ahead log records every transaction, and every state- change performed against the database Recovery processing scans this entire log Oile Unless database is crashing all the time (unlikely), most transactions in log Oile won t actually require recovery! Introduce a checkpoint mechanism that allows us to pinpoint where to start recovery processing from Goal: As with recovery, checkpoint procedure should bring all table Oiles into sync with the write- ahead log Extent of write- ahead log required for recovery processing will be constrained by most recent checkpoint

5 Checkpoint Procedure Constraint: while checkpoint is being performed, no other update operations are allowed Imposes a small but deoinite performance impact Checkpoint procedure (each step performed in order): Output all log records currently in memory to disk, and sync the log Oile Output all modioied table- pages from memory to disk, and sync the table- Oiles as well Output a log record of the form <checkpoint L>, where L is the set of all active transactions at time of checkpoint

6 Checkpoint Records When we see a <checkpoint L> record in the log: The table state on disk reolects all changes recorded in the write- ahead log on disk, up to that checkpoint record Once the checkpoint is performed, can the database ignore all log records before <checkpoint L> record? NO: If one of the transactions in L was not completed before recovery, it must be aborted during recovery Database must keep all logs up to earliest <T i : start> record, over all transactions T i that appear in the set L Logs before this earliest <T i : start> record may be discarded

7 Checkpoint Recovery Recovery processing changes slightly with new checkpoint mechanism Database must initially scan backward through logs, to Oind the last checkpoint record Perform redo phase Oirst, as before This time, the set of incomplete transactions is initialized to contain all L txns specioied in <checkpoint: L> record Can ignore all redo operations before this checkpoint: All logs were Olushed, then all modioied table- pages were Olushed, before the <checkpoint: L> record was output (May still need to undo operations before checkpoint )

8 Checkpoint Recovery (2) Second, perform the undo phase: At end of redo phase, the set of incomplete transactions will specify which transactions must be rolled back Undo processing doesn t change at all Traverse transaction- log backwards, undoing state- changes of transactions in the incomplete set For a transaction T i in the incomplete set, remove it from the set when a <T i : start> record is reached Don t forget: Txns in set L from <checkpoint: L> record may also need to be rolled back; their records will be before checkpoint record Just scan backward in logs until incomplete- set is empty

9 Checkpoint Performance During a checkpoint, no other updates are allowed Example: A transaction is updating every record in a very large table Lots of dirty buffer- pages in memory, logs being generated! Changes aren t required to be Olushed to disk until commit and then a checkpoint occurs. All in- memory write- ahead log records must be output All dirty table- pages must be output During this time, all other write operations will be blocked until the checkpoint completes Even small transactions that only need to update one value! If a lot of data must be output, this will take some time

10 Checkpoint Performance (2) Can perform checkpoints based on how many logs (or dirty table- pages) have accumulated in memory Impose an upper- bound on the time a checkpoint takes Can also implement fuzzy checkpoints Allows transactions to perform updates even during the fuzzy checkpoint procedure Modify the checkpoint procedure slightly: Still require that updates are blocked while WAL records are output to disk, and while <checkpoint> record is written Allow modioied table- pages to be written to disk after the <checkpoint> record has been written After <checkpoint> is written, allow some updates to proceed

11 Fuzzy Checkpoints Fuzzy checkpoint procedure (performed in order): Output all write- ahead log records currently in memory to disk, and sync the log Oile Output a log record of the form <checkpoint L>, where L is the set of all active transactions at time of checkpoint (This used to be done after writing all table-ciles to disk ) Output all modioied table- pages from memory to disk, and sync the table- Oiles as well Checkpoint record isn t quite as strong now The <checkpoint L> record no longer guarantees that table- Oiles are in sync with log records up to that point

12 Fuzzy Checkpoints (2) When a fuzzy checkpoint is performed: The <checkpoint L> record no longer guarantees that table- Oiles are in sync with log records up to that point Checkpoint is still incomplete at the time the record is written to the log! Database maintains an additional value on the disk: A last-checkpoint value, recording the record- location of the last successfully completed checkpoint in the log Oile (Also want to avoid having to scan backward through log just to Oind the most recent checkpoint record )

13 Fuzzy Checkpoints (3) Database maintains a last-checkpoint value on disk When <checkpoint L> record is written, DB must collect all dirty table- pages at time of checkpoint Iterate through this collection, writing each page to disk When all table- pages have been successfully written, update last-checkpoint to point to most recent <checkpoint> record Database can choose the order that it writes dirty pages out to disk e.g. write them in sequential order for very fast output

14 Fuzzy Checkpoints (4) The database must still follow write- ahead logging rule Example: During a fuzzy checkpoint, a dirty buffer- page B is collected to be output to disk then, a transaction wants to write to the data in page B Cannot allow this write until after page B is written to disk Otherwise, will violate semantics of the checkpoint procedure With fuzzy checkpoints, txns will still block on individual pages being output during the fuzzy- checkpoint process Ideally this will be infrequent, and the delay will be short

15 Long- Running TransacDons Long- running transactions still cause issues: A checkpoint marks a point- in- time when table Oiles and write- ahead log are in sync with each other, on disk Don t need to redo anything before the checkpoint May still need to undo operations before the checkpoint, if a transaction extended well before that point As described, the write- ahead logging mechanism doesn t handle this situation very well May still need to traverse log far into the past, just to undo operations before the checkpoint (In fact, undos may already be reclected in the table Ciles!)

16 ARIES Logging/Recovery ARIES: Algorithms for Recovery and Isolation Exploiting Semantics A no- force, steal logging/recovery system Designed at IBM in 1992 Used in IBM DB2, MS SQLServer, NTFS, and many others Has many transaction- processing features, such as: Row- level locking instead of page- level locking Fuzzy checkpoints that allow updates during checkpoint Nested transactions and savepoints Parallel recovery processing Will do a very basic overview of ARIES mechanisms (See the research papers for all the details!)

17 Log Sequence Numbers Many features in ARIES depend on assigning each log record a unique log sequence number (LSN) Can simply use the record s offset in the log Oile Can refer to individual log records using their LSNs Write- ahead logging can generate very large logs Place an upper bound on log Oile sizes When current log Oile hits upper bound, begin another Oile Assign each log Oile its own number Fully specioied LSN would be logcile_number.offset Within a single log Oile, just use a record s offset for the LSN

18 ARIES Log Records Example: multiple concurrent transactions T 37 : Transfer $50 from account A to B (committed) T 40 : Transfer $100 from account C to D (active) Each record has an (implicit) log sequence number Not stored; can infer from the record s position in the Oile All log records contain a PrevLSN LSN Oield, specifying the LSN for the 257 T 37 : start 262 T previous record for that transaction 40 : start 267 T 40 : C, 350, 250 262 285 T Records store the LSN of previous 37 : A, 100, 50 257 303 T 37 : B, 40, 90 285 321 T log- record in the same transaction 40 : D, 100, 200 267 339 T 37 : commit PrevLSN 303

19 ARIES Log Records (2) PrevLSN Oield makes it very easy to trace through logs for a specioic transaction Continuing example: Abort T 40, transfer from C to D Need to roll back all state- changes in T 40 Start with last log- record recorded for T 40 : LSN 321 DB also tracks the most recent LSN for every active transaction in a table As before, ARIES uses compensation log records (CLRs) during rollback These records have an extra Oield UndoNextLSN, specifying the next operation to undo in the transaction LSN 257 262 267 285 303 321 339 T 37 : start T 40 : start T 40 : C, 350, 250 T 37 : A, 100, 50 T 37 : B, 40, 90 T 40 : D, 100, 200 T 37 : commit PrevLSN 262 257 285 267 303 344 T 40 : D, 100 267 321 UndoNextLSN

20 ARIES Log Records (3) Continuing example: Abort T 40, transfer from C to D After performing compensating action for LSN 321, roll back LSN 267 LSN As before, continue process until reaching T 40 s start log- record Using PrevLSN and UndoNextLSN entries, ARIES can provide very fast rollback, even during periods of heavy concurrent usage 257 262 267 285 303 321 339 T 37 : start T 40 : start T 40 : C, 350, 250 T 37 : A, 100, 50 T 37 : B, 40, 90 T 40 : D, 100, 200 T 37 : commit PrevLSN 262 257 285 267 303 344 T 40 : D, 100 267 321 362 T 40 : C, 350 262 344 380 T 40 : abort 362

21 ARIES Data Pages Every data page is also given a PageLSN Oield SpeciOies the most recent write- ahead log record that has been applied to the page When pages are Olushed to disk, PageLSN on disk is also updated (obvious) DB buffers data pages in memory PageLSN value in the page on disk may differ from value in memory Simply indicates that data page on disk doesn t yet have all updates applied Similarly, if PageLSN values match, we know the disk page is up to date PageLSN 1 303 2 A = 50 B = 90 1 303 2 A = 50 B = 90 PageLSN 362 C = 350 D = 100 C = 250 D = 200 Volatile Memory 321 Nonvolatile Memory

22 ARIES Data Pages (2) PageLSN Oield specioies the most recent write- ahead log record that has been applied to the page During recovery processing: For a given page, only apply records with a LSN larger than the page s current PageLSN Unnecessary to apply earlier records; data page on disk already reolects the earlier updates ARIES also has some update records that can only be applied once! Applying to a page multiple times produces incorrect results PageLSN value is essential to maintain idempotence of recovery processing 1 PageLSN 303 A = 50 B = 90 1 303 2 A = 50 B = 90 2 PageLSN 362 C = 350 D = 100 321 C = 250 D = 200 Volatile Memory Nonvolatile Memory

23 Dirty Page Table ARIES also keeps a dirty page table, recording extra details about all dirty pages in the buffer manager Dirty page table entries hold three values: The dirty page s ID (e.g. Oilename and block- number) The current in- memory PageLSN value for the dirty page Also, an additional RecLSN Oield: Set to the current LSN at the time the page became dirty When a data page is Olushed from buffer manager back to disk, its entry in the dirty page table is removed

24 Dirty Page Table (2) Continuing previous example Dirty page table contains one entry Data page 2 is dirty PageLSN value is easy. RecLSN is more challenging Page 2 in memory reolects state after aborting T 40 Page 2 on disk has LSN two initial writes 257 PageLSN PageLSN from T 40 1 262 303 2 362 267 Page 2 became dirty A = 50 C = 350 Volatile 285 B = 90 D = 100 Memory 303 again from update 321 recorded in LSN 344 1 2 303 A = 50 B = 90 321 C = 250 D = 200 Nonvolatile Memory 339 Dirty Page Table: PgID PageLSN RecLSN 2 362 344 T 37 : start T 40 : start T 40 : C, 350, 250 T 37 : A, 100, 50 T 37 : B, 40, 90 T 40 : D, 100, 200 T 37 : commit PrevLSN 262 257 285 267 303 344 T 40 : D, 100 267 321 362 T 40 : C, 350 262 344 380 T 40 : abort 362

25 Dirty Page Table: Dirty Page Table (3) What can we discern from the RecLSN value recorded in the dirty page table? RecLSN tells us where to start in the log, to bring the disk- version of the data page into synchronization with the in- memory version of the page Very useful information to have during recovery processing 1 PgID PageLSN RecLSN 2 362 344 LSN 257 262 267 285 303 321 339 PageLSN 303 A = 50 B = 90 A = 50 B = 90 2 1 303 2 T 37 : start T 40 : start T 40 : C, 350, 250 T 37 : A, 100, 50 T 37 : B, 40, 90 T 40 : D, 100, 200 T 37 : commit PageLSN 362 C = 350 D = 100 321 C = 250 D = 200 Volatile Memory PrevLSN 262 257 285 267 303 344 T 40 : D, 100 267 321 362 T 40 : C, 350 262 344 Nonvolatile Memory 380 T 40 : abort 362

26 ARIES Checkpoints ARIES also supports fuzzy checkpoints An ARIES checkpoint record includes: A table of all active transactions, including the ID of each transaction, and its LastLSN value: The log sequence number of the last WAL record generated for that transaction The dirty page table, specifying all dirty pages at time of checkpoint, along with their PageLSN and RecLSN values The fuzzy checkpoint procedure is similar to before: Write and Olush all in- memory log records to disk Write and Olush the checkpoint record to disk

27 ARIES Checkpoints (2) ARIES doesn t require Olushing all dirty pages to disk as a part of the fuzzy checkpoint! Makes checkpointing very unintrusive to other transaction- processing operations Rather, buffer manager writes out dirty pages on a regular basis, during normal execution Can schedule dirty page writes to minimize disk seeks When a dirty page is Olushed to disk: Easy to follow the WAL rule, using PageLSN value stored in each page e.g. when Olushing page 2, simply ensure that write- ahead log is Olushed to record with LSN 362 1 PageLSN 303 A = 50 B = 90 1 303 2 A = 50 B = 90 2 PageLSN 362 C = 350 D = 100 321 C = 250 D = 200 Volatile Memory Nonvolatile Memory

28 ARIES Recovery ARIES recovery- processing involves three phases: Analysis phase, redo phase, undo phase Analysis phase performs three important tasks: Determine the starting point (LSN) in the write- ahead log where redo processing must commence from Determine the set of data pages that must be brought into sync with the write- ahead log state Determine the set of incomplete transactions that must be rolled back (In ARIES paper, these are called loser transactions)

29 Recovery: Analysis Phase Checkpoint record contains a copy of dirty page table Use this table to determine initial value of RedoLSN, the LSN where the redo phase should start from e.g. given this dirty page table: RedoLSN should be 345 General rule: Set RedoLSN to smallest value of RecLSN that appears in dirty page table If checkpoint s dirty page table is empty, set RedoLSN to the LSN of the checkpoint record Dirty Page Table: PgID PageLSN RecLSN 14 403 389 3 521 484 9 505 456 11 398 345 6 584 577 7 522 490

30 Recovery: Analysis Phase (2) Next, analysis phase determines the set of dirty pages, and the set of incomplete transactions Initial dirty page table, and initial set of incomplete transactions are both taken from the checkpoint state Start scanning forward through write- ahead log, starting with the checkpoint record (not RedoLSN!)

31 Recovery: Analysis Phase (3) Scan forward from the checkpoint record: If a <T i : begin> record is seen, add T i to set of incomplete transactions, with LastLSN set to LSN of record Allows us to know where to start undoing T i, if we have to If a <T i : commit> or <T i : abort> record is seen, remove T i from set of incomplete transactions For T i update records: <T i : X j, V 1, V 2 > or <T i : X j, V> Update transaction T i s LastLSN to LSN of the update record If X j s data page is not in the dirty page table, add it to the table with RecLSN set to the LSN of the update record, and PageLSN taken from the data page on disk When this scan is complete, analysis phase is complete

32 Recovery: Redo Phase Database now has an up- to- date dirty page table From earlier: ARIES uses each page s PageLSN value to ensure that each update is only applied to a page once Required to make recovery processing idempotent, since some ARIES redo- operations may only be applied once Redo phase: repeat history! For each update record <T i : X j, V 1, V 2 > or <T i : X j, V>: If X j s page is in dirty page table then apply the update If not, we know update already hit the data page on disk; no need to apply the update twice!

33 Recovery: Redo Phase (2) At end of redo phase, database will be in exactly the same state as when the system crash occurred The redo phase only loads data pages that need to be updated. No updates are applied multiple times! Makes for a much faster recovery process than before After redo phase, there will be incomplete transactions Roll these back in the undo phase

34 Recovery: Undo Phase Undo phase must roll back all incomplete transactions At end of analysis phase, had a table of all incomplete transactions, along with LastLSN of each transaction Can roll back these transactions in any order Could roll each transaction back separately, or could roll all transactions back in a single pass LSN Each log record specioies the LSN of 257 T the previous operation in that txn, so 37 : start 262 T 40 : start rolling back these operations is easy 267 T 40 : C, 350, 250 285 T As before, must write Compensation Log 37 : A, 100, 50 303 T 37 : B, 40, 90 Records to WAL when undoing an update 321 T 40 : D, 100, 200 339 T of the form <T i : X j, V 1, V 2 >! 37 : commit PrevLSN 262 257 285 267 303

35 ARIES Logging/Recovery Checkpoints are often performed during recovery to guard against further crashes during recovery Next recovery attempt will begin further down the line ARIES is a very complex logging/recovery system Provides many txn- processing features and beneoits Checkpoints and recovery processing are both very fast DeOinitely worth the added complexity! Extensive use of ARIES demonstrates this very clearly