Database Management Systems CSEP 544. Lecture 9: Transactions and Recovery

Similar documents
Introduction to Data Management CSE 344

Database Systems CSE 414

CSE 344 MARCH 25 TH ISOLATION

Database Systems CSE 414

CSE 344 MARCH 5 TH TRANSACTIONS

CSE 344 MARCH 21 ST TRANSACTIONS

L i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A

Introduction to Data Management CSE 414

CSE 344 MARCH 9 TH TRANSACTIONS

CSE 444: Database Internals. Lectures Transactions

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344

Database Systems CSE 414

Introduction to Data Management CSE 344

5/17/17. Announcements. Review: Transactions. Outline. Review: TXNs in SQL. Review: ACID. Database Systems CSE 414.

Lecture 16: Transactions (Recovery) Wednesday, May 16, 2012

CSE 544 Principles of Database Management Systems. Fall 2016 Lectures Transactions: recovery

CSE 444: Database Internals. Lectures 13 Transaction Schedules

Database Systems CSE 414

Announcements. Transaction. Motivating Example. Motivating Example. Transactions. CSE 444: Database Internals

Announcements. Motivating Example. Transaction ROLLBACK. Motivating Example. CSE 444: Database Internals. Lab 2 extended until Monday

Lecture 7: Transactions Concurrency Control

Lecture 5: Transactions Concurrency Control. Wednesday October 26 th, 2011

CompSci 516: Database Systems

Motivating Example. Motivating Example. Transaction ROLLBACK. Transactions. CSE 444: Database Internals

Transaction Management. Readings for Lectures The Usual Reminders. CSE 444: Database Internals. Recovery. System Crash 2/12/17

Recovery from failures

Lecture 5: Transactions in SQL. Tuesday, February 6, 2007

Transaction Management & Concurrency Control. CS 377: Database Systems

Transactions: Recovery

CSEP544 Lecture 4: Transactions

CSE 190D Database System Implementation

Concurrency Control & Recovery

Transaction Processing. Introduction to Databases CompSci 316 Fall 2018

CS 5614: Transaction Processing 121. Transaction = Unit of Work Recall ACID Properties (from Module 1)

CS122 Lecture 15 Winter Term,

CSE544 Transac-ons: Concurrency Control. Lectures #5-6 Thursday, January 20, 2011 Tuesday, January 25, 2011

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9

Database Systems. Announcement

Introduction to Data Management. Lecture #18 (Transactions)

Problems Caused by Failures

Page 1. CS194-3/CS16x Introduction to Systems. Lecture 8. Database concurrency control, Serializability, conflict serializability, 2PL and strict 2PL

Lectures 8 & 9. Lectures 7 & 8: Transactions

Introduction to Data Management. Lecture #26 (Transactions, cont.)

Department of Computer Science University of Cyprus EPL446 Advanced Database Systems. Lecture 17. Crash Recovery: Undo Logging and Recovery

Overview of Transaction Management

Transactions. 1. Transactions. Goals for this lecture. Today s Lecture

Transaction Processing: Concurrency Control ACID. Transaction in SQL. CPS 216 Advanced Database Systems. (Implicit beginning of transaction)

CSEP 544: Lecture 07. Transactions Part 2: Concurrency Control. CSEP544 - Fall

Announcements. Review of Schedules. Scheduler. Notation. Pessimistic Scheduler. CSE 444: Database Internals. Serializability.

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Intro to Transactions

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems

CSC 261/461 Database Systems Lecture 24

Transaction Management: Introduction (Chap. 16)

Concurrency Control & Recovery

Introduction to Data Management. Lecture #24 (Transactions)

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

h p:// Authors: Tomáš Skopal, Irena Holubová Lecturer: Mar n Svoboda, mar

Intro to Transaction Management

References. Transaction Management. Database Administration and Tuning 2012/2013. Chpt 14 Silberchatz Chpt 16 Raghu

TRANSACTION MANAGEMENT

Transaction Management Overview

Database design and implementation CMPSCI 645. Lectures 18: Transactions and Concurrency

Lecture 21. Lecture 21: Concurrency & Locking

COURSE 1. Database Management Systems

CS5200 Database Management Systems Fall 2017 Derbinsky. Recovery. Lecture 15. Recovery

Introduction to Data Management. Lecture #25 (Transactions II)

Transaction Management: Concurrency Control

Overview. Introduction to Transaction Management ACID. Transactions

CMPT 354: Database System I. Lecture 11. Transaction Management

Transactions. Transaction. Execution of a user program in a DBMS.

Conflict Equivalent. Conflict Serializability. Example 1. Precedence Graph Test Every conflict serializable schedule is serializable

Introduction. Storage Failure Recovery Logging Undo Logging Redo Logging ARIES

XI. Transactions CS Computer App in Business: Databases. Lecture Topics

Transaction Management Overview. Transactions. Concurrency in a DBMS. Chapter 16

Example: Transfer Euro 50 from A to B

Topics in Reliable Distributed Systems

Lock Granularity and Consistency Levels (Lecture 7, cs262a) Ali Ghodsi and Ion Stoica, UC Berkeley February 7, 2018

Transactions. Silberschatz, Korth and Sudarshan

SQL: Transactions. Announcements (October 2) Transactions. CPS 116 Introduction to Database Systems. Project milestone #1 due in 1½ weeks

Information Systems (Informationssysteme)

6.830 Lecture Transactions October 23, 2017

Database Recovery. Dr. Bassam Hammo

Chapter 20 Introduction to Transaction Processing Concepts and Theory

doc. RNDr. Tomáš Skopal, Ph.D.

Transactions and Concurrency Control. Dr. Philip Cannata

Concurrency control 12/1/17

Chapter 22. Transaction Management

CS 377 Database Systems Transaction Processing and Recovery. Li Xiong Department of Mathematics and Computer Science Emory University

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Database Management Systems 2010/11

PART II. CS 245: Database System Principles. Notes 08: Failure Recovery. Integrity or consistency constraints. Integrity or correctness of data

T ransaction Management 4/23/2018 1

Database Recovery. Lecture 22. Dr. Indrakshi Ray. Database Recovery p. 1/44

CS122 Lecture 19 Winter Term,

SQL: Transactions. Introduction to Databases CompSci 316 Fall 2017

Transcription:

Database Management Systems CSEP 544 Lecture 9: Transactions and Recovery CSEP 544 - Fall 2017 1

HW8 released Announcements OH tomorrow Always check the class schedule page for up to date info Last lecture today Finals on 12/9-10 Covers everything (lectures, HWs, readings) 2

Homework 8 A flight reservation transactional application in Java based on HW3 and Azure 2 weeks assignment Use your Azure credits to run and test 3

Homework 8 Throughput contest (completely optional): We will generate a random number of transactions and measure the time taken to execute them Fastest implementation wins 1 st place: 2% extra credit on HW 2 nd place: 1% extra credit on HW 3 rd place: 0.5% extra credit on HW You can create any extra tables, indexes, classes, etc in your implementation Need to pass all grading test cases to be eligible for prizes 4

Class overview Data models Relational: SQL, RA, and Datalog NoSQL: SQL++ RDBMS internals Query processing and optimization Physical design Parallel query processing Spark and Hadoop Conceptual design E/R diagrams Schema normalization Transactions Locking and schedules Writing DB applications CSEP 544 - Fall 2017 Data models Query Processing Using DBMS 5

Class Recap Data models Elements of a data model Relational data model SQL, RA, and Datalog Non-relational data model SQL++ RDBMS internals Relational algebra and basics of query processing Algorithms for relational operators Physical design and indexes Query optimization CSEP 544 - Fall 2017 6

Class Recap Parallel query processing Different algorithms for relational operators MapReduce and Spark programming models Conceptual design E/R diagrams Normal forms and schema normalization Transactions and recovery Schedules and locking-based scheduler Recovery from failures CSEP 544 - Fall 2017 7

Data Management Pipeline Application programmer Schema designer Conceptual Schema name price product Database administrator Physical Schema 8

Transactions We use database transactions everyday Bank $$$ transfers Online shopping Signing up for classes For this class, a transaction is a series of DB queries Read / Write / Update / Delete / Insert Unit of work issued by a user that is independent from others CSEP 544 - Fall 2017 9

What s the big deal? CSEP 544 - Fall 2017 10

Challenges Want to execute many apps concurrently All these apps read and write data to the same DB Simple solution: only serve one app at a time What s the problem? Want: multiple operations to be executed atomically over the same DBMS CSEP 544 - Fall 2017 11

What can go wrong? Manager: balance budgets among projects Remove $10k from project A Add $7k to project B Add $3k to project C CEO: check company s total balance SELECT SUM(money) FROM budget; This is called a dirty / inconsistent read aka a WRITE-READ conflict CSEP 544 - Fall 2017 12

What can go wrong? App 1: SELECT inventory FROM products WHERE pid = 1 App 2: UPDATE products SET inventory = 0 WHERE pid = 1 App 1: SELECT inventory * price FROM products WHERE pid = 1 This is known as an unrepeatable read aka READ-WRITE conflict CSEP 544 - Fall 2017 13

App 1: What can go wrong? Set Account 1 = $200 Set Account 2 = $0 Account 1 = $100 Account 2 = $100 Total = $200 App 1: Set Account 1 = $200 App 2: Set Account 2 = $200 App 2: Set Account 2 = $200 Set Account 1 = $0 App 1: Set Account 2 = $0 App 2: Set Account 1 = $0 At the end: Total = $200 At the end: Total = $0 This is called the lost update aka WRITE-WRITE conflict CSEP 544 - Fall 2017 14

What can go wrong? Buying tickets to the next Bieber / Swift concert: Fill up form with your mailing address Put in debit card number Click submit Screen shows money deducted from your account [Your browser crashes] Lesson: Changes to the database should be ALL or NOTHING CSEP 544 - Fall 2017 15

Transactions Collection of statements that are executed atomically (logically speaking) BEGIN TRANSACTION [SQL statements] COMMIT or ROLLBACK (=ABORT) [single SQL statement] CSEP 544 - Fall 2017 If BEGIN missing, then TXN consists of a single instruction 16

Know your chemistry transactions: ACID Atomic State shows either all the effects of txn, or none of them Consistent Txn moves from a DBMS state where integrity holds, to another where integrity holds remember integrity constraints? Isolated Effect of txns is the same as txns running one after another (i.e., looks like batch mode) Durable Once a txn has committed, its effects remain in the database CSEP 544 - Fall 2017 17

Atomic Definition: A transaction is ATOMIC if all its updates must happen or not at all. Example: move $100 from A to B UPDATE accounts SET bal = bal 100 WHERE acct = A; UPDATE accounts SET bal = bal + 100 WHERE acct = B; BEGIN TRANSACTION; UPDATE accounts SET bal = bal 100 WHERE acct = A; UPDATE accounts SET bal = bal + 100 WHERE acct = B; COMMIT; CSEP 544 - Fall 2017 18

Isolated Definition An execution ensures that txns are isolated, if the effect of each txn is as if it were the only txn running on the system. CSEP 544 - Fall 2017 19

Consistent Recall: integrity constraints govern how values in tables are related to each other Can be enforced by the DBMS, or ensured by the app How consistency is achieved by the app: App programmer ensures that txns only takes a consistent DB state to another consistent state DB makes sure that txns are executed atomically Can defer checking the validity of constraints until the end of a transaction CSEP 544 - Fall 2017 20

Durable A transaction is durable if its effects continue to exist after the transaction and even after the program has terminated How? By writing to disk! (more later) CSEP 544 - Fall 2017 21

Rollback transactions If the app gets to a state where it cannot complete the transaction successfully, execute ROLLBACK The DB returns to the state prior to the transaction What are examples of such program states? CSEP 544 - Fall 2017 22

ACID Atomic Consistent Isolated Durable Enjoy this in HW8! Again: by default each statement is its own txn Unless auto-commit is off then each statement starts a new txn CSEP 544 - Fall 2017 23

Transaction Schedules CSEP 544 - Fall 2017 24

Schedules A schedule is a sequence of interleaved actions from all transactions CSEP 544 - Fall 2017 25

Serial Schedule A serial schedule is one in which transactions are executed one after the other, in some sequential order Fact: nothing can go wrong if the system executes transactions serially (up to what we have learned so far) But DBMS don t do that because we want better overall system performance CSEP 544 - Fall 2017 26

Example A and B are elements in the database t and s are variables in txn source code T1 T2 READ(A, t) READ(A, s) t := t+100 s := s*2 WRITE(A, t) WRITE(A,s) READ(B, t) READ(B,s) t := t+100 s := s*2 WRITE(B,t) WRITE(B,s) CSEP 544 - Fall 2017 27

Example of a (Serial) Schedule Time T1 READ(A, t) t := t+100 WRITE(A, t) READ(B, t) t := t+100 WRITE(B,t) T2 READ(A,s) s := s*2 WRITE(A,s) READ(B,s) s := s*2 WRITE(B,s) CSEP 544 - Fall 2017 28

Another Serial Schedule Time T1 T2 READ(A,s) s := s*2 WRITE(A,s) READ(B,s) s := s*2 WRITE(B,s) READ(A, t) t := t+100 WRITE(A, t) READ(B, t) t := t+100 WRITE(B,t) CSEP 544 - Fall 2017 29

Serializable Schedule A schedule is serializable if it is equivalent to a serial schedule CSEP 544 - Fall 2017 30

A Serializable Schedule T1 READ(A, t) t := t+100 WRITE(A, t) READ(B, t) t := t+100 WRITE(B,t) This is a serializable schedule. This is NOT a serial schedule T2 READ(A,s) s := s*2 WRITE(A,s) READ(B,s) s := s*2 WRITE(B,s) CSEP 544 - Fall 2017 31

A Non-Serializable Schedule T1 READ(A, t) t := t+100 WRITE(A, t) READ(B, t) t := t+100 WRITE(B,t) T2 READ(A,s) s := s*2 WRITE(A,s) READ(B,s) s := s*2 WRITE(B,s) CSEP 544 - Fall 2017 32

How do We Know if a Schedule is Serializable? Notation: T 1 : r 1 (A); w 1 (A); r 1 (B); w 1 (B) T 2 : r 2 (A); w 2 (A); r 2 (B); w 2 (B) Key Idea: Focus on conflicting operations CSEP 544 - Fall 2017 33

Conflicts Write-Read WR Read-Write RW Write-Write WW Read-Read? CSEP 544 - Fall 2017 34

Conflict Serializability Conflicts: (i.e., swapping will change program behavior) Two actions by same transaction T i : r i (X); w i (Y) Two writes by T i, T j to same element w i (X); w j (X) w i (X); r j (X) Read/write by T i, T j to same element r i (X); w j (X) CSEP 544 - Fall 2017 35

Conflict Serializability A schedule is conflict serializable if it can be transformed into a serial schedule by a series of swappings of adjacent non-conflicting actions Every conflict-serializable schedule is serializable CSEP 544 - Fall 2017 36

Conflict Serializability Example: r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B) CSEP 544 - Fall 2017 37

Example: Conflict Serializability r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B) r 1 (A); w 1 (A); r 1 (B); w 1 (B); r 2 (A); w 2 (A); r 2 (B); w 2 (B) CSEP 544 - Fall 2017 38

Example: Conflict Serializability r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B) r 1 (A); w 1 (A); r 1 (B); w 1 (B); r 2 (A); w 2 (A); r 2 (B); w 2 (B) CSEP 544 - Fall 2017 39

Example: Conflict Serializability r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B) r 1 (A); w 1 (A); r 2 (A); r 1 (B); w 2 (A); w 1 (B); r 2 (B); w 2 (B) r 1 (A); w 1 (A); r 1 (B); w 1 (B); r 2 (A); w 2 (A); r 2 (B); w 2 (B) CSEP 544 - Fall 2017 40

Example: Conflict Serializability r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B) r 1 (A); w 1 (A); r 2 (A); r 1 (B); w 2 (A); w 1 (B); r 2 (B); w 2 (B) r 1 (A); w 1 (A); r 1 (B); r 2 (A); w 2 (A); w 1 (B); r 2 (B); w 2 (B). r 1 (A); w 1 (A); r 1 (B); w 1 (B); r 2 (A); w 2 (A); r 2 (B); w 2 (B) CSEP 544 - Fall 2017 41

Testing for Conflict-Serializability Precedence graph: A node for each transaction T i, An edge from T i to T j whenever an action in T i conflicts with, and comes before an action in T j The schedule is conflict-serializable iff the precedence graph is acyclic CSEP 544 - Fall 2017 42

Example 1 r 2 (A); r 1 (B); w 2 (A); r 3 (A); w 1 (B); w 3 (A); r 2 (B); w 2 (B) 1 2 3 CSEP 544 - Fall 2017 43

Example 1 r 2 (A); r 1 (B); w 2 (A); r 3 (A); w 1 (B); w 3 (A); r 2 (B); w 2 (B) B A 1 2 3 This schedule is conflict-serializable CSEP 544 - Fall 2017 44

Example 2 r 2 (A); r 1 (B); w 2 (A); r 2 (B); r 3 (A); w 1 (B); w 3 (A); w 2 (B) 1 2 3 CSEP 544 - Fall 2017 45

Example 2 r 2 (A); r 1 (B); w 2 (A); r 2 (B); r 3 (A); w 1 (B); w 3 (A); w 2 (B) B 1 A 2 B 3 This schedule is NOT conflict-serializable CSEP 544 - Fall 2017 46

Course Eval http://bit.do/544eval CSEP 544 - Fall 2017 47

Implementing Transactions CSEP 544 - Fall 2017 48

Scheduler Scheduler = the module that schedules the transaction s actions, ensuring serializability Also called Concurrency Control Manager We discuss next how a scheduler may be implemented CSEP 544 - Fall 2017 49

Implementing a Scheduler Major differences between database vendors Locking Scheduler Aka pessimistic concurrency control SQLite, SQL Server, DB2 Multiversion Concurrency Control (MVCC) Aka optimistic concurrency control Postgres, Oracle We discuss only locking schedulers in this class CSEP 544 - Fall 2017 50

Locking Scheduler Simple idea: Each element has a unique lock Each transaction must first acquire the lock before reading/writing that element If the lock is taken by another transaction, then wait The transaction must release the lock(s) By using locks scheduler CSEP 544 ensures - Fall 2017 conflict-serializability 51

What Data Elements are Locked? Major differences between vendors: Lock on the entire database SQLite Lock on individual records SQL Server, DB2, etc CSEP 544 - Fall 2017 52

More Notations L i (A) = transaction T i acquires lock for element A U i (A) = transaction T i releases lock for element A CSEP 544 - Fall 2017 53

A Non-Serializable Schedule T1 READ(A) A := A+100 WRITE(A) READ(B) B := B+100 WRITE(B) T2 READ(A) A := A*2 WRITE(A) READ(B) B := B*2 WRITE(B) CSEP 544 - Fall 2017 54

A Serializable Schedule T1 READ(A, t) A := A+100 WRITE(A) READ(B) B := B+100 WRITE(B) T2 READ(A) A := A*2 WRITE(A) READ(B) B := B*2 WRITE(B) CSEP 544 - Fall 2017 55

Enforcing Conflict-Serializability T1 L 1 (A); READ(A) A := A+100 WRITE(A); U 1 (A); L 1 (B) READ(B) B := B+100 WRITE(B); U 1 (B); with Locks T2 L 2 (A); READ(A) A := A*2 WRITE(A); U 2 (A); L 2 (B); BLOCKED Scheduler has ensured a conflict-serializable schedule CSEP 544 - Fall 2017 GRANTED; READ(B) B := B*2 WRITE(B); U 2 (B); 56

But T1 L 1 (A); READ(A) A := A+100 WRITE(A); U 1 (A); L 1 (B); READ(B) B := B+100 WRITE(B); U 1 (B); T2 L 2 (A); READ(A) A := A*2 WRITE(A); U 2 (A); L 2 (B); READ(B) B := B*2 WRITE(B); U 2 (B); Locks did not enforce conflict-serializability!!! What s wrong? 57

Two Phase Locking (2PL) The 2PL rule: In every transaction, all lock requests must precede all unlock requests CSEP 544 - Fall 2017 58

Example: 2PL transactions T1 L 1 (A); L 1 (B); READ(A) A := A+100 WRITE(A); U 1 (A) READ(B) B := B+100 WRITE(B); U 1 (B); Now it is conflict-serializable T2 CSEP 544 - Fall 2017 L 2 (A); READ(A) A := A*2 WRITE(A); L 2 (B); BLOCKED GRANTED; READ(B) B := B*2 WRITE(B); U 2 (A); U 2 (B); 59

A New Problem: Non-recoverable Schedule T1 L 1 (A); L 1 (B); READ(A) A :=A+100 WRITE(A); U 1 (A) READ(B) B :=B+100 WRITE(B); U 1 (B); Rollback T2 L 2 (A); READ(A) A := A*2 WRITE(A); L 2 (B); BLOCKED GRANTED; READ(B) B := B*2 WRITE(B); U 2 (A); U 2 (B); Commit CSEP 544 - Fall 2017 60

Strict 2PL The Strict 2PL rule: All locks are held until the transaction commits or aborts. With strict 2PL, we will get schedules that are both conflict-serializable and recoverable CSEP 544 - Fall 2017 61

T1 L 1 (A); READ(A) A :=A+100 WRITE(A); L 1 (B); READ(B) B :=B+100 WRITE(B); Rollback U 1 (A);U 1 (B); Strict 2PL T2 L 2 (A); BLOCKED GRANTED; READ(A) A := A*2 WRITE(A); L 2 (B); READ(B) B := B*2 WRITE(B); Commit U 2 (A); U 2 (B); 62

Another problem: Deadlocks T 1 waits for a lock held by T 2 ; T 2 waits for a lock held by T 3 ; T 3 waits for....... T n waits for a lock held by T 1 SQL Lite: there is only one exclusive lock; thus, never deadlocks SQL Server: checks periodically for deadlocks and aborts one TXN CSEP 544 - Fall 2017 63

Lock Modes S = shared lock (for READ) X = exclusive lock (for WRITE) Lock compatibility matrix: None S X None S X CSEP 544 - Fall 2017 64

Lock Modes S = shared lock (for READ) X = exclusive lock (for WRITE) Lock compatibility matrix: None S X None S X CSEP 544 - Fall 2017 65

Lock Granularity Fine granularity locking (e.g., tuples) High concurrency High overhead in managing locks E.g., SQL Server Coarse grain locking (e.g., tables, entire database) Many false conflicts Less overhead in managing locks E.g., SQL Lite Solution: lock escalation changes granularity as needed CSEP 544 - Fall 2017 66

Lock Performance Throughput (TPS) To avoid, use admission control thrashing Why? TPS = Transactions per second # Active Transactions CSEP 544 - Fall 2017 67

Phantom Problem So far we have assumed the database to be a static collection of elements (=tuples) If tuples are inserted/deleted then the phantom problem appears CSEP 544 - Fall 2017 68

Suppose there are two blue products, A1, A2: Phantom Problem T1 SELECT * FROM Product WHERE color= blue SELECT * FROM Product WHERE color= blue T2 INSERT INTO Product(name, color) VALUES ( A3, blue ) Is this schedule serializable? CSEP 544 - Fall 2017 69

Suppose there are two blue products, A1, A2: Phantom Problem T1 SELECT * FROM Product WHERE color= blue SELECT * FROM Product WHERE color= blue T2 INSERT INTO Product(name, color) VALUES ( A3, blue ) R 1 (A1);R 1 (A2);W 2 (A3);R 1 (A1);R 1 (A2);R 1 (A3) CSEP 544 - Fall 2017 70

Suppose there are two blue products, A1, A2: Phantom Problem T1 SELECT * FROM Product WHERE color= blue SELECT * FROM Product WHERE color= blue T2 INSERT INTO Product(name, color) VALUES ( A3, blue ) R 1 (A1);R 1 (A2);W 2 (A3);R 1 (A1);R 1 (A2);R 1 (A3) W 2 (A3);R 1 (A1);R 1 (A2);R 1 (A1);R 1 (A2);R 1 (A3)

Phantom Problem A phantom is a tuple that is invisible during part of a transaction execution but not invisible during the entire execution In our example: T1: reads list of products T2: inserts a new product T1: re-reads: a new product appears! CSEP 544 - Fall 2017 72

Dealing With Phantoms Lock the entire table Lock the index entry for blue If index is available Or use predicate locks A lock on an arbitrary predicate Dealing with phantoms is expensive! CSEP 544 - Fall 2017 73

Isolation Levels in SQL 1. Dirty reads SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED 2. Committed reads SET TRANSACTION ISOLATION LEVEL READ COMMITTED 3. Repeatable reads SET TRANSACTION ISOLATION LEVEL REPEATABLE READ 4. Serializable transactions SET TRANSACTION ISOLATION LEVEL SERIALIZABLE ACID CSEP 544 - Fall 2017 74

1. Isolation Level: Dirty Reads Long duration WRITE locks Strict 2PL No READ locks Read-only transactions are never delayed Possible problems: dirty and inconsistent reads CSEP 544 - Fall 2017 75

2. Isolation Level: Read Committed Long duration WRITE locks Strict 2PL Short duration READ locks Only acquire lock while reading (not 2PL) Unrepeatable reads: When reading same element twice, may get two different values CSEP 544 - Fall 2017 76

3. Isolation Level: Repeatable Read Long duration WRITE locks Strict 2PL Long duration READ locks Strict 2PL Why? This is not serializable yet!!! CSEP 544 - Fall 2017 77

4. Isolation Level Serializable Long duration WRITE locks Strict 2PL Long duration READ locks Strict 2PL Predicate locking To deal with phantoms CSEP 544 - Fall 2017 78

Beware! In commercial DBMSs: Default level is often NOT serializable Default level differs between DBMSs Some engines support subset of levels! Serializable may not be exactly ACID Locking ensures isolation, not atomicity Also, some DBMSs do NOT use locking and different isolation levels can lead to different pbs Bottom line: Read the doc for your DBMS! CSEP 544 - Fall 2017 79

Recovery CSEP 544 - Fall 2017 80

Log-based Recovery Basics (based on textbook Ch. 17.2-3) Undo logging Redo logging CSEP 544 - Fall 2017 81

Transaction Abstraction Database is composed of elements. 1 element can be either: 1 page = physical logging 1 record = logical logging CSEP 544 - Fall 2017 82

Primitive Operations of Transactions READ(X,t) copy element X to transaction local variable t WRITE(X,t) copy transaction local variable t to element X INPUT(X) read element X to memory buffer OUTPUT(X) write element X to disk CSEP 544 - Fall 2017 83

Running Example BEGIN TRANSACTION READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t) COMMIT; CSEP 544 - Fall 2017 Initially, A=B=8. Atomicity requires that either (1) T commits and A=B=16, or (2) T does not commit and A=B=8. 84

READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t) Transaction Main memory Disk Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT

Is this bad? Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT Crash!

Is this bad? Yes it s bad: A=16, B=8. Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT Crash!

Is this bad? Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT Crash!

Is this bad? Yes it s bad: A=B=16, but not committed Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT Crash!

Is this bad? Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT Crash!

Is this bad? No: that s OK Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT Crash!

Typically, OUTPUT is after COMMIT (why?) Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 COMMIT OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16

Typically, OUTPUT is after COMMIT (why?) Action t Mem A Mem B Disk A Disk B INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 COMMIT OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 Crash!

Atomic Transactions FORCE or NO-FORCE Should all updates of a transaction be forced to disk before the transaction commits? STEAL or NO-STEAL Can an update made by an uncommitted transaction overwrite the most recent committed value of a data item on disk? CSEP 544 - Fall 2017 94

Force/No-steal FORCE: Pages of committed transactions must be forced to disk before commit NO-STEAL: Pages of uncommitted transactions cannot be written to disk Easy to implement (how?) and ensures atomicity CSEP 544 - Fall 2017 95

No-Force/Steal NO-FORCE: Pages of committed transactions need not be written to disk STEAL: Pages of uncommitted transactions may be written to disk In either case, Atomicity is violated; need WAL CSEP 544 - Fall 2017 96

Write-Ahead Log The Log: append-only file containing log records Records every single action of every TXN Force log entry to disk After a system crash, use log to recover Three types: UNDO, REDO, UNDO-REDO CSEP 544 - Fall 2017 97

UNDO Log FORCE and STEAL CSEP 544 - Fall 2017 98

Undo Logging Log records <START T> transaction T has begun <COMMIT T> T has committed <ABORT T> T has aborted <T,X,v> T has updated element X, and its old value was v CSEP 544 - Fall 2017 99

Action t Mem A Mem B Disk A Disk B UNDO Log <START T> INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,8> INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,8> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT <COMMIT T> 100

Action t Mem A Mem B Disk A Disk B UNDO Log <START T> INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,8> INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,8> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 Crash! COMMIT <COMMIT T> WHAT DO WE DO? 101

Action t Mem A Mem B Disk A Disk B UNDO Log <START T> INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,8> INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,8> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 Crash! COMMIT <COMMIT T> WHAT DO WE DO? 102 We UNDO by setting B=8 and A=8

Action t Mem A Mem B Disk A Disk B UNDO Log <START T> INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,8> INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,8> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT <COMMIT T> What do we do now? 103 Crash!

Action t Mem A Mem B Disk A Disk B UNDO Log INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 <START T> WRITE(A,t) 16 16 8 8 <T,A,8> INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,8> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT <COMMIT T> What do we do now? Crash! Nothing: log contains COMMIT

Recovery with Undo Log <T6,X6,v6> <START T5> <START T4> <T1,X1,v1> <T5,X5,v5> <T4,X4,v4> <COMMIT T5> <T3,X3,v3> <T2,X2,v2> Crash! Question1: Which updates are undone? Question 2: How far back do we need to read in the log? Question 3: What happens if there is a second crash, during recovery? 105

Action t Mem A Mem B Disk A Disk B UNDO Log When must we force pages to disk? INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 <START T> WRITE(A,t) 16 16 8 8 <T,A,8> INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,8> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 COMMIT <COMMIT T>

Action t Mem A Mem B Disk A Disk B UNDO Log <START T> INPUT(A) 8 8 8 READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,8> INPUT(B) 16 16 8 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,8> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 FORCE COMMIT <COMMIT T> RULES: log entry before OUTPUT before COMMIT 107

Undo-Logging Rules U1: If T modifies X, then <T,X,v> must be written to disk before OUTPUT(X) U2: If T commits, then OUTPUT(X) must be written to disk before <COMMIT T> Hence: OUTPUTs are done early, before the transaction commits FORCE CSEP 544 - Fall 2017 108

REDO Log NO-FORCE and NO-STEAL CSEP 544 - Fall 2017 109

Is this bad? Action t Mem A Mem B Disk A Disk B READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 COMMIT OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 Crash! 110

Is this bad? Yes, it s bad: A=16, B=8 Action t Mem A Mem B Disk A Disk B READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 COMMIT OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 Crash! 111

Is this bad? Action t Mem A Mem B Disk A Disk B READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 COMMIT OUTPUT(A) 16 16 16 16 8 Crash! OUTPUT(B) 16 16 16 16 16 112

Is this bad? Yes, it s bad: lost update Action t Mem A Mem B Disk A Disk B READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 COMMIT OUTPUT(A) 16 16 16 16 8 Crash! OUTPUT(B) 16 16 16 16 16 113

Is this bad? Action t Mem A Mem B Disk A Disk B READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 COMMIT Crash! OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 114

Is this bad? No: that s OK. Action t Mem A Mem B Disk A Disk B READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 COMMIT Crash! OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 115

Redo Logging One minor change to the undo log: <T,X,v>= T has updated element X, and its new value is v CSEP 544 - Fall 2017 116

Action t Mem A Mem B Disk A Disk B REDO Log <START T> READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,16> READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,16> COMMIT <COMMIT T> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 117

Action t Mem A Mem B Disk A Disk B REDO Log <START T> READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,16> READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,16> COMMIT <COMMIT T> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 Crash! How do we recover? 118

Action t Mem A Mem B Disk A Disk B REDO Log <START T> READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,16> READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,16> COMMIT <COMMIT T> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 Crash! How do we recover? We REDO by setting A=16 and 119 B=16

Recovery with Redo Log <START T1> <T1,X1,v1> <START T2> <T2, X2, v2> <START T3> <T1,X3,v3> <COMMIT T2> <T3,X4,v4> <T1,X5,v5> Crash! Show actions during recovery CSEP 544 - Fall 2017 120

Action t Mem A Mem B Disk A Disk B REDO Log When must we force pages to disk? READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 <START T> WRITE(A,t) 16 16 8 8 <T,A,16> READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,16> COMMIT <COMMIT T> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 121

Action t Mem A Mem B Disk A Disk B REDO Log <START T> READ(A,t) 8 8 8 8 t:=t*2 16 8 8 8 WRITE(A,t) 16 16 8 8 <T,A,16> READ(B,t) 8 16 8 8 8 t:=t*2 16 16 8 8 8 WRITE(B,t) 16 16 16 8 8 <T,B,16> NO-STEAL COMMIT <COMMIT T> OUTPUT(A) 16 16 16 16 8 OUTPUT(B) 16 16 16 16 16 RULE: OUTPUT after COMMIT 122

Redo-Logging Rules R1: If T modifies X, then both <T,X,v> and <COMMIT T> must be written to disk before OUTPUT(X) NO-STEAL Hence: OUTPUTs are done late CSEP 544 - Fall 2017 123

Comparison Undo/Redo Undo logging: OUTPUT must be done early: Inefficient Redo logging: OUTPUT must be done late: Inflexible Compromise: ARIES (see textbook) CSEP 544 - Fall 2017 124

End of CSEP 544 Big data is here to stay Requires unique techniques / abstractions Logic (SQL) Algorithms (query processing) Conceptual modeling (FD s) Transactions Technology evolving rapidly, but Techniques/abstracts persist over may years, e.g. What goes around comes around CSEP 544 - Fall 2017 125