Transactions. Silberschatz, Korth and Sudarshan

Similar documents
Transactions. Prepared By: Neeraj Mangla

Chapter 15: Transactions

Chapter 15: Transactions

Transactions These slides are a modified version of the slides of the book Database System Concepts (Chapter 15), 5th Ed

ICOM 5016 Database Systems. Chapter 15: Transactions. Transaction Concept. Chapter 15: Transactions. Transactions

Chapter 13: Transactions

BİL 354 Veritabanı Sistemleri. Transaction (Hareket)

Transaction Concept. Two main issues to deal with:

Transactions. Lecture 8. Transactions. ACID Properties. Transaction Concept. Example of Fund Transfer. Example of Fund Transfer (Cont.

Chapter 14: Transactions

Transaction Concept. Chapter 15: Transactions. Example of Fund Transfer. ACID Properties. Example of Fund Transfer (Cont.)

Database System Concepts

Advanced Databases (SE487) Prince Sultan University College of Computer and Information Sciences. Dr. Anis Koubaa. Spring 2014

Chapter 14: Transactions

Chapter 9: Transactions

CSIT5300: Advanced Database Systems

Roadmap of This Lecture

Database System Concepts

References. Transaction Management. Database Administration and Tuning 2012/2013. Chpt 14 Silberchatz Chpt 16 Raghu

CS322: Database Systems Transactions

CHAPTER: TRANSACTIONS

Database Management Systems 2010/11

Database System Concepts

Advanced Databases. Transactions. Nikolaus Augsten. FB Computerwissenschaften Universität Salzburg

Lecture 20 Transactions

Transaction Management. Chapter 14

Transactions and Concurrency Control. Dr. Philip Cannata

Introduction Conflict Serializability. Testing for Serializability Applications Scope of Research

Transactions. Chapter 15. New Chapter. CS 2550 / Spring 2006 Principles of Database Systems. Roadmap. Concept of Transaction.

Database Management Systems

UNIT 4 TRANSACTIONS. Objective

Introduction to Transaction Management

DATABASE DESIGN I - 1DL300

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

CS122 Lecture 19 Winter Term,

Transactions. ACID Properties of Transactions. Atomicity - all or nothing property - Fully performed or not at all

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9

UNIT-IV TRANSACTION PROCESSING CONCEPTS

Comp 5311 Database Management Systems. 14. Timestamp-based Protocols

Transaction Management

TRANSACTION PROCESSING CONCEPTS

UNIT IV TRANSACTION MANAGEMENT

CSIT5300: Advanced Database Systems


Intro to DB CHAPTER 15 TRANSACTION MNGMNT

Transaction Processing: Concurrency Control ACID. Transaction in SQL. CPS 216 Advanced Database Systems. (Implicit beginning of transaction)

CS122 Lecture 15 Winter Term,

CSE 344 MARCH 9 TH TRANSACTIONS

Graph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d 1, d 2,..., d h } of all data items.

Database Recovery. Dr. Bassam Hammo

Transaction Management & Concurrency Control. CS 377: Database Systems

14.1 Answer: 14.2 Answer: 14.3 Answer: 14.4 Answer:

CS352 Lecture - The Transaction Concept

Announcements. Motivating Example. Transaction ROLLBACK. Motivating Example. CSE 444: Database Internals. Lab 2 extended until Monday

Transactions and Concurrency Control

CPS352 Lecture - The Transaction Concept

L i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A

Chapter 22. Transaction Management

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Introduction to Data Management CSE 414

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Announcements. Transaction. Motivating Example. Motivating Example. Transactions. CSE 444: Database Internals

CS352 Lecture - Concurrency

! A lock is a mechanism to control concurrent access to a data item! Data items can be locked in two modes :

2 nd Semester 2009/2010

CS352 Lecture - Concurrency

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery

CSE 444: Database Internals. Lectures 13 Transaction Schedules

CSE 444: Database Internals. Lectures Transactions

Concurrency Control. Concurrency Control Ensures interleaving of operations amongst concurrent transactions result in serializable schedules

Intro to Transactions

Q.1 Short Questions Marks 1. New fields can be added to the created table by using command. a) ALTER b) SELECT c) CREATE. D. UPDATE.

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Transactions and Concurrency Control

Lock Granularity and Consistency Levels (Lecture 7, cs262a) Ali Ghodsi and Ion Stoica, UC Berkeley February 7, 2018

CSE 344 MARCH 25 TH ISOLATION

UNIT 3 UNIT 3. Transaction Management and Concurrency Control, Performance tuning and query optimization of SQL and NoSQL Databases.

Database Systems CSE 414

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems

Database Tuning and Physical Design: Execution of Transactions

Introduction to Transaction Management

CSE 344 MARCH 21 ST TRANSACTIONS

Chapter 12 : Concurrency Control

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #17: Transac0ons 1: Intro. to ACID

Page 1. CS194-3/CS16x Introduction to Systems. Lecture 8. Database concurrency control, Serializability, conflict serializability, 2PL and strict 2PL

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #6: Transac/ons 1: Intro. to ACID

Motivating Example. Motivating Example. Transaction ROLLBACK. Transactions. CSE 444: Database Internals

Introduction to Data Management CSE 344

Database Management System 20

Transactions. Juliana Freire. Some slides adapted from L. Delcambre, R. Ramakrishnan, G. Lindstrom, J. Ullman and Silberschatz, Korth and Sudarshan

Chapter 15 : Concurrency Control

Fundamentals of Database Systems

Database Systems CSE 414

Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed McGraw-Hill by

ECE 650 Systems Programming & Engineering. Spring 2018

Databases - Transactions

Introduction to Transaction Processing Concepts and Theory

Concurrency Control Overview. COSC 404 Database System Implementation. Concurrency Control. Lock-Based Protocols. Lock-Based Protocols (2)

Weak Levels of Consistency

mywbut.com Concurrency Control

Transcription:

Transactions Transaction Concept ACID Properties Transaction State Concurrent Executions Serializability Recoverability Implementation of Isolation Transaction Definition in SQL Testing for Serializability. 1

Transaction Concept A transaction is a unit of program execution that accesses and possibly updates various data items. Less formally, a transaction typically consists of a collection of operations that someone, e.g., a programmer, wants to execute. Up until now, we have only considered individual queries, but most non-trivial database applications perform more sophisticated, longer-running transactions. Some implications: A transaction will put a database in an inconsistent state during its execution, at least temporarily. The probability of system failure during a transaction is non-trivial. 2

Transaction Concept Additionally, DBMSs support concurrent execution of queries/transactions. Additional implications: Multiple transactions may require access to the same data at the same time. One transaction might want to access data being modified by another. 3

ACID Properties General DBMS transaction requirements: A database should not get screwed up, or made inconsistent, by an individual transaction. Transactions should execute quickly. More specifically: After a transaction completes, the database must be in a consistent state. After a transaction completes, the changes it has made must persist. Failures must not corrupt the database, even during transaction execution. A transaction must see a consistent database. Multiple transactions should be able to execute in parallel. 4

ACID Properties More formally, a DBMS must guarantee the ACID properties: Atomicity Consistency Isolation Durability Atomicity - All operations of a transaction are completed successfully or none are. Consistency - Transaction execution in isolation preserves database consistency. Isolation - Multiple transactions executing concurrently, must be unaware of other each other; intermediate transaction results must be hidden from other concurrently executed transactions. For every pair of transactions T i and T j, it appears to T i that either T j finished execution before T i started, or T j started after T i finished. Durability - After a transaction completes successfully, the changes it has made to the database persist, even in the event of system failures. 5

Example of Fund Transfer Consider a transaction to transfer $50 from account A to account B: 1. begin transaction 2. read(a) // read means read the value from disk 3. A := A 50 // performed in memory; could be lost by power failure 4. write(a) // write means write the value to disk 5. read(b) 6. B := B + 50 7. write(b) 8. end transaction; Atomicity : If the system, and hence the transaction, fails after step 4 and before step 7, the DBMS must ensure that either the updates to A are not reflected in the database, or that the transaction gets finished when the system comes back up. 6

Example of Fund Transfer Consider a transaction to transfer $50 from account A to account B: 1. begin transaction 2. read(a) // read means read the value from disk 3. A := A 50 // performed in memory 4. write(a) // write means write the value to disk 5. read(b) 6. B := B + 50 7. write(b) 8. end transaction; Consistency : The sum of A and B is unchanged by the execution of the transaction, i.e., no money gets unintentionally lost. 7

Example of Fund Transfer Consider a transaction to transfer $50 from account A to account B: 1. begin transaction 2. read(a) // read means read the value from disk 3. A := A 50 // performed in memory 4. write(a) // write means write the value to disk 5. read(b) 6. B := B + 50 7. write(b) 8. end transaction; Isolation : If between steps 4 and 7, another transaction is allowed to access the partially updated database, it would see an inconsistent database (the sum A + B will be less than it should be). Isolation can be ensured trivially by running transactions serially. However, executing multiple transactions concurrently has significant benefits, as we will see later. 8

Example of Fund Transfer Consider a transaction to transfer $50 from account A to account B: 1. begin transaction 2. read(a) // read means read the value from disk 3. A := A 50 // performed in memory 4. write(a) // write means write the value to disk 5. read(b) 6. B := B + 50 7. write(b) 8. end transaction; Durability : once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist, even in the even to future system failures. 9

Transaction State (Cont.) A transaction goes through several states during its execution: After the final statement has been executed; final results have not been committed and are still not visible to other transactions. Changes are committed, i.e., they become durable and are made visible to other transactions. The initial state; a transaction stays in this state while it is executing; changes are not visible to other transactions. After the database has been restored to its state prior to the start of the transaction; transaction is killed or restarted at this point. After discovery that normal execution can no longer proceed; any changes made to the database are undone, i.e., the transaction is rolled back.. 10

Implementation of Atomicity and Durability So how are the ACID requirements implemented? Atomicity and durability are implemented by the recovery-management subsystem. A simplistic approach to recovery-management is the shadow-database scheme: A pointer called db_pointer always points to the current consistent copy of the database. All updates are made on a shadow copy of the database (active and partially committed). db_pointer is updated only after all updates have been written to disk (commit). If the transaction fails, the old copy pointed to by db_pointer is retained, and the shadow copy is deleted. 11

Implementation of Atomicity and Durability (Cont.) Notes: Extremely inefficient for large databases. Assumes that at most one modifying transaction is executing at a time. Assumes that writing db_pointer is an atomic operation (probably true). Used for some text, photo and video editors, single-user databases, and other simple programs. More sophisticated schemes will be discussed in the next chapter. 12

Concurrent Executions How about isolation? The easiest way to guarantee isolation is to execute transactions serially. From a performance perspective, this is too restrictive. - multiple transactions should be allowed to run concurrently. Increased throughput Reduced average response time 13

Schedules Given a set of transactions, where each transaction consists of a sequence of instructions, a schedule is a sequence of instructions for the transactions specified in chronological order of execution. SupposeT 1 transfers $50 from A to B, and T 2 transfers 10% from A to B. The following is a serial schedule (#1) wheret 1 is followed by T 2 : Time 14

Schedule 2 The following is also a serial schedule (#2) wheret 2 is followed by T 1 : Note that the result of the two schedules is generally different. So which is correct? 15

Schedule 2 If a particular order is required for two transactions, then that implies they should be one transaction that reflects the required order. In other words, either of the previous serial schedules is correct. They might give different results, but that s ok that s what happens in the real world; people and processes don t always coordinate. 16

Schedule 3 So what does a concurrent schedule look like? Let T 1 and T 2 be the transactions defined previously. The following concurrent schedule is not a serial schedule, but is equivalent to Schedule 1. 17

Schedule 4 The following is also a concurrent schedule, but it is not equivalent to either schedule #1 or #2: Consider what happens when A and B both initially contain $100. 18

Schedule 4 Equivalence to a serial schedule is imperative! (why?) Which one doesn t matter! 19

Schedule 2 So, in addition to concurrency, what do we require of our schedules? A schedule for a set of transactions must: Include all transaction instructions. Preserve the order in which the instructions appear in each individual transaction Additionally, it is assumed that each transaction preserves database consistency. Thus serial execution of a set of transactions preserves database consistency. 20

Serializability A (concurrent) schedule is said to be serializable if it is equivalent to a serial schedule. There are two different forms of schedule equivalence: 1. conflict serializability 2. view serializability For the sake of simplicity, operations other than reads and writes are ignored. It is assumed (conservatively) that transactions may perform arbitrary computations between reads and writes. This will result in false negatives, but not false positives huh? 21

Conflicting Instructions Instructions l i and l j of transactions T i and T j, respectively, conflict if there exists a item Q accessed by both l i and l j, and at least one of these instructions wrote Q. 1. l i = read(q), l j = read(q) - l i and l j don t conflict 2. l i = read(q), l j = write(q) - conflict 3. l i = write(q), l j = read(q) - conflict 4. l i = write(q), l j = write(q) - conflict Intuitively, a conflict between l i and l j means that their relative order of execution can make a difference. On the other hand, if l i and l j do not conflict, then their relative order of execution cannot not make a difference. 22

Conflict Serializability Let S and S be two schedules. If S can be transformed into S by a series of swaps of non-conflicting instructions, then S and S are conflict equivalent. Equivalently, S and S are said to be conflict equivalent if the order of all pairs of conflicting instructions are the same in the two schedules. We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. 23

Conflict Serializability (Cont.) Schedule A can be transformed into Schedule B (a serial schedule) where T 2 follows T 1, by series of swaps of non-conflicting instructions. Therefore Schedule A is conflict serializable. Schedule A Schedule B 24

Conflict Serializability (Cont.) Example of a schedule that is not conflict serializable: We are unable to swap instructions in the above schedule to obtain either the serial schedule < T 3, T 4 >, or the serial schedule < T 4, T 3 >. 25

Testing for Conflict Serializability Consider a schedule for a set of transactions T 1, T 2,..., T n A precedence graph for the schedule is a directed graph which has: A vertex for each transaction An edge from T i to T j if they contain conflicting instructions, and the conflicting instruction from T i accessed the data item before the conflicting instruction fromt j. If executes T i write(q) before T j executes read(q) If executes T i read(q) before T j executes write(q) If executes T i write(q) before T j executes write(q). We may label the arc by the item that was accessed. Duplicate edges may results from the above, but can be deleted. 26

Testing for Conflict Serializability Example: 27

Example Schedule (Schedule A) and Precedence Graph T 1 T 2 T 3 T 4 T 5 read(x) read(y) read(z) read(v) read(w) read(w) read(y) write(y) write(z) read(u) read(y) write(y) read(z) write(z) read(u) write(u) T 1 T 2 T 3 T 4 T 5 28

Test for Conflict Serializability Observation: A schedule is conflict serializable if and only if its precedence graph is acyclic. Cycle-detection algorithms exist which take O(n 2 )time, where n is the number of vertices in the graph. Better algorithms take O(n + e) where e is the # of edges. If a precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph. For example, a serializability order for Schedule A would be: T 5 T 1 T 3 T 2 T 4 Are there others? Special case what if each transaction operates (reads/writes) on a unique variable? 29

Test for Conflict Serializability The preceding definition of a serializable schedule is very conservative in that false negatives will be detected. In other words, schedules will be deemed non-serializable that in fact are. This leads to a more relaxed definition of serializability 30

View Serializability Let S and S be two schedules with the same set of transactions. S and S are said to be view equivalent if the following three conditions are met: 1. For each data item Q, if transaction T i reads the initial value of Q in schedule S, then transaction T i reads the initial value of Q in schedule S. 2. For each data item Q, the transaction (if any) that performs the final write(q) operation in schedule S performs the final write(q) operation in schedule S. 3. For each data item Q, if transaction T i executes read(q) in schedule S, and that value was produced by transaction T j in S, then transaction T i reads the value of Q that was produced by transaction T j in schedule S. View equivalence is also based only on reads and writes, but it considers details of those reads and writes a bit more closely: It is therefore less conservative, i.e., fewer false negatives. 31

View Serializability (Cont.) A schedule S is view-serializable if it is view-equivalent to a serial schedule. Every conflict-serializable schedule is also view-serializable (prove as an exercise). All previous conflict-serializable schedules are therefore view-serializable. However, a schedule may be view-serializable, but not conflict-serializable. 32

View Serializability (Cont.) Example #1: What serial schedule is the above view-equivalent to? Is the above schedule conflict-equivalent to that serial schedule? What is the point? 33

View Serializability (Cont.) Example #2: What serial schedule is the above view-equivalent to? Is the above schedule conflict-equivalent to that serial schedule? Every view-serializable schedule that is not conflict-serializable must contain at least one blind write (a write of a variable without first reading that variable; prove as an exercise). 34

Test for View Serializability As described, the precedence graph test for conflict serializability cannot be used to test for view serializability. The test can be modified to work for view serializability, but has cost exponential in the size of the precedence graph (see the example on the class website). In fact, the problem of checking if a schedule is view-serializable is NP-complete. Thus existence of an efficient algorithm is extremely unlikely. 35

Other Notions of Serializability Regardless, even view-serializability is too conservative, however. The schedule below produces the same outcome as the serial schedule T 1, T 5 yet is not conflict equivalent or view equivalent to it. Similarly for the schedule < T 5, T 1 > Determining such equivalence requires analysis of operations other than reads and writes; this is why we said earlier that our approach was conservative. 36

Recoverable Schedules So far we have assumed that transactions execute to completion, i.e., there are no transaction failures. What is the effect of transaction failures on concurrently running transactions? Consider the following schedule, which is conflict and, hence, view serializable. T 8 T 9 read(a) write(a) read(b) read(a) write(c) 37

Recoverable Schedules Now suppose that T 9 commits immediately after the write, but before T 8 commits. T 8 T 9 read(a) write(a) read(b) commit read(a) write(c) commit 38

Recoverable Schedules Now suppose T 8 aborts instead of committing. In that case T 9 would read (and possibly write or display) an inconsistent database state. T 8 T 9 read(a) write(a) read(b) abort read(a) write(c) commit The above schedule is said to be non-recoverable. 39

Recoverable Schedules A schedule is said to be recoverable if whenever a transaction T j reads a data item previously written by a transaction T i, the commit operation of T i appears before the commit operation of T j. Given a schedule, how can we determine if the schedule is recoverable? Label each vertex in the precedence graph with the time of the commit. If there is an edge from v i to v j, then the time on v j must be after the time on v i. The DBMS must ensure that all concurrent schedules are recoverable by imposing the above ordering on transaction commits. However, even if the DBMS does ensure that schedules are recoverable, significant rework might still occur in the context of an abort. 40

Cascading Rollbacks A cascading rollback is when a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable). If T 10 fails, T 11 and T 12 must also be rolled back. This can lead to the undoing of a significant amount of work 41

Cascadeless Schedules A schedule is said to be cascadeless if for each pair of transactions T i and T j such that T j reads a data item previously written by T i, the commit operation of T i appears before the read operation of T j. Cascading rollbacks cannot occur in a cascadeless schedule. Every cascadeless schedule is also recoverable (why?). 42

Summary so Far Focusing only on reads and writes simplifies the process, but may lead to false negatives, thereby hindering concurrency. Conflict-serializability leads to false negatives, thereby reducing concurrency. Similarly for view-serializability but not as bad as for conflict-serialiability. Requiring recoverable schedules hinders concurrency more Requiring cascade-less schedules hinders concurrency even more So why are we doing this? Aren t we trying to encourage concurrency? We are trying to restrict concurrent schedules in order to guarantee isolation and recoverability, while at the same time maximizing concurrency. 43

Weak Levels of Consistency Some applications are willing to live with weaker levels of isolation, allowing schedules that are not serializable: A read-only transaction that calculates an approximate total account balance. Database statistics for query optimization can be approximate. Most DBMSs allow the user to select the level of isolation. A given DMBS, however, may choose to execute a query at a higher level of isolation. 44

Levels of Consistency in SQL A dirty read occurs when a transaction reads a value from a row that has been modified by another transaction but has not yet been committed. A non-repeatable read occurs when a transaction reads the same record more than once, and gets different (possibly committed) values. A phantom read occurs when a transaction issues the same select query twice, and gets a different set of (possibly committed) rows each time. 45

Levels of Consistency in SQL Conflict serializable: No dirty reads. No non-repeatable reads. No phantom reads. Repeatable read: No dirty reads. No non-repeatable reads. Allows phantom reads. 46

Levels of Consistency in SQL Read committed: No dirty reads. Allows non-repeatable reads. Allows phantom reads. Read uncommitted: Allows dirty reads. Allows non-repeatable reads. Allows phantom reads. 47

Concurrency Control A DBMS must provide a mechanism that will ensure that all schedules are: Serializable (at some level) Recoverable, and preferably cascadeless. Testing a schedule for serializability after it has executed is a little too late! Concurrency-control protocols allow concurrent schedules, but ensure that the schedules are conflict/view serializable, and are recoverable and cascadeless. Without building or examining the precedence graph. Different concurrency control protocols provide different tradeoffs between the amount of concurrency they allow and the amount of overhead that they incur. 48

Concurrency Control By the way a policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency Are serial schedules recoverable/cascadeless? 49

End of Chapter 50