CS 347 Parallel and Distributed Data Processing

Similar documents
CS 347 Parallel and Distributed Data Processing

Concurrency control in Homogeneous Distributed Databases (2)

Distributed Databases

Concurrency. Consider two ATMs running in parallel. We need a concurrency manager. r1[x] x:=x-250 r2[x] x:=x-250 w[x] commit w[x] commit

Implementing Isolation

Transaction Processing Concurrency control

Multi-User-Synchronization

Concurrency Control! Snapshot isolation" q How to ensure serializability and recoverability? " q Lock-Based Protocols" q Other Protocols"

Transactions and Concurrency Control

CS54200: Distributed Database Systems

Database Tuning and Physical Design: Execution of Transactions

Time stamp ordering. 5 Non-locking concurrency Control

Concurrency Control 9-1

Lecture 22 Concurrency Control Part 2

TRANSACTION MANAGEMENT

Chapter 5. Concurrency Control Techniques. Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2006)

Chapter 15 : Concurrency Control

Chapter 9: Concurrency Control

Distributed Transaction Management

Datenbanksysteme II: Implementation of Database Systems Synchronization of Concurrent Transactions

CS 370 Concurrency worksheet. T1:R(X); T2:W(Y); T3:R(X); T2:R(X); T2:R(Z); T2:Commit; T3:W(X); T3:Commit; T1:W(Y); Commit

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

For more Articles Go To: Whatisdbms.com CONCURRENCY CONTROL PROTOCOL

Transaction Processing: Basics - Transactions

Intro to Transactions

Transaction Processing: Concurrency Control ACID. Transaction in SQL. CPS 216 Advanced Database Systems. (Implicit beginning of transaction)

The Issue. Implementing Isolation. Transaction Schedule. Isolation. Schedule. Schedule

Database Management Systems

CS 5300 module6. Problem #1 (10 Points) a) Consider the three transactions T1, T2, and T3, and the schedules S1 and S2.

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 18-1

Part III Transactions

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems

A lock is a mechanism to control concurrent access to a data item Data items can be locked in two modes:

3. Concurrency Control for Transactions Part One

Concurrency Control. Chapter 17. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Concurrency Control. Concurrency Control Ensures interleaving of operations amongst concurrent transactions result in serializable schedules

Chapter 13 : Concurrency Control

Chapter 18 Concurrency Control Techniques

Transaction in Distributed Databases

Concurrency Control. Chapter 17. Comp 521 Files and Databases Spring

Transaction Management and Concurrency Control. Chapter 16, 17

Transactions. Concurrency. Consistency ACID. Modeling transactions. Transactions and Concurrency Control

Concurrency Control. Transaction Management. Lost Update Problem. Need for Concurrency Control. Concurrency control

transaction - (another def) - the execution of a program that accesses or changes the contents of the database

CSC 261/461 Database Systems Lecture 24

Page 1. Goals of Today s Lecture. The ACID properties of Transactions. Transactions

Comp 5311 Database Management Systems. 14. Timestamp-based Protocols

! A lock is a mechanism to control concurrent access to a data item! Data items can be locked in two modes :

Concurrency Control / Serializability Theory

CMSC 461 Final Exam Study Guide

CSIT5300: Advanced Database Systems

Concurrency Control. [R&G] Chapter 17 CS432 1

Goal of Concurrency Control. Concurrency Control. Example. Solution 1. Solution 2. Solution 3

Transactions: Definition. Concurrent transactions: Problems

Distributed Databases Systems

Datenbanksysteme II: Synchronization of Concurrent Transactions. Ulf Leser

Exercises on Concurrency Control (part 2)

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

Database Technology. Topic 9: Concurrency Control

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

Concurrency Control in Distributed Systems. ECE 677 University of Arizona

Synchronization. Clock Synchronization

Transaction Management: Introduction (Chap. 16)

Chapter 6 Distributed Concurrency Control

Multiversion Schemes to achieve Snapshot Isolation Timestamped Based Protocols

Transaction Management. Pearson Education Limited 1995, 2005

Concurrency Control CHAPTER 17 SINA MERAJI

Transactional Information Systems:


Concurrency Control Techniques

Concurrency Control. Conflict Serializable Schedules. Example. Chapter 17

Graph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d 1, d 2,..., d h } of all data items.

Review. Review. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Lecture #21: Concurrency Control (R&G ch.

Transaction Management: Concurrency Control

Concurrency control 12/1/17

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Lecture 21 Concurrency Control Part 1

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Last Class. Faloutsos/Pavlo CMU /615

Lecture 21. Lecture 21: Concurrency & Locking

Concurrency Control. R &G - Chapter 19

Transaction. Primary unit of work Well formed: Sequence of operations between begin and end (commit or abort)

Advanced Databases. Lecture 9- Concurrency Control (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

0: BEGIN TRANSACTION 1: W = 1 2: X = W + 1 3: Y = X * 2 4: COMMIT TRANSACTION

Chapter 10 : Concurrency Control

230 Chapter 17. (c) Otherwise, T writes O and WTS(O) is set to TS(T).

Transaction Systems. Andrey Gubichev. October 21, Exercise Session 1

Synchronization. Chapter 5

CHAPTER: TRANSACTIONS

VSR, CSR, 2PL, TS, TS-Multi

Database systems. Database: a collection of shared data objects (d1, d2, dn) that can be accessed by users

DB2 Lecture 10 Concurrency Control

Control. CS432: Distributed Systems Spring 2017

Page 1. Goals of Todayʼs Lecture" Two Key Questions" Goals of Transaction Scheduling"

T ransaction Management 4/23/2018 1

CSE 444: Database Internals. Lectures 13 Transaction Schedules

Chapter 12 : Concurrency Control

Concurrent & Distributed Systems Supervision Exercises

Page 1. Goals of Today s Lecture" Two Key Questions" Goals of Transaction Scheduling"

Lock-based concurrency control. Q: What if access patterns rarely, if ever, conflict? Serializability. Concurrency Control II (OCC, MVCC)

Transcription:

CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 5: Concurrency Control Topics Data Database design Queries Decomposition Localization Optimization Transactions Concurrency control Reliability Replication CS 347 Notes 5 2 Transactions Programs with database accesses Always end in commit or abort Transactions Programs with database accesses Always end in commit or abort Properties Atomicity Consistency Isolation Durability CS 347 Notes 5 3 CS 347 Notes 5 4

Transactions Programs with database accesses Always end in commit or abort Concurrency control Properties Atomicity Consistency Isolation Durability Reliability Concurrency Control Synchronization primitives Mutual exclusive access Execution ordering Pessimistic Optimistic CS 347 Notes 5 5 CS 347 Notes 5 6 Concurrency Control Schedules and serializability Locking Timestamp ordering Schedule A schedule represents how a set of transactions are executed Just like in a centralized system Schedules may be good (i.e., preserve constraints) or bad CS 347 Notes 5 7 CS 347 Notes 5 8

Schedule Schedule Example X Y constraint: X = Y Schedule S 1 () a X 5 () c X 2 () X a+100 6 () X 2c 3 () b Y 7 () d Y 4 () Y b+100 8 () Y 2d Precedence relation 1 () a X 2 () X a+100 5 () c X 3 () b Y 6 () X 2c 4 () Y b+100 7 () d Y 8 () Y 2d Precedence Intra-transaction Inter-transaction If X = Y = 0 initially, X = Y = 200 at the end CS 347 Notes 5 9 CS 347 Notes 5 10 Schedule Schedule S 1 () a X 2 () X a+100 5 () c X 3 () b Y 6 () X 2c 4 () Y b+100 7 () d Y 8 () Y 2d If X = Y = 0 initially, X = Y = 200 at the end CS 347 Notes 5 11 Precedence Intra-transaction Inter-transaction Other, schedules with different outcomes? Schedule Formal definition Let T= {,,, T n } be a set of transactions A schedule S over T is a partial order with ordering relation < S where: 1) S = T i i = 1 2) < S i = 1 < T n n i 3) For any two conflicting operations p, q S, either p < S q or q < S p On same data, at least one is a write CS 347 Notes 5 12

Schedule Example r 1[X] w 1[X] T 3 S r 2[X] w 2[Y] w 2[X] r 3[X] w 3[X] w 3[Y] w 3[Z] r 2[X] w 2[Y] w 2[X] r 3[X] w 3[X] w 3[Y] w 3[Z] r 1[X] w 1[X] Schedule Precedence graph The precedence graph P(S) for some schedule is a directed graph Nodes the transactions in S Edges T i T j is an edge iff p T i, q T j such that p, q conflict and p < S q CS 347 Notes 5 13 CS 347 Notes 5 14 Schedule Serializability Example r 3[X] w 3[X] Theorem A schedule S is (conflict-)serializable iff P(S) is acyclic S r 1[X] w 1[X] w 1[Y] r 2[X] w 2[Y] P(S) T 3 T2 T1 because r2[x] w1[x], w2[y] w1[y] T 3 because w 1[X] r 3[X], w 1[X] w 3[X] T2 T3 because r2[x] w3[x] CS 347 Notes 5 15 CS 347 Notes 5 16

Serializability Enforcement Locks Timestamps Locking Just like in a centralized system But with multiple lock managers scheduler 1 D 1 node 1 D1 locks... scheduler 2 D 2 node 2 D2 locks CS 347 Notes 5 17 CS 347 Notes 5 18 Locking Just like in a centralized system But with multiple lock managers Locking Reminder Using locks alone does not guarantee serializability scheduler 1 D 1 D1 locks... scheduler 2 D 2 D2 locks node 1 access & lock data T access & lock data release all locks at the end node 2 CS 347 Notes 5 19 CS 347 Notes 5 20

Locking Reminder Using locks alone does not guarantee serializability L(X) 1 () a X 2 R( X) () X a+100 L(X) L(Y) 3 () c X 5 () d Y 4 R( X) () X 2c 6 R( Y) () Y 2d L(Y) 7 () b Y 8 () Y b+100 R( Y) CS 347 Notes 5 21 Locking L(X) 1 () a X 2 R( X) () X a+100 L(X) L(Y) 3 () c X 5 () d Y 4 R( X) () X 2c 6 R( Y) () Y 2d L(Y) 7 () b Y 8 () Y b+100 R( Y) If X = Y = 0 initially, X = 200 and Y = 100 at the end serializable CS 347 Notes 5 22 Two-Phase Locking One solution locks Two-Phase Locking One solution locks time time May lead to cascading aborts CS 347 Notes 5 23 CS 347 Notes 5 24

Strict Two-Phase Locking Locking with Shared Memory locks P P M Only release on commit or abort to avoid cascading aborts time Where does lock table live? How do we avoid race conditions? CS 347 Notes 5 25 CS 347 Notes 5 26 Locking with Shared Disk Locking with Shared Disk M Where does lock table live? How do we avoid race conditions? P M P Fixed partition Dynamic partition For each data item X need to have lock table entry L(X) For each L(X) need to know current owner processor P(X) Replicate P(X) at all processors CS 347 Notes 5 27 CS 347 Notes 5 28

Deadlocks Remember, 2PL may lead to deadlocks Deadlocks Even if nodes check WFG locally, global deadlocks are possible L(X), r(x), L(Y) L(Y), r(y), L(X) Need to avoid cycles in wait-for graph (WFG) between transactions Many deadlock solutions Detection vs. prevention Timeouts Wait-die Wound-wait Local WFG: No cycles Local WFG: No cycles CS 347 Notes 5 29 CS 347 Notes 6 30 Deadlocks Need to combine local WFG to discover global deadlocks Timestamp Ordering Assign timestamp when transaction begins Local WFG: No cycles Local WFG: No cycles If ts() < ts() < < ts(t n) then scheduler produces history equivalent to,,..., T n E.g., at central detection node CS 347 Notes 6 31 CS 347 Notes 5 32

Timestamp Ordering Rule If p i[x] and q j[x] are conflicting operations then p i[x] < S q j[x] iff ts(t i) < ts (T j) Timestamp Ordering Example Non-serializable schedule S () a X () d Y () X a+100 () Y 2d () c X () b Y () X 2c () Y b+100 CS 347 Notes 5 33 CS 347 Notes 5 34 Timestamp Ordering Timestamp Ordering Example Non-serializable schedule S Example Non-serializable schedule S ts ( T1 ) < ts ( T2 ) ts ( T1 ) < ts ( T2 ) () a X () d Y () X a+100 () Y 2d () c X () b Y () X 2c () Y b+100 reject! () a X () d Y () X a+100 () Y 2d () c X () b Y () X 2c () Y b+100 reject! abort T1 abort T1 abort T2 abort T2 CS 347 Notes 5 35 CS 347 Notes 5 36

Strict TO Lock written items until it is certain that writing transaction has been successful Avoid cascading aborts Strict TO Example Non-serializable schedule S ts ( T1 ) < ts ( T2 ) () a X () d Y () X a+100 () Y 2d () c X delay () b Y abort T1 reject! CS 347 Notes 5 37 CS 347 Notes 5 38 Strict TO TO Scheduler Example Non-serializable schedule S X data item ts ( T1 ) < ts ( T2 ) () a X () d Y () X a+100 () Y 2d () c X delay () b Y reject! mtsr[x] maximum timestamp of a transaction that read X mtsw[x] maximum timestamp of a transaction that wrote X nr[x] number of transactions currently reading X (= 0, 1, 2, ) nw[x] number of transactions currently writing X (= 0 or 1) abort T1 () c X () X 2c abort T1 CS 347 Notes 5 39 CS 347 Notes 5 40

TO Scheduler Part 1 r i[x] arrives TO Scheduler Part 2 w i[x] arrives if ts(t i) < mtsw[x] then abort T i else if ts(t i) > mtsr[x] then mtsr[x] ts(t i) if queue is empty and nw[x] = 0 then nr[x] nr[x] + 1 initiate read X else add r i[x] to queue CS 347 Notes 5 41 if ts(t i) < mtsw[x] or ts(t i) < mtsr[x] then abort T i else mtsw[x] ts(t i) if queue is empty and nw[x] = 0 and nr[x] = 0 then nw[x] 1 write X and wait for T i to finish else add w i[x] to queue CS 347 Notes 5 42 TO Scheduler Part 3 When some operation o (read or write) on X finishes no[x] no[x] 1 not_done true while not_done o j[x] head of queue (with smallest timestamp) if o = w and nr[x] = 0 and nw[x] = 0 then remove o j[x] from queue CS 347 Notes 5 43 TO Scheduler Part 3 When some operation o (read or write) on X finishes nw[x] 1 write X and wait for T j to finish else if o = r and nw[x] = 0 then remove o j[x] from queue nr[x] nr[x] + 1 initiate read X else not_done true CS 347 Notes 5 44

TO Scheduler TO Scheduler For reads nr[x] nr[x] + 1; initiate read X If a transaction is aborted, it must be retried with a new larger timestamp For writes nw[x] 1; write X and wait for T i to finish In part 3, the end of a write is only processed when all other writes for the transaction have completed mtsr[x] = 10 mtsw[x] = 9 ts(t) = 8 read X CS 347 Notes 5 45 CS 347 Notes 5 46 TO Scheduler If a transaction is aborted, it must be retried with a new larger timestamp mtsr[x] = 10 mtsw[x] = 9 ts(t) = 8 starvation (keeps being aborted), should use ts(t) = 11 read X TO Scheduler Theorem If S is a schedule representing an execution by a TO scheduler then S is serializable Proof Assume T i T j in P(S) conflicting p i[x], q j[x] in S: p i[x] < S q j[x] ts(t i) < ts(t j) by TO rule CS 347 Notes 5 47 CS 347 Notes 5 48

TO Scheduler Assume there is a cycle... T n in P(S) ts() < ts() < < ts(t n) < ts (), which is a contradiction Hence, P(S) is acyclic S is serializable Thomas Write Rule mtsr[x] ts(t i) mtsw[x] T i wants to write X CS 347 Notes 5 49 CS 347 Notes 5 50 Thomas Write Rule w i[x] arrives Thomas Write Rule mtsr[x] if ts(t i) < mtsr[x] then abort T i else if ts(t i) < mtsw[x] ignore write else // as before ts(t i) T i wants to write X mtsw[x] Why can t we let T i go ahead? CS 347 Notes 5 51 CS 347 Notes 5 52

Thomas Write Rule T j reads X mtsr[x] TO Optimizations Update mtsr and mtsw when the action is executed, not when it is added to the queue ts(t i) T i wants to write X mtsw[x] mtsw[x] = 9 or 7? Why can t we let T i go ahead? mtsr[x] is only the latest read there could be others before X w @ ts = 9 w @ ts = 8 w @ ts = 7 active write CS 347 Notes 5 53 CS 347 Notes 5 54 TO Optimizations Use multiple versions of data Timestamp Management data mtsr mtsw X Value written @ ts = 9 r i[x] ts(t i) = 8 Value written @ ts = 7 X 1 X 2 X n Tons of space and extra IO CS 347 Notes 5 55 CS 347 Notes 5 56

Timestamp Management Timestamp cache item mtsr mtsw X min Y Z If transaction reads/writes X, add/update entry for X in cache Periodically purge items X with mtsr[x] < min, mtsw[x] < min Track min (e.g., choose min current time d) Timestamp Management Timestamp cache Enforcing TO rule for p i[x]: if X has entry in cache then use mtsr, mtsw values in cache else assume mtsr[x] = mtsw[x] = min Use hashing (on items) for cache CS 347 Notes 5 57 CS 347 Notes 5 58 2PL TO 2PL TO w 1[ Y ] r 2[ X ] r 2[ Y ] w 2[ Z ] ts() < ts() < ts(t 3) T 3 w 3[ X ] S r 2[ X ] w 3[ X ] w 1 [ Y ] r 2[ Y ] w 2[ Z ] S could be produced with TO but not with 2PL Are all 2PL schedules TO? any examples here? 2PL schedules TO schedules previous example CS 347 Notes 5 59 CS 347 Notes 5 60

Distributed TO Scheduler Distributed TO Scheduler scheduler 1 D 1 D1 ts cache... scheduler 2 D 2 D2 ts cache scheduler 1 D 1 D1 ts cache... scheduler 2 D 2 D2 ts cache node 1 node 2 node 1 access data node 2 Each scheduler is independent Signal all schedulers involved at the end of the transaction T access data CS 347 Notes 5 61 CS 347 Notes 5 62 Pessimistic vs. Optimistic Control Pessimistic Pessimistic vs. Optimistic Control Optimistic control enables more parallelism validate read (compute) write R V W R V W Optimistic R V W read (compute) validate write R V W R V W R V W CS 347 Notes 5 63 CS 347 Notes 5 64

Summary 2PL Useful in a distributed system Most popular Deadlocks still possible TO Useful in a distributed system Aborts more likely No deadlocks CS 347 Notes 5 65