Concurrency Control II and Distributed Transactions

Similar documents
Multi-version concurrency control

Multi-version concurrency control

Lock-based concurrency control. Q: What if access patterns rarely, if ever, conflict? Serializability. Concurrency Control II (OCC, MVCC)

Distributed Transactions

Spanner: Google s Globally-Distributed Database. Wilson Hsieh representing a host of authors OSDI 2012

Defining properties of transactions

Transactions. CS6450: Distributed Systems Lecture 16. Ryan Stutsman

Spanner: Google s Globally- Distributed Database

Spanner: Google s Globally- Distributed Database

Spanner : Google's Globally-Distributed Database. James Sedgwick and Kayhan Dursun

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017

Google Spanner - A Globally Distributed,

Exam 2 Review. Fall 2011

Beyond TrueTime: Using AugmentedTime for Improving Spanner

Building Consistent Transactions with Inconsistent Replication

Spanner: Google's Globally-Distributed Database* Huu-Phuc Vo August 03, 2013

(Pessimistic) Timestamp Ordering

Primary-Backup Replication

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Concurrency control 12/1/17

Corbett et al., Spanner: Google s Globally-Distributed Database

Distributed Systems. 12. Concurrency Control. Paul Krzyzanowski. Rutgers University. Fall 2017

Spanner: Google's Globally-Distributed Database. Presented by Maciej Swiech

Synchronization. Chapter 5

Distributed Data Management. Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig

Chapter 5. Concurrency Control Techniques. Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2006)

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Integrity in Distributed Databases

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Strong Consistency & CAP Theorem

Concurrency Control. Transaction Management. Lost Update Problem. Need for Concurrency Control. Concurrency control

CS 347 Parallel and Distributed Data Processing

Optimistic Concurrency Control. April 13, 2017

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Optimistic Concurrency Control. April 18, 2018

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 18-1

CS 347 Parallel and Distributed Data Processing

Chapter 18 Concurrency Control Techniques

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Byzantine Fault Tolerance

EECS 498 Introduction to Distributed Systems

Causal Consistency and Two-Phase Commit

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

Building Consistent Transactions with Inconsistent Replication

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

Concurrency Control Algorithms

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

CS5412: TRANSACTIONS (I)

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

Quiz I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Department of Electrical Engineering and Computer Science

Concurrency control CS 417. Distributed Systems CS 417

Distributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li

Chapter 9: Concurrency Control

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

CPS 512 midterm exam #1, 10/7/2016

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

Distributed Systems. GFS / HDFS / Spanner

Topics in Reliable Distributed Systems

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Control. CS432: Distributed Systems Spring 2017

Recovering from a Crash. Three-Phase Commit

Distributed Databases Systems

Distributed Transaction Management. Distributed Database System

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Concurrency Control & Recovery

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

Exam 2 Review. October 29, Paul Krzyzanowski 1

Chapter 13 : Concurrency Control

Concurrency Control Techniques

Comp 5311 Database Management Systems. 14. Timestamp-based Protocols

CS 245: Database System Principles

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Lock!= Serializability. REVISÃO: Controle de Concorrência. Two-Phase Locking Techniques

Transaction Management. Pearson Education Limited 1995, 2005

CS Reading Packet: "Transaction management, part 2"

Introduction to Database Systems CSE 444

Concurrency Control 9-1

Distributed Databases

Transactions and Concurrency Control

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

EECS 498 Introduction to Distributed Systems

CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY

CSC 261/461 Database Systems Lecture 24

Rethinking Serializable Multi-version Concurrency Control. Jose Faleiro and Daniel Abadi Yale University

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Consensus and related problems

CS 541 Database Systems. Two Phase Locking 2

An Evaluation of Distributed Concurrency Control. Harding, Aken, Pavlo and Stonebraker Presented by: Thamir Qadah For CS590-BDS

Lecture 22 Concurrency Control Part 2

Network File Systems

Synchronization. Clock Synchronization

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Graph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d 1, d 2,..., d h } of all data items.

Concurrency Control. Concurrency Control Ensures interleaving of operations amongst concurrent transactions result in serializable schedules

Transcription:

Concurrency Control II and Distributed Transactions CS 240: Computing Systems and Concurrency Lecture 18 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material.

Serializability Execution of a set of transactions over multiple items is equivalent to some serial execution of s 2

Lock-based concurrency control Big Global Lock: Results in a serial transaction schedule at the cost of performance Two-phase locking with finer-grain locks: Growing phase when acquires locks Shrinking phase when releases locks (typically commit) Allows to execute concurrently, improving performance 3

Q: What if access patterns rarely, if ever, conflict? 4

Be optimistic! Goal: Low overhead for non-conflicting s Assume success! Process transaction as if it would succeed Check for serializability only at commit time If fails, abort transaction Optimistic Concurrency Control (OCC) Higher performance when few conflicts vs. locking Lower performance when many conflicts vs. locking 5

OCC: Three-phase approach Begin: Record timestamp marking the transaction s beginning Modify phase Txn can read values of committed data items Updates only to local copies (versions) of items (in DB cache) Validate phase Commit phase If validates, transaction s updates applied to DB Otherwise, transaction restarted Care must be taken to avoid TOCTTOU issues 6

OCC: Why validation is necessary coord O New creates shadow copies of P and Q P and Q s copies at inconsistent state When commits updates, create new versions at some timestamp t P Q coord 7

OCC: Validate Phase Transaction is about to commit. System must ensure: Initial consistency: Versions of accessed objects at start consistent No conflicting concurrency: No other has committed an operation at object that conflicts with one of this s invocations Consider transaction 1. For all other s N either committed or in validation phase, one of the following holds: A. N completes commit before 1 starts modify B. 1 starts commit after N completes commit, and ReadSet 1 and WriteSet N are disjoint C. Both ReadSet 1 and WriteSet 1 are disjoint from WriteSet N, and N completes modify phase. When validating 1, first check (A), then (B), then (C). If all fail, validation fails and 1 aborted. 8

2PL & OCC = strict serialization Provides semantics as if only one transaction was running on DB at time, in serial order + Real-time guarantees 2PL: Pessimistically get all the locks first OCC: Optimistically create copies, but then recheck all read + written items before commit 9

Multi-version concurrency control Generalize use of multiple versions of objects 10

Multi-version concurrency control Maintain multiple versions of objects, each with own timestamp. Allocate correct version to reads. Prior example of MVCC: 11

Multi-version concurrency control Maintain multiple versions of objects, each with own timestamp. Allocate correct version to reads. Unlike 2PL/OCC, reads never rejected Occasionally run garbage collection to clean up 12

MVCC Intuition Split transaction into read set and write set All reads execute as if one snapshot All writes execute as if one later snapshot Yields snapshot isolation < serializability 13

Serializability vs. Snapshot isolation Intuition: Bag of marbles: ½ white, ½ black Transactions: T1: Change all white marbles to black marbles T2: Change all black marbles to white marbles Serializability (2PL, OCC) T1 T2 or T2 T1 In either case, bag is either ALL white or ALL black Snapshot isolation (MVCC) T1 T2 or T2 T1 or T1 T2 Bag is ALL white, ALL black, or ½ white ½ black 14

Timestamps in MVCC Transactions are assigned timestamps, which may get assigned to objects those s read/write Every object version O V has both read and write TS ReadTS: Largest timestamp of that reads O V WriteTS: Timestamp of that wrote O V 15

Executing transaction T in MVCC Find version of object O to read: # Determine the last version written before read snapshot time Find O V s.t. max { WriteTS(O V ) WriteTS(O V ) <= TS(T) } ReadTS(O V ) = max(ts(t), ReadTS(O V )) Return O V to T Perform write of object O or abort if conflicting: Find O V s.t. max { WriteTS(O V ) WriteTS(O V ) <= TS(T) } # Abort if another T exists and has read O after T If ReadTS(O V ) > TS(T) Abort and roll-back T Else Create new version O W Set ReadTS(O W ) = WriteTS(O W ) = TS(T) 16

Digging deeper Notation W(1) = 3: Write creates version 1 with WriteTS = 3 TS = 3 TS = 4 TS = 5 R(1) = 3: Read of version 1 returns timestamp 3 O write(o) by TS=3 17

Digging deeper Notation W(1) = 3: Write creates version 1 with WriteTS = 3 TS = 3 TS = 4 TS = 5 R(1) = 3: Read of version 1 returns timestamp 3 O W(1) = 3 R(1) = 3 write(o) by TS=5 18

Digging deeper Notation W(1) = 3: Write creates version 1 with WriteTS = 3 TS = 3 TS = 4 TS = 5 R(1) = 3: Read of version 1 returns timestamp 3 O W(1) = 3 R(1) = 3 write(o) by TS = 4 W(2) = 5 R(2) = 5 Find v such that max WriteTS(v) <= (TS = 4) Þ v = 1 has (WriteTS = 3) <= 4 If ReadTS(1) > 4, abort Þ 3 > 4: false Otherwise, write object 19

Digging deeper Notation W(1) = 3: Write creates version 1 with WriteTS = 3 TS = 3 TS = 4 TS = 5 R(1) = 3: Read of version 1 returns timestamp 3 W(1) = 3 R(1) = 3 W(3) = 4 R(3) = 4 W(2) = 5 R(2) = 5 O Find v such that max WriteTS(v) <= (TS = 4) Þ v = 1 has (WriteTS = 3) <= 4 If ReadTS(1) > 4, abort Þ 3 > 4: false Otherwise, write object 20

Digging deeper Notation W(1) = 3: Write creates version 1 with WriteTS = 3 TS = 3 TS = 4 TS = 5 R(1) = 3: Read of version 1 returns timestamp 3 O W(1) = 3 R(1) = 53 BEGIN Transaction tmp = READ(O) WRITE (O, tmp + 1) END Transaction Find v such that max WriteTS(v) <= (TS = 5) Þ v = 1 has (WriteTS = 3) <= 5 Set R(1) = max(5, R(1)) = 5 21

Digging deeper Notation W(1) = 3: Write creates version 1 with WriteTS = 3 TS = 3 TS = 4 TS = 5 R(1) = 3: Read of version 1 returns timestamp 3 O W(1) = 3 R(1) = 53 BEGIN Transaction tmp = READ(O) WRITE (O, tmp + 1) END Transaction W(2) = 5 R(2) = 5 Find v such that max WriteTS(v) <= (TS = 5) Þ v = 1 has (WriteTS = 3) <= 5 If ReadTS(1) > 5, abort Þ 5 > 5: false Otherwise, write object 22

Digging deeper Notation W(1) = 3: Write creates version 1 with WriteTS = 3 TS = 3 TS = 4 TS = 5 R(1) = 3: Read of version 1 returns timestamp 3 O W(1) = 3 R(1) = 53 write(o) by TS = 4 W(2) = 5 R(2) = 5 Find v such that max WriteTS(v) <= (TS = 4) Þ v = 1 has (WriteTS = 3) <= 4 If ReadTS(1) > 4, abort Þ 5 > 4: true 23

Digging deeper Notation W(1) = 3: Write creates version 1 with WriteTS = 3 TS = 3 TS = 4 TS = 5 R(1) = 3: Read of version 1 returns timestamp 3 O W(1) = 3 R(1) = 53 BEGIN Transaction tmp = READ(O) WRITE (P, tmp + 1) END Transaction W(2) = 5 R(2) = 5 Find v such that max WriteTS(v) <= (TS = 4) Þ v = 1 has (WriteTS = 3) <= 4 Set R(1) = max(4, R(1)) = 5 Then write on P succeeds as well 24

Distributed Transactions 25

Consider partitioned data over servers O L R U P L R W U Q L W U Why not just use 2PL? Grab locks over entire read and write set Perform writes Release locks (at commit time) 26

Consider partitioned data over servers O L R U P L R W U Q L W U How do you get serializability? On single machine, single COMMIT op in the WAL In distributed setting, assign global timestamp to (at sometime after lock acquisition and before commit) Centralized manager Distributed consensus on timestamp (not all ops) 27

Strawman: Consensus per group? O L R U P L R W U Q L W U R S Single Lamport clock, consensus per group? Linearizability composes! But doesn t solve concurrent, non-overlapping problem 28

Spanner: Google s Globally- Distributed Database OSDI 2012 29

Google s Setting Dozens of zones (datacenters) Per zone, 100-1000s of servers Per server, 100-1000 partitions (tablets) Every tablet replicated for fault-tolerance (e.g., 5x) 30

Scale-out vs. fault tolerance O O O P P P Q QQ Every tablet replicated via Paxos (with leader election) So every operation within transactions across tablets actually a replicated operation within Paxos RSM Paxos groups can stretch across datacenters! (COPS took same approach within datacenter) 31

Disruptive idea: Do clocks really need to be arbitrarily unsynchronized? Can you engineer some max divergence? 32

TrueTime Global wall-clock time with bounded uncertainty TT.now() time earliest latest 2*ε Consider event e now which invoked tt = TT.new(): Guarantee: tt.earliest <= t abs (e now ) <= tt.latest 33

Timestamps and TrueTime Acquired locks Release locks T Pick s > TT.now().latest s Wait until TT.now().earliest > s Commit wait average ε average ε 34

Commit Wait and Replication Start consensus Achieve consensus Notify followers Acquired locks T Release locks Pick s Commit wait done 35

Client-driven transactions Client: 1. Issues reads to leader of each tablet group, which acquires read locks and returns most recent data 2. Locally performs writes 3. Chooses coordinator from set of leaders, initiates commit 4. Sends commit message to each leader, include identify of coordinator and buffered writes 5. Waits for commit from coordinator 36

Commit Wait and 2-Phase Commit On commit msg from client, leaders acquire local write locks If non-coordinator: Choose prepare ts > previous local timestamps Log prepare record through Paxos Notify coordinator of prepare timestamp If coordinator: Wait until hear from other participants Choose commit timestamp >= prepare ts, > local ts Logs commit record through Paxos Wait commit-wait period Sends commit timestamp to replicas, other leaders, client All apply at commit timestamp and release locks 37

Commit Wait and 2-Phase Commit Start logging Acquired locks T C Acquired locks Done logging Release locks Committed Notify participants s c Release locks T P1 Acquired locks T P2 Compute s p for each Prepared Send s p Release locks Commit wait done Compute overall s c 38

Example Remove X from friend list Risky post P T C T 2 s p = 6 s c = 8 s = 15 T P s p = 8 Remove myself from X s friend list s c = 8 Time <8 My friends My posts X s friends [X] [me] 8 [] [] 15 [P] 39

Read-only optimizations Given global timestamp, can implement read-only transactions lock-free (snapshot isolation) Step 1: Choose timestamp s read = TT.now.latest() Step 2: Snapshot read (at s read ) to each tablet Can be served by any up-to-date replica 40

Disruptive idea: Do clocks really need to be arbitrarily unsynchronized? Can you engineer some max divergence? 41

TrueTime Architecture GPS timemaster GPS timemaster GPS timemaster GPS timemaster Atomic-clock timemaster GPS timemaster Client Datacenter 1 Datacenter 2 Datacenter n Compute reference [earliest, latest] = now ± ε 42

TrueTime implementation now = reference now + local-clock offset ε = reference ε + worst-case local-clock drift = 1ms + 200 μs/sec +6ms ε 0sec 30sec 60sec 90sec time What about faulty clocks? Bad CPUs 6x more likely in 1 year of empirical data 43

Known unknowns > unknown unknowns Rethink algorithms to reason about uncertainty 44

Sunday topic: Security 45