CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Similar documents
CSE 444: Database Internals. Lecture 25 Replication

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 13 - Distribution: transactions

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Distributed Databases

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

CS 425 / ECE 428 Distributed Systems Fall 2017

Distributed System. Gang Wu. Spring,2018

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Topics in Reliable Distributed Systems

Consistency in Distributed Systems

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

CS October 2017

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Chapter 11 - Data Replication Middleware

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Databases. Laboratorio de sistemas distribuidos. Universidad Politécnica de Madrid (UPM)

Module 8 Fault Tolerance CS655! 8-1!

Lecture 17 : Distributed Transactions 11/8/2017

Distributed Commit in Asynchronous Systems

Intro to Transactions

Control. CS432: Distributed Systems Spring 2017

Replication in Distributed Systems

Module 8 - Fault Tolerance

Introduction to Database Systems CSE 444. Lecture 26: Distributed Transactions

Exam 2 Review. October 29, Paul Krzyzanowski 1

Lecture XII: Replication

Distributed Databases

Distributed Transaction Management. Distributed Database System

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

CS5412: TRANSACTIONS (I)

CS 245: Database System Principles

Exam 2 Review. Fall 2011

Transactions. Transaction. Execution of a user program in a DBMS.

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

Concurrency Control & Recovery

Atomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38

Defining properties of transactions

(Pessimistic) Timestamp Ordering

Transaction Management Overview. Transactions. Concurrency in a DBMS. Chapter 16

Goal A Distributed Transaction

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

The Implicit{Yes Vote Commit Protocol. with Delegation of Commitment. Pittsburgh, PA Pittsburgh, PA are relatively less reliable.

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Concurrency Control & Recovery

It also performs many parallelization operations like, data loading and query processing.

Causal Consistency and Two-Phase Commit

Distributed Transaction Management

CPS 512 midterm exam #1, 10/7/2016

Distributed Data Management Transactions

Distributed KIDS Labs 1

XI. Transactions CS Computer App in Business: Databases. Lecture Topics

Distributed Data Management Replication

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Synchronization. Chapter 5

Distributed Systems 8L for Part IB

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Transactions with Replicated Data. Distributed Software Systems

Problems Caused by Failures

Integrity in Distributed Databases

Homework 5 (by Tupac Shakur) Solutions Due: Monday Dec 3, 11:59pm

Database Management System

Basic vs. Reliable Multicast

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Today: Fault Tolerance. Reliable One-One Communication

Lecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo

Distributed Databases. CS347 Lecture 16 June 6, 2001

Recovering from a Crash. Three-Phase Commit

Scaling Database Systems. COSC 404 Database System Implementation Scaling Databases Distribution, Parallelism, Virtualization

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Synchronization. Clock Synchronization

Transaction Management Overview

ACID Properties. Transaction Management: Crash Recovery (Chap. 18), part 1. Motivation. Recovery Manager. Handling the Buffer Pool.

Motivating Example. Motivating Example. Transaction ROLLBACK. Transactions. CSE 444: Database Internals

416 practice questions (PQs)

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

CSE 444: Database Internals. Lectures 13 Transaction Schedules

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Cost Reduction of Replicated Data in Distributed Database System

Database Management Systems

CSE 5306 Distributed Systems. Fault Tolerance

Parallel DBs. April 25, 2017

Introduction to Data Management. Lecture #18 (Transactions)

6.033 Computer System Engineering

Paxos provides a highly available, redundant log of events

Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications

Transaction Management. Pearson Education Limited 1995, 2005

Distributed File System

PRIMARY-BACKUP REPLICATION

Transcription:

CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Transactions Main issues: Concurrency control Recovery from failures 2

Distributed Transactions 3

References C. Mohan, B. Lindsay, and R. Obermarck. Transaction Management in the R* Distributed Database Management System. ACM Transactions On Database Systems 11 (4), 1986. Also in the Red Book (3rd and 4th ed). Chapters 8 and 9 in Principles of Transaction Processing. Second Ed. Phil Bernstein and Eric Newcomer. Chapter 22 in Ramakrishnan and Gehrke 4

Distributed Transactions Need to update both partitions UPDATE Employees SET salary = salary * 1.05 WHERE level < 7 UPDATE Employees SET salary = salary * 1.03 WHERE level >= 7 Must preserve ACID! Employees (eid, salary, level) where level < 5 Employees (eid, salary, level) where level >= 5 5

Typical Architecture User doesn t know how data is distributed User Queries Coordinator Subordinate 1 Subordinate 2 Employees (eid, salary, level) where level < 5 Employees (eid, salary, level) where level >= 5 6

Distributed Transactions Concurrency control Allow multiple distributed queries execute at the same time Failure recovery Transaction must be committed at all sites or at none of the sites! No matter what failures occur and when they occur Two-phase commit protocol (2PC) 7

Distributed Concurrency Control Different techniques are possible Pessimistic, optimistic, locking, timestamps Common implementation: distributed two-phase locking Simultaneously hold locks at all sites involved Deadlock detection techniques Global wait-for graph (not very practical) Timeouts If deadlock: abort least costly local transaction How to define cost? 8

What about failures? 9

Two-Phase Commit: Motivation Coordinator 1) User decides to commit 4) Coordinator crashes 2) COMMIT 3) COMMIT Subordinate 1 What do we do now? Subordinate 2 But I already aborted! Subordinate 3 10

Two-Phase Commit Protocol One coordinator and many subordinates Phase 1: prepare All subordinates must flush tail of write-ahead log to disk before ack Must ensure that if coordinator decides to commit, they can commit! Phase 2: commit or abort Log records for 2PC include transaction and coordinator ids Coordinator also logs ids of all subordinates Principle When a process makes a decision: vote yes/no or commit/abort Or when a subordinate wants to respond to a message: ack First force-write a log record (to make sure it survives a failure) Only then send message about decision 11

2PC: Phase 1, Prepare Coordinator 1) User decides to commit 4) YES 2) PREPARE 4) YES 4) YES 2) PREPARE 2) PREPARE Subordinate 1 3) Force-write: prepare Subordinate 2 3) Force-write: prepare Subordinate 3 3) Force-write: prepare 12

2PC: Phase 2, Commit 5) Write: end, then forget transaction Coordinator 2) COMMIT Subordinate 1 1) Force-write: commit 4) ACK Transaction is now committed! 4) ACK 4) ACK 2) COMMIT 3) Force-write: commit 5) Commit transaction and forget it 2) COMMIT Subordinate 2 3) Force-write: commit 5) Commit transaction Subordinate 3 and forget it 3) Force-write: commit 5) Commit transaction and forget it 13

2PC with Abort Coordinator 1) User decides to commit 2) PREPARE 4) YES 2) PREPARE Subordinate 1 3) Force-write: prepare 4) NO 4) No 2) PREPARE Subordinate 2 3) Force-write: abort 5) Abort transaction Subordinate 3 and forget it 3) Force-write: abort 5) Abort transaction and forget it 14

2PC with Abort 5) Write: end, then forget transaction Coordinator 2) ABORT Subordinate 1 1) Force-write: abort 4) ACK 3) Force-write: abort 5) Abort transaction and forget it Subordinate 2 Subordinate 3 15

Coordinator State Machine All states involve waiting for messages INIT Receive: Commit Send: Prepare COLLECTING R: No votes FW: Abort S: Abort R: Yes votes FW: Commit S: Commit ABORTING COMMITTING R: ACKS W: End Forget END R: ACKS W: End Forget

Subordinate State Machine INIT and PREPARED involve waiting R: Prepare FW: Abort S: No vote INIT R: Prepare FW: Prepare S: Yes vote R: Abort FW: Abort S: Ack PREPARED R: Commit FW: Commit S: Ack ABORTING COMMITTING Abort and forget Commit and forget 17

Handling Site Failures Approach 1: no site failure detection Can only do retrying & blocking Approach 2: timeouts Since unilateral abort is ok, Subordinate can timeout in init state Coordinator can timeout in collecting state Prepared state is still blocking 2PC is a blocking protocol 18

Site Failure Handling Principles Retry mechanism In prepared state, periodically query coordinator In committing/aborting state, periodically resend messages to subordinates If doesn't know anything about transaction respond abort to inquiry messages about fate of transaction If there are no log records for a transaction after a crash then abort transaction and forget it 19

Site Failure Scenarios R: No votes FW: Abort S: Abort INIT COLLECTING Receive: Commit Send: Prepare R: Yes votes FW: Commit S: Commit R: Prepare FW: Abort S: No vote R: Abort FW: Abort S: Ack INIT PREPARED R: Prepare FW: Prepare S: Yes vote R: Commit FW: Commit S: Ack ABORTING COMMITTING ABORTING COMMITTING R: ACKS W: End END R: ACKS W: End Abort and forget Commit and forget 20

Observations Coordinator keeps transaction in transactions table until it receives all acks To ensure subordinates know to commit or abort So acks enable coordinator to forget about transaction Read-only transactions: no changes ever need to be undone nor redone After crash, if recovery process finds no log records for a transaction, the transaction is presumed to have aborted 21

Presumed Abort Protocol Optimization goals Fewer messages and fewer force-writes Principle If nothing known about a transaction, assume ABORT Aborting transactions need no force-writing Avoid log records for read-only transactions Reply with a READ vote instead of YES vote Optimizes read-only transactions 22

2PC State Machines (repeat) R: No votes FW: Abort S: Abort INIT COLLECTING Receive: Commit Send: Prepare R: Yes votes FW: Commit S: Commit R: Prepare FW: Abort S: No vote R: Abort FW: Abort S: Ack INIT PREPARED R: Prepare FW: Prepare S: Yes vote R: Commit FW: Commit S: Ack ABORTING COMMITTING ABORTING COMMITTING R: ACKS W: End END R: ACKS W: End Abort and forget Commit and forget 23

Presumed Abort State Machines R: No votes W: Abort S: Abort INIT COLLECTING Receive: Commit Send: Prepare R: Yes votes FW: Commit S: Commit R: Prepare W: Abort S: No vote R: Abort W: Abort INIT PREPARED R: Prepare FW: Prepare S: Yes vote R: Commit FW: Commit S: Ack COMMITTING ABORTING COMMITTING END R: ACKS W: End Abort and forget Commit and forget 24

Presumed Abort for Read-Only INIT COLLECTING Receive: Commit Send: Prepare R: Read Forget INIT DONE Forget R: Prepare S: Read vote END 25

Replication 26

Outline Goals of replication Three types of replication Eager replication Lazy replication Two-tier replication 27

Goals of Replication Goal 1: availability Goal 2: performance Some requests Three replicas Other requests As expected, it s easy to build a replicated system that reduces performance and availability 28

Eager Replication Also called synchronous replication All updates are applied to all replicas (or to a majority) as part of a single transaction (need two phase commit) E.g., triggers on tables apply updates to replicas within transaction Main goal: as if there was only one copy Maintain consistency Maintain one-copy serializability i.e., execution of transactions has same effect as an execution on a non-replicated db Transactions must acquire global locks 29

Eager Master One master for each object holds primary copy The Master is also called Primary To update object, transaction must acquire a lock at the master Lock at the master is global lock Master propagates updates to replicas synchronously Updates propagate as part of the same distributed transaction For example, using triggers 30

Crash Failures What happens when a secondary crashes? Nothing happens When secondary recovers, it catches up What happens when the master/primary fails? Blocking would hurt availability Must chose a new primary: run election See the Paxos algorithm (CSE 550) 31

Network Failures / Partitions Network failures can cause trouble... Secondaries think that primary failed Secondaries elect a new primary But primary can still be running Now have two primaries! 32

Majority Consensus To avoid problem, only majority partition can continue processing at any time In general, Whenever a replica fails or recovers... a set of communicating replicas must determine... whether they have a majority before they can continue 33

Eager Group With n copies Exclusive lock on x copies is global exclusive lock Shared lock on s copies is global shared lock Must have: 2x > n and s + x > n Majority locking s = x = (n+1)/2 No need to run any reconfiguration algorithms Read-locks-one, write-locks-all s=1 and x = n, high read performance Need to make sure algorithm runs on quorum of machines 34

Eager Replication Properties Favors consistency over availability Only majority partition can process requests There appears to be a single copy of the db High runtime overhead Must lock and update at least majority of replicas Two-phase commit Runs at pace of slowest replica in quorum So overall system is now slower Higher deadlock rate (transactions take longer) 35

Lazy Replication Also called asynchronous replication Also called optimistic replication Main goals: availability and performance Approach One replica updated by original transaction Updates propagate asynchronously to other replicas 36

Lazy Master One master holds primary copy Transactions update primary copy Master asynchronously propagates updates to replicas, which process them in same order (e.g. through log shipping) Ensures single-copy serializability What happens when master/primary fails? Can lose most recent transactions when primary fails! After electing a new primary, secondaries must agree who is most up-to-date 37

Lazy Group Also called multi-master Best scheme for availability Cannot guarantee one-copy serializability! Init: x=1 Update x=2 R1 R2 Init: x=1 Update x=3 38

Lazy Group Cannot guarantee one-copy serializability! Instead guarantee convergence DB state does not reflect any serial execution But all replicas have the same state Detect conflicts and reconcile replica states Different reconciliation techniques are possible Manual Most recent timestamp wins Site A wins over site B User-defined rules, etc. 39

Detecting Conflicts Using Timestamps R1 R2 Init: x=1 at T 0 Update at T 1 : x=2 x=2, Old: T 0 New: T 1 Init: x=1 at T 0 x=2 at T 1 x=2 at T 1 40

Detecting Conflicts Using Timestamps R1 R2 Init: x=1 at T 0 Update at T 1 : x=2 x=2, Old: T 0 New: T 1 Init: x=1 at T 0 Update at T 2 : x=3 Conflict! Reconciliation rule T 2 > T 1, so x=3 x=3, Old: T 0 New: T 2 Conflict! Reconciliation rule T 2 > T 1, so x=3 41

Lazy Group Replication Properties Favors availability over consistency Can read and update any replica High runtime performance Weak consistency Conflicts and reconciliation Important principle in systems research: TINSTAAFL 42

Two-Tier Replication Benefits of lazy master and lazy group Each object has a master with primary copy When disconnected from master Secondary can only run tentative transactions When reconnects to master Master reprocesses all tentative transactions Checks an acceptance criterion If passes, we now have final commit order Secondary undoes tentative and redoes committed 43

Conclusion (distributed txns and replication) Distributed transactions are very important Necessary for scalability (throughput and global services) But ACID properties require expensive 2PC protocol Replication is a very important problem Fault-tolerance (various forms of replication) Caching (lazy master) Warehousing (lazy master) Mobility (two-tier techniques) Replication is complex, but basic techniques and trade-offs are very well known Eager or lazy replication Master or no master For eager replication: use quorum 44