Consistency and Scalability

Similar documents
Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Outline. Failure Types

Overview of Transaction Management

Database Management System

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Introduction. Storage Failure Recovery Logging Undo Logging Redo Logging ARIES

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

CS October 2017

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

XI. Transactions CS Computer App in Business: Databases. Lecture Topics

T ransaction Management 4/23/2018 1

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Final Exam Solutions

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Transaction Management. Pearson Education Limited 1995, 2005

Databases: transaction processing

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS122 Lecture 15 Winter Term,

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions and ACID

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

TRANSACTION PROCESSING MONITOR OVERVIEW OF TPM FOR DISTRIBUTED TRANSACTION PROCESSING

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Ext3/4 file systems. Don Porter CSE 506

Database Technology. Topic 8: Introduction to Transaction Processing

SCALABLE CONSISTENCY AND TRANSACTION MODELS

CSE 344 MARCH 5 TH TRANSACTIONS

PRIMARY-BACKUP REPLICATION

Replication. Feb 10, 2016 CPSC 416

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Outline. Purpose of this paper. Purpose of this paper. Transaction Review. Outline. Aries: A Transaction Recovery Method

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Chapter 14: Recovery System

Lecture X: Transactions

Lectures 8 & 9. Lectures 7 & 8: Transactions

Weak Levels of Consistency

Introduction to Data Management CSE 344

NOTES W2006 CPS610 DBMS II. Prof. Anastase Mastoras. Ryerson University

CS5460: Operating Systems Lecture 20: File System Reliability

Transaction Management & Concurrency Control. CS 377: Database Systems

It also performs many parallelization operations like, data loading and query processing.

Database Management Systems

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

ARIES (& Logging) April 2-4, 2018

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Causal Consistency and Two-Phase Commit

Example: Transfer Euro 50 from A to B

COURSE 1. Database Management Systems

Transactions. A Banking Example

Synchronization. Chapter 5

Database Systems. Announcement

Database Architectures

Lecture 18: Reliable Storage

Distributed System. Gang Wu. Spring,2018

Defining properties of transactions

6.830 Lecture Recovery 10/30/2017

File Systems: Consistency Issues

Distributed Systems

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Fault tolerance with transactions: past, present and future. Dr Mark Little Technical Development Manager, Red Hat

Recoverability. Kathleen Durant PhD CS3200

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Actions are never left partially executed. Actions leave the DB in a consistent state. Actions are not affected by other concurrent actions

Intro to Transactions

Transactions. 1. Transactions. Goals for this lecture. Today s Lecture

6.830 Lecture Recovery 10/30/2017

CSE 530A ACID. Washington University Fall 2013

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Chapter 22. Transaction Management

CSE380 - Operating Systems. Communicating with Devices

CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E)

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION 4/3/2014

TRANSACTION PROPERTIES

GFS: The Google File System

Parallel DBs. April 25, 2017

GFS: The Google File System. Dr. Yingwu Zhu

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Database Recovery. Dr. Bassam Hammo

CMPT 354: Database System I. Lecture 11. Transaction Management

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 10: Transaction processing. November 14, Lecturer: Rasmus Pagh

Transactions. ACID Properties of Transactions. Atomicity - all or nothing property - Fully performed or not at all

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Introduction to Data Management. Lecture #18 (Transactions)

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Database Technology. Topic 11: Database Recovery

Transaction Management: Concurrency Control, part 2

Transcription:

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015) Consistency and Scalability Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah Copyright 2015 Noah Mendelsohn

What you should get from today s session You will explore challenges relating to maintaining data consistency in a computing system You will learn about techniques used to make storage systems more reliable You will learn about transactions and their implementation using logs You will learn about the CAP theorem and why scaling and consistency tend not to come together 2

A note about scope The challenges & principles we cover today reappear at every level of system design CPU Instruction set and memory Parallel programming languages Single machine databases Distributed applications and databases Today we will focus mainly on larger scale systems 3

Why Worry About Consistency? 4

Duplicate information in computing systems Why complicated things? Mirrored disks for reliability Parallel processing higher throughput Geographic distribution reduces network delay (one each in Europe, Asia, US) Higher availability if network crashes, each partition may still have a copy Inter-dependent data Bank account records have total for each account Bank record keeps total for all accounts Memory Hierarchies CPU Caches, file system caches, Web proxies, etc. If we allow updates, then maintaining consistency is tricky 5

Simple Examples: Parallel Disk Systems 6

Mirrored disks Everything written twice Better performance on reads (slower on writes) X Logical disk X X Mirrored Implementation 7

Duplicate data and crash recovery After a crash, data survives X Logical disk X Crash! X Mirrored Implementation 8

Mirrored disks Replacement drive can be reconstructed in the background X Logical disk X X Mirrored Implementation 9

REVIEW: How is the disk used in Unix / Linux? Application Filesystem Block Device Driver Sector In-memory Block Cache Raw Device Driver Sector Access by cylinder/track/sector Unix Kernel Direct read/write of filesystem blocks (hides sector size and device geometry) Files/Dirs security, etc Buffered block r/w: hides timing

We can use mirrored disks with Unix Application Abstraction: The mirrored disk provides the same service as a single disk just faster and more reliable! Filesystem Block Device Driver Sector In-memory Block Cache MIRRORED Device Driver Mirrored Implementation Unix Kernel Files/Dirs security, etc Buffered block r/w: hides timing

Atomicity and update synchronziation Question: when is the update committed? X Mirrored writes DO NOT happen at quite the same time Logical disk X X Mirrored Implementation 12

RAID Reliable Arrays of Inexpensive Disks X Logical disk X X X RAID Implementation 13

RAID Reliable Arrays of Inexpensive Disks X Y Logical disk X Y X XOR(X,Y) RAID Implementation 14

RAID Reliable Arrays of Inexpensive Disks X Y Z Logical disk Much less space overhead than mirroring but typically slower X Y X Z XOR(X,Y,Z) RAID Implementation 15

RAID Reliable Arrays of Inexpensive Disks X Y Z Logical disk If any disk is lost you can reconstruct from information on the others! X Y Crash! X Z XOR(X,Y,Z) RAID Implementation 16

Why Consistency is Hard 17

Synchronization problem Some code to add money to my account NA =Access Noah s Bank account Bal = NA.Balance; NewBalance = Bal + $1000 NA.Balance.Write NewBalance Let s run code for two deposits in parallel Some code to add money to my account NA =Access Noah s Bank account Bal = NA.Balance; NewBalance = Bal + $1000 NA.Balance.Write NewBalance Can you see the problem? There s a risk that both copies will pick up X before either updates. If that happens, I only get $1000 not $2000! 18

Solution - locking Only one transaction or thread can hold the lock at a time Some code to add money to my account Lock Noah s Bank Account NA =Access Noah s Bank account Bal = NA.Balance; NewBalance = Bal + $1000 NA.Balance.Write NewBalance Unlock Noah s Bank Account Now the two copies can t run at once on the same account but if each locks a different bank account they can. 19

Consistency and Crash Recovery Some code to transfer money NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; This gets lost during crash Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal Can you see the problem? If the system crashes just after writing my balance, the bank loses $1000 (it s still in your account too) 20

Transactions 21

Transactions: automated consistency & crash recovery! Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION The system guarantees that either everything in the transaction happens, or nothing and it guarantees more! 22

ACID Properties of a Transaction Atomicity Everything happens or nothing Consistency If the database has rules they are obeyed at transaction end (e.g. balance must be < $1,000,000) Isolation Any two parallel transactions act as if serial Most transaction systems do the locking automatically! Durability Once committed, never lost That seems almost magic how can we achieve all this? 23

How to implement transactions - logging The key idea: a shared log records information needed to undo any change made by any transaction When a transaction commits: All data is written to the main data store A commit record is written to the log. This is the atomic point at which the transaction happens After a crash, the log is replayed For any transactions that did not commit, the undo operations are performed After the crash, only commited operations have happened! When combined with transaction driven locking, we can automatically support ACID properties with almost no application code complexity This is all built into SQL databases like Oracle, Postgres, DB2, and SQL Server Logging and transaction processing are two of the most important and beautiful data processing technologies 24

Logging in Action Noah.Bal = $100 Your.Bal = $1300 Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION 25

Logging in Action Noah.Bal = $100 Your.Bal = $1300 Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Begin Trans 1 Log 26

Logging in Action Noah.Bal = $1100 Your.Bal = $1300 Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Begin Trans 1 Old Noah Bal = $100 Log 27

Logging in Action Noah.Bal = $1100 Your.Bal = $1300 Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Begin Trans 1 Old Noah Bal = $100 Old Your Bal = $1300 Log 28

Logging in Action Noah.Bal = $1100 Your.Bal = $1300 Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Write Ybal END_TRANSACTION Begin Trans 1 Old Noah Bal = $100 Old Your Bal = $1300 Commit Tr 1 Log 29

Logging in Action Noah.Bal = $1100 Your.Bal = $1300 Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Write Ybal END_TRANSACTION What if we crash while the data is inconsistent? Begin Trans 1 Old Noah Bal = $100 Old Your Bal = $1300 Commit Tr 1 Log 30

Logging in Action Noah.Bal = $100 Your.Bal = $1300 Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION 31

Logging in Action Noah.Bal = $100 Your.Bal = $1300 Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Write Ybal END_TRANSACTION Begin Trans 1 Log 32

Logging in Action Noah.Bal = $1100 Your.Bal = $1300 Crash! Some code to transfer money BEGIN_TRANSACTION NA =Access Noah s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Write Ybal END_TRANSACTION Begin Trans 1 Old Noah Bal = $100 Log 33

Recovery! When system restarts, data is inconsistent Noah.Bal = $1100 Your.Bal = $1300 but we can play the log to restore consistency! Begin Trans 1 Old Noah Bal = $100 Log 34

Recovery! Noah.Bal = $1100 Your.Bal = $1300 We notice that Transaction 1 never committed, so we apply all of its undo entries Begin Trans 1 Old Noah Bal = $100 Log 35

Recovery! Noah.Bal = $1100 $100 Your.Bal = $1300 We notice that Transaction 1 never committed, so we apply all of its undo entries Begin Trans 1 Old Noah Bal = $100 Log 36

Logging keeping consistency Full Disclosure after crashes This explanation is highly simplified but the spirit is exactly right. The key idea: a shared log records information on how to undo any change to the main data Examples of things not covered: When a transaction commits: All data is written to the main data store Some databases use redo vs. undo logging or log both old and A commit record is written to the log. This is the atomic point at which the transaction new values happens Transactions can abort (a ROLLBACK record is logged instead of COMMIT) For any transactions that did not commit, the undo operations are performed Useful if programmer wants to give up After the crash, only commited operations have happened! The system can abort a transaction if there is an error The system can abort a transaction if locking has caused properties deadlock with almost no application code complexity The same logs, if carefully designed, can be used to help with and backup, SQL Server recovery from disk drive failure, and synchronization of distributed systems. After a crash, the log is replayed When combined with locking, we can automatically support ACID This is all built into SQL databases like Oracle, Postgres, DB2, Logging and transaction processing are two of the most important and beautiful data processing technologies 37

Atomicity and hardware Important: transactions are committed by an atomic hardware write to the log Before the commit is written, the transaction has not happened After it s written all of its work is committed It all happens at once: atomically Principle: Almost any computing activity that is to be done atomically must be achieved in a single atomic hardware operation! Store, Test_and_set or compare_and_swap CPU instructions Write a disk block When designing systems that require consistency, start by studying what your hardware can do atomically 38

Consistency in Distributed Systems 39

Problem In a distributed system, we want to do work in lots of places To get consistency, we need to do an atomic update to the system state Challenge: can we get consistency in a distributed system? 40

Can we get distributed consensus and consistency? Yes! (but with some limitations) First we need to think about how distributed systems fail individual nodes can fail what if the network partitions? In general, implementing transactions or other consistency guarantees in distributed systems is hard! 41

Network Partition This network is fully connected 42

Network Partition All computers are still up! Updates in one partition can t be sent to the other. If these links break the network is partitioned 43

Questions about failures in distributed systems Can we support replicated data and maintain consistency? Can we run distributed transactions in which work (updating accounts) is spread through the network and achieve consistency? How can we do crash recovery? How do we continue running when the network partitions? 44

Voting: a simple approach to replicated data Copies of the same data can be kept at any or all nodes but when reading you must use the value stored at a majority of nodes! 45

Network Partition All computers are still up! Updates in one partition can t be sent to the other. During partition, only one group of nodes can be a majority the other can t proceed! 46

The Famous CAP Theorem 47

The Cap Theorem When designing a system with distributed data you would like to have: Consistency: everyone agrees on the data Availability: nobody ever has to stop processing Partition tolerance: keep going even when the network partitions The CAP theorem says: you can have any two simultaneously, but not all three! If your network can partition, then either some nodes will have to stop working (no availability) or data may become inconsistent (other partition doesn t see the updates) 48

Network Partition With the voting algorithm, only the orange partition can do work. The CAP theorem explains why we can never build a system that does better, unless we are willing to sacrifice consistency. 49

Distributed Transactions 50

Distributed transactions: the challenge What if our computation is distributed? We still want ACID properties Atomicity Consistency Isolation Durability Per the CAP theorem: let s ignore partition for now Amazingly, there are ways to do this: Isolation and Consistency: distributed lock managers Atomicity and Durability: Distributed Two Phase Commit (DTPC) 51

Distributed two phase commit Allows a single transaction to be spread across multiple nodes Logging is done at each node as for traditional transactions Special protocol ensures atomic commit of distributed work One of the great achievements of 20 th century distributed computing research 52

Distributed Two Phase Commit Node 1 logic BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 2 Logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Noah.Bal = $100 Your.Bal = $1300 Begin Trans 1 Node 1 Log Join Trans 1 Node 2 Log 53

Distributed Two Phase Commit Node 1 logic BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 2 Logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Noah.Bal = $1100 Your.Bal = $300 Begin Trans 1 Old Noah Balance = $100 Node 1 Log Join Trans 1 Node 2 Log Old YourBalance = $1300 54

Distributed Two Phase Commit Node 1 logic BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 2 Logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Noah.Bal = $1100 Your.Bal = $300 Begin Trans 1 Old Noah Balance = $100 Prepared Node 1 Log Join Trans 1 Old YourBalance = $1300 Prepared Node 2 Log 55

Distributed Two Phase Commit Node 1 logic Prepared means: if you ask Node 2 Logic BEGIN_DISTRIBUTED_TRANSACTION me later to commit or abort JOIN_DISTRIBUTED_TRANSACTION NA =Access Noah s Bank account I will be able to do either! YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal COMMIT Noah.Bal = $1100 Your.Bal = $300 Begin Trans 1 Old Noah Balance = $100 Prepared Node 1 Log Join Trans 1 Old YourBalance = $1300 Prepared Node 2 Log 56

Distributed Two Phase Commit Node 1 logic BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 2 Logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Noah.Bal = $1100 Your.Bal = $300 Begin Trans 1 Old Noah Balance = $100 Prepared Commit Node 1 Log Join Trans 1 Old YourBalance = $1300 Prepared Commit Node 2 Log 57

What happens if there is a crash? If a node goes down before the commit, the master node writes an abort record and tells other nodes to abort When any node comes up after a crash or after partition, it checks with master what has happened to any prepared transactions Because prepared means it can go either way, that node can either record a commit or execute a rollback using data from the log We can see the CAP theorem in action again: the algorithm stalls while the network is partitioned 58

Does Everyone use Distributed 2 Phase Commit? In the late 1990s everyone thought DTPC would be the key to distributed data In practice, systems like Amazon can t stop in case of network partition or master node crashe Today: Massive but non-critical data stores do not even attempt perfect consistency: once in awhile your Amazon shopping cart may lose things you ve parked there Critical transactions (e.g. when you place your order and charge your credit card) are often recorded in less scalable but fully consistent (usually relational) databases 59

Summary 60

Summary Keeping data consistent is important Techniques like ACID transactions implemented with logs have been spectacularly successful Consistency and scalability tend not to come together Atomicity in software tends to require reduction to a single atomic operation in hardware The CAP theorem says we can t have Consistency, Availability and Parition tolerance Techniques like Voting and Distributed Two Phase Commit can achieve distributed consistency at the cost of availability Many modern systems sacrifice consistency to achieve availability at massive scale 61