Atomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38

Similar documents
CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Basic vs. Reliable Multicast

CSE 444: Database Internals. Lecture 25 Replication

Replication in Distributed Systems

sinfonia: a new paradigm for building scalable distributed systems

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

Chapter 11 - Data Replication Middleware

Applications of Paxos Algorithm

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Today. A methodology for making fault-tolerant distributed systems.

Distributed Systems (ICE 601) Fault Tolerance

Consensus and related problems

Sinfonia: A New Paradigm for Building Scalable Distributed Systems

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Consistency in Distributed Systems

CS State Machine Replication

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

EECS 498 Introduction to Distributed Systems

Distributed Systems COMP 212. Revision 2 Othon Michail

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

GOSSIP ARCHITECTURE. Gary Berg css434

Distributed KIDS Labs 1

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Consistency & Replication

Synchronization. Chapter 5

Causal Consistency and Two-Phase Commit

Lecture XII: Replication

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Chapter 8 Fault Tolerance

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Distributed System. Gang Wu. Spring,2018

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Dynamo: Key-Value Cloud Storage

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Silberschatz and Galvin Chapter 18

Exam 2 Review. October 29, Paul Krzyzanowski 1

Fault Tolerance. Distributed Systems. September 2002

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Advanced Databases Lecture 17- Distributed Databases (continued)

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Formal Development of Fault Tolerant Transactions for a Replicated Database using Ordered Broadcasts

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Today: Fault Tolerance. Failure Masking by Redundancy

XI. Transactions CS Computer App in Business: Databases. Lecture Topics

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Cache Coherence in Distributed and Replicated Transactional Memory Systems. Technical Report RT/4/2009

11/7/2018. Event Ordering. Module 18: Distributed Coordination. Distributed Mutual Exclusion (DME) Implementation of. DME: Centralized Approach

Distributed Systems (5DV147)

REPLICATED STATE MACHINES

CSE 5306 Distributed Systems. Fault Tolerance

Chapter 16: Distributed Synchronization

CS 655 Advanced Topics in Distributed Systems

Synchronization. Clock Synchronization

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Fault Tolerance. Basic Concepts

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Database Management Systems

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

CSE 124: REPLICATED STATE MACHINES. George Porter November 8 and 10, 2017

Scaling Out Key-Value Storage

It also performs many parallelization operations like, data loading and query processing.

Chapter 18: Distributed

Distributed Systems Fault Tolerance

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Selected Questions. Exam 2 Fall 2006

Distributed Transactions

Distributed Systems COMP 212. Lecture 19 Othon Michail

The Google File System (GFS)

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Fault tolerance with transactions: past, present and future. Dr Mark Little Technical Development Manager, Red Hat

CPS 512 midterm exam #1, 10/7/2016

Presented By: Devarsh Patel

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

Distributed File Systems II

System Models for Distributed Systems

CSE 5306 Distributed Systems

The Google File System

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Fault Tolerance. Distributed Software Systems. Definitions

EMPIRICAL STUDY OF UNSTABLE LEADERS IN PAXOS LONG KAI THESIS

Distributed Deadlock

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Integrity in Distributed Databases

Distributed Data Management

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

Chapter 17: Recovery System

Spanner : Google's Globally-Distributed Database. James Sedgwick and Kayhan Dursun

Databases. Laboratorio de sistemas distribuidos. Universidad Politécnica de Madrid (UPM)

The Dangers of Replication and a Solution Abstract: reconcile 1. Introduction Eager replication lazy replication

CA485 Ray Walshe Google File System

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Transcription:

Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, 2012 1 / 38

Outline 1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication Bailu Ding Atomicity Oct 18, 2012 2 / 38

Introduction Introduction Implementing Fault-Tolerance Services Using State Machine Approach Sinfonia: A New Paradim for Building Scalable Distributed Systems The Dangers of Replication and a Solution Bailu Ding Atomicity Oct 18, 2012 3 / 38

State Machine Outline 1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication Bailu Ding Atomicity Oct 18, 2012 4 / 38

State Machine State Machine Server State variables Commands Example: memory, reads, writes. Outputs of a state machine are completely determined by the sequence of requests it processes Client Output Bailu Ding Atomicity Oct 18, 2012 5 / 38

State Machine Causality Requests issued by a single client to a given state machine are processed by the order they were issued If request r was made to a state machine sm caused a request r to sm, then sm processes r before r Bailu Ding Atomicity Oct 18, 2012 6 / 38

State Machine Fault Tolerance Byzantine failures Fail-stop failures t fault tolerant Bailu Ding Atomicity Oct 18, 2012 7 / 38

State Machine Fault-Tolerant State Machine Replicate state machine t fault tolerant Byzantine: 2t+1 Fail-stop: t+1 Bailu Ding Atomicity Oct 18, 2012 8 / 38

State Machine Replica Coordination Requriements Agreement: receive the same sequence of requests Order: process the requests in the same relative order Bailu Ding Atomicity Oct 18, 2012 9 / 38

State Machine Agreement Transmitter: disseminate a value to other processors All nonfaulty processors agree on the same value If the transmitter is nonfaulty, then all nonfaulty processors use its value as the one on which they agree Bailu Ding Atomicity Oct 18, 2012 10 / 38

State Machine Order Each request has a unique identifier State machine processes requests ordered by unique identifiers Stable: no request with a lower unique identifier can arrive Challenge Unique identifier assignment that satisfies causality Stability test Bailu Ding Atomicity Oct 18, 2012 11 / 38

State Machine Order Implementation Logical Clocks Each event e has a timestamp T (e) Each processor p has a counter T (p) Each message sent by p is associated with a timestamp T (p) T (p) is updated when sending or receiving a message Satisfy causality Stability test for fail-stop failures Send a request r to processor p ensures T (p) > T (r) A request r is stable if T (p) > T (r) for all processors Bailu Ding Atomicity Oct 18, 2012 12 / 38

State Machine Order Implmentation Synchronized Real-Time Clocks Approximately synchronized clocks Use real time as timestamps Satisfy causality No client makes two or more requests between successive clock ticks Degree of clock synchronization is better than the minimum message delivery time Stability test I: wait after delta time Stability test II: receive larger identifier from all clients Bailu Ding Atomicity Oct 18, 2012 13 / 38

State Machine Order Implementation Replica-Generated Identifiers Two phase State machine replicas propose candidate unique identifiers One of the candidates is selected Communication between all processors are not necessary Stability test: Selected candidate is the maximum of all the candidates Candidate proposed by a replica is larger than the unique identifier of any accepted request Causality: a client waits until all replicas accept its previous request Bailu Ding Atomicity Oct 18, 2012 14 / 38

State Machine Faulty Clients Replicate the client Challenges Requests with different unique identifiers Requests with different content Bailu Ding Atomicity Oct 18, 2012 15 / 38

State Machine Reconfiguration Remove faulty state machine Add new state machine Bailu Ding Atomicity Oct 18, 2012 16 / 38

Sinfonia Outline 1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication Bailu Ding Atomicity Oct 18, 2012 17 / 38

Sinfonia Sinfonia Two Phase Commit Sinfonia Bailu Ding Atomicity Oct 18, 2012 18 / 38

Sinfonia Two Phase Commit Problem All participate in a distributed atomic transaction commit or abort a transaction Bailu Ding Atomicity Oct 18, 2012 19 / 38

Sinfonia Two Phase Commit Problem All participate in a distributed atomic transaction commit or abort a transaction Challenge A transaction can commit its updates on one participate, but a second participate can fail before the transaction commits there. When the failed participant recovers, it must be able to commit the transaction. Bailu Ding Atomicity Oct 18, 2012 19 / 38

Sinfonia Two Phase Commit Idea Each participant must durably store its portion of updates before the transaction commits anywhere. Prepare (Voting) Phase: a coordinator sends updates to all participants Commit Phase: a coordinator sends commit requests to all participants Bailu Ding Atomicity Oct 18, 2012 20 / 38

Sinfonia Motivation Problem Data centers are growing quickly Need distributed applications scale well Current protocols are often too complex Idea New building block Bailu Ding Atomicity Oct 18, 2012 21 / 38

Sinfonia Scope System within a data center Network latency is low Nodes can fail Stable storage can fail Infrastructure applications Fault-tolerant and consistent Cluster file systems, distributed lock managers, group communication services, distributed name services Bailu Ding Atomicity Oct 18, 2012 22 / 38

Sinfonia Approach Idea What can we sqeeuze out of 2PC? Bailu Ding Atomicity Oct 18, 2012 23 / 38

Sinfonia Approach Idea What can we sqeeuze out of 2PC? Observation For pre-defined read set, an entire transaction can be piggybacked in 2PC. Bailu Ding Atomicity Oct 18, 2012 23 / 38

Sinfonia Approach Idea What can we sqeeuze out of 2PC? Observation For pre-defined read set, an entire transaction can be piggybacked in 2PC. Solution Minitransaction: compare-read-write Bailu Ding Atomicity Oct 18, 2012 23 / 38

Sinfonia Minitransaction Minitransaction Compare items, read items, write items Prepare phase: compare items Commit phase: if all comparison succeed, return read items and update write items; otherwise, abort. Bailu Ding Atomicity Oct 18, 2012 24 / 38

Sinfonia Minitransaction Minitransaction Compare items, read items, write items Prepare phase: compare items Commit phase: if all comparison succeed, return read items and update write items; otherwise, abort. Applications Compare and swap Atomic read of multiple data Acquire multiple leases Sinfonia File System Bailu Ding Atomicity Oct 18, 2012 24 / 38

Sinfonia Architecture Bailu Ding Atomicity Oct 18, 2012 25 / 38

Sinfonia Fault Tolerance App crash, memory crash, storage crash Disk images, logging, replication, backup Bailu Ding Atomicity Oct 18, 2012 26 / 38

Dangers of Replication Outline 1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication Bailu Ding Atomicity Oct 18, 2012 27 / 38

Dangers of Replication Contribution Dangers of Replication A ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliations. Solution Two-tier replication algorithm Commutative transactions Bailu Ding Atomicity Oct 18, 2012 28 / 38

Dangers of Replication Existing Replication Algorithms Replication Propagation Eager replication: replication as part of a transaction Lazy replication: replication as multiple transactions Replication Regulation Group: update anywhere Master: update the primary copy Bailu Ding Atomicity Oct 18, 2012 29 / 38

Dangers of Replication Analytic Model Parameters Number of nodes (Nodes) Number of transactions per second (TPS) Number of items updated per transaction (Actions ) Duration of a transaction (Action Time) Database size (DB Size) Serial replication Bailu Ding Atomicity Oct 18, 2012 30 / 38

Dangers of Replication Analysis of Eager Replication Single Node Concurrent Transactions: Transactions = TPS Actions Action Time Resource: Transactions Actions/2 Locked Resource: Transactions Actions/2/DB Size Probability of Waits Per Transaction: PW = (1 Transactions Actions/2/DB Size) Actions Transactions Actions 2 /2/DB Size Probability of Deadlocks Per Transaction: PD PW 2 /Transactions = TPS Action Time Actions 5 /4/DB Size 2 Deadlock Rate Per Trasction: DR = PD/(Actions Action Time) TPS Actions 4 /4/DB Size 2 Deadlock Rate Per Node: DT = TPS 2 Actions 5 Action Time/4/DB Size 2 Bailu Ding Atomicity Oct 18, 2012 31 / 38

Dangers of Replication Analysis of Eager Replication Multiple Nodes Transaction Duration: Actions Nodes Action Time Concurrent Transactions: Transactions = TPS Actions Action Time Nodes 2 Probability of Waits Per Transaction: PW m PW Nodes 2 Probability of Deadlocks Per Transaction: PD m PW 2 /Transactions = PD Nodes 2 Deadlock Rate Per Transaction: DR m DR Nodes Deadlock Rate Total: DT m DT Nodes 3 DB Grows Linearly (unlikely): DT Nodes Bailu Ding Atomicity Oct 18, 2012 32 / 38

Dangers of Replication Analysis of Eager Replication Master Serialized at the master No deadlocks if each transaction updates a single replica Deadlocks for mutiple masters Bailu Ding Atomicity Oct 18, 2012 33 / 38

Dangers of Replication Lazy Replication Lazy Group Replication No waits or deadlocks, but reconciliation. Reconciliation rate: TPS 2 Action Time (Actions Nodes) 3 /2/DB Size Lazy Master Replication Reconciliation rate is quadratic to Nodes. Bailu Ding Atomicity Oct 18, 2012 34 / 38

Dangers of Replication Sinfonia Revisit Analysis of Scalability The number of application nodes: App Nodes The number of memory nodes: Mem Nodes Total TPS: TPS = TPS App Nodes Total DB size: DB Size = DB Size Mem Nodes Single App/Mem node: Rate = TPS 2 xaction TimexActions 5 /4/DB Size 2 Multiple App/Mem nodes: Rate = TPS 2 xaction TimexActions 5 /4/DB Size 2 = (App Nodes/Mem Nodes) 2 xrate Bailu Ding Atomicity Oct 18, 2012 35 / 38

Dangers of Replication Sinfonia Revisit Analysis Bailu Ding Atomicity Oct 18, 2012 36 / 38

Dangers of Replication Sinfonia Revisit Analysis Bailu Ding Atomicity Oct 18, 2012 37 / 38

Dangers of Replication Discussion Parallel Eager Replication Transaction Duration: Actions Action Time Concurrent Transactions: Transactions = TPS Actions Action Time Nodes Probability of Waits Per Transaction: PW p PW Nodes Probability of Deadlocks Per Transaction: PD p PW 2 /Transactions = PD Nodes Deadlock Rate Per Transaction: DR p DR Deadlock Rate Total: DT p DT Nodes DB Grows Linearly: DT /Nodes Any problem? Bailu Ding Atomicity Oct 18, 2012 38 / 38

Dangers of Replication Discussion Fault Tolerance Logging v.s. Replication? Ordering Timestamping in recent system, i.e. Percolator? Bailu Ding Atomicity Oct 18, 2012 39 / 38