The Chubby lock service for loosely- coupled distributed systems

Similar documents
Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

The Chubby Lock Service for Distributed Systems

The Chubby Lock Service for Loosely-coupled Distributed systems

Consensus and related problems

Distributed Systems 16. Distributed File Systems II

The Chubby lock service for loosely-coupled distributed systems

SimpleChubby: a simple distributed lock service

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Comparison of Different Implementations of Paxos for Distributed Database Usage. Proposal By. Hanzi Li Shuang Su Xiaoyu Li

Chubby lock service. Service in a cloud. Big table and Chubby: perfect together. Big table. Badri Nath Rutgers University

Reliability & Chubby CSE 490H. This presentation incorporates content licensed under the Creative Commons Attribution 2.5 License.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Applications of Paxos Algorithm

Paxos Made Simple. Leslie Lamport, 2001

Paxos Made Live. An Engineering Perspective. Authors: Tushar Chandra, Robert Griesemer, Joshua Redstone. Presented By: Dipendra Kumar Jha

Distributed Systems 11. Consensus. Paul Krzyzanowski

Hierarchical Chubby: A Scalable, Distributed Locking Service

Strong Consistency and Agreement

Today s Papers. Google Chubby. Distributed Consensus. EECS 262a Advanced Topics in Computer Systems Lecture 24. Paxos/Megastore November 24 th, 2014

CmpE 138 Spring 2011 Special Topics L2

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Distributed Systems Consensus

Paxos Made Live. an engineering perspective. Authored by Tushar Chandra, Robert Griesemer, Joshua Redstone. Presented by John Hudson

Distributed Consensus: Making Impossible Possible

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

EECS 498 Introduction to Distributed Systems

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

Distributed Consensus: Making Impossible Possible

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

CPS 512 midterm exam #1, 10/7/2016

Exam 2 Review. October 29, Paul Krzyzanowski 1

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Fault-tolerant and Decentralized Lease Coordination in Distributed Systems

BigTable. CSE-291 (Cloud Computing) Fall 2016

Paxos Made Moderately Complex Made Moderately Simple

Today: Fault Tolerance

Fast Paxos Made Easy: Theory and Implementation

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Recovering from a Crash. Three-Phase Commit

PushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina,

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Bigtable. Presenter: Yijun Hou, Yixiao Peng

416 practice questions (PQs)

Lecture XIII: Replication-II

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Consensus Protocols

Exam 2 Review. Fall 2011

Today: Fault Tolerance. Fault Tolerance

Distributed File Systems II

Integrity in Distributed Databases

Using Paxos For Distributed Agreement

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Bigtable: A Distributed Storage System for Structured Data

Practical Byzantine Fault

EECS 482 Introduction to Operating Systems

Paxos provides a highly available, redundant log of events

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Cheap Paxos. Leslie Lamport and Mike Massa. Appeared in The International Conference on Dependable Systems and Networks (DSN 2004)

CS November 2017

EMPIRICAL STUDY OF UNSTABLE LEADERS IN PAXOS LONG KAI THESIS

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Intuitive distributed algorithms. with F#

Distributed Data Management. Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig

A Rendezvous Framework for the Automatic Deployment of Services in Cluster Computing

Paxos and Replication. Dan Ports, CSEP 552

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Fault-Tolerance & Paxos

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Group Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer

GFS: The Google File System

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Practice: Large Systems

There Is More Consensus in Egalitarian Parliaments

A Distributed Lock Manager Using Paxos

This material is covered in the textbook in Chapter 21.

CSCI 4717 Computer Architecture

WA 2. Paxos and distributed programming Martin Klíma

7680: Distributed Systems

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Practice: Large Systems Part 2, Chapter 2

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Transcription:

The Chubby lock service for loosely- coupled distributed systems Mike Burrows, Google Inc. Presented By- Rahul Malviya

Outline Paxos Algorithm The Chubby Lock Service

What is Paxos Help to implemenjng a fault- tolerant distributed system, provide distributed consensus in a network of several processors a family of algorithms cheap Paxos, fast Paxos, generalized Paxos, Byzantine Paxos

Why Paxos Considered the most effec8ve in the field all working protocols for asynchronous consensus we have so far encountered have Paxos at their core. Raised from intuijve and reasonable reasoning

Paxos Three roles in the Paxos protocol Proposer Acceptor Learner Goal: consensus

How The Safety properjes A value that has been proposed may be chosen Only a single value is chosen A leaner never learns that a value has been chosen unless it actually has been chosen

Reasoning P1: An acceptor must accept the first proposal that it receives. Keeping track of proposals: unique numbers a proposal is pairs: (counter, value) A value has been chosen only if: A single proposal has been accepted by a majority.

Reasoning We can allow muljple proposals to be chosen But must guarantee that all chosen proposals have the same value: P2: If a proposal with value v is chosen, then every higher- numbered proposal that is chosen has value v. P2 guarantees the safety property of consensus.

Reasoning P2a: If a proposal with value v is chosen, then every higher- numbered proposal accepted by any acceptor has value v. P2b: If a proposal with value v is chosen, then every higher- numbered proposal issued by any proposer has value v.

Reasoning P2c: If a proposal (n, v) is issued, then there exists a majority S of acceptors, such that: Either no acceptor in S has accepted any proposal numbered less than n Or v is the value of the highest- numbered proposal among all proposals numbered less than n accepted by the acceptors in S. P2c P2b P2a P2

Algorithm Phase 1a: Prepare Send Msg: Prepare n Phase 1b: Promise For Acceptor, if n > previous proposal number, send back the value. Phase 2a: Accept! proposer fix the value Phase 2b: Accepted Acceptor fix the value, tell the learner

The Chubby Lock Service 1

Chubby's Design IntroducJon System structure File, dirs, handles Locks and sequencers Events API Caching

Basic Idea About Chubby Chubby lock service, is intended to provide coarse- grained locking as well as reliable (though low- volume) storage for a loosely- coupled distributed system. Chubby provides an interface much like a distributed file system with advisory locks, but the design emphasis is on availability and reliability, as opposed to high performance.

IntroducJon It is intended for use within a loosely- coupled distributed system consisjng of moderately large numbers of small machines connected by a high- speed network. The purpose of the lock service is to allow its clients to synchronize their acjvijes and to agree on basic informajon about their environment. The primary goals included reliability, availability to a moderately large set of clients, and throughput and storage capacity were considered secondary.

System structure

System Structure Chubby has two main components that communicate via RPC: a server, and a library that client applicajons link against A Chubby cell consists of a small set of servers (typically five) known as replicas. The replicas use a distributed consensus protocol to elect a master; the master must obtain votes from a majority of the replicas, plus promises that those replicas will not elect a different master for an interval of a few seconds known as the master lease

Files, dirs, handles File interface similar to, but simpler than UNIX /ls/foo/wombat/pouch the name space contains files and directories called nodes each node has various meta- data

Files, dirs, handles (cont.) clients open node to obtain handles handles ~ UNIX file descriptors handle include Check digits prevent client guess handle Sequence number Mode informajon use to recreate the lock state when the master changes

Locks and sequencers Chubby uses a file or directory to act as a reader- writer lock One client may hold the lock in exclusive (writer) mode. Any number of clients hold the lock in shared (reader) mode. Locks are advisory - only conflict with other adempts to acquire the same lock

Locks and sequencers Locking in Distributed is complex Virtual synchrony or virtual Jme costly Chubby - Only interacjons which use locks are numbered Sequencer - a byte string describing the state of the lock afer acquisijon A lock requests a sequencer A client passes the sequencer to the server for protected progress

Events Client can subscribe to some events when a handle is created. Delivered afer the corresponding acjon has taken place Events include file contents modified child node added, removed, or modified chubby master failed over handle has become invalid lock acquired by others conflicjng lock requested from another client

API Handled are created by Open() destroyed by Close() GetContentsAndStat(), GetStat(), ReadDir() SetContents(), SetACL() Delete() Acquire(), TryAcquire(), Release() GetSequencer(), SetSequencer(), CheckSequencer()

Caching Client caches file data and meta- data reduce read traffic Handle and lock can also be cached Master keeps a list of which clients might be caching InvalidaJons keep consistent Clients see consistent data or error

The Chubby Lock Service 2

Client Sessions Sessions maintained between client and server Keep- alive messages required to maintain session every few seconds If session is lost, server releases any client- held handles. What if master is late with next keep- alive? Client has its own (longer) Jmeout to detect server failure Spinnaker Labs, Inc.

Master Failure If client does not hear back about keep- alive in local lease 8meout, session is in jeopardy Clear local cache Wait for grace period (about 45 seconds) ConJnue adempt to contact master Successful adempt => ok; jeopardy over Failed adempt => session assumed lost Spinnaker Labs, Inc.

Master Failure

Scalability Arbitrary number of Chubby cells. System increases lease Jmes from 12 sec up to 60 sec under heavy load Chubby clients cache file data, meta- data, the absence of files, and open handles. Data is small all held in RAM (as well as disk) * Use proxies to save KeepAlive and reads traffic. * Use parjjoning

REFERENCES Paxos [Chandra, et al.,2007] T. D. Chandra, R. Griesemer, and J. Redstone, "Paxos made live: an engineering perspective," in Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing. Portland, Oregon, USA: ACM, 2007, pp. 398-407. http://labs.google.com/papers/paxos_made_live.html [Lamport,2001] L. Lamport, "Paxos made simple," ACM SIGACT News, vol. 32, pp. 18-25, 2001. http://en.wikipedia.org/wiki/paxos_algorithm [Burrows,2006] M. Burrows, "The Chubby Lock Service for Loosely- Coupled Distributed Systems," presented at OSDI'06: Seventh Symposium on OperaJng System Design and ImplementaJon, Seadle, WA, 2006. Zookeeper http://zookeeper.wiki.sourceforge.net http://developer.yahoo.com/blogs/hadoop/2008/03/intro-to-zookeepervideo.html