Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Similar documents
Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS Amazon Dynamo

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Dynamo: Key-Value Cloud Storage

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

CAP Theorem, BASE & DynamoDB

Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Dynamo: Amazon s Highly Available Key-Value Store

CS 655 Advanced Topics in Distributed Systems

Dynamo Tom Anderson and Doug Woos

FAQs Snapshots and locks Vector Clock

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

Dynamo: Amazon s Highly Available Key-value Store

Presented By: Devarsh Patel

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

EECS 498 Introduction to Distributed Systems

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Riak. Distributed, replicated, highly available

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

There is a tempta7on to say it is really used, it must be good

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability

Eventual Consistency 1

Large-Scale Data Stores and Probabilistic Protocols

Dynamo: Amazon s Highly Available Key-value Store

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

10. Replication. Motivation

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency

6.830 Lecture Spark 11/15/2017

Weak Consistency and Disconnected Operation in git. Raymond Cheng

Page 1. Key Value Storage"

Reprise: Stability under churn (Tapestry) A Simple lookup Test. Churn (Optional Bamboo paper last time)

Distributed Hash Tables

Dynamo: Amazon s Highly Available Key-value Store

References. NoSQL Motivation. Why NoSQL as the Solution? NoSQL Key Feature Decisions. CSE 444: Database Internals

CSE 486/586 Distributed Systems

Distributed Hash Tables Chord and Dynamo

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

11/12/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

Apache Cassandra - A Decentralized Structured Storage System

SCALABLE CONSISTENCY AND TRANSACTION MODELS

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Intuitive distributed algorithms. with F#

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

Introduction to Distributed Data Systems

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Indexing Large-Scale Data

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

C 1. Today s Question. CSE 486/586 Distributed Systems Failure Detectors. Two Different System Models. Failure Model. Why, What, and How

Replication in Distributed Systems

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

Peer- to- Peer in the Datacenter: Amazon Dynamo

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

CISC 7610 Lecture 2b The beginnings of NoSQL

SEGR 550 Distributed Computing. Final Exam, Fall 2011

CSE 486/586 Distributed Systems

Haridimos Kondylakis Computer Science Department, University of Crete

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

Distributed Data Management Replication

CSE 486/586: Distributed Systems

Consistency and Replication

Applications of Paxos Algorithm

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Network. CS Advanced Operating Systems Structures and Implementation Lecture 13. File Systems (Con t) RAID/Journaling/VFS.

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Last Class: Clock Synchronization. Today: More Canonical Problems

CSE 444: Database Internals. Lecture 25 Replication

Consistency and Replication 1/62

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

What Came First? The Ordering of Events in

Transactions and ACID

Consistency and Replication 1/65

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Remote Procedure Call. Chord: Node Joins and Leaves. Recall? Socket API

CSE 486/586 Distributed Systems

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

10.0 Towards the Cloud

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Page 1. Key Value Storage" System Examples" Key Values: Examples "

Consistency & Replication

The material in this lecture is taken from Dynamo: Amazon s Highly Available Key-value Store, by G. DeCandia, D. Hastorun, M. Jampani, G.

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Page 1. Key Value Storage" System Examples" Key Values: Examples "

Consistency and Replication. Why replicate?

Database Architectures

Peer-to-Peer Systems and Distributed Hash Tables

Ch06. NoSQL Part III.

Transcription:

Recap CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A? Eventual consistency? Availability and partition tolerance over consistency Lazy replication? Replicate lazily in the background Gossiping? Contact random targets, infect, and repeat in the next round 2 Amazon Dynamo Distributed key-value storage Only accessible with the primary key put(key, value) & get(key) Used for many Amazon services ( applications ) Shopping cart, best seller lists, customer preferences, product catalog, etc. Now in AWS as well (DynamoDB) (if interested, read http://www.allthingsdistributed.com/2012/01/amazondynamodb.html) With other Google systems (GFS & Bigtable), Dynamo marks one of the first non-relational storage systems (a.k.a. NoSQL) Amazon Dynamo A synthesis of techniques we discuss in class Well, not all but mostly Very good example of developing a principled distributed system Comprehensive picture of what it means to design a distributed storage system Main motivation: shopping cart service 3 million checkouts in a single day Hundreds of thousands of concurrent active sessions Properties (in the CAP theorem sense) Eventual consistency Partition tolerance Availability ( always-on experience) 3 4 Necessary Pieces? We want to design a storage service on a cluster of servers What do we need? Membership maintenance Object insert/lookup/delete (Some) Consistency with replication Partition tolerance Dynamo is a good example as a working system. Overview of Key Design Techniques Gossiping for membership and failure detection Eventually-consistent membership Consistent hashing for node & key distribution Similar to Chord But there s no ring-based routing; everyone knows everyone else Object versioning for eventually-consistent data objects A vector clock associated with each object Quorums for partition/failure tolerance Sloppy quorum similar to the available copies replication strategy Merkel tree for resynchronization after failures/partitions (This was not covered in class) 5 6 C 1

Membership Nodes are organized as a ring just like Chord using consistent hashing But everyone knows everyone else. Node join/leave Manually done An operator uses a console to add/delete a node Reason: it s a well-maintained system; nodes come back pretty quickly and don t depart permanently most of the time Membership change propagation Each node maintains its own view of the membership & the history of the membership changes Propagated using gossiping (every second, pick random targets) Eventually-consistent membership protocol Failure Detection Does not use a separate protocol; each request serves as a ping Dynamo has enough requests at any moment anyway If a node doesn t respond to a request, it is considered to be failed. 7 8 Original consistent hashing Load becomes uneven Consistent hashing with virtual nodes for better load balancing Start with a static number of virtual nodes uniformly distributed over the ring 9 10 One node joins and gets all virtual nodes One more node joins and gets 1/2 Node 2 11 12 C 2

One more node joins and gets 1/3 (roughly) from the other two Node 2 Node 3 CSE 486/586 Administrivia PA4 will be released soon. 13 14 N: # of replicas; configurable The first is stored regularly with consistent hashing N-1 replicas are stored in the N-1 (physical) successor nodes (called preference list) Any server can handle read/write in the preference list, but it walks over the ring E.g., try A first, then B, then C, etc. Update propagation: by the server that handled the request 15 16 Object Versioning Dynamo s replication is lazy. A put() request is returned right away (more on this later); it does not wait until all update is propagated to the replicas. This leads to inconsistency, solved by object versioning Writes should succeed all the time E.g., Add to Cart Used to reconcile inconsistent data Each object has a vector clock E.g., D1 ([Sx, 1], [Sy, 1]): Object D1 has written once by server Sx and Sy. Each node keeps all versions until the data becomes consistent Causally concurrent versions: inconsistency If inconsistent, reconcile later. E.g., deleted items might reappear in the shopping cart. 17 18 C 3

Object Versioning Example Object Versioning Consistency revisited Linearizability: any read operation reads the latest write. Sequential consistency: per client, any read operation reads the latest write. Eventual consistency: a read operations might not read the latest write & sometimes inconsistent versions need to be reconciled. Conflict detection & resolution required Dynamo uses vector clocks to detect conflicts Simple resolution done by the system (last-write-wins policy) Complex resolution done by each application System presents all conflicting versions of data 19 20 Object Versioning Experience Quorums Over a 24-hour period 99.94% of requests saw exactly one version 0.00057% saw 2 versions 0.00047% saw 3 versions 0.00009% saw 4 versions Usually triggered by many concurrent requests issued busy robots, not human clients Parameters N replicas R readers W writers Static quorum approach: R + W > N Typical Dynamo configuration: (N, R, W) == (3, 2, 2) But it depends High performance read (e.g., write-once, read-many): R==1, W==N Low R & W might lead to more inconsistency Dealing with failures Another node in the preference list handles the requests temporarily Delivers the replicas to the original node upon recovery 21 22 Replica Synchronization Replica Synchronization Key ranges are replicated. Say, a node fails and recovers, a node needs to quickly determine whether it needs to resynchronize or not. Transferring entire (key, value) pairs for comparison is not an option Merkel trees Leaves are hashes of values of individual keys Parents are hashes of (immediate) children Comparison of parents at the same level tells the difference in children Does not require transferring entire (key, value) pairs Comparing two nodes that are synchronized Two (key, value) pairs: (k0, v0) & (k1, v1) Node0 Equal Node1 23 24 C 4

Replica Synchronization Comparing two nodes that are not synchronized One: (k0, v2) & (k1, v1) The other: (k0, v0) & (k1, v1) h4 = hash(h2 + h1) h3 = hash(v2) Not equal Summary Amazon Dynamo Distributed key-value storage with eventual consistency Techniques Gossiping for membership and failure detection Consistent hashing for node & key distribution Object versioning for eventually-consistent data objects Quorums for partition/failure tolerance Merkel tree for resynchronization after failures/partitions Very good example of developing a principled distributed system Node0 Node1 25 26 Acknowledgements These slides contain material developed and copyrighted by Indranil Gupta (UIUC). 27 C 5