CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Similar documents
CS Amazon Dynamo

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

CAP Theorem, BASE & DynamoDB

Dynamo: Key-Value Cloud Storage

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Presented By: Devarsh Patel

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Scaling Out Key-Value Storage

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-Value Store

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

There is a tempta7on to say it is really used, it must be good

Dynamo: Amazon s Highly Available Key-value Store

Dynamo Tom Anderson and Doug Woos

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

EECS 498 Introduction to Distributed Systems

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

CS 655 Advanced Topics in Distributed Systems

10. Replication. Motivation

Reprise: Stability under churn (Tapestry) A Simple lookup Test. Churn (Optional Bamboo paper last time)

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Hash Tables Chord and Dynamo

11/12/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

The material in this lecture is taken from Dynamo: Amazon s Highly Available Key-value Store, by G. DeCandia, D. Hastorun, M. Jampani, G.

References. NoSQL Motivation. Why NoSQL as the Solution? NoSQL Key Feature Decisions. CSE 444: Database Internals

Eventual Consistency 1

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

FAQs Snapshots and locks Vector Clock

6.830 Lecture Spark 11/15/2017

Introduction to Distributed Data Systems

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Large-Scale Data Stores and Probabilistic Protocols

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Intuitive distributed algorithms. with F#

Riak. Distributed, replicated, highly available

Page 1. Key Value Storage"

10.0 Towards the Cloud

Distributed Hash Tables

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

CONSISTENT FOCUS. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

Building Consistent Transactions with Inconsistent Replication

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time.

CS5412: DIVING IN: INSIDE THE DATA CENTER

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Replica Placement. Replica Placement

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Consistency and Replication

Distributed Data Management Replication

Indexing Large-Scale Data

Cloud Computing. Lectures 11, 12 and 13 Cloud Storage

Distributed Key-Value Stores UCSB CS170

What Came First? The Ordering of Events in

CS655: Advanced Topics in Distributed Systems [Fall 2013] Dept. Of Computer Science, Colorado State University

DIVING IN: INSIDE THE DATA CENTER

Apache Cassandra - A Decentralized Structured Storage System

Haridimos Kondylakis Computer Science Department, University of Crete

CS5412: OTHER DATA CENTER SERVICES

DrRobert N. M. Watson

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar

Cloud Computing. Up until now

Scalable overlay Networks

Replication in Distributed Systems

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

11. Replication. Motivation

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

CS November 2017

Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Consistency in Distributed Systems

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

INF-5360 Presentation

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Consistency and Replication. Why replicate?

Replication and Consistency

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017

Network. CS Advanced Operating Systems Structures and Implementation Lecture 13. File Systems (Con t) RAID/Journaling/VFS.

Distributed Systems (5DV147)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Dynomite. Yet Another Distributed Key Value Store

CS5412: TRANSACTIONS (I)

CS5412: DIVING IN: INSIDE THE DATA CENTER

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Ch06. NoSQL Part III.

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Extreme Computing. NoSQL.

CS5412: TRANSACTIONS (I)

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

Peer-to-Peer Systems and Distributed Hash Tables

Module 7 - Replication

Transcription:

CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and performance requirements best-seller lists shopping carts customer preferences session management sales rank product catalog CS 138 XXIV 2 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Background Amazon e-commerce platform hundreds of services - from recommendations to fraud detection tens of thousands of servers - across many data centers worldwide RDBMS? no don t need complex queries RDBMS replication strategies are inefficient Simple queries key/value pairs (blobs), < 1MB CS 138 XXIV 3 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Transactions at Amazon ACID properties atomicity - pretty important consistency - weak is good isolation - forget it P D X P durability - pretty important CS 138 XXIV 4 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Efficiency Strict latency requirements for services measured at 99.9 th percentile Typical request to put together a page requires responses from 150 services Services rely on dynamo for storage CS 138 XXIV 5 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Service-Level Agreements Contract between client and server Example provide a response within 300 ms for 99.9% of its requests at a peak load of 500 requests/sec Requires admission control not discussed CS 138 XXIV 6 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Amazon Architecture CS 138 XXIV 7 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Design Considerations Replication important for durability and availability Tradeoff between consistency and performance writes should never be delayed reads should return quickly, despite possible inconsistencies CS 138 XXIV 8 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Assigning Data to Nodes Consistent hashing (as in Chord) no finger tables each node knows the complete assignment Issues replication coping with nodes of varying performance CS 138 XXIV 9 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Consistent Hashing coordinator(7) = 3 7 0 1 coordinator(1) = 3 coordinator(6) = 3 6 2 coordinator(2) = 3 5 3 4 1 2 6 7 CS 138 XXIV 10 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Adding a Node coordinator(7) = 3 7 0 1 coordinator(1) = 3 coordinator(6) = 3 6 2 coordinator(2) = 3 5 3 4 1 2 6 7 CS 138 XXIV 11 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Adding a Node coordinator(7) = 1 7 0 1 1 6 7 coordinator(1) = 1 coordinator(6) = 1 6 2 coordinator(2) = 3 5 3 4 2 CS 138 XXIV 12 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Adding a Node coordinator(7) = 1 7 0 1 1 6 7 coordinator(1) = 1 coordinator(6) = 1 6 2 coordinator(2) = 3 5 3 4 2 CS 138 XXIV 13 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Deleting a Node coordinator(7) = 3 7 0 1 coordinator(1) = 3 coordinator(6) = 3 6 2 coordinator(2) = 3 5 4 3 1 2 6 7 CS 138 XXIV 14 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Virtual Nodes real 6 7 7 0 1 6 2 5 4 3 1 2 CS 138 XXIV 15 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Virtual Nodes: Adding real 7 real 7 0 1 1 6 6 2 5 3 4 2 CS 138 XXIV 16 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Virtual Nodes: Adding real 7 real 7 0 1 1 real 6 6 2 2 5 4 3 CS 138 XXIV 17 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Virtual Nodes real 6 7 7 0 1 real 6 2 1 2 5 4 3 CS 138 XXIV 18 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Adding and Deleting Nodes Without virtual nodes (e.g., Chord) added nodes acquire objects only from successors deleted nodes give objects only to successors With (randomly distributed) virtual nodes added nodes acquire objects uniformly distributed deleted nodes give objects to uniformly distributed others CS 138 XXIV 19 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Replication Each key assigned to coordinator node via DHT Coordinator replicates data items at n-1 other nodes n is an application parameter next n-1 distinct real nodes on ring Set of n nodes for a data item is called its preference list CS 138 XXIV 20 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Versioning Eventual consistency update request may return to caller before it is propagated to all replicas delete request may reach replica that doesn t have what is being deleted - delete requests treated as put requests - deleted later on reconciliation versions are immutable (but garbage collectible) CS 138 XXIV 21 Copyright 2017 Thomas W. Doeppner. All rights reserved.

The Shopping Cart? CS 138 XXIV 22 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Vector Clocks CS 138 XXIV 23 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Reconciliation Reconciliation done only on reads, which may return multiple values easy reconciliation if values are causally ordered application handles it otherwise CS 138 XXIV 24 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Quorums gets and puts go to coordinator, if available put: coordinator writes new version locally sends it to next n-1 nodes when w-1 respond, put is considered successful get: coordinator requests existing versions from next n-1 nodes waits for r responses before returning to client r+w>n typically n=3, r=w=2 CS 138 XXIV 25 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Sloppy Quorum What if not all n nodes (or even w nodes) are available? don t want to deny a put request Use first n healthy nodes encountered data tagged with the node it should go on written back to that node when available Handles failures of entire data centers! storage nodes spread across data centers CS 138 XXIV 26 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Anti-Entropy Replicas synced in background How to determine whether replicas differ? Merkle trees used to compare contents leaves are hashes of contents parents are hashes of children Nodes maintain Merkle trees of key ranges each virtual node defines a key range CS 138 XXIV 27 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Ring Membership Gossip protocol used to distribute membership info and key ranges Failure detection simple time-out on communication CS 138 XXIV 28 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Performance: Latency CS 138 XXIV 29 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Performance: Buffering CS 138 XXIV 30 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Performance: Load Balancing CS 138 XXIV 31 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Example Uses Business-logic-specific reconciliation data replicated across multiple nodes reconciliation is application-specific - e.g.: shopping carts merged Timestamp-based reconciliation real-time stamp used: last write wins used to maintain customer session information High-performance read engine r=1, w=n used for product catalog and promotional items CS 138 XXIV 32 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Divergent Versions Is it a real problem with shopping carts? In one 24-hour period: 99.94% of requests saw exactly one version Divergent versions usually not due to failures Due to concurrent writers writers probably aren t human! CS 138 XXIV 33 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Post 2007 Dynamo successful, but not popular among developers difficult to integrate and manage SimpleDB preferred easier to use much less robust DynamoDB released in January 2012 combines both no details CS 138 XXIV 34 Copyright 2017 Thomas W. Doeppner. All rights reserved.

HW 4 due May 5 PuddleStore due May 8 The End interactive grading is 11am 3pm, May 11 13 Final Exam 2pm, May 16 last names beginning A-P: Biomed 202 last names beginning Q-Z: Biomed 291 covers entire course help session TBA old finals will soon be on web page CS 138 XXIV 35 Copyright 2017 Thomas W. Doeppner. All rights reserved.