Dynamo: Amazon s Highly Available Key-value Store

Similar documents
Dynamo: Amazon s Highly Available Key-value Store

CAP Theorem, BASE & DynamoDB

Reprise: Stability under churn (Tapestry) A Simple lookup Test. Churn (Optional Bamboo paper last time)

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

CS Amazon Dynamo

Distributed Hash Tables Chord and Dynamo

Dynamo: Amazon s Highly Available Key-value Store

Presented By: Devarsh Patel

There is a tempta7on to say it is really used, it must be good

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo: Amazon s Highly Available Key-Value Store

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

The material in this lecture is taken from Dynamo: Amazon s Highly Available Key-value Store, by G. DeCandia, D. Hastorun, M. Jampani, G.

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Dynamo: Amazon s Highly- Available Key- Value Store

CSE-E5430 Scalable Cloud Computing Lecture 10

Scaling Out Key-Value Storage

11/12/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

FAQs Snapshots and locks Vector Clock

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

CS655: Advanced Topics in Distributed Systems [Fall 2013] Dept. Of Computer Science, Colorado State University

Dynamo: Key-Value Cloud Storage

CS 655 Advanced Topics in Distributed Systems

A Cloud Storage Adaptable to Read-Intensive and Write-Intensive Workload

EECS 498 Introduction to Distributed Systems

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Load Balancing Technology Based On Consistent Hashing For Database Cluster Systems

Haridimos Kondylakis Computer Science Department, University of Crete

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Dynamo Tom Anderson and Doug Woos

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Cloud Computing. Lectures 11, 12 and 13 Cloud Storage

Improving Logical Clocks in Riak with Dotted Version Vectors: A Case Study

6.830 Lecture Spark 11/15/2017

Cassandra- A Distributed Database

Cloud Computing. Up until now

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

10. Replication. Motivation

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

10.0 Towards the Cloud

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

CONSISTENT FOCUS. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.

L22: NoSQL. CS3200 Database design (sp18 s2) 4/5/2018 Several slides courtesy of Benny Kimelfeld

A Transaction Model for Management for Replicated Data with Multiple Consistency Levels

NoSQL and Database as a Service

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

Two Levels Hashing Function in Consistent Hashing Algorithm

Eventual Consistency 1

Network. CS Advanced Operating Systems Structures and Implementation Lecture 13. File Systems (Con t) RAID/Journaling/VFS.

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

NoSQL Databases. The reference Big Data stack

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Eventual Consistency Today: Limitations, Extensions and Beyond

Riak. Distributed, replicated, highly available

CS 223 Final Project CuckooRings: A Data Structure for Reducing Maximum Load in Consistent Hashing

Intuitive distributed algorithms. with F#

NoSQL Data Stores. The reference Big Data stack

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

What Came First? The Ordering of Events in

Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Defining Weakly Consistent Byzantine Fault-Tolerant Services

References. NoSQL Motivation. Why NoSQL as the Solution? NoSQL Key Feature Decisions. CSE 444: Database Internals

SCALARIS. Irina Calciu Alex Gillmor

TIBCO StreamBase 10 Distributed Computing and High Availability. November 2017

Selective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation

/ Cloud Computing. Recitation 8 October 18, 2016

Large-Scale Data Stores and Probabilistic Protocols

Introduction to Distributed Data Systems

Today s topics. FAQs. Topics in BigTable. This material is built based on, CS435 Introduction to Big Data Spring 2017 Colorado State University

Semantically Aware Contention Management for Distributed Applications

Apache Cassandra - A Decentralized Structured Storage System

Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates. Minnesota Web Design Community Meetup Monday, February 3rd, :00pm to 8:30pm

Peer- to- Peer in the Datacenter: Amazon Dynamo

@ Twitter Peter Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica. UC Berkeley

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Scalable overlay Networks

Architecture and Implementation of Database Systems (Summer 2018)

Lecture 18: Peer-to-Peer applications

Tools for Social Networking Infrastructures

How Eventual is Eventual Consistency?

BIG DATA AND CONSISTENCY. Amy Babay

Chapter 6: Weak Isolation and Distribution

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Evaluating Auto Scalable Application on Cloud

Distributed Data Management Replication

Dynomite. Yet Another Distributed Key Value Store

11. Replication. Motivation

GreenCHT: A Power-Proportional Replication Scheme for Consistent Hashing based Key Value Storage Systems

Transcription:

Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels

Motivation Build a distributed storage system: Scale Simple: key-value Highly available Guarantee Service Level Agreements (SLA)

System Assumptions and Requirements Query Model: simple read and write operations to a data item that is uniquely identified by a key. ACID Properties: Atomicity, Consistency, Isolation, Durability. Efficiency: latency requirements which are in general measured at the 99.9th percentile of the distribution. Other Assumptions: operation environment is assumed to be non-hostile and there are no security related requirements such as authentication and authorization.

Service Level Agreements (SLA) Application can deliver its functionality in a bounded time: Every dependency in the platform needs to deliver its functionality with even tighter bounds. Example: service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second. Service-oriented architecture of Amazon s platform

Design Consideration Sacrifice strong consistency for availability Conflict resolution is executed during read instead of write, i.e. always writeable. Other principles: Incremental scalability. Symmetry. Decentralization. Heterogeneity.

Summary of techniques used in Dynamo and their advantages Problem Technique Advantage Partitioning Consistent Hashing Incremental Scalability High Availability for writes Vector clocks with reconciliation during reads Version size is decoupled from update rates. Handling temporary failures Sloppy Quorum and hinted handoff Provides high availability and durability guarantee when some of the replicas are not available. Recovering from permanent failures Anti-entropy using Merkle trees Synchronizes divergent replicas in the background. Membership and failure detection Gossip-based membership protocol and failure detection. Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information.

Partition Algorithm Consistent hashing: the output range of a hash function is treated as a fixed circular space or ring. Virtual Nodes : Each node can be responsible for more than one virtual node.

Advantages of using virtual nodes If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes. When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes. The number of virtual nodes that a node is responsible can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

Replication Each data item is replicated at N hosts. preference list : The list of nodes that is responsible for storing a particular key.

Data Versioning A put() call may return to its caller before the update has been applied at all the replicas A get() call may return many versions of the same object. Challenge: an object having distinct version sub-histories, which the system will need to reconcile in the future. Solution: uses vector clocks in order to capture causality between different versions of the same object.

Vector Clock A vector clock is a list of (node, counter) pairs. Every version of every object is associated with one vector clock. If the counters on the first object s clock are less-than-or-equal to all of the nodes in the second clock, then the first is an ancestor of the second and can be forgotten.

Vector clock example

Execution of get () and put () operations 1. Route its request through a generic load balancer that will select a node based on load information. 2. Use a partition-aware client library that routes requests directly to the appropriate coordinator nodes.

Sloppy Quorum R/W is the minimum number of nodes that must participate in a successful read/write operation. Setting R + W > N yields a quorum-like system. In this model, the latency of a get (or put) operation is dictated by the slowest of the R (or W) replicas. For this reason, R and W are usually configured to be less than N, to provide better latency.

Hinted handoff Assume N = 3. When A is temporarily down or unreachable during a write, send replica to D. D is hinted that the replica is belong to A and it will deliver to A when A is recovered. Again: always writeable

Other techniques Replica synchronization: Merkle hash tree. Membership and Failure Detection: Gossip

Implementation Java Local persistence component allows for different storage engines to be plugged in: Berkeley Database (BDB) Transactional Data Store: object of tens of kilobytes MySQL: object of > tens of kilobytes BDB Java Edition, etc.

Evaluation

Evaluation