Distributed Key-Value Stores UCSB CS170

Similar documents
Page 1. Key Value Storage"

Page 1. Key Value Storage" System Examples" Key Values: Examples "

Page 1. Key Value Storage" System Examples" Key Values: Examples "

Haridimos Kondylakis Computer Science Department, University of Crete

Distributed Hash Tables Chord and Dynamo

Dynamo: Key-Value Cloud Storage

Large-Scale Data Stores and Probabilistic Protocols

CS 655 Advanced Topics in Distributed Systems

Distributed Data Store

Scaling Out Key-Value Storage

Peer-to-Peer Systems and Distributed Hash Tables

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Eventual Consistency 1

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

CS162 Operating Systems and Systems Programming Lecture 23. Distributed Storage, Key-Value Stores, Security. Recall: Two Phase (2PC) Commit

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Key Value Store. Yiding Wang, Zhaoxiong Yang

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Data Management Replication

Presented By: Devarsh Patel

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

SCALABLE CONSISTENCY AND TRANSACTION MODELS

CSE 344 JULY 9 TH NOSQL

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Distributed File Systems II

Distributed KIDS Labs 1

The Google File System

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Distributed Hash Tables

It also performs many parallelization operations like, data loading and query processing.

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

The Google File System

Dynamo: Amazon s Highly Available Key-Value Store

Database Architectures

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Replication in Distributed Systems

Distributed Filesystem

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

10. Replication. Motivation

Database Systems CSE 414

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Distributed Storage Systems: Data Replication using Quorums

Apache Cassandra - A Decentralized Structured Storage System

The Google File System

CS 43: Computer Networks. 08:Network Services and Distributed Systems 19 September

CIB Session 12th NoSQL Databases Structures

Staggeringly Large File Systems. Presented by Haoyan Geng

Architekturen für die Cloud

CS Amazon Dynamo

ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table

Distributed Systems CS6421

Google File System. By Dinesh Amatya

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Cassandra Design Patterns

Column-Family Databases Cassandra and HBase

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Distributed Systems (5DV147)

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

Large-Scale Web Applications

Scalable Causal Consistency for Wide-Area Storage with COPS

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Introduction to Database Services

Handling Churn in a DHT

Database Architectures

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

BIG DATA AND CONSISTENCY. Amy Babay

The Google File System (GFS)

Google File System. Arun Sundaram Operating Systems

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

DIVING IN: INSIDE THE DATA CENTER

Introduction to Distributed Data Systems

Erasure Coding in Object Stores: Challenges and Opportunities

Cloud Computing CS

Lecture 11 Hadoop & Spark

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

Distributed PostgreSQL with YugaByte DB

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Building Consistent Transactions with Inconsistent Replication

THE DATACENTER AS A COMPUTER AND COURSE REVIEW

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Peer-to-peer systems and overlay networks

Transcription:

Distributed Key-Value Stores UCSB CS170

Overview Key-Value Stores/Storage Architecture Replication management

Key Value Stores: Important system service on a cluster of machines Handle huge volumes of data, e.g., PetaBytes! Store (key, value) tuples Simple interface put(key, value); Insert/write value associated with key value = get(key); Get/read data associated with key Used sometimes as a simpler but more scalable database

KV-stores and Relational Tables Viewed as two-column (key, value) tables with a single key column. Can be used to implement more complicated relational tables: State ID Population Area Senator_1 Alabama 1 4,822,023 52,419 Sessions Alaska 2 731,449 663,267 Begich Arizona 3 6,553,255 113,998 Boozman Arkansas 4 2,949,131 53,178 Flake California 5 38,041,430 163,695 Boxer Colorado 6 5,187,582 104,094 Bennet Index

KV-stores and Relational Tables The KV-version of the previous table includes one table indexed by the actual key, and others by an ID. State ID ID Population ID Area ID Senator_1 Alabama 1 1 4,822,023 1 52,419 1 Sessions Alaska 2 2 731,449 2 663,267 2 Begich Arizona 3 3 6,553,255 3 113,998 3 Boozman Arkansas 4 4 2,949,131 4 53,178 4 Flake California 5 5 38,041,430 5 163,695 5 Boxer Colorado 6 6 5,187,582 6 104,094 6 Bennet

KV-stores and Relational Tables You can add indices with new KV-tables: Thus KV-tables are used for column-based storage, as opposed to row-based storage typical in older DBMS. State ID ID Population Senator_1 ID Alabama 1 Alaska 2 1 4,822,023 2 731,449 Sessions 1 Begich 2 Arizona 3 3 6,553,255 Boozman 3 Arkansas 4 4 2,949,131 Flake 4 California 5 5 38,041,430 Boxer 5 Colorado 6 6 5,187,582 Bennet 6 Index Index_2

Key Values: Examples Amazon: Key: customerid Value: customer profile (e.g., buying history, credit card,..) Facebook, Twitter: Key: UserID Value: user profile (e.g., posting history, photos, friends, ) icloud/itunes: Key: Movie/song name Value: Movie, Song

Amazon Key-Value Storage Systems in Real Life DynamoDB: internal key value store used for Amazon.com (cart) Amazon SimpleDB. Simple Storage System (S3) BigTable/HBase/Hypertable: distributed, scalable data store Cassandra: distributed data management system (developed by Facebook) Memcached: in-memory key-value store for small chunks of arbitrary data (strings, objects) BitTorrent distributed file location: peer-to-peer sharing system Redis, Oracle NSQL Database Distributed file systems: set of (file block ID, file block)

Key Value Store Also called Distributed Hash Tables (DHT) Main idea: partition set of key-values across key, value many machines

Challenges Fault Tolerance: handle machine failures without losing data and without degradation in performance Scalability: Need to scale to thousands of machines Need to allow easy addition of new machines Consistency: maintain data consistency in face of node failures and message losses Heterogeneity (if deployed as peer-to-peer systems): Latency: 1ms to 1000ms Bandwidth: 32Kb/s to 100Mb/s

Key Questions put(key, value): where to store a new (key, value) tuple? get(key): where is the value associated with a given key stored? And, do the above while providing Fault Tolerance Scalability Consistency

Directory-Based Architecture Have a node maintain the mapping between keys and the machines (nodes) that store the values associated with the keys Master/Directory put(k14, V14) K5 N2 K14 N3 K105N50 K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 from UCB CS162

Directory-Based Architecture Have a node maintain the mapping between keys and the machines (nodes) that store the values associated with the keys get(k14) V14 Master/Directory K5 N2 K14 N3 K105N50 K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 from UCB CS162

Directory-Based Architecture Having the master relay the requests recursive query Another method: iterative query (this slide) Return node to requester and let requester contact node put(k14, V14) N3 K5 N2 K14 N3 K105N50 Master/Directory K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 from UCB CS162

Directory-Based Architecture Having the master relay the requests recursive query Another method: iterative query Return node to requester and let requester contact node get(k14) N3 V14 Master/Directory K5 N2 K14 N3 K105N50 K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 from UCB CS162

Discussion: Iterative vs. Recursive Query get(k14) V14 Master/Directory K14 N3 get(k14) N3 V14 Master/Directory K14 N3 Recursive Iterative K14 V14 K14 V14 N1 N2 N3 N50 N1 N2 N3 N50 Recursive Query: Advantages: Faster, as typically master/directory closer to nodes Easier to maintain consistency, as master/directory can serialize puts()/gets() Disadvantages: scalability bottleneck, as all Values go through master/directory Iterative Query Advantages: reduce the chance that master becomes a bottleneck Disadvantages: slower, harder to enforce data consistency. Expose the internal node structure

Scalability More Storage: use more nodes Directory keeps track of the storage availability at each node More Requests: Can serve requests from all nodes on which a value is stored in parallel Master can replicate a popular value on more nodes Master/directory scalability: Replicate it Partition it, so different keys are served by different masters/directories How do you partition?

Fault Tolerance Replicate value on several nodes Usually, place replicas on different racks in a datacenter to guard against rack failures put(k14, V14) K14 V14 Master/Directory K5 N2 K14 N1,N3 K105N50 K5 V5 K14 V14 K105V105 N 1 N 2 N 3 N 50 What happens when a node fails? Need to replicate values from fail node to other nodes

Consistency Need to make sure that a value is replicated correctly How do you know a value has been replicated on every node? Wait for acknowledgements from every node What happens if a node fails during replication? Pick another node and try again What happens if a node is slow? Slow down the entire put()? Pick another node? In general, with multiple replicas Slow puts and fast gets

Consistency (cont d) If concurrent updates (i.e., puts to same key) may need to make sure that updates happen in the same order put(k14, V14 ) put(k14, V14 ) Master/Directory K5 N2 K14 N1,N3 K105N50 put(k14, V14 ) and put(k14, V14 ) reach N 1 & N 3 in reverse order What does get(k14) return? Undefined! K14 V14 K5 V5 K14 V14 K105V105 Inconsistent write/write N 1 N 2 N 3 N 50 from UCB CS162

Read after Write Read not guaranteed to return value of latest write Can happen if Master processes requests in different threads Master/Directory put(k14, V14 ) get(k14) V14 K5 K14 N2 N1,N3 K105 N50 get(k14) happens right after put(k14, V14 ) get(k14) reaches N3 before put(k14, V14 )! K14 V14 K5 V5 K14 V14 K105 V105 N 1 N 2 N 3 N 50 Inconsistent write/read from UCB CS162

Quorum Consensus for Consistency Reading or writing involves multiple replicas. But not wait for all replicas : improve put() and get() operation performance Define a replica set of size N put() yields writes to all replicas, and waits for acknowledgements from at least W replicas. The writer returns after it hears form these replicas. Ensure sufficient replicas have right versions. get() asks a response from all replicas and waits for responses from at least R replicas. Use timestamp to get the latest version. W+R > N Why does it work?

Quorum Consensus Example N=3, W=2, R=2 Replica set for K14: {N 1, N 3, N 4 } Assume put() on N 3 fails. But put(k14, V14) K14 V14 K14 V14 N 1 N 2 N 3 N 4

Quorum Consensus Example Now, issuing get() to any two nodes out of three will return the answer NIL get(k14) K14 V14 K14 V14 N 1 N 2 N 3 N 4 from UCB CS162

Questions : Key Value Stores Q1: True _ False _ On a single machine, key-value store can be implemented as a hash table. Q2: 1 _ 2 _ 3_ machine failures can be tolerated with the number of replicas as 3 in a distribued key-value store. Q3: 1 _ 2 _ 3_ replicas must respond for a read in a quorum-based scheme when # of relicas =3 and # of replicas to respond a write operation is 2. Remember W+R >N where W=2 and N=3. Thus R>N-W=1

Summary: Key-Value Stores Very large scale storage systems Two operations put(key, value) value = get(key) Challenges Fault Tolerance à replication Scalability à serve get() s in parallel; replicate/cache hot tuples Consistency à quorum consensus to improve put/get performance