CS 655 Advanced Topics in Distributed Systems

Similar documents
Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Tools for Social Networking Infrastructures

Dynamo: Amazon s Highly Available Key-Value Store

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

CS Amazon Dynamo

Presented By: Devarsh Patel

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Scaling Out Key-Value Storage

CAP Theorem, BASE & DynamoDB

Dynamo: Amazon s Highly Available Key-value Store

CISC 7610 Lecture 2b The beginnings of NoSQL

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Outline. Introduction Background Use Cases Data Model & Query Language Architecture Conclusion

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

FAQs Snapshots and locks Vector Clock

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Dynamo: Key-Value Cloud Storage

Extreme Computing. NoSQL.

10. Replication. Motivation

Apache Cassandra - A Decentralized Structured Storage System

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS November 2017

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Dynamo: Amazon s Highly Available Key-value Store

There is a tempta7on to say it is really used, it must be good

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider

11/12/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

Haridimos Kondylakis Computer Science Department, University of Crete

Reprise: Stability under churn (Tapestry) A Simple lookup Test. Churn (Optional Bamboo paper last time)

CS November 2018

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Riak. Distributed, replicated, highly available

Migrating Oracle Databases To Cassandra

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Chapter 24 NOSQL Databases and Big Data Storage Systems

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Introduction to Distributed Data Systems

CIB Session 12th NoSQL Databases Structures

Distributed Data Store

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Distributed Hash Tables Chord and Dynamo

Migrating to Cassandra in the Cloud, the Netflix Way

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

BIG DATA AND CONSISTENCY. Amy Babay

Distributed File Systems II

Indexing Large-Scale Data

MongoDB Architecture

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Rule 14 Use Databases Appropriately

Comparing SQL and NOSQL databases

CompSci 516 Database Systems

EECS 498 Introduction to Distributed Systems

Outline. Spanner Mo/va/on. Tom Anderson

Advanced Database Technologies NoSQL: Not only SQL

Key Value Store. Yiding Wang, Zhaoxiong Yang

Data Management for Big Data Part 1

Cassandra- A Distributed Database

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Axway API Management 7.5.x Cassandra Best practices. #axway

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

4. Managing Big Data. Cloud Computing & Big Data MASTER ENGINYERIA INFORMÀTICA FIB/UPC. Fall Jordi Torres, UPC - BSC

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

Dynamo Tom Anderson and Doug Woos

A Non-Relational Storage Analysis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Non-Relational Databases. Pelle Jakovits

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building Consistent Transactions with Inconsistent Replication

Big Data Analytics. Rasoul Karimi

Intuitive distributed algorithms. with F#

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Scaling for Humongous amounts of data with MongoDB

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability

CA485 Ray Walshe NoSQL

References. NoSQL Motivation. Why NoSQL as the Solution? NoSQL Key Feature Decisions. CSE 444: Database Internals

Eventual Consistency 1

DIVING IN: INSIDE THE DATA CENTER

Cassandra Design Patterns

BigTable: A Distributed Storage System for Structured Data

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Transcription:

Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1

Outline Problem Solution Approaches Comparison Conclusion 2

Problem 3

Problem Increase the data => add DBs Increase the complexity Expensive Need expertise Deal with failed DBs Upgrade and repair DBs without downtime 4

Problem Description Growing the global Internet use leads to having applications with millions of users. The applications must: Support unpredictable number of concurrent users Overnight, The number can grow from zero to a million users Efficiently manage big data that must be distributed across several nodes 2012- Amazon sold 27 million items (306 items/s) 2013- Facebook has 1.26 billion users 2013- Flickr has 87 million users and stores 8 billion of phones Be reliable and always available 5

Problem Description Incredible increasing amount of Data Users Commodity computers Is driving the need for storage systems that Targets specific usage pattern Easy to scale 6

Why not use RDBMs? Does not scale well Designed to run on one server and not on cluster of nodes Availability issue in case of Partitioning No support for different data models No need for RDBMs full functionality Cost money 7

NoSQL Databases The described problem is motivating more organizations to move to NoSQL databases that from beginning have been built to be Distributed Scale out Reliable Fault tolerance 8

NoSQL The appeared NoSQL databases have Common Characteristic Schema-less Non-relational Ability to run on large cluster Horizontally scalable Distributed 9

NoSQL Databases NoSQL databases use Partitioning To exploit storage capabilities To Balance load Replication To increase performance To provide high availability To recover from failure Running on large cluster To scale out 10

NoSQL Databases Consistency Issue Concurrent users and data replicas distributed across different nodes Could cause consistency issue. CAP Theorem 11

NoSQL Databases Consistency Issue For partition-tolerance, choice can be made based on domain business Strict consistency Eventual consistency Tradeoff consistency versus response time 12

NoSQL databases Based on the data models the NoSQL databases support, they can be classified into four groups: Key-Value data model Ex: Dynamo, DynamoDB, Redis Column family data model Ex: Bigtable, Cassandra Graph data model Ex: neo4j Document data model Ex: MonogoDB, CouchDB 13

NoSQL databases Key-Value data model Simple and very powerful model The other models relying on it Data values identified by keys System dose not care about the internal structure of the data Key is used to store and retrieve an object Poor aggregation capabilities Value modeling done at application level 14

NoSQL databases Column family data model Improvement of key-value model Organizing columns into families Locality groups Providing different level data access Ex: Bittable uses Source: http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/ <column families, columns, timestamp> Support aggregation operations Value can be anything 15

16

Cassandra Implemented by Facebook to solve the Inbox Search problem Open source Cassandra = Dynamo(design) + Bigtable (data model) P2P O(1) routing Supporting flexible structured data model Used by more 100 companies Including Facebook, Netflix, ebay, and Twitter The largest known cluster has over 300 TB data in over 400 nodes 17

Data Model 18

Cassandra Data Model Column-Oriented Data Model Same model of Bigtable Row <Value> Column Family Simple Super Columns Name, Value, Timestamp Sorted by name or time 19

Cassandra Data Model 20

Data Partition 21

Data Partition(1) The same concept of Dynamo to distribute data across nodes Organizing nodes in logical ring Using consistent hashing to determine data locations on the ring MD5 produces hashed identifier in 2 128 key space Each node is assigned a random key (token) between 0 and 2 128-1 Placing it in random location on the ring 22

Data Partition (2) Each node is responsible for the keys fall Between itself and its predecessor Data rows stored in nodes Based on their hashed keys The responsible node called coordinator Unbalance? Move nodes 23

Data Partition (3) Comparing to other systems Decentralized Cassandra and Dynamo Using consisting hashing Centralized Bigtable One master assigns tablets to tablet servers GFS One master assigns chunks to chunk servers 24

Replication 25

Replication (1) High availability, durability, and performance is achieved by Replicating the data across data centers Each data item replicated at N nodes Configurable replication factor N When a node joins the system, it receives data items and replicas The Coordinator is the responsible for replicating its data Replication can be configured to performed across on many data centers 26

Replication (2) Replication Strategies The first replica is always stored at coordinator, but the remainder will be placed according to the chosen strategy. Rack Unaware Replicating at N-1 successive nodes Rack Aware Placing one replica in another data center Placing the remaining N-2 across different racks in the same data center of the coordinator Datacenter Aware Placing M replicas in another data center Placing the remaining N-(M+1) across different racks in the same data center of the coordinator 27

Replication (3) Rack Unaware Default For single data center Example: N=3 If node fails, data of N nodes must be replicated failing node? Data transfer Locally effect 28

Replication (4) Rack Aware & Datacenter Aware Provides higher availability and adds extra latency To keep the effect locally Keep neighbors from different DCs Use 1 st node along the ring That belongs in another data center 29

Replication (5) Comparing to other systems Cassandra and Dynamo use the same replication technique Both replicate rows data Pnuts is geographic replication Tablets containing rows are replicated One replica per record is chosen as master Mainly used to provide quick response Bigtable relies on GFS to replicate data Tablets containing column families are replicated The main replica is managed by one tablet server 30

Membership & Failure Detection 31

Membership & Failure Detection Membership Joined nodes can take InitialToken specified in the configuration file or Token so that highly loaded node will be alleviated? Gossip protocol is used so that each node sends State information about itself and about other nodes they know To three other nodes every second Every node knows about other nodes in the system 32

Membership & Failure Detection Failure Detection Cassandra uses Φ Accrual Failure Detector (AFD) algorithm to detect whether a node is dead or not Traditional failure detectors implements heartbeats AFD finds a suspicion level for each node Between the extremes of dead and alive AFD computes per-node threshold considering network performance, workload, or other parameters Lightly loaded nodes can move to alleviate heavily loaded nodes 33

Failure detection Comparing to other systems Bigtable Joined servers register themselves with chubby service Master can detect failing servers by checking the servers folder on chubby service. Master reassign tablets to other servers to balance load GFS Uses heartbeats to detect failing nodes Joined servers must contact the master 34

Consistency 35

Consistency(1) Client specifies consistency level for read & write operations Write Read Level Description Level Description Zero Immediately return One One Replica Any Minimum one Replica or hint Quorum N 2 + 1 Replicas One One Replica All All Replicas Quorum N 2 + 1 Replicas All All Replicas 36

Consistency(2) Consistency Quorum Level Choose R and W so that R + W > N (overlap prevents read-write conflict) W > N/2 (prevents write-write conflicts) 37

Consistency (3) Comparing to other systems PNUTS One replica exists in each region One replica is always chosen as a master Master applies updates To all replicas per record In the same order but might be not at the same time Eventual consistency Immediately applying to local replica Lazy applying to other replicas Cause race condition 38

Consistency (4) Comparing to other systems GFS One replica server is chosen by the master as a primary Primary replica server applies updates to other replicas Applied updates increase the version number for each replica Version number is used to detect stale replicas 39

Evaluation (1) High Performance Results of operations done on 50 GB Data Inbox Search MySQL VS Cassandra Source: Slides of the authors of the paper Writes Average Reads Average MySQL ~300 ms ~350 ms Cassandra 0.12 ms 15 ms 40

Evaluation (2) Latency Source: Benchmarking Top NoSQL Databases White Paper BY DATASTAX CORPORATION FEBRUARY 2013 41

Evaluation (3) Throughput Source: Benchmarking Top NoSQL Databases White Paper BY DATASTAX CORPORATION FEBRUARY 2013 42

43

Dynamo High available Data partitioned and replicated Write operation is always available Scalable Decentralized Data partitioning used consistent hashing Consistency alleviated by versioning Quorum technique for replicas consistency Decentralized replica synchronization Eventual consistency 44

Dynamo Membership and failure handling Gossip protocol No authentication required Trusted environment Service Level Agreement (SLA) Dynamo targets application that Requires primary key to access data Deals with small values(< 1 MB) 45

Data Model 46

Data model Using unstructured key-value data model The value can be anything Allowing putting and getting value using the key Does not support aggregations 47

Data Partition 48

Partitioning(1) Evenly partitioning the data across the nodes Using MD5 produces hashed identifier in 2 128 key space Each node is assigned a random key (token) Data items stored in nodes according to their hashed keys Each node will be responsible for the keys Fall in the range between it and its predecessor The responsible node is called the coordinator Each node will have full information about others Using gossip protocol 49

Partitioning(2) Assigning nodes to random position leads Uneven data distribution Uneven load balance Splitting powerful physical node to virtual nodes To avoid unevenly data distribution To exploit heterogeneity in the system Number of virtual nodes depends on node capacity If node maintaining virtual nodes fails Its data will be dispersed over available nodes To avoid unbalance in the ring Virtual nodes are used 50

Replication 51

Replication (1) Dynamo achieves high availability and durability By replicating each piece of data N -1 times Coordinator replicates data on N 1 successor nodes Coordinator for each key stores preference list To know where to replicate data 52

Replication (2) Preference list for particular key containing replicas nodes for that key The preference list maintains information of more than N nodes To cope with failing nodes To use some of them for hinted handoff Some positions in the ring are not included in the list To avoid having of virtual nodes of the same physical node in preference list 53

Versions 54

Versions (1) Dynamo aims for availability Achieved by performing asynchronies updates to the replicas Old versions could be used to provide high availability In cases of failing of the coordinator and Existing not up to date replicas To avoid having multiple versions of the same object Dynamo make these versions as immutable 55

Versions (2) Concurrent updates with failure Conflicting versions can be resulted Vector clock is used to capture causality between Conflicting versions New version always wins If the causality is considered as concurrent Automatic reconciliation is not possible The clients have to reconcile it Merging different versions of shipping cart 56

Consistency 57

Consistency Dynamo uses Quorum protocol to provide different levels of consistency on Read and write operations By choosing values of R and W Small R and large W means Reduce the response time for read operation Increasing the latency to do the write operation 58

Failure Handling 59

Failure Handling If coordinator node fails Replica of data coordinated by the failed node will be sent to the successor (N+1) th node Once the coordinator comes back The (N+1) th node will send the data into it The data will be deleted from the (N+1) th node In the case of failing of data center, Replicas of each object are stored on different data centers 60