Trade- Offs in Cloud Storage Architecture. Stefan Tai

Similar documents
Architekturen für die Cloud

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Eventual Consistency 1

CS Amazon Dynamo

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Dynamo: Amazon s Highly Available Key-Value Store

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

10. Replication. Motivation

Performance Evaluation of NoSQL Databases

CS 655 Advanced Topics in Distributed Systems

CIB Session 12th NoSQL Databases Structures

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"

CISC 7610 Lecture 2b The beginnings of NoSQL

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

Eventual Consistency Today: Limitations, Extensions and Beyond

Presented By: Devarsh Patel

Migrating Oracle Databases To Cassandra

Dynamo: Key-Value Cloud Storage

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

@ Twitter Peter Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica. UC Berkeley

Introduction to NoSQL Databases

Modern Database Concepts

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Research Faculty Summit Systems Fueling future disruptions

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

Experiment-Driven Evaluation of Cloud-based Distributed Systems

Intra-cluster Replication for Apache Kafka. Jun Rao

Advanced Database Technologies NoSQL: Not only SQL

Scalable backup and recovery for modern applications and NoSQL databases. Best practices for cloud-native applications and NoSQL databases on AWS

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari

Cloud & AWS Essentials Agenda. Introduction What is the cloud? DevOps approach Basic AWS overview. VPC EC2 and EBS S3 RDS.

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

CompSci 516 Database Systems

11. Replication. Motivation

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Dynamo: Amazon s Highly Available Key-value Store

Distributed Data Management Replication

Chapter 24 NOSQL Databases and Big Data Storage Systems

A Predictive Load Balancing Service for Cloud-Replicated Databases

GlobalFS: A Strongly Consistent Multi-Site Filesystem

Building Consistent Transactions with Inconsistent Replication

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

Not ACID, not BASE, but SALT A Transaction Processing Perspective on Blockchains

Achieving the Potential of a Fully Distributed Storage System

Quantitative Analysis of Consistency in NoSQL Key-value Stores

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

How Netflix Leverages Multiple Regions to Increase Availability: Isthmus and Active-Active Case Study

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

EECS 498 Introduction to Distributed Systems

Riak. Distributed, replicated, highly available

[This is not an article, chapter, of conference paper!]

A Holistic View of Telco Clouds

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Aurora, RDS, or On-Prem, Which is right for you

CA ERwin Data Modeler s Role in the Relational Cloud. Nuccio Piscopo.

Microservices at Netflix Scale. First Principles, Tradeoffs, Lessons Learned Ruslan

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

Azure Cosmos DB. Planet Earth Scale, for now. Mike Sr. Consultant, Microsoft

CONSISTENT FOCUS. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.

CAP Theorem, BASE & DynamoDB

Migrating to Cassandra in the Cloud, the Netflix Way

NoSQL Concepts, Techniques & Systems Part 1. Valentina Ivanova IDA, Linköping University

TROPIC: Transactional Resource Orchestration Platform In the Cloud

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability

CAP and the Architectural Consequences

CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs

Transactions and ACID

Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

How Eventual is Eventual Consistency?

Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar

Axway API Management 7.5.x Cassandra Best practices. #axway

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Changing Requirements for Distributed File Systems in Cloud Storage

Dynamo: Amazon s Highly Available Key-value Store

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra

OpenStack Seminar Disruption, Consolidation and Growth. Woodside Capital Partners

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

10.0 Towards the Cloud

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

NoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam

Differentiating Your Datacentre in the Networked Future John Duffin

Cloud Computing: Is it safe for you and your customers? Alex Hernandez DefenseStorm

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Improving Logical Clocks in Riak with Dotted Version Vectors: A Case Study

Implementing SaaS on Kubernetes

Transcription:

Trade- Offs in Cloud Storage Architecture Stefan Tai

Cloud computing is about providing and consuming resources as services There are five essential characteristics of cloud services [NIST] [NIST]: http://csrc.nist.gov/groups/sns/cloud- computing/

1. Elastic Scalability

2. On- demand Self- Service

3. Ubiquitous Network Access

4. Resource Pooling

5. Measured Service

Essential characteristics Security High Availability Reliability Rapid Elasticity Data Consistency Measured Service On- demand Self- service Demand- side Aggregation Multi- Channel Access Multi- Tenancy Efficiency Resource Pooling Supply- side Savings Broad Network Access Client Latency

...and diverse Trade- offs Security High Availability Reliability Rapid Elasticity Data Consistency Measured Service Demand- side Aggregation $ On- demand Self- service Multi- Channel Access Multi- Tenancy Efficiency Resource Pooling Supply- side Savings Broad Network Access Client Latency

Trade- offs are unavoidable Trade- off decisions cannot (always) be made during design- time, rather, the factors determining trade- off decisions are likely to dynamically change during runtime, e.g. Workload patterns might change because user location changes times of (peek) usage changes (partly user- controlled) applications & features change Failure patterns might change because hardware characteristics change software characteristics change network reliability changes

Key Challenges Measurable qualities and trade- offs Consumer- observable vs. Provider- side Metrics: $ vs. msec Tunable qualities and trade- offs At runtime, not (only) design/deployment- time Continuous evaluation

Example: Cloud Storage Service

A simple Web API to start with Storage Consumer Storage Provider write (file, key) read (file, key) Programming Model Storage Model

Let s have a closer look at write() Storage Consumer Storage Provider write (file, key) Programming Model Storage Model

(N,R,W) Quorum System Storage Consumer Storage Provider the girl with the dragon tattoo (key) write (file, key) Programming Model fff A 000 030849cd38 (MD5 hash of key) bbb E Coordinator Node B 333 Replica Node Storage Model 999 D C 666

with Eventual Consistency Storage Consumer Storage Provider the girl with the dragon tattoo (key) write (file, key) Programming Model fff A 000 030849cd38 (MD5 hash of key) bbb E Coordinator Node B 333 Replica Node Hinted Handoff Node Storage Model 999 D C 666

A well- known Trade- Off: CAP Theorem Consis- tency Avail- ability Either C, or A, but not both (Network Partitions assumed in a Cloud context) Tolerance to Network Partitions

Consistency vs. Availability Atomic Consistent Isolated Durable ACID BASE Basically Available Soft- State Eventually Consistent Traditional DBMS NoSQL Cloud Storage

Measuring client- observable t0 Client Eventual Consistency Send Update Request Cloud Storage Provider #1 t1 Receive Response How soon (how late) is eventual? Client- observable time window (t1,t2) Update Propagation #2 #n t2 time Bermbach/Tai 2011

Setup Cloud Clock Sync. Protocol

Experimental Findings for Amazon S3 LOW SAW [Bermbach2011]: David Bermbach and Stefan Tai: Eventual Consistency: How soon is eventual? An Evaluation of Amazon S3 s Consistency Behavior Middleware for Services Computing Workshop. to appear December 2011, ACM.

Length of S3 LOW/SAW Periods Avg. Bermbach/Tai 2011

Experimental Findings for Apache Cassandra Bermbach/Tai 2011

Comparing S3 and Cassandra Amazon S3 1 Apache Cassandra 2 Two different periodicities 12% violations of monotonic read consistency One availability zone out of three usually lags behind (in 50% of all tests) > 99 % of all LOW writes create consistent data after 175ms Read availability > 8 nines Geometric distribution 0.0006% violations of monotonic read consistency No influence of geographic distribution > 99% of all writes create consistent data after 35ms Read availability 100% 1 Setup: high redundancy with replication over 3 availability zones; test duration 7 days 2 Setup: deployed on 3 large EC2 instances in different availability zones; Consistency level ONE; 3 replica; test duration 24h Bermbach/Tai 2011

Further S3 Measurements Two files, same bucket, different behavior Bermbach/Tai 2011

Client- observable measurable qualities as defined, as agreed?

Objective (1.): Understanding the C- A- Spectrum for use in Applications A B A S E A C I D C t

(2.) Tuning Knobs to dynamically manage trade- offs Storage Consumer Storage Provider the girl with the dragon tattoo (key) write (file, key) Programming Model fff A 000 030849cd38 (MD5 hash of key) config_ring (N,R,W) = (3,2,2) Config Model bbb E Coordinator Node Replica Node B 333 N Durability Storage Model R W Read- Availability Write- Availability 999 D C 666

3. Reliable, client- observable behavior

Our Agenda Understanding relevant qualities and critical trade- off decisions that need to be addressed on a per- application basis, on a per- provider, and per- client/ tenant basis Providing tuning knobs that translate into novel programming / configuration / deployment models for the Cloud Continuously monitoring and evaluating runtime data (e.g., nr. of transactions, load and utilization) within the trade- off model and automating trade- off adjustments

Our Vision: The Harmonic Cloud Architecture Trade- Off Equalizer

Thank You Stefan Tai tai@kit.edu / tai@fzi.de

Acknowledgments David Bermbach (david.bermbach@kit.edu) Markus Klems (markus.klems@kit.edu)