Resiliency at Scale in the Distributed Storage Cloud

Similar documents
HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Cloudian Sizing and Architecture Guidelines

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Next Generation Erasure Coding Techniques Wesley Leggette Cleversafe

Elastic Cloud Storage (ECS)

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

Decentralized Distributed Storage System for Big Data

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

Recording at the Edge Solution Brief

TIBCO StreamBase 10 Distributed Computing and High Availability. November 2017

Commvault Backup to Cloudian Hyperstore CONFIGURATION GUIDE TO USE HYPERSTORE AS A STORAGE LIBRARY

Why Datrium DVX is Best for VDI

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services

Datacenter replication solution with quasardb

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Step into the future. HP Storage Summit Converged storage for the next era of IT

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

Database Architectures

The Google File System

High Availability for Citrix XenDesktop

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future

Chapter 18: Parallel Databases

Google File System. Arun Sundaram Operating Systems

The Google File System

The Google File System

ActiveScale Erasure Coding and Self Protecting Technologies

BUSINESS CONTINUITY: THE PROFIT SCENARIO

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

SAP HANA. HA and DR Guide. Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Lecture 9: MIMD Architectures

Chapter 20: Database System Architectures

Software-defined Storage: Fast, Safe and Efficient

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Integrated hardware-software solution developed on ARM architecture. CS3 Conference Krakow, January 30th 2018

November 7, DAN WILSON Global Operations Architecture, Concur. OpenStack Summit Hong Kong JOE ARNOLD

Availability for the modern datacentre Veeam Availability Suite v9.5

Take Back Lost Revenue by Activating Virtuozzo Storage Today

Differentiating Your Datacentre in the Networked Future John Duffin

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

ActiveScale Erasure Coding and Self Protecting Technologies

EMC Forum 2014 EMC ViPR and ECS: A Lap Around Software-Defined Services. Magnus Nilsson Blog: purevirtual.

The Google File System

Lecture 23 Database System Architectures

Computer Architecture 计算机体系结构. Lecture 6. Data Storage and I/O 第六讲 数据存储和输入输出. Chao Li, PhD. 李超博士

Securely Access Services Over AWS PrivateLink. January 2019

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

DELL EMC DATA DOMAIN BOOST AND DYNAMIC INTERFACE GROUPS

VMware vsphere Clusters in Security Zones

ECS High Availability Design

What's New in vsan 6.2 First Published On: Last Updated On:

The Google File System

EMC DATA DOMAIN OPERATING SYSTEM

SolidFire and Pure Storage Architectural Comparison

vsan Remote Office Deployment January 09, 2018

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

RED HAT CEPH STORAGE ROADMAP. Cesar Pinto Account Manager, Red Hat Norway

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

The Google File System

Lecture 9: MIMD Architectures

Vendor: EMC. Exam Code: E Exam Name: Cloud Infrastructure and Services Exam. Version: Demo

CLOUD-SCALE FILE SYSTEMS

Software Defined Storage

vsan Security Zone Deployment First Published On: Last Updated On:

Distributed System. Gang Wu. Spring,2018

The Google File System (GFS)

Native vsphere Storage for Remote and Branch Offices

Don t Run out of Power: Use Smart Grid and Cloud Technology

Balancing storage utilization across a global namespace Manish Motwani Cleversafe, Inc.

Vortex Whitepaper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems

Chapter 4. Fundamental Concepts and Models

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Dell EMC CIFS-ECS Tool

Virtual Security Server

Oracle E-Business Availability Options. Solution Series for Oracle: 2 of 5

SwiftStack and python-swiftclient

The Google File System

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Staggeringly Large Filesystems

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Dell Technologies IoT Solution Surveillance with Genetec Security Center

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Database Architectures

Advanced Databases: Parallel Databases A.Poulovassilis

A Thorough Introduction to 64-Bit Aggregates

How to Protect SAP HANA Applications with the Data Protection Suite

Addressing Data Management and IT Infrastructure Challenges in a SharePoint Environment. By Michael Noel

Distributing Software in a Massively Parallel Environment

Cloud Computing. What is cloud computing. CS 537 Fall 2017

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

Surveillance Dell EMC Storage with Digifort Enterprise

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Deduplication has been around for several

DELL EMC DATA DOMAIN OPERATING SYSTEM

Transcription:

Resiliency at Scale in the Distributed Storage Cloud Alma Riska Advanced Storage Division EMC Corporation In collaboration with many at Cloud Infrastructure Group

Outline Wi topic but this talk will focus on Architecture Resiliency Failures Redundancy schemes Policies to differentiate services 2

Digital Content Creation & Investment 3

Scaled-out Storage Systems Large amount of hardware Thousands of disks Tens to hundreds of servers Significant amount of networking Wi range of applications Internet Service Provirs On-line Service Provirs Private cloud Up-to million of users 4

Storage Requirements Store amount of data (Tens) PetaBytes massive Direct attached high capacity nearline HDDs Highly available Minimum down time Reliably stored Beyond the traditional 5 nines Ubiquitous access Cross geographical boundaries 5

Scaled-out Storage Architecture Hardware organized in nos / racks / geographical sites LAN / WAN o o s,. Services/ no Services/ no 6

Scalability in Scaled-out Storage Inpennce between components no single point of failure Hardware disks, nos, racks, sites Software services such as metadata Seamlessly add/remove storage vices or nos Isolation of failures Sustaining performance Shared-nothing architecture Elasticity / resilience / performance 7

EMC Atmos Architecture Shared nothing architecture s - 15-60 large capacity SAS HDDs s up to 8 nos or 480 4TB HDDs (>1PByte) At least two sites o LAN / WAN o 8

Storage Resiliency Data Reliability Data is stored persistently in vice(s) like HDDs Data Availability Data is available inpenntly of the failures of hardware Data Consistency and Accuracy Returned data is what the user has stored in the system 9

Failures Data vices (HDDs) Other components o o Hardware Network Power outages Cooling outages LAN / WAN Software Drivers o Services (metadata) o 10

Transient Failures Many failures are transient Temporary interruption of operation of a component o o Variability in component response time can be seen as a transient failure Particularly network lays LAN / WAN System load causes transient failures o o Transient failures occur much more often than hardware component failures 11

Impact of Failures Reliability Disk failures directly o o But all other failures too Availability Directly impacted by any failure, particularly transient Consistency o LAN / WAN o Service failures Metadata Transient failures 12

Criticality of Failures in the Cloud Large scale, e.g., no failure Make unavailable large amount of data and other components simultaneously Since there are more components in the system, failures happen more often System needs to be sign with high component unavailability in mind Even if the unavailability is transient 13

Challenges of Handling Failures Correct intification of failures Many failures have similar symptoms Disk unreachable (disk failure, controller failure, power failure, network failure) Effective isolation of failures Limit the cases when a single component failure becomes a no or site failure Timely tection of failures In a large system failures may go untected Particularly transient failures and their impact 14

Example of System Alerts HDD events are overwhelming Event do not necessarily indicate disk failures Rather temporary unreachable HDDs Various reasons Majority, transient 15

Fault Tolerance in Cloud Storage Transparency toward failures Disks / s / s Services Even entire sites Transparency varies by system Goal or targets o o X LA N / W AN X x x o o 16

Fault Tolerance in Cloud Storage Transparency toward failures Disks / s / s o o Services Even entire sites Transparency varies by system Goal or targets o LA N / W AN o Resilience goals termine fault domains 17

Fault Domains The hierarchy of the set of resources whose failure can be tolerated in a system o o Example: Tolerate a site failure Two racks or 16 nos or 240 disks o LA N / W AN o Determines distribution Data Services 18

Fault Tolerance and Redundancy Fault tolerance is primarily achieved via redundancy More hardware and software than need Achieving a fault tolerant goal pends Amount of redundancy (storage capacity) Traditionally parity (RAID) Often in the cloud is replication Erasure coding Pro-active measures Monitoring/analysis/prediction of system s health Background tection of failures 19

Fault Tolerance and Data Replication Replicate data (including metadata) up to 4 times Pros High reliability High availability Good performance and accessibility Easy to implement Cons High capacity overhead Up to 300% in a 4-way replication 20

Replication in Scale-Out Cloud Storage Average case in a cloud storage system Several tens (up to hundred) of raw PBytes capacity Multiple tens of user PBytes capacity Does not scale well with regard to Cost Resilience With only 3 replicas it is not always possible to tolerate multi-no and site failure 21

Erasure Coding Generalization of parity-based fault tolerance RAID schemes Replication is a special case Out of n fragments of information m are actual data k are additional cos (n=m+k) k missing fragments of data can be tolerated Co is referred to as m/n co 22

Erasure Coding Capacity overhead k/n Overhead reduces as n increases Same protection Complexity computational and management Increases as n increases As network lays dominate performance erasure coding becomes feasible approach Tra-off between protection, complexity, overhead Common EMC Atmos cos are 9/12, 10/16 23

EC vs. Other redundancy schemes 24

Erasure Coding at Scale Data fragments distributed based on the system fault domains Placement of these fragments is crucial Round-robin placement ensures uniform distribution of fragments Assumed in previous calculations Placement of data fragments pends on User requirements with regard to Performance Priorities 25

EC data placement in the Cloud We velop a mol to see penncies between EC fragment placement and system size/architecture Determine Tolerance toward site failures as a function of Number of sites m/n erasure co parameters Additional no failure tolerance

EC data placement in the Cloud Assumptions: Homogeneous geographically distributed sites Equal number of nos and disks Equal network lays between any pair of sites Equal data priority Round robin distribution of the fragments across s / nos / disks Failures on disks / nos / sites (power, network)

Failure Tolerance in 2 System In a two site system there is only one site failure tolerance Each site has 6 nos available The numbers insi each (x,y) tuple are the number of nos tolerated in addition to the sites tolerated

Failure Tolerance in 4 System In a four site system there are one, two and three site failure tolerance Each site has 6 nos available The numbers insi each (x,y) tuple are the number of nos tolerated in addition to the sites tolerated

Heterogeneous Protection Policies As system evolve their resources become heterogeneous Different no or site sizes Different network bandwidth Different data priority location origin In such a case Uniformity of data distribution not a requirement The above factors (including performance) should termine data fragment placement 30

Abstraction of Heterogeneous Cloud Storage Group components based on affinity criteria o o Network bandwidth Create homogeneous subcluster Determine redundancy for each sub-cluster Handle each sub-cluster inpenntly o LAN / WAN o Combine outcome for system-wi placement

Abstraction of Heterogeneous Cloud Storage - Example Two sites are close (e.g. on the same US coast) Fast network connection o o Data can be placed in any of the nos in both sites and retrieving it will not suffer extra network lay If an 6/12 redundancy scheme is used If data primary location is the upper two-site subcluster then 6 data fragments can be placed in its two sites and the 6 cos in the other remote sites o LAN / WAN o Accessing the data is not affected by network bandwidth One site failure is tolerated

Differentiate Protection via Policy Flexible policy settings for grouping resources and isolating applications /tenants Easily managing a large heterogeneous system Hybrid protection schemes that combine multiple replication schemes E.g., a two replications policy where First replica is the original data (stored in the closest site to tenant) Second replica is a 9/12 EC scheme that distributes the data in the rest of the sites for resilience 33

Protection Policies in the Field Tenants s 2 replicas >= 3 replicas 1 EC replica >= 2 EC replica Mix regular/ec 1 1 10/2; 9/3 4 1 sync; async 2 2 sync; async async 3 2 sync async 4 2 sync; async 9/3 9/3; 10/6; sync; async 2 2 sync; async sync; async 2 2 9/3; sync; async 2 sync; async sync 3 2 sync 9:3; sync; async 2 4 sync; async async 10/6; 9/3 9/3; sync; async 9:/3; async 1 2 10:2 2 2 sync 9:3; sync; async 9/3 async 2 1 sync 9:3 2 2 sync async 9/3 async 1 6 9:3 async 2 2 async 2 2 sync 9:3 9:3 async 3 2 sync async 3 3 sync async 2 1 sync 34

Proactive Failure Detection Monitoring the health of vices and services Logging events Taking corrective measures before failures happen Strengthen the resilience Address without the redundancy affected by failure Example Use of SMART logs to termine health of drives Replace HDDs that are about to fail rather than failed 35

Proactive Failure Detection Verify in the background the validity of data, services and health of hardware Critical aspect of resiliency in the cloud System are large and some portions maybe idle for extend periods of time Failures and issues may go untected Ensure timely failure tection Improve resilience for a given amount of redundancy 36

Conclusions Resilience at scale = reliability+availability+consistency Wi range of large scale failures Redundancy aids resiliency at scale Erasure coding efficient scaling of resiliency Proactive measures to ensure resiliency at scale 37