Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services

Similar documents
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures

Storage-Based Convergence Between HPC and Big Data

BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Towards a Self-Adaptive Data Management System for Cloud Environments

High Performance Computing on MapReduce Programming Framework

EMC Business Continuity for Microsoft Applications

High Throughput Data-Compression for Cloud Storage

Summary optimized CRUSH algorithm more than 10% read performance improvement Design and Implementation: 1. Problem Identification 2.

GFS: The Google File System

KerData: Scalable Data Management on Clouds and Beyond

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and

Deployment Planning and Optimization for Big Data & Cloud Storage Systems

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Decentralized Distributed Storage System for Big Data

Dynamics 365. for Finance and Operations, Enterprise edition (onpremises) system requirements

Predictable Time-Sharing for DryadLINQ Cluster. Sang-Min Park and Marty Humphrey Dept. of Computer Science University of Virginia

CA485 Ray Walshe Google File System

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

The Lion of storage systems

SoftNAS Cloud Performance Evaluation on Microsoft Azure

CSE 124: Networked Services Lecture-16

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Social Environment

Global Journal of Engineering Science and Research Management

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

BlobSeer: Next Generation Data Management for Large Scale Infrastructures

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

A Fast and High Throughput SQL Query System for Big Data

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Correlation based File Prefetching Approach for Hadoop

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

Certified Reference Design for VMware Cloud Providers

Power-Aware Throughput Control for Database Management Systems

Analyzing Electroencephalograms Using Cloud Computing Techniques

Ceph vs Swift Performance Evaluation on a Small Cluster. edupert monthly call Jul 24, 2014

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

Distributed computing: index building and use

Massive Online Analysis - Storm,Spark

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

Cloud Computing Concepts, Models, and Terminology

PACM: A Prediction-based Auto-adaptive Compression Model for HDFS. Ruijian Wang, Chao Wang, Li Zha

CSE 124: Networked Services Lecture-17

PLANEAMENTO E GESTÃO DE REDES INFORMÁTICAS COMPUTER NETWORKS PLANNING AND MANAGEMENT

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment

CSE 124: Networked Services Fall 2009 Lecture-19

HPC in Cloud. Presenter: Naresh K. Sehgal Contributors: Billy Cox, John M. Acken, Sohum Sohoni

2012 Enterprise Strategy Group. Enterprise Strategy Group Getting to the bigger truth. TM

New research on Key Technologies of unstructured data cloud storage

Zevenet EE 4.x. Performance Benchmark.

VVD for Cloud Providers: Scale and Performance Guidelines. October 2018

Flash-Conscious Cache Population for Enterprise Database Workloads

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

Filesystem Performance on FreeBSD

BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Cloudian Sizing and Architecture Guidelines

Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud

EMC Integrated Infrastructure for VMware. Business Continuity

A Non-Relational Storage Analysis

Centralized versus distributed schedulers for multiple bag-of-task applications

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

An Overview of Fujitsu s Lustre Based File System

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Configuring and Managing Virtual Storage

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

Centralized versus distributed schedulers for multiple bag-of-task applications

scc: Cluster Storage Provisioning Informed by Application Characteristics and SLAs

SoftNAS Cloud Performance Evaluation on AWS

Data Intensive Scalable Computing

Maintaining End-to-End Service Levels for VMware Virtual Machines Using VMware DRS and EMC Navisphere QoS

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

SurFS Product Description

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

RIGHTNOW A C E

Massive Data Processing on the Acxiom Cluster Testbed

Lenovo - Excelero NVMesh Reference Architecture

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Distributed Filesystem

Chapter 5. The MapReduce Programming Model and Implementation

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Storage Optimization with Oracle Database 11g

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

Certification Document GRAFENTHAL GmbH R2208 S2 01/04/2016. GRAFENTHAL GmbH R2208 S2 Storage system

Users and utilization of CERIT-SC infrastructure

Construct a sharable GPU farm for Data Scientist. Layne Peng DellEMC OCTO TRIGr

Introduction to Grid Computing

ibench: Quantifying Interference in Datacenter Applications

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

The Fusion Distributed File System

MOHA: Many-Task Computing Framework on Hadoop

Transcription:

2 nd IEEE International Conference on Cloud Computing Technology and Science Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services Jesús Montes, Bogdan Nicolae, Gabriel Antoniu, Alberto Sánchez, María S. Pérez December 2010 Indianapolis (IN), USA

Introduction Cloud Computing Computation as utility New posibilities for the large public, such as complex data processing Dryad MapReduce Jesus Montes - jmontes@cesvima.fi.upm.es 2

MapReduce Platform for large-scale, massively parallel data processing Application that can be deployed in the Cloud Critical component: Underlying storage service Performance: High aggregated throughput under heavy access concurrency Quality of Service: Stable throughput for each individual access Jesus Montes - jmontes@cesvima.fi.upm.es 3

BlobSeer Data management for large, unstructured data Very large data (TB) BLOBs: Binary large objects Highly concurrent, fine-grain access (MB): R/W/A Key desing choices: Decentralized data and metadata management Multiversioning exposed to the user Lock-free concurrent writes (enabled by versioning) Jesus Montes - jmontes@cesvima.fi.upm.es 4

BlobSeer architecture Providers Provider manager Metadata providers Version manager Clients Jesus Montes - jmontes@cesvima.fi.upm.es 5

Storage QoS in MapReduce Resource failures Complex data access patterns Complexity of both hardware and software resources Jesus Montes - jmontes@cesvima.fi.upm.es 6

Proposal Automate the process of identifying and characterizing events that have an impact on the storage service QoS In-depth monitoring Application-side feedback Behavior pattern analysis Jesus Montes - jmontes@cesvima.fi.upm.es 7

GloBeM Global Behavior Modeling A Specific methodology to analyze and construct a model of the global behavior of a large-scale distribited system Behavior model characteristics: Finite state machine State characterization based on monitoring parameters Extended statistical information Jesus Montes - jmontes@cesvima.fi.upm.es 8

Behavior modeling cycle Jesus Montes - jmontes@cesvima.fi.upm.es 9

Proposal 1.Monitor the storage service 2.Identify behavior patterns 3.Classify behavior patterns according to feedback 4.Predict and prevent undesired behavior patterns Jesus Montes - jmontes@cesvima.fi.upm.es 10

Improving BloobSeer's QoS Jesus Montes - jmontes@cesvima.fi.upm.es 11

Experimental setup Scenario: MapReduce data gathering and analysis Read+Write access pattern (10:1 ratio) I/O+Computation time (1:7 ratio) Platform: Grid'5000 resoruces x86_64 CPUs, 2GB RAM, 1Gbit/s standard Ethernet Two clusters (130 and 275 nodes) Failure injection framework Multi-state resource avaliability characterization (Rood and Lewis, 2007) Jesus Montes - jmontes@cesvima.fi.upm.es 12

Experimental settings Setting A (Grid 5000 Lille cluster) 130 nodes Total of 11TB accessed (1.3TB written, rest read) Shared resources for computation and storage Setting B (Grid 5000 Orsay cluster) 275 nodes Total of 17TB accessed (1.5TB written, rest read) Separated resources for computation and storage Jesus Montes - jmontes@cesvima.fi.upm.es 13

Experimental procedure 1.Running the original BlobSeer instance 2.Performing global behavior modeling 3.Improving BlobSeer 4.Running the improved BlobSeer instance Jesus Montes - jmontes@cesvima.fi.upm.es 14

Setting A GloBeM identified 4 states State State 1 24.2 State 2 20.1 State 3 31.5 State 4 23.9 Average read BW (MB/s) Jesus Montes - jmontes@cesvima.fi.upm.es 15

Setting A parameter State 1 State 2 State 3 State 4 Avg. Read ops. 68.9 121.2 60.0 98.7 Read ops stdev. 10.5 15.8 9.9 16.7 Avg. Write ops. 43.2 38.4 45.3 38.5 Write ops stdev. 4.9 4.7 5.2 7.4 Free space stdev. 3.1e7 82.1e7 84.6e7 89.4e7 Nr. of providers 107.0 102.7 96.4 97.2 Jesus Montes - jmontes@cesvima.fi.upm.es 16

Setting B GloBeM identified 3 states State State 1 50.7 State 2 35.0 State 3 47.0 Average read BW (MB/s) Jesus Montes - jmontes@cesvima.fi.upm.es 17

Setting B parameter State 1 State 2 State 3 Avg. Read ops. 98.6 202.3 125.5 Read ops stdev. 17.7 21.6 21.9 Avg. Write ops. 35.2 27.5 33.1 Write ops stdev. 4.5 3.9 4.5 Free space stdev. 17.2e6 13.0e6 15.5e6 Nr. of providers 129.2 126.2 122.0 Jesus Montes - jmontes@cesvima.fi.upm.es 18

Analysis When a node is dispatching many read operations, a node failure causes many read faults Read faults force the client to look for another available replica, reducing the final effective read BW Jesus Montes - jmontes@cesvima.fi.upm.es 19

Optimizing BlobSeer When, in State 2, selecting a node for writing, reduce priority of those with excessive read operations Reduces the pressure on overwhelmed nodes Improves load-balance and stabilizes read bandwidth, increasing QoS Jesus Montes - jmontes@cesvima.fi.upm.es 20

Results: Setting A Initial configuration read BW avg = 24.9 MB/s, stdev = 9.6 Advanced strategy read BW avg = 27.5 MB/s, stdev = 7.3 Jesus Montes - jmontes@cesvima.fi.upm.es 21

Results: Setting B Initial configuration read BW avg = 44.7 MB/s, stdev = 10.5 Advanced strategy read BW avg = 44.7 MB/s, stdev = 8.4 Jesus Montes - jmontes@cesvima.fi.upm.es 22

Conclusions Cloud storage backend has to ensure stable throughput: QoS constraint Our proposal combines component monitoring, application side feedback and global behavior modeling to infer useful knowledge about the storage service We have obtained significative improvement in data access QoS Jesus Montes - jmontes@cesvima.fi.upm.es 23

2 nd IEEE International Conference on Cloud Computing Technology and Science Thank You! Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services Jesús Montes, Bogdan Nicolae, Gabriel Antoniu, Alberto Sánchez, María S. Pérez Jesus Montes - jmontes@cesvima.fi.upm.es 24

Outline Introduction Proposal Experimental Setup and Results Conclusions Jesus Montes - jmontes@cesvima.fi.upm.es 25

BlobSeer A back-end for high-level, sophisticated data management systems Higly scalable distributed file systems Storage for cloud services Extremely large distributed databases Jesus Montes - jmontes@cesvima.fi.upm.es 26