Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching

Similar documents
Techniques to improve the scalability of Checkpoint-Restart

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Leveraging Burst Buffer Coordination to Prevent I/O Interference

Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures

Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS

VM Migration, Containers (Lecture 12, cs262a)

Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

CA485 Ray Walshe Google File System

PHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan

Lecture 09: VMs and VCS head in the clouds

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

The Fusion Distributed File System

Distributed caching for cloud computing

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

<Insert Picture Here> Btrfs Filesystem

Improved Solutions for I/O Provisioning and Application Acceleration

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Duy Le (Dan) - The College of William and Mary Hai Huang - IBM T. J. Watson Research Center Haining Wang - The College of William and Mary

Painless switch from proprietary hypervisor to QEMU/KVM. Denis V. Lunev

The Google File System (GFS)

Next-Generation Cloud Platform

Sky Computing on FutureGrid and Grid 5000 with Nimbus. Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes Bretagne Atlantique Rennes, France

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Large Scale Sky Computing Applications with Nimbus

Towards Efficient Live Migration of I/O Intensive Workloads: A Transparent Storage Transfer Proposal

SOFTWARE DEFINED STORAGE

VMware Join the Virtual Revolution! Brian McNeil VMware National Partner Business Manager

so Mechanism for Internet Services

Solid State Drive (SSD) Cache:

IBM InfoSphere Streams v4.0 Performance Best Practices

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory

Distributed Systems COMP 212. Lecture 18 Othon Michail

Stellar performance for a virtualized world

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

SvSAN Data Sheet - StorMagic

A Comparative Study of High Performance Computing on the Cloud. Lots of authors, including Xin Yuan Presentation by: Carlos Sanchez

Black-box and Gray-box Strategies for Virtual Machine Migration

Revealing Applications Access Pattern in Collective I/O for Cache Management

Structuring PLFS for Extensibility

Samsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions

Performance Evaluation of Virtualization Technologies

Condusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware

Checkpointing with DMTCP and MVAPICH2 for Supercomputing. Kapil Arya. Mesosphere, Inc. & Northeastern University

Nested Virtualization and Server Consolidation

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

Advanced Memory Management

OS-caused Long JVM Pauses - Deep Dive and Solutions

TA7750 Understanding Virtualization Memory Management Concepts. Kit Colbert, Principal Engineer, VMware, Inc. Fei Guo, Sr. MTS, VMware, Inc.

Storage. Hwansoo Han

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

predefined elements (CI)

Mission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software.

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Multi-Hypervisor Virtual Machines: Enabling An Ecosystem of Hypervisor-level Services

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

Preserving I/O Prioritization in Virtualized OSes

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

HPC learning using Cloud infrastructure

IBM Power Systems HPC Cluster

IME Infinite Memory Engine Technical Overview

The Google File System

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

IBM Bluemix compute capabilities IBM Corporation

High-performance aspects in virtualized infrastructures

Progress Report on Transparent Checkpointing for Supercomputing

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

BEST PRACTICES FOR OPTIMIZING YOUR LINUX VPS AND CLOUD SERVER INFRASTRUCTURE

FAQ FOR VMWARE SERVER 2 - AUGUST, 2008

Optimizing Server Flash with Intelligent Caching (Enterprise Storage Track)

Roadmap for Enterprise System SSD Adoption

Memory-Based Cloud Architectures

VIProf: A Vertically Integrated Full-System Profiler

2014 VMware Inc. All rights reserved.

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Google File System. Arun Sundaram Operating Systems

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

The next step in Software-Defined Storage with Virtual SAN

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Data Protection Modernization: Meeting the Challenges of a Changing IT Landscape

AIST Super Green Cloud

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

SANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache

Measuring zseries System Performance. Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012

Virtualization of the MS Exchange Server Environment

CLOUD-SCALE FILE SYSTEMS

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

The Google File System

Virtuozzo Containers

EXAM Pro: Windows Server 2008 R2, Virtualization Administrator. Buy Full Product.

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Distributed Filesystem

Transcription:

Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Bogdan Nicolae (IBM Research, Ireland) Pierre Riteau (University of Chicago, USA) Kate Keahey (Argonne National Laboratory, USA) Presented by Pierre Riteau UCC 2014 1

IaaS storage via virtual disks How does it work? Attach/detach a virtual disk to a VM instance using REST API The guest OS sees the virtual disk as a regular block device A higher level abstraction (typically file system) is used to leverage the block device Cost model Users are charged per GB and time unit a fixed cost Fixed cost depends on throughput (e.g. SSD more expensive than HDD) 2

Elas0city: a key feature of IaaS The ability of a system to change its capacity in direct response to the workload demand Unawareness of elasticity typically leads to overprovisioning (bold dotted line) and explosion of costs compared to the average utilization (thin dotted line) 3

Why is storage elas0city important? HPC workloads moving to the cloud Need large number of VMs Store huge amounts of data At large scale it matters even for short runtimes Overprovisioning costs accumulate and explode Rapid fluctuations of space (IPDPS 2014) and throughput requirements 4

Checkpoin0ng: a common padern l HPC applications interleave computationally intensive phases with I/O intensive phases l Most common scenario: checkpointing l l All processes dump in-memory data simultaneously to capture a globally consistent snapshot Happens every N iterations l Why checkpointing? l l l Fault tolerance: Checkpoint- Restart Debugging Migration Simulation of black hole formation Checkpoint-Restart 5

Problem summary Many apps interleave I/O intensive periods with compute-intensive periods Scientific applications use large number of instances and virtual disks Virtual disks have fixed I/O throughput Traditional approach: Overprovision with faster virtual disks to get best performance during I/O peaks Underuse outside peaks high storage costs 6

Solu0on Block device with elastic disk throughput High performance during I/O peaks Minimizes storage costs Large, slow virtual disk for primary storage Boosts throughput with small, short-lived, fast virtual disks used as caching layer 7

Three key design principles 1. Transparent block level caching 2. Adaptive flushing of dirty blocks 3. Access-pattern-aware cache sizing 8

1. Transparent block level caching Elastic storage as single block device Transparent access from applications Transparent approach from cloud provider Monitor I/O throughput utilization When above utilization threshold (UT): Provision new, faster caching device I/O throughput boost until threshold below UT Cached data flushed to backing device Caching device removed 9

2. Adap0ve flushing of dirty blocks Constant data movement from caching to backing device can hurt performance Read/write cache with a custom dirty block commit strategy (dynamic writeback) First phase: writeback, priority to application I/O Async flush if bandwidth available or cache full Phase change after time ID (inactivity delay) spent under Utilization Threshold Second phase: priority to cache flush Writes go straight to backing device 10

3. Access- padern- aware cache sizing Must be careful with size of caching device Cache too large long flush delay Cache too small forced flush Optimize caching device size According to access patterns Relies on repetitive behavior of scientific apps Start with large cache and monitor utilization Decrease or increase size as required 11

Architecture Based on Bcache Controller monitors I/O throughput and selects best size for caching device Application File System Backing Device VM instance Adaptive Block Device I/O stats Controller Caching Device Add / remove cache Attach / detach virtual disk IaaS API IaaS storage Provision fast storage HDD SSD 12

Experimental evalua0on plaporm Cloud environment simulated via QEMU/KVM 1.6 as the hypervisor 32 physical nodes on Shamrock testbed VM guest operating system: Recent Debian Sid with Linux 3.12 VM network interfaces VirtIO driver Bridged to host physical interface Enables direct L2 communication between VMs 13

Experimental evalua0on Static pre-allocation with slow virtual disk (static-slow) Large, fixed-size virtual disk as RAW file Fixed maximum I/O bandwidth Static pre-allocation with fast virtual disk (static-fast) Similar as above but using RAM-disk Fixed (but higher) maximum I/O bandwidth Simulates SSD Throughput elasticity with our approach (adaptive) Backing device as in static-slow Caching device as in static-fast 14

Metrics Performance overhead vs. static-fast Micro benchmarks: sustained I/O throughput Real-life applications: completion time Total adjusted storage accumulation Sum of storage cost for all VMs Evolution of I/O activity For understanding the behavior of the system 15

Case study: CM1 Real-life HPC MPI application Alternates between compute phase and I/Ointensive phase (results and checkpoints) 32 VMs with one per physical node 11 cores per VM (one for OS and 10 for CM1) One core kept for host OS 18 GB of RAM (enough for 10 MPI processes) Disks: Backing device: 50 GB Static-slow: 33 MB/s Static-fast: 128 MB/s with initial size of 10 GB 16

CM1 I/O throughput Total IO activity (MB/s) 6000 5000 4000 3000 2000 1000 adaptive static-fast static-slow 0 0 200 400 600 800 1000 1200 1400 1600 Time (s) 17

CM1 performance Approach Comple0on Time Overhead sta>c- slow 1471s 23% sta>c- fast 1190s adap$ve 1231s 3.3% 18

CM1 storage costs Adjusted storage accumulation (AGBs) 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 adaptive static-fast static-slow 0 200 400 600 800 1000 1200 1400 1600 Time (s) Compared with sta>c- fast, our adap>ve approach amounts to a reduc0on of 65% in cost for a performance overhead of 3.3%. Compared with sta>c- slow, our adap>ve approach amounts to a reduc0on of 7% in cost by being 16% faster. 19

Conclusions Our transparent solution achieves great performance with massive cost reductions Cost reduction of 65% to 70% versus overprovisioning with 1% - 3.3% overhead Lower cost than underprovisioning too 16% faster 7% less expensive Caching no longer physically limited Can adapt to applications demands 20

Ques0ons? 21