SRCMap: Energy Proportional Storage using Dynamic Consolidation

Similar documents
Non-Blocking Writes to Files

Current Topics in OS Research. So, what s hot?

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura

Building dense NVMe storage

2014 VMware Inc. All rights reserved.

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Automated Storage Tiering on Infortrend s ESVA Storage Systems

Managing Performance Variance of Applications Using Storage I/O Control

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

PESIT Bangalore South Campus

COMP283-Lecture 3 Applied Database Management

Lenovo RAID Introduction Reference Information

FlashArray//m. Business and IT Transformation in 3U. Transform Your Business. All-Flash Storage for Every Workload.

White Paper. EonStor GS Family Best Practices Guide. Version: 1.1 Updated: Apr., 2018

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Glamour: An NFSv4-based File System Federation

VMware vstorage APIs FOR ARRAY INTEGRATION WITH EMC VNX SERIES FOR SAN

High Availability Without the Cluster (or the SAN) Josh Sekel IT Manager, Faculty of Business Brock University

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

IBM EXAM QUESTIONS & ANSWERS

Take control of storage performance

VMware Virtual SAN. Technical Walkthrough. Massimiliano Moschini Brand Specialist VCI - vexpert VMware Inc. All rights reserved.

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Data center requirements

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

IOmark- VM. HP MSA P2000 Test Report: VM a Test Report Date: 4, March

Optimizing Server Designs for Speed

SANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION

What s New in VMware Virtual SAN (VSAN) v 0.1c/AUGUST 2013

HP AutoRAID (Lecture 5, cs262a)

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

FLASHARRAY//M Business and IT Transformation in 3U

Towards Deadline Guaranteed Cloud Storage Services Guoxin Liu, Haiying Shen, and Lei Yu

16GFC Sets The Pace For Storage Networks

Benefits of Multi-Node Scale-out Clusters running NetApp Clustered Data ONTAP. Silverton Consulting, Inc. StorInt Briefing

Hitachi Adaptable Modular Storage and Hitachi Workgroup Modular Storage

Hitachi Virtual Storage Platform Family

Offloaded Data Transfers (ODX) Virtual Fibre Channel for Hyper-V. Application storage support through SMB 3.0. Storage Spaces

Using Transparent Compression to Improve SSD-based I/O Caches

Technical Report. Recomputation-based data reliability for MapReduce using lineage. Sherif Akoush, Ripduman Sohan, Andy Hopper. Number 888.

EMC Corporation. Tony Takazawa Vice President, EMC Investor Relations. August 5, 2008

Using EMC FAST with SAP on EMC Unified Storage

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Copyright 2012 EMC Corporation. All rights reserved.

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University

PowerVault MD3 SSD Cache Overview

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

SSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Hedvig as backup target for Veeam

10/1/ Introduction 2. Existing Methods 3. Future Research Issues 4. Existing works 5. My Research plan. What is Data Center

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Preparing for Server 2012 Hyper-V: Seven Questions to Ask Now Greg Shields

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Storage Optimization with Oracle Database 11g

DiskReduce: Making Room for More Data on DISCs. Wittawat Tantisiriroj

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Enabling Cost-effective Data Processing with Smart SSD

Low Latency Evaluation of Fibre Channel, iscsi and SAS Host Interfaces

HCI: Hyper-Converged Infrastructure

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

PHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan

GFS: The Google File System

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

FOUR WAYS TO LOWER THE COST OF REPLICATION

Field Update Expanded Deduplication Sizing Guidelines. Oct 2015

EMC Virtual Infrastructure for Microsoft Exchange 2010 Enabled by EMC Symmetrix VMAX, VMware vsphere 4, and Replication Manager

A Thorough Introduction to 64-Bit Aggregates

Lenovo SAN Manager Rapid RAID Rebuilds and Performance Volume LUNs

EMC Business Continuity for Microsoft Applications

A Thorough Introduction to 64-Bit Aggregates

Performance Testing December 16, 2017

Using Synology SSD Technology to Enhance System Performance Synology Inc.

vsan Mixed Workloads First Published On: Last Updated On:

DELL EMC UNITY: BEST PRACTICES GUIDE

Nimble Storage Adaptive Flash

IBM Storwize V5000 disk system

SolidFire. Petr Slačík Systems Engineer NetApp NetApp, Inc. All rights reserved.

EMC Backup and Recovery for Microsoft Exchange 2007

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Benefits of Storage Capacity Optimization Methods (COMs) And. Performance Optimization Methods (POMs)

The storage challenges of virtualized environments

Storing Data: Disks and Files

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

Truly Non-blocking Writes

Next-Generation Cloud Platform

INFINIDAT Storage Architecture. White Paper

RIAL: Resource Intensity Aware Load Balancing in Clouds

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

The Google File System

SSD Architecture for Consistent Enterprise Performance

Transcription:

SRCMap: Energy Proportional Storage using Dynamic Consolidation By: Akshat Verma, Ricardo Koller, Luis Useche, Raju Rangaswami Presented by: James Larkby-Lahet

Motivation storage consumes 10-25% of datacenter power (higher ratio at lower loads) storage virtualization is happening already (in part driven by OS virtualization) can we use create energy proportionality in virtualized storage systems the same way as in OS virtualization? workload variability exists, migration is more expensive

Virtualized Storage Virtualized Storage: a set of logical volumes provided across SAN (Fiber Channel, iscsi) to compute nodes In this case, storage is provided by a set of physical volumes, e.g. RAID arrays virtualization server maps logical to physical at some granularity (volumes == low metadata and performance overhead, blocks == efficient utilization)

Overall Design Claims: working set is small and stable, workload intensity varies within and across volumes pre-replicate working set to other volumes and offload writes virtualization redirects accesses away from spun-down disks create close to N power levels with N volumes

Design Goals fine-grained energy proportionality: many power levels low space overhead:.25x is reasonable, coarse granularity & disks are cheap, however Nx is not reliability: on-off duty cycles are limited workload shift adaptation: must maintain proportionality while adapting heterogeneity: multi-vendor & multi-generation datacenters

Existing Solutions singly redundant schemes: able to power off a (parity) disk geared RAID: skewed striping of replicated data caching: disk or SSD cache of popular data, PDC write offloading: cache writes somewhere persistent

Workload Characteristics: 1 active dataset is typically a small fraction of the total data

Workload Characteristics: 2 There is a significant variability in workload intensity

Workload Characteristics: 3 data usage is skewed toward popular and recent data

Workload Characteristics: 4 read-idle time is dominated by small durations write-offloading and spin-down alone won t save much power and will significantly increase # of duty cycles, reducing reliability

SRCMap want to power only a subset of disks (volumes) migration is expensive assign each logical volume (vdisk) to a physical volume (mdisk) store working sets of other vdisks in the free space of mdisks power the minimum subset of mdisk to serve all vdisks at acceptable performance

SRCMap Illustrated

Rationale multiple replica targets - during peak load, the primary mdisk is active, under lower load, multiple replicas are required to provide fine grained energy proportionality sampling - replicating entire vdisks is impractical, working sets are much smaller and reasonable to replicate ordered replica placement - space is still an issue, not all replicas are equal. prefer to replicate idle and small

Rationale, continued dynamic vdisk -> mdisk mapping - workload varies and some vdisks are more highly replicated. must decide online dual data sync - update replica on read miss to adapt to workload shifts, lazy incremental sync with non-active replicas on active mdisks coarse grained power cycling - consolidation interval (on the order of hours) where the active mdisks don t change (except replica misses)

SRCMap Overview

Replica Placement Algo. better vdisks to replicate - smaller working set, stable working set (lower replica miss rate), small average load, hosted on a less power efficient volume Ordering Property: if vdisk Vi is more likely than Vj to require an replica during Active Disk Selection, Vi is more likely than Vj to find a replica among the active mdisks

Replica Placement Algo. 2 order vdisks based on cost-benefit tradeoff create a bipartite graph that reflects this ordering iteratively create one source-target mapping that respects ordering recalibrate edge weights to respect Ordering Property

Initial vdisk Ordering Pi = probability that vdisk i s primary mdisk is spun down w = tunable weights WS = size of working set PPR = ratio between peak I/O bandwidth and peak power ρ = average IOPS m = number of read misses in working set

Bipartite Matching maps Vdisks to Mdisks with weight as cost-benefit of replication Vdisks sorted by inverse P, aligned with primary Mdisk match is made by allocating a replica to topmost Mdisk from Vdisk with highest edge weight weights for selected Vdisk are multiplied by probability of target Mdisk, and next iteration begins

Active Disk Selection Goal

Active Disk Selection run every interval, or if performance degrades 1) estimate load for each vdisk as load from prior interval 2) determine minimum number of mdisks to meet aggregate load (and select the mdisks with smallest P to be active) 3) for any vdisk whose mdisk is not selected, find a replica with spare bandwidth on the active mdisks 4) if no replica can be found, increase number of active mdisks and repeat 3

Optimizations sub-volumes - sub-divide vdisks for easier replica packing replica scratch space - for write buffering and missed reads

Evaluation testbed system with 8 SATA channels, single disks acting as mdisks Watts up? power meter monitoring disks power simulator seeded with testbed values for longer running traces workloads: block traces of volume request for webserver, home directories, svn, wikis

Prototype - Peak 8 Hours L0-35.5% average power reduction L3-56.6%.0003% requests suffer read miss spin-up delays

Prototype - Peak 8 Hours

Simulation - Competitors

Aggressive Load Consolidation

Sensitivity to Free Space

Energy Proportionality

Overhead Per block map - current avtice replica, version, write redirect Volumes * Size * % replica space * 13 / 4k 10 10TB volumes with 10% over-provisioning 3.2GB metadata

Conclusion feasible to build dynamically consolidated, energyproportional storage system meets goals of fine-grained proportionality, low space overhead, reliability, workload adaptation, and heterogeneity support TODO: better synchronization I/O scheduling