Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

Similar documents
Towards Weakly Consistent Local Storage Systems

Toward SLO Complying SSDs Through OPS Isolation

From server-side to host-side:

Warming up Storage-level Caches with Bonfire

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Using Transparent Compression to Improve SSD-based I/O Caches

Presented by: Nafiseh Mahmoudi Spring 2017

Tri-Hybrid SSD with storage class memory (SCM) and MLC/TLC NAND Flash Memories

RAID4S: Improving RAID Performance with Solid State Drives

AN ALTERNATIVE TO ALL- FLASH ARRAYS: PREDICTIVE STORAGE CACHING

[537] Flash. Tyler Harter

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

Improving throughput for small disk requests with proximal I/O

Performance Modeling and Analysis of Flash based Storage Devices

Speeding Up Cloud/Server Applications Using Flash Memory

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

Purity: building fast, highly-available enterprise flash storage from commodity components

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

vsan 6.6 Performance Improvements First Published On: Last Updated On:

Amnesic Cache Management for Non-Volatile Memory

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

FlashTier: A Lightweight, Consistent and Durable Storage Cache

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

I/O CANNOT BE IGNORED

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Differential RAID: Rethinking RAID for SSD Reliability

Storage. Hwansoo Han

SFS: Random Write Considered Harmful in Solid State Drives

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Chapter 6. Storage and Other I/O Topics

A Semi Preemptive Garbage Collector for Solid State Drives. Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim

ECE 598-MS: Advanced Memory and Storage Systems Lecture 7: Unified Address Translation with FlashMap

Storage. CS 3410 Computer System Organization & Programming

Cluster-Level Google How we use Colossus to improve storage efficiency

Computer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

CSE 153 Design of Operating Systems

Two hours - online. The exam will be taken on line. This paper version is made available as a backup

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Using Synology SSD Technology to Enhance System Performance Synology Inc.

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

Flash In the Data Center

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

A Design for Networked Flash

Distributed caching for cloud computing

Guide to SATA Hard Disks Installation and RAID Configuration

IBM DS8870 Release 7.0 Performance Update

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Running VMware vsan Witness Appliance in VMware vcloudair First Published On: April 26, 2017 Last Updated On: April 26, 2017

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Overview and Current Topics in Solid State Storage

High Performance SSD & Benefit for Server Application

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

File Systems: FFS and LFS

SSD Architecture Considerations for a Spectrum of Enterprise Applications. Alan Fitzgerald, VP and CTO SMART Modular Technologies

High Performance Solid State Storage Under Linux

VSSIM: Virtual Machine based SSD Simulator

Chapter 6. Storage and Other I/O Topics. ICE3003: Computer Architecture Spring 2014 Euiseong Seo

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Managing Array of SSDs When the Storage Device is No Longer the Performance Bottleneck

SMORE: A Cold Data Object Store for SMR Drives

Caching less for better performance: Balancing cache size and update cost of flash memory cache in hybrid storage systems"

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Copyright 2012 EMC Corporation. All rights reserved.

Chapter 6. Storage and Other I/O Topics

How to Speed up Database Applications with a Purpose-Built SSD Storage Solution

Optimizing Flash-based Key-value Cache Systems

A New Metric for Analyzing Storage System Performance Under Varied Workloads

PC-based data acquisition II

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

Toward Seamless Integration of RAID and Flash SSD

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Optimizing Flash Storage With Linearization Software. Doug Dumitru CTO EasyCO LLC. EasyCo. Santa Clara, CA USA August

L7: Performance. Frans Kaashoek Spring 2013

The Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp

Flash Trends: Challenges and Future

GETTING GREAT PERFORMANCE IN THE CLOUD

stec Host Cache Solution

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

CSE 124: Networked Services Lecture-17

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Overview and Current Topics in Solid State Storage

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 23 Feb 2011 Spring 2012 Exam 1

Comparing Performance of Solid State Devices and Mechanical Disks

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

Demystifying Storage Area Networks. Michael Wells Microsoft Application Solutions Specialist EMC Corporation

StorMagic SvSAN 6.1. Product Announcement Webinar and Live Demonstration. Mark Christie Senior Systems Engineer

Guide to SATA Hard Disks Installation and RAID Configuration

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Guide to SATA Hard Disks Installation and RAID Configuration

The Google File System

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Transcription:

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) FAST 2013

Throughput (MB/s) Cloud and Virtualization What happens to storage? Guest VM Guest VM Guest VM Guest VM 350 300 250 200 150 100 50 0 4-Disk RAID0 Seq - VM+EXT4 Rand - VM+EXT4 1 2 3 4 5 6 7 # of Seq Writers Shared Disk SEQUENTIAL Shared Disk VMM RANDOM Shared Disk Shared Disk I/O CONTENTION causes throughput collapse 2

Existing Solutions for I/O Contention? I/O scheduling: reordering I/Os Entails increased latency for certain workloads May still require seeking Workload placement: positioning workloads to minimize contention Requires prior knowledge or dynamic prediction Predictions may be inaccurate Limits freedom of placing VMs in the cloud 3

Log-structured File System to the Rescue? [Rosenblum et al. 91] Write Log Tail Write Write Log all writes to tail Addr 0 1 2 N Shared Disk 4

Log Head Log Tail GC Read Challenges of Log-Structured File System Garbage collection is the Achilles Heel of LFS [Seltzer et al. 93, 95; Matthews et al. 97] Garbage Collection (GC) from Log Head N Shared Disk Shared Disk Shared Disk GC and read still cause disk seek 5

1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 Throughput (MB/s) Challenges of Log-Structured File System Garbage collection is the Achilles Heel of LFS 2-disk RAID-0 setting of LFS GC under write-only synthetic workload 250 200 Max Sequential Throughput RAID 0 + LFS 150 100 50 0 Throughput falls by 10X during GC GC Aggregate App Time (s) 6

Problem: Increased virtualization leads to increased disk seeks and kills performance RAID and LFS do not solve the problem 7

Rest of the Talk Motivation Gecko: contention-oblivious disk storage Sources of I/O contention New technique: Chained logging Implementation Evaluation Conclusion 8

Write Write Read Read Write Read What Causes Disk Seeks? Write-write Read-read Write-read VM 1 VM2 9

Write GC Read What Causes Disk Seeks? Write-write Read-read Write-read VM 1 VM2 Logging Write-GC read Read-GC read 10

Principle: A single sequentially accessed disk is better than multiple randomly seeking disks 11

GC Gecko s Chained Logging Design Separating the log tail from the body Eliminates write-gc read contention Physical Addr Space Log Tail Garbage collection from Log Head to Tail Eliminates write-write and reduces write-read contention Disk 0 Disk 1 Disk 2 GC reads do not interrupt the sequential write 1 uncontended drive >>faster>> N contended drives 12

GC Gecko s Chained Logging Design Smarter Compact-In-Body Garbage Collection Physical Addr Space Garbage collection from Log Head to Tail Log Tail Disk 0 Disk 1 Disk 2 Garbage collection from Head to Body 13

Read Read Write Read Gecko Caching What happens to reads going to tail drives? Reduces read-read contention Body Cache (Flash ) Tail Cache Reduces write-read contention LRU Cache RAM Hot data Disk 0 Disk 1 Disk 2 MLC SSD Warm data 14

Read Read Write Gecko Properties Summary Reduced read-write and read-read contention No write-write contention, No GC-write contention, and Reduced read-write contention Body Cache (Flash ) Tail Cache Fault tolerance Disk 0 Disk 1 Disk + 2 Read performance Mirroring/ Striping Disk 0 Disk 1 Disk 2 Power saving w/o consistency concerns 15

head tail head tail Gecko Implementation Primary map: less than 8 GB RAM for a 8 TB storage Logical-to-physical map (in memory) 4-byte entries W W R W W 4KB pages Data (in disk) R W W Disk 0 Disk 1 Disk 2 empty filled Physical-to-logical map (in flash) Inverse map: 8 GB flash for a 8 TB storage (written every 1024 writes) 16

Evaluation 1. How well does Gecko handle GC? 2. Performance of Gecko under real workloads? 3. Effect of varying Gecko chain length? 4. Effectiveness of the tail cache? 5. Durability of the flash based tail cache? 17

Evaluation Setup In-kernel version Implemented as block device for portability Similar to software RAID User-level emulator For fast prototyping Runs block traces Tail cache support Hardware WD 600GB HDD Used 512GB of 600GB 2.5 10K RPM SATA-600 Intel MLC (multi level cell) SSD 240GB SATA-600 18

1 11 21 31 41 51 61 71 81 91 101 111 1 12 23 34 45 56 67 78 89 100 111 122 Throughput (MB/s) Throughput (MB/s) How Well Does Gecko Handle GC? 2-disk setting; write-only synthetic workload Gecko Log + RAID0 250 250 GC 200 200 Max Sequential Aggregate Aggregate throughput 150 Throughput of 1 disk Max Sequential = Max throughput App 150 Throughput of 2 disks 100 100 Throughput collapses 50 50 GC Aggregate App 0 0 Time (s) Time (s) Gecko s aggregate throughput always remains high 3X higher aggregate & 4X higher application throughput 19

1 11 21 31 41 51 61 71 81 91 101 111 Throughput (MB/s) Throughput (MB/s) How Well Does Gecko Handle GC? 250 200 150 100 50 Gecko GC Aggregate App 250 200 150 100 50 0 Time (s) App throughput can be preserved using smarter GC 0 No GC Moveto-Tail GC Compactin-Body GC 20

MS Enterprise and MSR Cambridge Traces Trace Name Estimated Addr Space Total Data Accessed (GB) Total Data Read (GB) Total Data Written (GB) TotalIOReq NumReadReq NumWriteReq prxy 136 2,076 1,297 779 181,157,932 110,797,984 70,359,948 src1 820 3,107 2,224 884 85,069,608 65,172,645 19,896,963 proj 4,102 2,279 1,937 342 65,841,031 55,809,869 10,031,162 Exchange 4,822 760 300 460 61,179,165 26,324,163 34,855,002 usr 2,461 2,625 2,530 96 58,091,915 50,906,183 7,185,732 LiveMapsBE 6,737 2,344 1,786 558 44,766,484 35,310,420 9,456,064 MSNFS 1,424 303 201 102 29,345,085 19,729,611 9,615,474 DevDivRelease 4,620 428 252 176 18,195,701 12,326,432 5,869,269 prn 770 271 194 77 16,819,297 9,066,281 7,753,016 [Narayanan et al. 08, 09] 21

1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 Throughput (MB/s) Throughput (MB/s) What is the Performance of Gecko under Real Workloads? Mix of 8 workloads: prn, MSNFS, DevDivRelease, proj, Exchange, LiveMapsBE, prxy, and src1 6 Disk configuration with 200GB of data prefilled Gecko Log + RAID10 200 Read 150 200 Write 150 Read Write 100 50 0 100 50 0 Gecko Time (s) Mirrored chain of length 3 Tail cache (2GB RAM + 32GB SSD) Body Cache (32GB SSD) Log + RAID10 Time (s) Mirrored, Log + 3 disk RAID-0 LRU cache (2GB RAM + 64GB SSD) Gecko showed less read-write contention and higher cache hit rate Gecko s throughput is 2X-3X higher 22

Throughput (MB/s) What is the Effect of Varying Gecko Chain Length? Same 8 workloads with 200GB data prefilled 180 160 140 120 100 80 60 40 20 0 Read Write errorbar = stdev RAID 0 Single uncontended disk Separating reads and writes better performance 23

Hit Rate (%) Hit Rate (%) How Effective Is the Tail Cache? Read hit rate of tail cache (2GB RAM+32GB SSD) on 512GB disk 21 combinations of 4 to 8 MSR Cambridge and MS Enterprise traces SSD 100 90 80 70 60 50 40 30 20 10 0 RAM 100 90 80 70 60 50 1 3 5 7 9 11 13 15 17 19 21 Application Mix 100 200 300 400 500 Amount of Data in Disk (GB) At least 86% of read hit rate RAM handles most of hot data Amount of data changes hit rate Still average 80+ % hit rate Tail cache can effectively resolve read-write contention 24

SSD Lifetime (Days) How Durable is Flash Based Tail Cache? Static analysis of lifetime based on cache hit rate Use of 2GB RAM extends SSD lifetime 700 600 500 400 300 200 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Application Mix 2X-8X Lifetime Extension Lifetime at 40MB/s 25

Conclusion Gecko enables fast storage in the cloud Scales with increasing virtualization and number of cores Oblivious to I/O contention Gecko s technical contribution Separates log tail from its body Separates reads and writes Tail cache absorbs reads going to tail A single sequentially accessed disk is better than multiple randomly seeking disks 26

Question? 27