Toward SLO Complying SSDs Through OPS Isolation

Similar documents
Caching less for better performance: Balancing cache size and update cost of flash memory cache in hybrid storage systems"

Presented by: Nafiseh Mahmoudi Spring 2017

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

SFS: Random Write Considered Harmful in Solid State Drives

Using Transparent Compression to Improve SSD-based I/O Caches

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

A Semi Preemptive Garbage Collector for Solid State Drives. Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim

Managing Array of SSDs When the Storage Device is No Longer the Performance Bottleneck

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Comparing Performance of Solid State Devices and Mechanical Disks

Optimizing Flash-based Key-value Cache Systems

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Performance Modeling and Analysis of Flash based Storage Devices

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

SSD Garbage Collection Detection and Management with Machine Learning Algorithm 1

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

The Unwritten Contract of Solid State Drives

VSSIM: Virtual Machine based SSD Simulator

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

FlashTier: A Lightweight, Consistent and Durable Storage Cache

Flash Trends: Challenges and Future

ReDefine Enterprise Storage

SoftNAS Cloud Performance Evaluation on AWS

SoftNAS Cloud Performance Evaluation on Microsoft Azure

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

OSSD: A Case for Object-based Solid State Drives

SSD Applications in the Enterprise Area

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

Enhancing SSD Control of NVMe Devices for Hyperscale Applications. Luca Bert - Seagate Chris Petersen - Facebook

From server-side to host-side:

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Toward Seamless Integration of RAID and Flash SSD

Improving MLC flash performance and endurance with Extended P/E Cycles

Empirical Analysis on Energy Efficiency of Flash-based SSDs

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud

Architecture Exploration of High-Performance PCs with a Solid-State Disk

PARDA: Proportional Allocation of Resources for Distributed Storage Access

Mass-Storage Structure

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

GearDB: A GC-free Key-Value Store on HM-SMR Drives with Gear Compaction

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

White Paper: Understanding the Relationship Between SSD Endurance and Over-Provisioning. Solid State Drive

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Hard Disk Drives (HDDs)

Managing Performance Variance of Applications Using Storage I/O Control

arxiv: v1 [cs.ar] 11 Apr 2017

High Performance SSD & Benefit for Server Application

FStream: Managing Flash Streams in the File System

Optimizing Fsync Performance with Dynamic Queue Depth Adaptation

An Evaluation of Different Page Allocation Strategies on High-Speed SSDs

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

How to Speed up Database Applications with a Purpose-Built SSD Storage Solution

Flash In the Data Center

Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers

2009. October. Semiconductor Business SAMSUNG Electronics

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Power Analysis for Flash Memory SSD. Dongkun Shin Sungkyunkwan University

OASIS: Self-tuning Storage for Applications

Amnesic Cache Management for Non-Volatile Memory

FAQs HP Z Turbo Drive Quad Pro

Extremely Fast Distributed Storage for Cloud Service Providers

A Flash Scheduling Strategy for Current Capping in Multi-Power-Mode SSDs

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

I/O Devices & SSD. Dongkun Shin, SKKU

CFS-v: I/O Demand-driven VM Scheduler in KVM

Partitioned Real-Time NAND Flash Storage. Katherine Missimer and Rich West

Overview and Current Topics in Solid State Storage

StorMagic SvSAN 6.1. Product Announcement Webinar and Live Demonstration. Mark Christie Senior Systems Engineer

THE HYBRID CLOUD DATA STRATEGY

Getting Real: Lessons in Transitioning Research Simulations into Hardware Systems

WHITE PAPER. Optimizing Virtual Platform Disk Performance

Unblinding the OS to Optimize User-Perceived Flash SSD Latency

Exploring System Challenges of Ultra-Low Latency Solid State Drives

Enabling NVMe I/O Scale

An Efficient Memory-Mapped Key-Value Store for Flash Storage

Comparison of Storage Protocol Performance ESX Server 3.5

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

WHITE PAPER. Know Your SSDs. Why SSDs? How do you make SSDs cost-effective? How do you get the most out of SSDs?

Performance of PC Solid-State Disks

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

Transcription:

Toward SLO Complying SSDs Through OPS Isolation October 23, 2015 Hongik University UNIST (Ulsan National Institute of Science & Technology) Sam H. Noh 1

Outline Part 1: FAST 2015 Part 2: Beyond FAST 2

Outline Part 1: FAST 2015 Part 2: Beyond FAST 3

Flash Memory Everywhere From embedded to server storage Target environment

Introduction Infrastructure convergence Virtualized servers 5

Motivation Virtualization system Need to satisfy Service Level Objective (SLO) for each VM SLO is provided through hardware resource isolation Existing solutions for isolating CPU and memory Distributed resource scheduler [VMware inc.] Memory resource management in VMware ESX server [SIGOPS OSR 2002] [Distributed resource scheduler] [ESX server] 6

Motivation Few studies on SSD resource isolation S-CAVE [PACT 13], vcacheshare [USENIX ATC 14] Challenges for isolating SSDs Performance is quite sensitive to workload characteristics More complex architecture than HDD VM VM VM VM Hypervisor SSD isolation? Any studies? CPU Memory SSD 7

Questions Raised Is I/O bandwidth of the shared SSD proportionally distributed among the VMs? How does state of the SSD affect proportionality? Everybody happy with given weight? VM1 VM2 Bronze Silver VM3 State of SSD VM4 Gold Platinum Clean SSD Aged SSD

Experiments on Commercial SSD Linux kernel-based virtual machine (KVM) on 4 VMs Assign proportional I/O weight Using Cgroups feature in Linux kernel 3.13.x VM-x: x is I/O weight value (Higher value: Allocate higher throughput) SSD as shared storage 128GB capacity, SATA3 interface, MLC Flash clean SSD: empty SSD aged SSD: full SSD (busy performing garbage collection) Each VM runs the same workload concurrently Financial, MSN, and Exchange VM-1 VM-2 VM-5 VM-10 Bronze Silver Gold Platinum I/O weight Hypervisor (KVM) CPU Memory SSD Two states: Clean & Aged

`Results HDD: Proportionality close to I/O weight Not so, for SSD worse for aged SSD Proportionality of I/O bandwidth 12 10 8 6 4 2 0 Ideal proportionality Why? Ideal HDD SSD clean SSD aged VM-1 VM-2 VM-5 VM-10 I/O weight I/O bandwidth relative to VM-1 Financial

Monitor Internal Workings of SSD Commercial SSD: Proprietary, black box SSDs Monitor using simulator SSD simulator: DiskSim SSD Extension Workloads: Financial, MSN, and Exchange Traces are captured as VMs run concurrently on real system SSD Trace input VM-F (Financial) VM-M (MSN) VM-E (Exchange) VM-F VM-M VM-E Hypervisor (KVM) SSD blktrace SSD simulator...

Analysis #1 : Mixture of Data Within block (GC unit): mixture of data from all VMs page SSD VM-F (Financial) VM-M (MSN) VM-E (Exchange) Data of all VMs are mixed into a block block OPS : VM-F data : VM-M data : Invalid data : Free page Over-Provisioned Space (OPS) - reserved space for write reqs. - used for garbage collection (GC) : VM-E data Data layout of conventional SSD 12

Analysis #2 : Interference among VMs during GC Movement of data: live pages of workloads other than the one invoking GC SSD VM-F (Financial) VM-M (MSN) VM-E (Exchange) 1) Victim block for GC 2) Pages moved to OPS OPS : VM-F data : VM-M data : Invalid data : Free page : VM-E data Data layout of conventional SSD Over-Provisioned Space (OPS) - reserved space for write reqs. - used for garbage collection (GC)

Analysis #3: Work induced by other VMs From one VM s viewpoint: doing unnecessary work induced by other workloads While executing VM-F (Financial) workload, only 30% of moved pages are its own Number of pages moved Number of pages moved for each workload during GC 5.0E+5 3.8E+5 2.5E+5 1.3E+5 0.0E+0 Financial MSN Exchange Owned by Exchange Owned by MSN Owned by Financial

More Closely GC leads to interference problem among VMs GC operation employed by one VM is burdened with other VM s pages VM-F VM-M VM-E SSD garbage garbage garbage (GC) garbage Data Area garbage OPS Area Employed by VM-F Why do I have to clean others? GC operation in conventional SSD 15

Avoiding Interference Cost of GC is major factor in SSD I/O performance Each VM should pay only for its own GC operation VM-F VM-M VM-E SSD garbage garbage garbage garbage garbage Data Area (GC) (GC) (GC) VM-F OPS VM-M OPS VM-E OPS 16

Proposed scheme: OPS isolation Dedicate flash memory blocks, including OPS, to each VM separately when allocating pages to VMs Prevent interference during GC Conventional SSD SSD with OPS isolation VM-F (Financial) VM-M (MSN) VM-E (Exchange) VM-F (Financial) VM-M (MSN) VM-E (Exchange) SSD SSD VM-F OPS OPS VM-M OPS VM-E OPS :VM-F data :VM-M data :VM-E data : Invalid data : Free page 17

VM OPS Allocation How much OPS for each VMs to satisfy SLO? SSD Fixed size Flexible size Data Space VM-F OPS VM-M OPS VM-E OPS OPS size per VM? 18

IOPS of SSD Constant value Constant value SSD IOPS = 1 / (tgc + tprog + txfer) Variable value (Crucial factor for IOPS) Determined by OPS size Parameter tgc tprog txfer Meaning Time to GC (depends on utilization (u) of victim block at GC) Time for programming a page (constant value) Time for transferring a page (constant value) 19

How to meet SLO (IOPS) of each VM? : Dynamically adjust OPS SSD state #1 Data Space OPS VM1 OPS VM2 OPS VM3 SSD state #2 Enlarged Shrunk Data Space OPS VM1 OPS VM2 OPS VM3 IOPS of state #2 IOPS of VM1 = Prev. IOPS + Δ IOPS of VM2 = Prev. IOPS IOPS of VM3 = Prev. IOPS Δ

Evaluation of OPS isolation Evaluation environment SSD simulator: DiskSim SSD Extension FTL: Page-mapped FTL GC: Greedy policy Aged state SSD Workloads: Financial, MSN, and Exchange Parameter Page size Block size Page read Page write Block erase Xfer latency (Page unit) Description Traces captured as VMs run concurrently on real system 4KB 512KB 60us OPS 5% 800us 1.5ms 102us Host interface Tags of VM ID are informed to SSD 21

Results x-axis: groups of VMs that are executed concurrently y-axis: proportionality of I/O bandwidth relative to smallest weight SLO satisfied (somewhat) by OPS Proportionality of I/O bandwidth 12 10 8 6 4 2 0 (Ideal) 1:5:10 10:2:1 1:2:5 Group of VMs VM-F (Financial) VM-M (MSN) VM-E (Exchange) Weights allotted to VMs 22

Conclusion of Part I Performance SLOs can not be satisfied with current commercial SSDs Garbage collection interference between VMs Propose OPS isolation Allocate blocks to VM in isolation via OPS allocation Do not allow mix of pages in same block Size of OPS is dynamically adjusted per VM OPS isolation: effective in providing performance SLOs among competing VMs 23

- Is OPS isolation satisfactory?

- Is OPS isolation satisfactory? - What about other resources? - Channels, Buffer, NCQ, etc

Outline Part 1 FAST 2015 Part 2: Beyond FAST Still on-going 26

SSD Components Considered Channel Data bus to connect controller to flash memory package Write cache Small amount of DRAM used as volatile cache NCQ SATA technology used to internally optimize the execution order of received disk read and write commands From: wikipedia

Devices SSD Samsung 850 PRO 256GB HDD Seagate Barracuda 1TB 7200 RPM SSD WBuf OFF Samsung 850 PRO 256GB SSD with disabling write cache SSD 1CH Commercial Samsung SSD Set to use 1-channel

Workloads Micro benchmark Random write Macro benchmark Fileserver workload from fio benchmark Traces Proj trace from MSR Cambridge Exchange trace from MS corporate mail

Channel Parallelism & Write Cache Random writes proportionality, generally, close to I/O weight Random write 32KB Proportionality of I/O BW (relative to VM1) 12 10 8 6 4 2 0 (Ideal) VM1(100) VM2(200) VM3(500) VM4(1000) SSD HDD SSD WBuf OFF SSD 1CH I/O weight 30

Channel Parallelism & Write Cache Fileserver proportionality deviates for SSD and SSD 1CH Fileserver Proportionality of I/O BW (relative to VM1) 12 10 8 6 4 2 (Ideal) SSD HDD SSD WBuf OFF SSD 1CH 0 VM1(100) VM2(200) VM3(500) VM4(1000) I/O weight 31

Time Series Analysis of Fileserver on SSD Write: 100% Throughput fluctuations Read:100% Throughput decreases Throughput (MB/s) Time (sec) VM4(1000) VM3(500) VM2(200) VM1(100) Time (sec) VM4(1000 VM3(500) VM2(200) VM1(100) Read: 40% Throughput (MB/s) Proportionality deviates VM4(1000) VM3(500) VM2(200) VM1(100)

Time Series Analysis of Fileserver on SSD with Write Cache OFF Throughput (MB/s) On Write: 100% Proportionality is close to I/O weight Read: 100% Throughput decreases (Write cache OFF has minimal effect) VM4(1000) VM3(500) VM2(200) VM1(100) Time (sec) Time (sec) VM4(1000) VM3(500) VM2(200) VM1(100) Proportionality is close to I/O weight Read: 40% VM4(1000) VM3(500) VM2(200) VM1(100)

Effect of NCQ Time series analysis with 2 VMs Proj on SSD Exchange on SSD Throughput (MB/s) 500 400 300 200 100 Throughput fluctuations & inversions 500 400 300 200 100 No fluctuations & proportional I/O throughput 0 1 11 21 31 41 51 61 71 81 91 101 111 0 1 17 33 49 65 81 97 113 129 145 161 177 Time (sec) Time (sec) VM1 (100) VM3 (500) VM1 (100) VM3 (500) I/O weight

Proj Trace on SSD with NCQ OFF Effect of NCQ in SSD Proj on SSD (NCQ depth=31) Proj on SSD (NCQ depth=1) 500 500 Throughput (MB/s) 400 300 200 100 0 1 12 23 34 45 56 67 78 89 100 111 Time (sec) NCQ OFF 400 300 200 100 0 No fluctuations & proportional I/O throughput 1 13 25 37 49 61 73 85 97 109 Time (sec) VM1 (100) VM3 (500) VM1 (100) VM3 (500)

Conclusion Conducted analysis of I/O SLO through examining major components of SSD GC, Write cache, Channel, and NCQ SSD components affect I/O SLO under various workloads Future work Analyze OS components for I/O SLO on SSD 36

Thank you!!! 37 Q & A