Purity: building fast, highly-available enterprise flash storage from commodity components

Similar documents
The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Differential RAID: Rethinking RAID for SSD Reliability

SSD Applications in the Enterprise Area

[537] Flash. Tyler Harter

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Data Organization and Processing

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

CS311 Lecture 21: SRAM/DRAM/FLASH

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques

Presented by: Nafiseh Mahmoudi Spring 2017

Flash Trends: Challenges and Future

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Virtual Storage Tier and Beyond

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Health-Binning Maximizing the Performance and the Endurance of Consumer-Level NAND Flash

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

Using MLC Flash to Reduce System Cost in Industrial Applications

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

Understanding SSD overprovisioning

SolidFire and Pure Storage Architectural Comparison

White Paper: Understanding the Relationship Between SSD Endurance and Over-Provisioning. Solid State Drive

Benchmarking Enterprise SSDs

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

Yet other uses of a level of indirection...! Log-structured & Solid State File Systems Nov 19, Garth Gibson Dave Eckhardt Greg Ganger

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

SFS: Random Write Considered Harmful in Solid State Drives

Improving LDPC Performance Via Asymmetric Sensing Level Placement on Flash Memory

Optimizing Software-Defined Storage for Flash Memory

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

DMT in Embedded Storage Media Data Updates

QLC Challenges. QLC SSD s Require Deep FTL Tuning Karl Schuh Micron. Flash Memory Summit 2018 Santa Clara, CA 1

Pseudo SLC. Comparison of SLC, MLC and p-slc structures. pslc

Speeding Up Cloud/Server Applications Using Flash Memory

Universal Storage. Innovation to Break Decades of Tradeoffs VASTDATA.COM

CBM: A Cooperative Buffer Management for SSD

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Solid State Drive (SSD) Cache:

Improving MLC flash performance and endurance with Extended P/E Cycles

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

Flash In the Data Center

Flash File Systems Overview

Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD

EMSOFT 09 Yangwook Kang Ethan L. Miller Hongik Univ UC Santa Cruz 2009/11/09 Yongseok Oh

Using Transparent Compression to Improve SSD-based I/O Caches

Holistic Flash Management for Next Generation All-Flash Arrays

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Considerations to Accurately Measure Solid State Storage Systems

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

ECE Enterprise Storage Architecture. Fall 2018

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

Flash memory talk Felton Linux Group 27 August 2016 Jim Warner

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

3SE4 Series. Customer Approver. Innodisk Approver. Customer: Customer Part Number: Innodisk Part Number: Innodisk Model Name: Date:

A Novel Buffer Management Scheme for SSD

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

3MG2-P Series. Customer Approver. Approver. Customer: Customer Part Number: Innodisk Part Number: Model Name: Date:

Getting it Right: Testing Storage Arrays The Way They ll be Used

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

Large and Fast: Exploiting Memory Hierarchy

SSD WRITE AMPLIFICATION

VSSIM: Virtual Machine based SSD Simulator

Frequently asked questions from the previous class survey

SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto

CS 537 Fall 2017 Review Session

Memory Hierarchy Y. K. Malaiya

Optimizes Embedded Flash-based Storage for Automotive Use

3D NAND - Data Recovery and Erasure Verification

Copyright 2012 EMC Corporation. All rights reserved.

3SE4 Series. Customer Approver. Innodisk Approver. Customer: Customer Part Number: Innodisk Part Number: Innodisk Model Name: Date:

SolidFire and Ceph Architectural Comparison

Copyright 2012 EMC Corporation. All rights reserved.

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

1. Introduction. Traditionally, a high bandwidth file system comprises a supercomputer with disks connected

Over provisioning in solid state hard drives: benefits, design considerations, and trade-offs in its use

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Don t Let RAID Raid the Lifetime of Your SSD Array

Webscale, All Flash, Distributed File Systems. Avraham Meir Elastifile

FlexECC: Partially Relaxing ECC of MLC SSD for Better Cache Performance

Sub-block Wear-leveling for NAND Flash

Error correction in Flash memory 1. Error correction in Flash memory. Melissa Worley. California State University Stanislaus.

Micron Quad-Level Cell Technology Brings Affordable Solid State Storage to More Applications

Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive

3ME4 Series. Customer Approver. Innodisk Approver. Customer: Customer Part Number: Innodisk Part Number: Innodisk Model Name: Date:

Overprovisioning and the SanDisk X400 SSD

Transcription:

Purity: building fast, highly-available enterprise flash storage from commodity components J. Colgrove, J. Davis, J. Hayes, E. Miller, C. Sandvig, R. Sears, A. Tamches, N. Vachharajani, and F. Wang 0

Gala has already introduced HDD What is exactly SSD? Heroes of this talk 1

Outline Introduction to SSD Unique properties in SSD The pure storage system System processes Real world deployments 2

SSD - 101 SSD is composed of flash memory arrays Block: 2-16 MB Page: 0.5-4KB 3

Write/Read/Erase Operations in SSD Full bucket means 1 Empty (or almost empty) bucket means 0 We can read/write rows However must erase the entire block 1 level 0 level 4

Multi-level flash memories 5

Write/Read/Erase Operations in MLC SSD Data is represented by the amount of electrical charge Cells can be written individually However, can only be erased by erasing an entire block q-1 level 0 level 6

7

Google s Data Centers 8

It even determines the location 9

So why not always use SSD? Traditionally, data center s software and design were optimally developed for HDD 10

Purity All SSD enterprise storage system Claims to be cheaper Higher performance 11

Outline Introduction to SSD Unique properties in SSD The pure storage system System processes Real world deployments 12

SSD - unique properties 13

SSD unique properties #1 Wear For each P/E cycle the media becomes worse Y. cai, et al. Threshold Voltage Distribution in MLC NAND Flash Memory: Characterization, Analysis, and Modeling DATE13 14

Pages SSD unique properties #2 Garbage collection We can write a page but can only erase a block Typical numbers: 64-128 pages per block ~2048 blocks per flash drive Blocks 0 1 2 15

Pages SSD unique properties #2 Garbage collection We can write a page but can only erase a block Typical numbers: 64-128 pages per block ~2048 blocks per flash drive Blocks 0 4 8 12 we want to update block 0 1 2 5 6 9 10 13 14 3 7 11 15 First option: erase the block Second option: use overprovisioning 16

Pages SSD unique properties #2 Garbage collection Blocks 0 46 8 12 we want to also update pages 4, 5, 8 1 57 9 13 2 6 10 14 3 7 11 15 We are full! And now what? Tradeof:Space vs. lifetime Garbage collection 18

Main conclusions for SSD Random fast access (unlike disks) Performance heavily depends on workload The response of the device is not uniform We wish to write 3 : Scenario A Scenario B 0 0 4 8 12 0 1 1 5 9 13 1 2 2 6 10 14 2 3 7 11 15 19

SSD unique properties #2 Wear Leveling Let us assume we keep updating pages 0, 1, 2, 3 0 4 8 12 0 1 5 9 13 1 2 6 10 14 2 3 7 11 15 3 What will happen? 20

System benefits from fast access Some processes are no longer required to run on cache HDD requires much more (expensive) DRAM for cache Can perform system processes (many IOPS) that are hard to do in HDD: Deduplication Read speedup Log structure file system etc. 21

Outline Introduction to SSD Unique properties in SSD The pure storage system System processes Real world deployments 22

Pure Storage storage system 23

Comparison between Purity and disk based system We will try to explain this magic 24

Implementation 12x(10-16) Gb/s 11-24 MLC drives This actually SLC flash 25

Basic Architecture Each segment is striped across multiple SSDs Reed-Solomon is used in order to overcome two SSD failures The parity pages enable correcting a single corrupted page without reading the rest of the SSDs (???) 7+2 Drives 1MB write block 8MB taken from a single SSD 26

Outline Introduction to SSD Unique properties in SSD The pure storage system System processes Real world deployments 27

Processes in the storage system 28

Compression 101: Column-oriented database management system Insights: Reducing seek time in HDD (for example if we wish to count how many people get >48000) Many repeating patterns 29

Compression 101: Run-Length Encoding 30

Compression 101: Entropy encoding Data centers use lossless compression Let us consider a text file: aabacfffeedcbaaa The storage needed in order to store the file: (45+13+12+16+9+5)*1000*3 = 300,000 bits 31

Compression 101 cont. But what if The storage needed is given by: (45*1+13*3+12*3+16*3+9*4+5*4)*1000=224,000bits 25% reduction in needed storage! We pay in using encoder and decoder for each read/write operation This is the Huffman code 32

Deduplication Elimination of duplicate copies of repeating data Replacing duplicated files/data blocks by pointers 0 3 3 9 1 4 0 9 2 5 7 10 Deduplication ratio of 16/11=1.45 0 6 8 2 0 4 8 1 5 9 2 6 10 3 7 33

Deduplication cont. Examples from real life for cases in which a single file is saved many times on the same server: Downloading homework1 in algebra to 1000 Dropbox accounts Mail attachments Streaming music files from the cloud Backups! What is the practical storage reduction by deduplication and compression? 3 10 (can even reach 50 for backups) 34

Block Deduplication Deduplication can be performed by blocks of information (and not necessarily by files) For example: 35

Deduplication in Purity Tracks deduplication blocks at 512B granularity Keeps the hash value of every 8 th block In case of a match, the block is verified byte by byte Can detect 8*512B=4KB duplicates Purity performs: Inline deduplication: detecting duplicates before writing it on SSD Looks only in recently written data, and in frequently duplicated data In charge on most of the deduplication ratio Deduplication during garbage collection 36

Log Structure File System In a nut shell: in order to overcome the high seek times of disks, data is buffered in a log structure to create long sequential writes: File#1 File#2 Update File#1 File#2 Update File#2 File#3 This structure supports the way we write to SSD! No need to buffer the data in expensive RAM Garbage collection is inherent process in SSD Wear leveling algorithms in flash memories avoids fragmentation 37

Data Read speedup Vast majority of slow SSD reads happen while the SSD is in the middle of a write process Purtiy avoids writing to more than two SSDs per ECC group in the same time Time Threshold Reconstructing using the erasure codes Request Latency 38

Real world deployments 39

Reliability The company collects telemetry data form its customers Telemetry data includes: I/O request rates Request sizes Deduplication ratio etc. The company analyzed the data and can foresee failures and replace components before they fail They reach 99.999% availability (5min/year) 40

Reliability cont. SSDs are very reliable in all of Purity s data centers (the number is not given) only two SSDs have failed Most customers never approach the P/E ratings of consumer MLC drives The company offers free SSD replacement due to wear In order to fix errors, Purity uses: ECC Rewriting un-accessed data 41

Database deployments Customers usually deploy dozens/hundreds of database instances on top of a single Purity array The 5min rule: Data accessed more often than once every 5min belonged to RAM Colder data belonged on disk Several assumptions for analysis: The deduplication ratio is usually 3-8 for documents database, it is ~10 I/Os are 55KB on average 42

Relative Cost based on data form customers Disks are no good for high performance needs Without reduction, store everything that you can afford to lose in RAM With data reduction, never cache cold data (accessed less frequently than 30m) 43

Summary SSDs are fast random storage devices SSDs have unique properties like: Wear Garbage collection etc. By using all SSD storage system, it is possible to enhance some system processes which are gamechanging Therefore, although SSDs are more expensive, the Purity system is cost-effective 44

Questions? 45