Enhancing SSD Control of NVMe Devices for Hyperscale Applications. Luca Bert - Seagate Chris Petersen - Facebook

Similar documents
Enabling NVMe I/O Scale

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Avoiding Costly Read Latency Variations in SSDs Through I/O Determinism

Utilitarian Performance Isolation in Shared SSDs

AutoStream: Automatic Stream Management for Multi-stream SSDs in Big Data Era

NVMe SSD s. NVMe is displacing SATA in applications which require performance. NVMe has excellent programing model for host software

A Case for IO Determinism for Hyperscale Applications Utilizing QLC Flash Memory

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

Exploring System Challenges of Ultra-Low Latency Solid State Drives

Virtual Storage Tier and Beyond

Toward SLO Complying SSDs Through OPS Isolation

OSSD: A Case for Object-based Solid State Drives

The Long-Term Future of Solid State Storage Jim Handy Objective Analysis

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Evolution of Rack Scale Architecture Storage

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

High-Speed NAND Flash

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

Dongjun Shin Samsung Electronics

Optimizing SSDs for Multiple Tenancy Use

Toward a Memory-centric Architecture

2 nd Half. Memory management Disk management Network and Security Virtual machine

Using Transparent Compression to Improve SSD-based I/O Caches

NVMe SSDs with Persistent Memory Regions

Building a High IOPS Flash Array: A Software-Defined Approach

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance

Raising QLC Reliability in All-Flash Arrays

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

RIDESHARING YOUR CLOUD DATA CENTER Realize Better Resource Utilization with NVMe-oF

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

VSSIM: Virtual Machine based SSD Simulator

Designing Next Generation FS for NVMe and NVMe-oF

Memory Systems DRAM, etc.

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Developing Extremely Low-Latency NVMe SSDs

PCIe Storage Beyond SSDs

Performance Assessment of an All-RRAM Solid State Drive Through a Cloud-Based Simulation Framework

Is Open Source good enough? A deep study of Swift and Ceph performance. 11/2013

Flash Trends: Challenges and Future

Unblinding the OS to Optimize User-Perceived Flash SSD Latency

Webscale, All Flash, Distributed File Systems. Avraham Meir Elastifile

Using FPGAs to accelerate NVMe-oF based Storage Networks

Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM

Developing Low Latency NVMe Systems for HyperscaleData Centers. Prepared by Engling Yeo Santa Clara, CA Date: 08/04/2017

CSE 451: Operating Systems Spring Module 12 Secondary Storage

Replacing the FTL with Cooperative Flash Management

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

INTEL NEXT GENERATION TECHNOLOGY - POWERING NEW PERFORMANCE LEVELS

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system

RIDESHARING YOUR CLOUD DATA CENTER Realize Better Resource Utilization with NVMe-oF

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

Making Storage Smarter Jim Williams Martin K. Petersen

March NVM Solutions Group

ReDefine Enterprise Storage

Introduction to Open-Channel Solid State Drives and What s Next!

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Can Embedded Applications Utilize the Latest Flash Storage Technologies?

Flashed-Optimized VPSA. Always Aligned with your Changing World

NVM Express 1.3 Delivering Continuous Innovation

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

L7: Performance. Frans Kaashoek Spring 2013

I/O Determinism and Its Impact on Data Centers and Hyperscale Applications

Scott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct :45 p.m. - 4:30 p.m. Moscone West - Room 3020

SSD Update Samsung Memory. August 2016 Samsung Memory NAND Marketing

Solid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Hardware NVMe implementation on cache and storage systems

Deterministic Storage Performance

The 3D-Memory Evolution

Adrian Proctor Vice President, Marketing Viking Technology

NVMe Client and Cloud Requirements, and Security 9:45 10:50

Analysis of SSD Health & Prediction of SSD Life

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

Introduction to Open-Channel Solid State Drives

Topic: A Deep Dive into Memory Access. Company: Intel Title: Software Engineer Name: Wang, Zhihong

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

NAND Interleaving & Performance

How does a Client SSD Controller Fit the Bill in Hyperscale Applications?

Building an All Flash Server What s the big deal? Isn t it all just plug and play?

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

Jim Harris Principal Software Engineer Intel Data Center Group

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Chunling Wang, Dandan Wang, Yunpeng Chai, Chuanwen Wang and Diansen Sun Renmin University of China

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

Experimental Results of Implementing NV Me-based Open Channel SSDs

All-Flash Storage System

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

What is QES 2.1? Agenda. Supported Model. Live demo

IBM DS8870 Release 7.0 Performance Update

Transcription:

Enhancing SSD Control of NVMe Devices for Hyperscale Applications Luca Bert - Seagate Chris Petersen - Facebook

Agenda Introduction & overview (Luca) Problem statement & proposed solution (Chris) SSD implication and proto work (Luca) Test results and analysis (All) Conclusion (All) 2

Problem Statement & Proposed Solution 3

NAND Flash Trends M.2 SSD Capacity (GB) 16384 14336 12288 10240 8192 6144 4096 2048 M.2 SSD capacity (GB) 0 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 4

SSD Capacity = Shared Resource A B C D SSD 5

Noisy Neighbors Request latency Application A Application B Application C Time 6

Introducing I/O Determinism New NVMe feature called: I/O Determinism SSD is configured as multiple NVM Sets (e.g. A, B, C) NVM Sets are QoS isolated regions A write to Set A does not impact a read to Set B or C One or more namespaces are allocated from an NVM Set 7

SSD Implications and Prototype Work SSD Data Layout Impact 8

Key Hurdles to Address Physical resource segregation Die have to be allocated to VM Sets Localization of the Globals Data: Maps, Journals, Logs, Metadata are global and need to be redesigned to be local to VM Sets Processes: Same for GC and other background processes Resources: OP, cache, memory... can be local or global 9

Baseline: Conventional SSD with 4 NS ONFi Channels NS1 NS2 NS3 16 die NS4 Global Maps OP 8 channels Pros Simplest configuration No logical physical affiliation Most balanced model Cons IO pattern conflicts 10

16 die IOD: Vertical Sets (VS) Dedicated Channels NS1 NS2 NS3 NS4 NS1 Maps NS2 Maps NS3 Maps NS4 Maps OP Pros Few dedicated channels for each Set No conflict in channel or Die Every Set runs in parallel resources 8 channels Cons 1/4 BW possible per Set (limited by number of dedicated channels) 11

IOD: Horizontal Sets (HS) Shared Channels NS1 NS2 NS3 NS4 NS1 Maps NS2 Maps NS3 Maps NS4 Maps OP Pros All channels are utilized by all Sets Full BW possible with single active Set (or shared BW with all Sets) No Conflict in Die Cons No Dedicated channels or queues for Set 12

IOD: Mixed Sets (MS) Shared Channels Shared Channels NS1 NS2 NS3 NS4 NS1 Maps NS2 Maps NS3 Maps NS4 Maps OP Pros Few dedicated channels for group of Sets No conflict in Die Groups of Sets runs in parallel resources Cons 1/2 of the BW possible per Set (limited by number of dedicated/shared channels) 13

Areas Not Covered in Proto Work R and W path are HW assisted No complete separation between them Conflicts do happen Cache memory is shared across VS Conflicts happen but limited by test nature R and W have same priority No priority to any specific IO No Erase Suspend 14

Test Results and Analysis 15

Configuration Under Test Facebook OCP Leopard System 2-socket Intel Xeon v4 CPU 32GB DRAM STX Nytro XP7102 (2TB emlc), 128 dies on 8 channels, ONFi @400 MT/s Tested with Baseline (No IOD), VSets and Hsets Writes = 32KB SW @QD4 Reads = 4KB RR @QD8 16

PICS 17

The Noisy Neighbor One Set under heavy writes, the other under heavy reads Write is 32K Seq Write with QD=4 Read is 4K Random Read with QD=8 Expected Outcome: minimal to no impact on Reads from the noisy Writes 18

WR on NS1 + RD on NS2 20 15 10 5 Zoom-in Area Avg Read Latency over 30 min (1800 seconds) BS-AVG-LAT HS-AVG-LAT VS-AVG-LAT 0 1 41 81 121 161 201 241 281 321 361 401 441 481 521 561 601 641 681 721 761 801 841 881 921 961 1001 1041 1081 1121 1161 1201 1241 1281 1321 1361 1401 1441 1481 1521 1561 1601 1641 1681 1721 1761 4000 3500 3000 2500 2000 1500 1000 500 0 Max Read Latency over 30 min (1800 seconds) 1 42 83 124 165 206 247 288 329 370 411 452 493 534 575 616 657 698 739 780 821 862 903 944 985 1026 1067 1108 1149 1190 1231 1272 1313 1354 1395 1436 1477 1518 1559 1600 1641 1682 1723 1764 BS-MAX-LAT HS-MAX-LAT VS-MAX-LAT 19

WR on NS1 + RD on NS2 1.8 1.6 Avg Read Latency 1.4 1.2 AVG=1.33 ms 1 0.8 0.6 9.5x improvement 7x improvement 0.4 0.2 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 226 235 244 253 262 271 280 289 298 307 316 325 334 343 352 361 370 379 388 397 AVG=0.19 ms AVG=0.14 ms BS-AVG-LAT HS-AVG-LAT VS-AVG-LAT 20

WR on NS1 + RD on NS2 140 Max Read Latency 120 100 80 60 40 AVG=55.7 ms 20 0 11x improvement 16x improvement 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 226 235 244 253 262 271 280 289 298 307 316 325 334 343 352 361 370 379 388 397 AVG=4.79 ms AVG=3.41 ms BS-MAX-LAT HS-MAX-LAT VS-MAX-LAT 21

WR on NS1 + RD on NS2 100.0000% Random 4k Read Latency (us) 0 1000 2000 3000 4000 5000 6000 7000 10.0000% Nines Probability 1.0000% 0.1000% 0.0100% 0.0010% BS-QD8 HS-QD8 VS-QD8 0.0001% Combined - 1 Sets 32K SW (QD=4), 1 Set 4k RR (QD=8) 22

WR on NS1 + RD on NS2 8 7 6 5 4 3 2 BS_AVG_LAT BS_P99 HS_AVG_LAT HS_P99 VS_AVG_LAT VS_P99 1 0 1 2 4 8 16 32 64 128 23

Conclusions 24

Conclusions Average values: IOD is behaving largely as expected Max values: there is more under the hood to address but even proto show huge advantages IOD is all about die separation but Controller, DRAM, ONFi, and other resources play key roles under specific conditions Most of all improvements are so high that there may be other usage models outside Hyperscalers 25

Visit Seagate Booth Thank You! Questions? #505 Learn about Seagate s portfolio of SSDs, flash solutions and system level products for every segment. www.seagate.com/nytro 26