Performance Benefits of Running RocksDB on Samsung NVMe SSDs

Similar documents
Maximizing Data Center and Enterprise Storage Efficiency

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

FStream: Managing Flash Streams in the File System

A Real-Time, Low Latency, Key-Value Solution Combining Samsung Z-SSD and Levyx s Helium Data Store. January 2018

Seagate Enterprise SATA SSD with DuraWrite Technology Competitive Evaluation

AutoStream: Automatic Stream Management for Multi-stream SSDs in Big Data Era

March NVM Solutions Group

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Overprovisioning and the SanDisk X400 SSD

RocksDB Key-Value Store Optimized For Flash

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

A New Key-Value Data Store For Heterogeneous Storage Architecture

Benchmarking Enterprise SSDs

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Accelerating OLTP performance with NVMe SSDs Veronica Lagrange Changho Choi Vijay Balakrishnan

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

Supermicro All-Flash NVMe Solution for Ceph Storage Cluster

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores

INTEL NEXT GENERATION TECHNOLOGY - POWERING NEW PERFORMANCE LEVELS

Exploiting the benefits of native programming access to NVM devices

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

arxiv: v1 [cs.db] 25 Nov 2018

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

SLM-DB: Single-Level Key-Value Store with Persistent Memory

New Approach to Unstructured Data

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

SPDK Blobstore: A Look Inside the NVM Optimized Allocator

Understanding Write Behaviors of Storage Backends in Ceph Object Store

Flashed-Optimized VPSA. Always Aligned with your Changing World

Samsung PM863 and SM863 for Data Centers

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

SSD Applications in the Enterprise Area

HP Z Turbo Drive G2 PCIe SSD

Three Paths to Better Business Decisions

HashKV: Enabling Efficient Updates in KV Storage via Hashing

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

webmethods Task Engine 9.9 on Red Hat Operating System

Apache Spark Graph Performance with Memory1. February Page 1 of 13

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

An Intelligent & Optimized Way to Access Flash Storage Increase Performance & Scalability of Your Applications

Deep Learning Performance and Cost Evaluation

Using Transparent Compression to Improve SSD-based I/O Caches

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Freewrite: Creating (Almost) Zero-Cost Writes to SSD

Developing Extremely Low-Latency NVMe SSDs

ADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION

vsan 6.6 Performance Improvements First Published On: Last Updated On:

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

Scale your Data Center with SAS Marty Czekalski Market and Ecosystem Development for Emerging Technology, Western Digital Corporation

stec Host Cache Solution

Purity: building fast, highly-available enterprise flash storage from commodity components

The Impact of Persistent Memory and Intelligent Data Encoding

Technology Advancement in SSDs and Related Ecosystem Changes

Open storage architecture for private Oracle database clouds

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Virtualization of the MS Exchange Server Environment

Deep Learning Performance and Cost Evaluation

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

Samsung SSD PM863 and SM863 for Data Centers. Groundbreaking SSDs that raise the bar on satisfying big data demands

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Scaling In-Memory Data Processing with Samsung Advanced DRAM and NAND/SSD Solutions

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Linux Kernel Abstractions for Open-Channel SSDs

Optimizing Software-Defined Storage for Flash Memory

Implementing Virtual NVMe for Flash- Storage Class Memory

White Paper: Understanding the Relationship Between SSD Endurance and Over-Provisioning. Solid State Drive

Understanding Data Locality in VMware vsan First Published On: Last Updated On:

Beyond Block I/O: Rethinking

Samsung Z-SSD SZ985. Ultra-low Latency SSD for Enterprise and Data Centers. Brochure

FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs

It Takes Guts to be Great

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

Linux Kernel Extensions for Open-Channel SSDs

Replacing the FTL with Cooperative Flash Management

Consolidating Microsoft SQL Server databases on PowerEdge R930 server

Architecture of a Real-Time Operational DBMS

Webscale, All Flash, Distributed File Systems. Avraham Meir Elastifile

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

DELL EMC POWER EDGE R940 MAKES DE NOVO ASSEMBLY EASIER

SSDs Driving Greater Efficiency in Data Centers

Next-Generation Cloud Platform

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

EMC XTREMCACHE ACCELERATES ORACLE

Transcription:

Performance Benefits of Running RocksDB on Samsung NVMe SSDs A Detailed Analysis 25 Samsung Semiconductor Inc.

Executive Summary The industry has been experiencing an exponential data explosion over the past decade, a trend that we see continuing in the coming years, moving us into something that we call a Data Deluge. As a result of this Data Deluge, huge amounts of unstructured data are ending up in massive data centers and cloud storage servers across the world. To store so much big data and process it at blinding speed, data center architects are pursuing storage solutions with very low latency and small, power-optimized form factors. Hard disk drives (HDDs) cannot meet the necessary performance requirements for big data applications. Unfortunately, many solid state drives are also constrained by the maximum throughput limitation of a SATA interface. Recognizing the difficulty of the challenge, and to take advantage of the full speed of NAND flash memory, the industry is now progressing towards a more scalable enterprise interface known as NVMe. With NVMe SSDs gaining considerable market traction, typical technical specifications on performance do not make it easy for customers to decide between SATA SSDs and NVMe SSDs. Very few benchmark studies have been done that compare real world applications, such as database workloads, running on NVMe SSDs versus SATA SSDs. This whitepaper addresses that void. This initiative uses a persistent key-value store, RocksDB, on both NVMe SSDs (with and without multi-streaming) and SATA SSD to provide a comparative performance analysis for different workloads. Using Samsung NVMe SSDs, the essential findings are: RocksDB is 3.8x faster compared to SATA SSDs for 2% read/8% update workload. Average read latency decreased by 73% for 8% read/2% update workload. 99 th percentile read latency decreased by 78% for 8% read/2% update workload. Using Samsung NVMe multi-stream SSDs, the performance and endurance benefits are even better. The essential findings are: RocksDB is 4.4x faster compared to SATA SSDs for 2% read/8% update workload. SSD lifetime increased by 54% (compared to NVMe SSDs without multi-streaming) with a 2% read/8% update workload. Please note that these benchmark results were obtained without significant tweaking or optimization and, thus, are very achievable. 2

RocksDB Overview RocksDB is an open-source, embeddable, persistent key-value store for fast storage. Developed by Facebook, it is based on the LevelDB database. RocksDB uses a Log Structured Database Engine for storage and features very flexible, tunable configuration settings that allow it to operate in a variety of production environments, including pure DRAM memory and Flash, using standard or distributed file systems such as HDFS. In operation, inserts are fast because new data goes into an in-memory data-structure, memtable, and optionally into logfile a persistent storage write-ahead commit log. This is later flushed to sstfile (in the L level described below) on non-volatile storage when the memtable is full. The entire database is stored in a set of sstfiles and arranged at multiple levels (the L L6 levels described below). Each sstfile contains data blocks that are sorted by key and stores metadata and indexing information. To reduce read and space amplification, concurrent compactions at different levels remove duplicates and overwritten data. But, compaction increases the stored data's write amplification. Write amplification in SSDs The minimum SSD read granularity is a page. Though writes have a page granularity level, they may entail an additional erase cycle. Such erasures happen only at the erase block granularity level. As an SSD fills with data, a garbage collection process needs to copy valid page data from fragmented blocks before erasing the blocks to make them available for further writes. Here, fragmented means media blocks have areas with invalidated data intermixed with areas containing valid data. Copying valid page data leads to an increase in write amplification which effectively increases the time for actual data storage activities. Introducing the NVMe SSD Multi-Stream Feature Preventing garbage collection during write operations translates to highly efficient SSD processing usage. Samsung PM953S NVMe SSD multi-stream feature does exactly that. Multi-stream reduces valid data fragmentation within blocks by grouping data with the same lifetime within the same erase block. In doing so, host applications provide data characteristic hints such as expected data-lifetime. An application could first select a stream ID to associate data in data-write requests, allowing the SSD to store data with the same stream ID in a same erase block. This can significantly decrease garbage collection activity, thereby reducing latency - translating to an IOPS throughput increase. Reducing the associated write amplification extends SSD NAND Flash media lifetime as well. Leveraging RocksDB s SSTFile for Multi-Streaming 3

RocksDB uses a log-structured merge process. Its sstfile has multiple levels (L LN), with each lower level (higher numbered) retaining increasingly colder data than its predecessor. This arrangement allows each sstfile level to have a unique assigned stream ID, differentiating consistently accessed/merged/deleted sstfile data (say, in Level L) from colder sstfile data which is not accessed/merged/deleted as frequently (say, in Level L6). Benchmark Environment The selected server was a Dell PowerEdge R73xd with dual Intel Xeon E5-264v3 running at 2.6 GHz and 64GB of memory. In total, the server provided 6 physical cores. With hyper-threading enabled, the total logical CPU count was 32. We kept a :4 physical core to main memory ratio (6 cores: 64GB memory). For the SSD, we used the power-optimized, M.2 form factor Samsung PM953 and PM953S NVMe SSD which is targeted for data center and cloud service providers. That said, Samsung PM725 NVMe SSD continues to be the flagship product within Samsung's NVMe SSD portfolio. In terms of documented performance, the Samsung PM725 is 4.5x faster with 4K random reads, 5.5x faster with 4K random writes than the Samsung PM953. Therefore, PM725 performance is much higher. Table. Server Hardware and OS Configuration Processor/Memory Details Operating System SSD Details Processor Dual Socket: Intel(R) Xeon(R) CPU E5-264 v3 @ 2.6GHz. Total Logical CPU : 32 Total memory : 64 GB Distro : Ubuntu 4.4. LTS Kernel : 3.9.--generic, patched for multi-stream support Arch : x86_64 SSD : x Samsung NVMe PM953 M.2 48GB SSD x Samsung NVMe PM953S M.2 48GB SSD x professional grade commercial SATA 48 GB SSD 4

The difference between PM953 and PM953S is the multi-stream feature. i.e., PM953S supports multistream feature while PM953 does not. For a baseline comparison, the selected SATA SSD is a professional-grade, commercially available, SSD. Table 2. Software Details Software Functionality Version/Remarks RocksDB Persistent Embedded Key Value Store 3. (Modified to add multi-stream support) YCSB Yahoo Cloud Benchmark Tool..4 SSDB-Rocks Provides an interface to RocksDB for YCSB.6.6 YCSB and SSDB have been set up to run on the same server. The dataset was initially created by performing 37 Million, K key-value record inserts. The same dataset was used for all benchmark runs. To get the SSD and the database into a stable steady-state, we performed preconditioning consisting of 2% read/8% update for 2 Billion operations. Benchmark workloads ran after this pre-conditioning. Table 3. Benchmark Workload Details Parameter Value No of Operations 5 Million No of YCSB Threads 32 With data centers already embracing super-fast SSDs, the world is ready to replace and, in some cases, complement rotational HDDs in many data-centers. Therefore, we didn't focus on providing an indepth performance benefit analysis of migrating from HDD to SATA SSDs or NVMe SSDs. We used metrics such as throughput, average read and update latency for comparing a commercially available SATA SSD, the Samsung PM953 NVMe SSD without multi-stream, and the same drive with multi-stream. Since actual NAND write operation data was unavailable on the SATA SSD, we only compared NVMe SSD Write Amplification Factors. 5

Relative Improvement Benchmark Results for 2% Read/8% Update The benchmark workload of 2% read and 8% update was performed for an operation count of 5 Million (i.e., Million reads and 4 Million updates.) Figure. 2/8 Workload Throughput and WAF Performance Benefit 5 4.5 4 3.5 3 Throughput 3.8 4.4.7.6.5.4.6 WAF 54% Lifetime Increase 2.5.3 2.5.2.5. PM953.4 PM953S The throughput increase with the Samsung PM953S NVMe SSD (with multi-stream) was 4.4x compared to the SATA SSD. (i.e., the benchmark runtime decreased by more than 77.2% when the Samsung NVMe SSD with multi-stream support was used.) The PM953S NVMe SSD multi-stream feature increased the throughput performance by 5.45% over the PM953 NVMe SSD (without multi-stream feature). Clearly, multi-streaming has a profound effect on the Write Amplification factor. It reduced extra data writes by 93%, thereby increasing the lifetime of the SSD by 54%. 6

Relative Latency Relative Latency Figure 2. 2/8 Workload Average Read and Update Latency Performance Benefit Avg Read Latency Avg Update Latency.2.2.8.8.6.4.6.4.58.49.2.24.2.2 The average read latency was reduced by 79% with a multi-stream-enabled PM953S NVMe SSD and 76% with an NVMe SSD when compared with a SATA SSD. The average update latency was reduced by 5% with a multi-stream-enabled PM953S NVMe SSD and 42% with a PM953 NVMe SSD when compared with a SATA SSD. The 99 th percentile read and update latency was reduced by 7% and 54% with a multi-stream-enabled PM953S NVMe SSD when compared with a SATA SSD. Benchmark Results for 5% Read/5% Update The benchmark workload of 5% read and 5% update were performed for an operation count of 5 Million (i.e., 25 Million reads and 25 Million updates.) The throughput increase with the Samsung PM953S NVMe SSD (with multistreaming) was 4x compared to the SATA SSD (i.e., the benchmark runtime decreased by more than 75% when using the Samsung PM953S NVMe SSD with multi-stream support. The PM953S NVMe SSD multi-stream feature increased throughput performance by 4.6% over the PM953 NVMe SSD (without multistreaming). With a multi-stream-enabled SSD, extra writes decreased by 9%, thereby extending the SSD s lifetime by 39%. 7

Relative Latency Relative Latency Relative Improvement Figure 3. 5/5 Workload Throughput and WAF Performance Benefit 4.5 4 3.5 Throughput 3.8 4.5.45.4.35.45 WAF 3 2.5 2.5.5.3.25.2.5..5 PM953 39% Lifetime Increase.4 PM953S Figure 4. 5/5 Workload Average Read and Update Latency Performance Benefit Avg Read Latency Avg Update Latency.2.2.8.8.6.6.7.63.4.4.2.25.24.2 Compared with the SATA SSD, the average read latency decreased 76% with the multi-stream-enabled PM953S NVMe SSD and 75% with the PM953 NVMe SSD. The average update latency decreased 37% 8

Relative Improvement with multi-stream-enabled PM953S NVMe SSD and 3% with PM953 NVMe SSD in comparison to a SATA SSD. The 99 th percentile read and update latency was reduced by 95% and 47% respectively with multistream-enabled PM953S NVMe SSD when compared with a SATA SSD. Benchmark Results for 8% Read/2% Update The benchmark workload of 8% read and 2% update was performed for an operation count of 5 million (i.e., 4 million reads and million updates.) Figure 5. 8/2 Workload Throughput and WAF Performance Benefit 4 3.5 3 Throughput 3.7 3.74.35.3.25.3 WAF 2.5.2 27% Lifetime Increase 2.5.5..5.5 PM953.3 PM953S The Samsung PM953S NVMe SSD (with multi-stream) throughput increases 3.74x compared to SATA SSD throughput (i.e., the benchmark runtime decreased more than 73% using the Samsung PM953S NVMe SSD with multi-stream support). In addition, the Samsung NVMe multi-stream-enabled SSD reduced extra writes by 9%, thereby extending the lifetime of the SSD by 27%. 9

Relative Latency Relative Latency Figure 6. 8/2 Workload Average Read and Update Latency Performance Benefit Avg Read Latency Avg Update Latency.2.2.8.8.6.6.4.4.49.48.2.27.26.2 When compared with the SATA SSD, the average read latency of the multi-stream-enabled PM953S NVMe and PM953 NVMe SSDs decreased by 74% and 73%, respectively. In comparison with an SATA SSD, the average update latency for the multi-stream-enabled PM953S NVMe SSD decreased by 52%, and by 5% for PM953 NVMe SSD. The 99 th percentile read and update latencies were reduced by 64% and 53% respectively, with a multistream-enabled PM953S NVMe SSD when compared to a SATA SSD. Conclusion These benchmark results were obtained without significant tweaking or optimization and, therefore are very achievable. I/O stack optimization opportunities still exist, which could enable higher performance, inching towards the maximum NVMe SSD hardware capability. Also, this whitepaper did not explore the power reduction benefits of the power-optimized Samsung PM953 (M.2 form factor). Benchmark results show that using Samsung PM953S NVMe multi-stream SSD over SATA SSD in this application configuration provides:

4.4x overall IOPS throughput improvement 54% SSD lifetime increase (compared to PM953 NVMe SSDs) 79% read latency reduction This throughput increase and runtime reduction clearly indicate that, to deliver better performance, real world applications such as RocksDB can exploit faster NVMe devices that provide storage intelligence like the T standardized multi-stream feature. Using such devices simultaneously sharply extend an SSD s lifetime.