Optimizing Flash-based Key-value Cache Systems

Similar documents
DIDACache: An Integration of Device and Application for Flash-based Key-value Caching

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage

FlashTier: A Lightweight, Consistent and Durable Storage Cache

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

SlimCache: Exploiting Data Compression Opportunities in Flash-based Key-value Caching

A File-System-Aware FTL Design for Flash Memory Storage Systems

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

Flash Memory Based Storage System

Toward SLO Complying SSDs Through OPS Isolation

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

Presented by: Nafiseh Mahmoudi Spring 2017

White Paper: Understanding the Relationship Between SSD Endurance and Over-Provisioning. Solid State Drive

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

QLC Challenges. QLC SSD s Require Deep FTL Tuning Karl Schuh Micron. Flash Memory Summit 2018 Santa Clara, CA 1

Linux Kernel Abstractions for Open-Channel SSDs

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

SFS: Random Write Considered Harmful in Solid State Drives

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

arxiv: v1 [cs.db] 25 Nov 2018

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

FFS: The Fast File System -and- The Magical World of SSDs

Using Transparent Compression to Improve SSD-based I/O Caches

ECE 598-MS: Advanced Memory and Storage Systems Lecture 7: Unified Address Translation with FlashMap

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

The Unwritten Contract of Solid State Drives

SOS : Software-based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

SMORE: A Cold Data Object Store for SMR Drives

VSSIM: Virtual Machine based SSD Simulator

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data

CFDC A Flash-aware Replacement Policy for Database Buffer Management

Chunling Wang, Dandan Wang, Yunpeng Chai, Chuanwen Wang and Diansen Sun Renmin University of China

Application-Managed Flash

SUPA: A Single Unified Read-Write Buffer and Pattern-Change-Aware FTL for the High Performance of Multi-Channel SSD

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

OSSD: A Case for Object-based Solid State Drives

pblk the OCSSD FTL Linux FAST Summit 18 Javier González Copyright 2018 CNEX Labs

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System

LAST: Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

Speeding Up Cloud/Server Applications Using Flash Memory

Workload-Aware Elastic Striping With Hot Data Identification for SSD RAID Arrays

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

KVFTL - Optimization of Storage Space Utilization for Key-Value-Specific Flash Storage Devices

Don t stack your Log on my Log

ASEP: An Adaptive Sequential Prefetching Scheme for Second-level Storage System

A New Key-Value Data Store For Heterogeneous Storage Architecture

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

From server-side to host-side:

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems

FlashKV: Accelerating KV Performance with Open-Channel SSDs

OS-caused Long JVM Pauses - Deep Dive and Solutions

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

1. Creates the illusion of an address space much larger than the physical memory

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

Flash-Conscious Cache Population for Enterprise Database Workloads

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Exploiting the benefits of native programming access to NVM devices

Red Hat Enterprise 7 Beta File Systems

The Oracle Database Appliance I/O and Performance Architecture

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Performance Modeling and Analysis of Flash based Storage Devices

Implementation and Performance Evaluation of RAPID-Cache under Linux

SWAP: EFFECTIVE FINE-GRAIN MANAGEMENT

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Maximizing Data Center and Enterprise Storage Efficiency

Design Considerations for Using Flash Memory for Caching

A Self Learning Algorithm for NAND Flash Controllers

BPCLC: An Efficient Write Buffer Management Scheme for Flash-Based Solid State Disks

Linux Kernel Extensions for Open-Channel SSDs

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

BROMS: Best Ratio of MLC to SLC

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory

Rechnerarchitektur (RA)

Clustered Page-Level Mapping for Flash Memory-Based Storage Devices

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University

Utilitarian Performance Isolation in Shared SSDs

<Insert Picture Here> Btrfs Filesystem

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

Warming up Storage-level Caches with Bonfire

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

[537] Flash. Tyler Harter

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

Integrating Flash Memory into the Storage Hierarchy

Performance Trade-Offs in Using NVRAM Write Buffer for Flash Memory-Based Storage Devices

Decentralized Distributed Storage System for Big Data

HP AutoRAID (Lecture 5, cs262a)

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)

Transcription:

Optimizing Flash-based Key-value Cache Systems Zhaoyan Shen, Feng Chen, Yichen Jia, Zili Shao Department of Computing, Hong Kong Polytechnic University Computer Science & Engineering, Louisiana State University The 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16)

Key-value Information Key-value access is dominant in web services Many apps simply store and retrieve key-value pairs Key-value cache is the first line of defense Benefits: Improve throughput, reduce latency, reduce server load In-memory KV cache is popular (Memcache) High speed but has cost, power, capacity problem 1

Flash based Key-value Cache Key-value cache Flash SSD + SHA-1(Key_1) Key (Slab, Slot) Slab Log-Structured Logical Flash Space Hash Index (Memory) Key-value Slabs (Flash LBA) In-memory hash map to track key-to-value mapping Slabs are used in a log-structured way Updated value item written to a new location and old values recycled later 2

Critical Issues Redundant mapping Double garbage collection Over-over-provisioning 3

Critical Issue 1: Redundant Mapping Critical Issues Redundant mapping at application- and FTL-level KVC: An in-memory hash table (Key Slab, Offset) FTL: An on-device page mapping table (LBA PBA) Problems Two mapping structures (unnecessarily) co-exist at two levels A significant waste of on-device DRAM space (e.g., 1GB for 1TB) The on-device DRAM buffer is costly, unreliable, and could be used for buffering. SHA-1(Key) K1 (1, 2) Hash table AAA,BBB BBB,CCC DDD,EEE Slab Space KVC software Mapping LBA 0 1 2 3 Mapping Table AAA, BBB BBB,CCC DDD,EEE PBA FTL Mapping in hardware 4

Critical Issue 2: Double Garbage Collection Garbage collection (GC) at app- and FTL- levels KVC: Recycle deleted or changed key-value items FTL: Recycle trimmed or changed pages Problems Semantic validity of a key-value entry is not seen at FTL Redundant data copy operation Critical Issues SHA-1(Key) K1 AAA,BBB AAA,BBB BBB,CCC 0 BBB,CCC 1 DDD,EEE 2 DDD,EEE (1, (2, 2) 1) 3 AAA,BBB 4 5 Hash table KVC-level GC Slab Space LBA Mapping Table FTL-level GC BBB,CCC DDD,EEE PBA 5

Critical Issue 3: Over-over-provisioning Critical Issues Over-provisioning at FTL-level FTL has a portion (20-30%) of flash as Over-Provisioning Space (OPS) OPS space is invisible and unusable to the host applications Problems SHA-1(Key) OPS is reserved for dealing the worst-case situation, as a storage Over-over provisioning for Key-value caches Key-value caches are dominated by read (GET) traffic, not writes Key-value cache hit ratio is highly sensitive to usable cache size If 20-30% space can be released, the cache hit ratio can be greatly improved K1 (1, 2) Hash table AAA,BBB BBB,CCC DDD,EEE Slab Space LBA 0 1 2 3 Mapping Table AAA,BBB BBB,CCC DDD,EEE FFF,GGG HHH,III JJJ, KKK Unusable PBA 6

Semantic Gap Problem Critical Issues Key-value cache Flash SSD Semantic Gap Fine-grained GC Key-to-value mapping Validity of slab entries Physical data layout on flash Direct flash memory control Proper mapping granularity In the current SW/HW architecture, we can do little to address these three issues. 7

Optimization Goals Redundant mapping Single mapping Double garbage collection App-driven GC Over-over-provisioning Dynamic OPS 8

Design Overview Key-value Cache K/V Mapping Key-value Cache Slab Mgm. Cache Mgm. Slab Manager Key/Slab Mapping KVC-driven GC Cache Manager OPS Management Page Cache Operating Systems I/O Scheduling Device Driver Bad Block Mgm. Library (libssd) Slab/Flash Translation Operation Conversion Garbage Collection Bad Block Mgm. Flash SSD Page Mapping Flash Operations Wear Leveling OPS Mgm. Page Cache Operating Systems I/O Scheduling Open-channel SSD Device Driver Flash Operations An enhanced flash-aware key-value cache A thin intermediate library layer (libssd) A specialized flash memory PCI-E SSD hardware 9

An Enhanced Flash-aware Key-value Cache Key-value Cache Slab management Unified direct mapping Garbage collection OPS management 10

Key-value Cache Slab Management Our choice Directly use a flash block as a slab (8MB) Slab buffer: An in-memory slab maintained for each class Parallelize and asynchronize the slab write I/Os to the flash memory Round-robin allocation of in-flash slab for load-balancing across channels Key/Value In-Memory Slab Buffer Class i Class i+1 Class i+2 Class n Flash SSD Channel #0 Channel #1 Channel #2 Channel #3 Channel #4 11

Unified Direct Mapping Key-value Cache Collapse multiple levels of indirect mapping to only one Prior mapping: Key Slab LBA PBA Current mapping: Key Slab (i.e., Flash Block) Benefits Removes the time overhead for lookup intermediate layers (Speed+) Only one single must-have in-memory hash table is needed (Cost-) On-device RAM space can be completely removed (or for other uses) SHA-1(Key) SHA-1(Key) K1 (1, 2) K1 (Slab, Offset) AAA,BBB BBB,CCC DDD,EEE Hash table Hash table Slab Space AAA,BBB BBB,CCC Slab #1 DDD,EEE0 CDC, CDC1 2 Slab #2 3 Mapping Flash Memory Table AAA,BBB BBB,CCC DDD,EEE Flash Memory 12

Key-value Cache App-driven Garbage Collection One single GC is driven directly by key-value cache system All slab writes are in units of blocks (no need for device-level GC) GC is directly triggered and controlled by application-level KVC Two GC policies Greedy GC: the least occupied slab is evicted to move minimum slots Quick clean: the LRU slab is immediately dropped recycling valid slots Adaptively used for different circumstances (busy or idle) AAA,BBB BBB,CCC DDD,EEE copy AAA,BBB BBB,CCC DDD,EEE 333,444 555,666 777,888 888,999 eee,fff ccc,ddd bbb,ccc aaa,bbb Victim Slab Target Slab LRU Slab MRU Slab Greedy GC Quick Clean 13

OPS Over-Provisioning Space Management Key-value Cache Dynamically tuning OPS space online Rationale KVC is typically read-intensive and OPS can be small Goal keep just enough OPS space (adaptive to intensity of writes) OPS management policies Heuristic method: An OPS window (W L and W H ) to estimate size Low watermark hit Trigger quick clean, raise the OPS window High watermark hit OPS is over-allocated, lower the OPS window Max OSP W H Time W L Quick clean initiated 14

Experiments Preliminary Experiments Implementation Key-value cache on Twitter s Fatcache to fit hardward Libssd Library (621 lines of code in C) Experimental Setup Intel Xeon E-1225, 32GB Memory, 1TB Disk, Open-Channel SSD Ubuntu 14.04 LTS, Linux 3.17.8, Ext4 filesystem Hardware: Open-channel SSD A PCI-E based with 12 channel, and 192 LUNs Direct control to the device (via ioctl interface) 15

Experiments SET Throughput and Latency Our Scheme Our Scheme Stock Fatcache Stock Fatcache SET Workloads: 40milliom requests of 1KB key/value pairs Both set throughput/latency from our scheme are the best 16

Conclusion Conclusion KV stores become critical as they are one of the most important building blocks in modern web infrastructures and high-performance data-intensive applications. We build a highly efficient flash-based cache system which enables three benefits, namely a unified single-level direct mapping, a cache-driven fine-grained garbage collection, and an adaptive over-provisioning scheme We are implementing a prototype on the Open-Channel SSD hardware and our preliminary results show that it is highly promising 17

Thank You!

Backup Slides 19

GET Throughput and Latency Our Scheme Our Scheme Stock Fatcache Stock Fatcache GET performance is largely determined by the raw speed GET latencies are among the lowest in the set of SSDs 20

SET Latencies A Zoom-in View Our Scheme Stock Fatchche + KingSpec The direct control successfully removes high cost GC effects Quick clean removes long I/Os under high write pressure 21

Block Erase Count Erase Count 3500 3000 Quick Clean kicks in 2500 2000 1500 1000 500 0 1 2 4 6 8 SSDSim Open-channel SSD Trace collected with running Fatcache on Samsung SSD Block trace is replayed on MSR s SSD simulator for erase count Our solution reduces erase count by 30% 22

Effect of the In-memory Slab Buffer 10x buffer size difference does not affect latency significantly A small (56MB) in-memory slab buffer is sufficient for use 23