Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Similar documents
Optimizing Flash-based Key-value Cache Systems

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

SlimCache: Exploiting Data Compression Opportunities in Flash-based Key-value Caching

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

STORING DATA: DISK AND FILES

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

An Efficient Memory-Mapped Key-Value Store for Flash Storage

Copyright 2012 EMC Corporation. All rights reserved.

arxiv: v1 [cs.db] 25 Nov 2018

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Using Transparent Compression to Improve SSD-based I/O Caches

DIDACache: An Integration of Device and Application for Flash-based Key-value Caching

Presented by: Nafiseh Mahmoudi Spring 2017

SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Cache from Thrashing in Multicores

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

CFLRU:A A Replacement Algorithm for Flash Memory

Chunling Wang, Dandan Wang, Yunpeng Chai, Chuanwen Wang and Diansen Sun Renmin University of China

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

CFDC A Flash-aware Replacement Policy for Database Buffer Management

HashKV: Enabling Efficient Updates in KV Storage via Hashing

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Copyright 2012 EMC Corporation. All rights reserved.

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

Comparing Performance of Solid State Devices and Mechanical Disks

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

SFS: Random Write Considered Harmful in Solid State Drives

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

Storage Optimization with Oracle Database 11g

From server-side to host-side:

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

A New Key-Value Data Store For Heterogeneous Storage Architecture

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Request-Oriented Durable Write Caching for Application Performance appeared in USENIX ATC '15. Jinkyu Jeong Sungkyunkwan University

Deduplication Storage System

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Design Considerations for Using Flash Memory for Caching

SMORE: A Cold Data Object Store for SMR Drives

White Paper. File System Throughput Performance on RedHawk Linux

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Near Memory Key/Value Lookup Acceleration MemSys 2017

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Today s Papers. Array Reliability. RAID Basics (Two optional papers) EECS 262a Advanced Topics in Computer Systems Lecture 3

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

Purity: building fast, highly-available enterprise flash storage from commodity components

Seagate Enterprise SATA SSD with DuraWrite Technology Competitive Evaluation

Atlas: Baidu s Key-value Storage System for Cloud Data

GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores

LSbM-tree: Re-enabling Buffer Caching in Data Management for Mixed Reads and Writes

Speeding Up Cloud/Server Applications Using Flash Memory

Improving Ceph Performance while Reducing Costs

A Page-Based Storage Framework for Phase Change Memory

Caching and reliability

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage

Developing Extremely Low-Latency NVMe SSDs

Anti-Caching: A New Approach to Database Management System Architecture. Guide: Helly Patel ( ) Dr. Sunnie Chung Kush Patel ( )

FlashTier: A Lightweight, Consistent and Durable Storage Cache

High-Performance Key-Value Store on OpenSHMEM

The Oracle Database Appliance I/O and Performance Architecture

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

EMC XTREMCACHE ACCELERATES ORACLE

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Toward SLO Complying SSDs Through OPS Isolation

Tools for Social Networking Infrastructures

FlashKV: Accelerating KV Performance with Open-Channel SSDs

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

SLM-DB: Single-Level Key-Value Store with Persistent Memory

Flash Memory Based Storage System

Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

Distributed caching for cloud computing

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

The Lion of storage systems

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

Chapter 6 Objectives

CSE 124: Networked Services Lecture-17

IBM Power Systems solution for SugarCRM

BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Data-Centric Innovation Summit ALPER ILKBAHAR VICE PRESIDENT & GENERAL MANAGER MEMORY & STORAGE SOLUTIONS, DATA CENTER GROUP

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

pblk the OCSSD FTL Linux FAST Summit 18 Javier González Copyright 2018 CNEX Labs

Increasing Performance of Existing Oracle RAC up to 10X

Advanced Database Systems

Transcription:

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA

Key-value Systems in Internet Services Key-value systems are widely used today Online shopping Social media Cloud storage Big data Key Product_ID URL Value Product_Name Image!2

Key-value Caching First line of defense in today s Internet service High throughput Low latency Client requests Web Server Operations: SET GET DELETE Database Server Cache Server!3

Key-value Caching First line of defense in today s Internet service High throughput Low latency Client requests Web Server Operations: SET GET DELETE Database Server Hit Cache Server!3

Key-value Caching First line of defense in today s Internet service High throughput Low latency Client requests Web Server Operations: SET GET DELETE Miss Database Server Hit Cache Server!3

Key-value Caching First line of defense in today s Internet service High throughput Low latency Client requests Web Server Operations: SET GET DELETE Miss Database Server Hit Cache Server!3

Flash-based Key-value Caching In-flash key-value caches Key-values are stored in commercial flash SSDs Example: Facebook s McDipper, Twitter s Fatcache Key features Memcached compatible (SET, GET, DELETE) Advantages: low cost and high performance McDipper: reduce 90% deployed servers, 90% GETs < 1ms * Speed Power Cost Capacity Persistency DRAM High High High Low No Flash Low- Low+ Low+ High+ Yes+ *https://www.facebook.com/notes/facebook-engineering/mcdipper-a-key-value-cache-for-flash-storage/10151347090423920/!4

Flash-based Key-value Caching Data stored in flash and all the mappings in DRAM DRAM Memory Flash SSD Hash-based mapping Key-value slabs!5

Flash-based Key-value Caching Data stored in flash and all the mappings in DRAM DRAM Memory Flash SSD Slab Hash-based mapping Key-value slabs!5

Flash-based Key-value Caching Data stored in flash and all the mappings in DRAM DRAM Memory Flash SSD Slot Slab MD[20] Slab_ID Slot_ID Expiry Hash-based mapping Key-value slabs!5

Flash-based Key-value Caching Data stored in flash and all the mappings in DRAM DRAM Memory Flash SSD Slot Slab MD[20] Slab_ID Slot_ID Expiry Hash-based mapping Key-value slabs!5

Scalability Challenge High Index-to-data Ratio Key-value cache is dominated by small items (90% < 500 bytes) Key-value mapping entry size: 44 bytes in Fatcache Atikoglu et al., Workload Analysis of A Large-scale Key-value Store, in SIGMETRICS 12.!6

Scalability Challenge High Index-to-data Ratio Key-value cache is dominated by small items (90% < 500 bytes) Key-value mapping entry size: 44 bytes in Fatcache < 500 bytes Atikoglu et al., Workload Analysis of A Large-scale Key-value Store, in SIGMETRICS 12.!6

Scalability Challenge High Index-to-data Ratio Key-value cache is dominated by small items (90% < 500 bytes) Key-value mapping entry size: 44 bytes in Fatcache Flash memory vs. DRAM memory Capacity: Flash cache is 10-100x larger than memory-based cache Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000) Growth: flash (50-60% per year), DRAM (25-40% per year) < 500 bytes Atikoglu et al., Workload Analysis of A Large-scale Key-value Store, in SIGMETRICS 12.!6

Scalability Challenge High Index-to-data Ratio Key-value cache is dominated by small items (90% < 500 bytes) Key-value mapping entry size: 44 bytes in Fatcache Flash memory vs. DRAM memory Capacity: Flash cache is 10-100x larger than memory-based cache Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000) Growth: flash (50-60% per year), DRAM (25-40% per year) < 500 bytes 1 TB Atikoglu et al., Workload Analysis of A Large-scale Key-value Store, in SIGMETRICS 12. 150 GB DRAM Flash Assume average key-value size is 300 bytes!6

Scalability Challenge High Index-to-data Ratio Key-value cache is dominated by small items (90% < 500 bytes) Key-value mapping entry size: 44 bytes in Fatcache Flash memory vs. DRAM memory Capacity: Flash cache is 10-100x larger than memory-based cache Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000) Growth: flash (50-60% per year), DRAM (25-40% per year) < 500 bytes 2 TB Atikoglu et al., Workload Analysis of A Large-scale Key-value Store, in SIGMETRICS 12. 300 GB DRAM Flash Assume average key-value size is 300 bytes!6

Scalability Challenge High Index-to-data Ratio Key-value cache is dominated by small items (90% < 500 bytes) Key-value mapping entry size: 44 bytes in Fatcache Flash memory vs. DRAM memory Capacity: Flash cache is 10-100x larger than memory-based cache Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000) Growth: flash (50-60% per year), DRAM (25-40% per year) < 500 bytes 2 TB DRAM Flash A technical dilemma: We have a lot Assume of average flash key-value space size is 300 bytes to cache Atikoglu et al., Workload Analysis SIGMETRICS 12. of A Large-scale Key-value Store, in the data, but we don t have enough DRAM to index the data 300 GB!6

Evolution of Key-value Caching key Mapping Table (DRAM) Key-value Slabs (DRAM)!7

Evolution of Key-value Caching key key Mapping Table (DRAM) Mapping Table (DRAM) Key-value Slabs (DRAM) Key-value Slabs (Flash)!7

Evolution of Key-value Caching key key key Mappings (DRAM) Mapping Table (DRAM) Mapping Table (DRAM) Mappings (Flash) Key-value Slabs (DRAM) Key-value Slabs (Flash) Key-value Slabs (Flash)!7

Evolution of Key-value Caching key key key Mappings (DRAM) Mapping Table (DRAM) Mapping Table (DRAM) Mappings (Flash) Key-value Slabs (DRAM) Zero Flash I/O Key-value Slabs (Flash) Key-value Slabs (Flash)!7

Evolution of Key-value Caching key key key Mappings (DRAM) Mapping Table (DRAM) Mapping Table (DRAM) Mappings (Flash) Key-value Slabs (DRAM) Zero Flash I/O Key-value Slabs (Flash) One Flash I/O Key-value Slabs (Flash)!7

Evolution of Key-value Caching key key key Mappings (DRAM) Mapping Table (DRAM) Mapping Table (DRAM) Mappings (Flash) Key-value Slabs (DRAM) Key-value Slabs (Flash) Key-value Slabs (Flash) Zero Flash I/O One Flash I/O N Flash I/Os Leverage the strong locality to differentiate hot and cold mappings Hold the most popular mappings in a small in-dram mapping structure Leave the majority mappings in a large in-flash mapping structure!7

Outline Cascade mapping design Optimizations Evaluation results Conclusions!8

Cascade Mapping Hierarchical Mapping Structure Tier 1 Hot mappings Hash index based search in memory Tier 2 Warm mappings High-bandwidth quick scan in flash Tier 3 Cold mappings Efficient linked-list structure in flash!9

Cascade Mapping Hierarchical Mapping Structure Tier 1 Hot mappings Hash index based search in memory Tier 2 Warm mappings High-bandwidth quick scan in flash Tier 3 Cold mappings Efficient linked-list structure in flash Memory space Flash space Tier 1 Tier 2 Tier 3!9

Cascade Mapping Key Hierarchical Mapping Structure Tier 1 Hot mappings Hash index based search in memory Tier 2 Warm mappings High-bandwidth quick scan in flash Tier 3 Cold mappings Efficient linked-list structure in flash Memory space Flash space Tier 1 Tier 2 Tier 3!9

Cascade Mapping Key Hierarchical Mapping Structure Tier 1 Hot mappings Hash index based search in memory Tier 2 Warm mappings High-bandwidth quick scan in flash Tier 3 Cold mappings Efficient linked-list structure in flash Memory space Flash space Tier 1 Tier 2 Tier 3!9

Cascade Mapping Key Hierarchical Mapping Structure Tier 1 Hot mappings Hash index based search in memory Tier 2 Warm mappings High-bandwidth quick scan in flash Tier 3 Cold mappings Efficient linked-list structure in flash Memory space Flash space Tier 1 Tier 2 Tier 3!9

Cascade Mapping Key Hierarchical Mapping Structure Tier 1 Hot mappings Hash index based search in memory Tier 2 Warm mappings High-bandwidth quick scan in flash Tier 3 Cold mappings Efficient linked-list structure in flash Memory space Flash space Tier 1 Tier 2 Tier 3!9

Cascade Mapping Key Hierarchical Mapping Structure Tier 1 Hot mappings Hash index based search in memory Tier 2 Warm mappings High-bandwidth quick scan in flash Tier 3 Cold mappings Efficient linked-list structure in flash Memory space Flash space Tier 1 Tier 2 Tier 3 Key-value slabs!9

Tier 1: A Mapping Table in Memory Bucket 0 Partition 1 Key Hash Bucket 1 Bucket n Partition n Virtual buffer Demote to Tier 2!10

Tier 1: A Mapping Table in Memory Bucket 0 Partition 1 Key Hash Bucket 1 Bucket n Partition n Virtual buffer Demote to Tier 2!10

Tier 1: A Mapping Table in Memory Bucket 0 Partition 1 Key Hash Bucket 1 Bucket n Partition n Virtual buffer Demote to Tier 2!10

Tier 1: A Mapping Table in Memory Bucket 0 Partition 1 Key Hash Bucket 1 Bucket n Partition n Virtual buffer Demote to Tier 2!10

Tier 1: A Mapping Table in Memory Bucket 0 Partition 1 Key Hash Bucket 1 Bucket n Partition n Virtual buffer Demote to Tier 2!10

Tier 1: A Mapping Table in Memory Bucket 0 Partition 1 Key Hash Bucket 1 Bucket n Partition n Virtual buffer Demote to Tier 2!10

Tier 1: A Mapping Table in Memory Bucket 0 Partition 1 Key Hash Bucket 1 Bucket n Partition n Virtual buffer Demote to Tier 2!10

Tier 1: A Mapping Table in Memory Bucket 0 Partition 1 Key Hash Bucket 1 Bucket n Partition n Hit Ratio (%) 100 80 60 40 20 CLOCK LRU FIFO Virtual buffer Demote to Tier 2 0 4 6 8 10 12 14 16 18 20 Ratio of Tier 1 (%)!10

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1!11

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1!11

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1!11

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1 T T T FOUND Serial Search: 3x T!11

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1 T T T FOUND Serial Search: 3x T Chen et al., Internal Parallelism of Flash-based Solid State Drives, ACM Transactions on Storage, 12:3, May 2016!11

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1 T T T FOUND Serial Search: 3x T Chen et al., Internal Parallelism of Flash-based Solid State Drives, ACM Transactions on Storage, 12:3, May 2016!11

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1 An FIFO array of blocks The most recent version is always in the latest position Parallelized Batch Search Parallel I/Os to load multiple mapping blocks into memory Scan and find the most recent version of the data in one I/O time T T T FOUND Serial Search: 3x T Chen et al., Internal Parallelism of Flash-based Solid State Drives, ACM Transactions on Storage, 12:3, May 2016!11

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1 An FIFO array of blocks The most recent version is always in the latest position Parallelized Batch Search Parallel I/Os to load multiple mapping blocks into memory Scan and find the most recent version of the data in one I/O time T T T FOUND Serial Search: 3x T Chen et al., Internal Parallelism of Flash-based Solid State Drives, ACM Transactions on Storage, 12:3, May 2016 FIFO Block 2 Block 3 Block 4 Block 1!11

Tier 2: Direct Indexing in Flash Direct mapping block A set of mapping entries demoted from Tier 1 An FIFO array of blocks The most recent version is always in the latest position Parallelized Batch Search Parallel I/Os to load multiple mapping blocks into memory Scan and find the most recent version of the data in one I/O time T T T FOUND Serial Search: 3x T Chen et al., Internal Parallelism of Flash-based Solid State Drives, ACM Transactions on Storage, 12:3, May 2016 FIFO Block 2 Block 3 Block 4 Block 1 T FOUND Parallel Search: 1x T!11

Tier 3: Hash Table List Designs Bucket 0 Bucket 1 Bucket 1023!12

Tier 3: Hash Table List Designs Bucket 0 Bucket 1 Bucket 1023 Memory buffers!12

Tier 3: Hash Table List Designs Bucket 0 Bucket 1 Bucket 1023 Memory buffers Narrow hash table Long list to walk through Need less memory buffers (e.g., 128MB)!12

Tier 3: Hash Table List Designs Bucket 0 Bucket 1 Bucket 1023 Memory buffers Bucket 0 Bucket 1 Narrow hash table Long list to walk through Need less memory buffers (e.g., 128MB) Bucket 1048575!12

Tier 3: Hash Table List Designs Bucket 0 Bucket 1 Bucket 1023 Memory buffers Bucket 0 Bucket 1 Bucket 1048575 Narrow hash table Long list to walk through Need less memory buffers (e.g., 128MB) Memory efficiency v.s. I/O efficiency!12

Tier 3: Hash Table List Designs Bucket 0 Bucket 1 Bucket 1023 Memory buffers Bucket 0 Bucket 1 Bucket 1048575 Narrow hash table Long list to walk through Need less memory buffers (e.g., 128MB) Wide hash table Short list to walk through Need more memory buffers (e.g., 128GB) Memory efficiency v.s. I/O efficiency!12

Tier 3: Dual-mode Hash Table Writes Dedicated buffers Bucket 0 Bucket 1 Bucket 1023 Active table Bucket 0 Bucket 1 Bucket 1023 Bucket 1048575 Inactive table Memory & I/O efficiency both achieved Only one set of dynamic buffers Write to active list first Reorganize into inactive list Combines the advantages!13

Tier 3: Dual-mode Hash Table Length limit Writes Dedicated buffers Bucket 0 Bucket 1 Bucket 1023 Active table Bucket 0 Bucket 1 Bucket 1023 Bucket 1048575 Inactive table Memory & I/O efficiency both achieved Only one set of dynamic buffers Write to active list first Reorganize into inactive list Combines the advantages!13

Tier 3: Dual-mode Hash Table Length limit Writes Dedicated buffers Bucket 0 Bucket 1 Bucket 1023 Active table Bucket 0 Bucket 1 Bucket 1023 Bucket 1048575 Inactive table Memory & I/O efficiency both achieved Only one set of dynamic buffers Write to active list first Reorganize into inactive list Combines the advantages!13

Tier 3: Dual-mode Hash Table Length limit Writes Dynamic buffers Dedicated buffers Bucket 0 Bucket 1 Bucket 1023 Active table Bucket 0 Bucket 1 Bucket 1023 Memory & I/O efficiency both achieved Only one set of dynamic buffers Write to active list first Reorganize into inactive list Combines the advantages Bucket 1048575 Inactive table!13

Tier 3: Dual-mode Hash Table Length limit Writes Compaction Dynamic buffers Dedicated buffers Bucket 0 Bucket 1 Bucket 1023 Active table Bucket 0 Bucket 1 Bucket 1023 Memory & I/O efficiency both achieved Only one set of dynamic buffers Write to active list first Reorganize into inactive list Combines the advantages Bucket 1048575 Inactive table!13

Tier 3: Dual-mode Hash Table Length limit Writes Compaction Dynamic buffers Dedicated buffers Bucket 0 Bucket 1 Bucket 1023 Active table Bucket 0 Bucket 1 Bucket 1023 Memory & I/O efficiency both achieved Only one set of dynamic buffers Write to active list first Reorganize into inactive list Combines the advantages Bucket 1048575 Inactive table!13

Outline Cascade mapping design Optimizations Evaluation results Conclusions!14

Optimization Techniques Partition the hash space to create multiple demotion I/O streams Adopt a memory-efficient CLOCK-based demotion policy Organize an array of direct mapping blocks in the FIFO order Parallel batch search to quickly complete a one-to-one scan Use a dual-mode hash table for both memory and I/O efficiency A jump list by using Bloom filters to skip impossible blocks Make the FIFO-based eviction policy locality aware Use slab sequence counter to realize zero-i/o demapping Leverage the FIFO nature of slabs for efficient crash recovery!15

Optimization Techniques Partition the hash space to create multiple demotion I/O streams Adopt a memory-efficient CLOCK-based demotion policy Organize an array of direct mapping blocks in the FIFO order Parallel batch search to quickly complete a one-to-one scan Use a dual-mode hash table for both memory and I/O efficiency A jump list by using Bloom filters to skip impossible blocks Make the FIFO-based eviction policy locality aware Use slab sequence counter to realize zero-i/o demapping Leverage the FIFO nature of slabs for efficient crash recovery!16

Optimization: Jump List Hash bucket One single long list!17

Optimization: Jump List A B C Bloom filter: to test whether an element is in a set A query returns either possibly in set or definitely not in set 1 0 0 1 1 1 1 1 False positive is possible, but false negative is impossible Elements can be added to the set, but not removed Hash bucket One single long list!17

Optimization: Jump List A B C Bloom filter: to test whether an element is in a set A query returns either possibly in set or definitely not in set 1 0 0 1 1 1 1 1 False positive is possible, but false negative is impossible Elements can be added to the set, but not removed One single long list!17

Optimization: Jump List A B C Bloom filter: to test whether an element is in a set A query returns either possibly in set or definitely not in set 1 0 0 1 1 1 1 1 False positive is possible, but false negative is impossible Elements can be added to the set, but not removed Hash bucket One single long list Several short lists connected by hops Bloom filters are used to avoid unnecessary tier-3 I/Os Bloom filters are stored in flash together with regular mapping blocks Indicate whether a mapping can be found within next several blocks If returns negative, jump to the next Bloom filter block!17

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Victim slab!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Traditional: Free up space immediately Erase entire victim slab based on FIFO order Reclaim space quickly, but may delete hot data Victim slab!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Traditional: Free up space immediately Erase entire victim slab based on FIFO order Reclaim space quickly, but may delete hot data Victim slab!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Traditional: Free up space immediately Erase entire victim slab based on FIFO order Reclaim space quickly, but may delete hot data Victim slab!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Traditional: Free up space immediately Erase entire victim slab based on FIFO order Reclaim space quickly, but may delete hot data Our solution: Keep hot data in cache If a k-v item s mapping is in tier 1, indicating it is hot data Rewrite hot data to a new slab, then erase victim slab Victim slab!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Traditional: Free up space immediately Erase entire victim slab based on FIFO order Reclaim space quickly, but may delete hot data Our solution: Keep hot data in cache If a k-v item s mapping is in tier 1, indicating it is hot data Rewrite hot data to a new slab, then erase victim slab Victim slab!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Traditional: Free up space immediately Erase entire victim slab based on FIFO order Reclaim space quickly, but may delete hot data Our solution: Keep hot data in cache If a k-v item s mapping is in tier 1, indicating it is hot data Rewrite hot data to a new slab, then erase victim slab Victim slab!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Traditional: Free up space immediately Erase entire victim slab based on FIFO order Reclaim space quickly, but may delete hot data Our solution: Keep hot data in cache If a k-v item s mapping is in tier 1, indicating it is hot data Rewrite hot data to a new slab, then erase victim slab Victim slab!18

Optimization: Garbage Collection GC is a must-have for key-value systems To reclaim flash space To organize large sequential writes Traditional: Free up space immediately Erase entire victim slab based on FIFO order Reclaim space quickly, but may delete hot data Our solution: Keep hot data in cache If a k-v item s mapping is in tier 1, indicating it is hot data Rewrite hot data to a new slab, then erase victim slab Adaptive two-phase GC If free flash space is too low, perform fast space reclaim Keep hot data when system under moderate pressure Victim slab!18

Outline Cascade mapping design Optimizations Evaluation results Conclusions!19

Experimental Setup Implementation SlickCache: 3,800 lines of C code added to Twitter s Fatcache Hardware environment Lenovo ThinkServers: 4-core Intel Xeon 3.4 GHz with 16 GB DRAM 240-GB Intel 730 SSD as cache device 280-GB Intel Optane 900P SSD as swapping device 7,200 RPM Seagate 2-TB HDD as database device Software environment Ubuntu 16.04 with Linux kernel 4.12 and Ext4 file system MongoDB 3.4 for backend database Workloads Yahoo! Cloud Serving Benchmark (YCSB) Popular distributions: Hotspot, Zipfian, and Normal!20

Evaluation Results 7x 2x Comparison with Fatcache and system swapping Fatcache-Swap-Flash and Fatcache-Swap-Optane are both configured with 10% of physical memory, allowed to swap on flash SSD and Optane SSD respectively.!21

Evaluation Results 85% Cache effectiveness (Fixed cache size) SlickCache only uses 10% of the memory used by Fatcache, achieves comparable performance. SlickCache-GC increases throughput by up to 85% due to the optimized GC policy.!22

Evaluation Results 125x Cache effectiveness (Fixed memory size) SlickCache is able to index a 10 times larger flash cache with the same amount of memory, which in turn increases the hit ratio by up to 8.2 times and the throughput by up to 125 times.!23

Conclusions Cascade Mapping for flash-based key-value caching A hierarchical mapping structure for flash-based key-value cache A set of optimizations to improve performance Use less memory while performs better than current design!24

Thanks! And Questions?!25