SILT: A Memory-Efficient, High- Performance Key-Value Store

Similar documents
SILT: A MEMORY-EFFICIENT, HIGH-PERFORMANCE KEY- VALUE STORE PRESENTED BY PRIYA SRIDHAR

Part II: Software Infrastructure in Data Centers: Key-Value Data Management Systems

Part II: Software Infrastructure in Data Centers: Key-Value Data Management Systems

Cheap and Large CAMs for High Performance Data-Intensive Networked Systems- The Bufferhash KV Store

Scalable and Secure Internet Services and Architecture PRESENTATION REPORT Semester: Winter 2015 Course: ECE 7650

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Hash-Based Indexing 1

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

BloomStore: Bloom-Filter based Memory-efficient Key-Value Store for Indexing of Data Deduplication on Flash

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data

Hash-Based Indexes. Chapter 11

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup:

Data and File Structures Chapter 11. Hashing

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

BUFFER HASH KV TABLE

Hashing Techniques. Material based on slides by George Bebis

AAL 217: DATA STRUCTURES

Data Structures and Methods. Johan Bollen Old Dominion University Department of Computer Science

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Freewrite: Creating (Almost) Zero-Cost Writes to SSD

Understand how to deal with collisions

Chapter 11: Indexing and Hashing

Speeding Up Cloud/Server Applications Using Flash Memory

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

CS127: B-Trees. B-Trees

Hashing IV and Course Overview

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Chapter 12: Indexing and Hashing. Basic Concepts

Hashed-Based Indexing

Chapter 12: Indexing and Hashing

The Hash Array- Mapped Trie (HAMT) Kris Micinski

Hashing file organization

Hash-Based Indexes. Chapter 11 Ramakrishnan & Gehrke (Sections ) CPSC 404, Laks V.S. Lakshmanan 1

MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing

IMPORTANT: Circle the last two letters of your class account:

Improving STM Performance with Transactional Structs 1

Advanced Database Systems

From architecture to algorithms: Lessons from the FAWN project

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

MongoDB Storage Engine with RocksDB LSM Tree. Denis Protivenskii, Software Engineer, Percona

External Sorting Implementing Relational Operators

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

File-System Structure. Allocation Methods. Free-Space Management. Directory Implementation. Efficiency and Performance. Recovery

Visit ::: Original Website For Placement Papers. ::: Data Structure

Topic HashTable and Table ADT

File Structures and Indexing

Queues. Queue ADT. Queue Specification. An ordered sequence A queue supports operations for inserting (enqueuing) and removing (dequeuing) elements.

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Chapter 11: Indexing and Hashing

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Chapter 11: Indexing and Hashing

Intro to DB CHAPTER 12 INDEXING & HASHING

Indexing and Searching

Introduction p. 1 Pseudocode p. 2 Algorithm Header p. 2 Purpose, Conditions, and Return p. 3 Statement Numbers p. 4 Variables p. 4 Algorithm Analysis

Data Structure and Algorithm Homework #3 Due: 1:20pm, Thursday, May 16, 2017 TA === Homework submission instructions ===

key h(key) Hash Indexing Friday, April 09, 2004 Disadvantages of Sequential File Organization Must use an index and/or binary search to locate data

Chapter 12: Indexing and Hashing

Quiz 1 Practice Problems

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Cuckoo Hashing for Undergraduates

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores

FAWN: A Fast Array of Wimpy Nodes

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE

Open Addressing: Linear Probing (cont.)

16 Sharing Main Memory Segmentation and Paging

Midterm Review. March 27, 2017

Lecture 11: Perfect Hashing

CSIT5300: Advanced Database Systems

Database Applications (15-415)

Indexing and Searching

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

October 23-26, 2011 Cascais, Portugal. Symposium on Operating Systems Principles

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Physical Level of Databases: B+-Trees

Chapter 20 Hash Tables

Chapter 12: Query Processing

Chapter 12: File System Implementation

Indexing: Overview & Hashing. CS 377: Database Systems

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 12: Query Processing. Chapter 12: Query Processing

Hashing. Data organization in main memory or disk

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Atlas: Baidu s Key-value Storage System for Cloud Data

Chapter 3 - Memory Management

Question Bank Subject: Advanced Data Structures Class: SE Computer

Lecture 17: Memory Hierarchy: Cache Design

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory

DS ata Structures Aptitude

Quiz 1 Practice Problems

Indexing and Searching

Transcription:

SILT: A Memory-Efficient, High- Performance Key-Value Store SOSP 11 Presented by Fan Ni March, 2016

SILT is Small Index Large Tables which is a memory efficient high performance key value store system based on flash storage. The design and implementation of three basic key-value stores: LogStore (writable): writes PUT and DELETE sequentially to flash to achieve high write throughput Cuckoo hash table in memory HashStore (readonly): Once a LogStore fills, SILT freezes the LogStore and converts it into a more memory-efficient data structure Hash filter in memory SortedStore (readonly): stores (key, value) entries sorted by key on flash, indexed by a new entropy-coded trie data structure Entropy-coded representation Metadata is minimized

In-memory metadata overhead decreases LogStore HashStore SortedStore Store size increases

Tradeoff between in-memory metadata overhead and performance

Q&A Q1: Figure 1: The memory overhead and lookup performance of SILT and the recent key-value stores. For both axes, smaller is better. Explain the positions of FAWN-DS, SkimpyStash, BufferHash, and SILT on the graph.

Metadata in memory Flash access (times/lookup) Fawn-DS 12B/key 1x SkimpyStash <1B/key 5x BufferHash 4B/key 1x SILT 0.7B/key 1.01x Figure 1 Tradeoff between metadata overhead and flash read (performance) Fawn-DS and BufferHash: consume more memory for better performance SkimpyStash: minimize metadata overhead with more flash read SILT tries to minimize metadata overhead and RA (Read Amplification)

Q2: Two design goals of SILT are low read amplification and low write amplification. Use any KV store we have studied as an example to show how these amplifications are produced.

Also, in leveldb If read more than one level to service GET request, RA introduced The compaction incurs WA In SkimpyStash Storing the linked lists on flash with a pointer in each hash bucket in DRAM pointing to the head of the list When read, go through the list on flash May cover more than one pages Before the target KV pair is found, more than one flash read may be performed, incurring RA GC will incur WA when moving items to new place

Q3: Describe SILT s structure using Figure 2 (Architecture of SILT). Compared with LevelDB, SILT has only three levels. What s concern with a multi-level KV store when it has too few levels?

Three basic KV stores with different tradeoffs between metadata overhead and performance LogStore: Serves PUT and DELETE by writing sequentially to flash. Using cuckoo hash maps keys to their location in the flash log HashStore: Once a LogStore fills up, SILT freezes the LogStore and converts it into a more memory-efficient data structure. SortedStore: Serves read only, very low memory footprint. Stores KV entries sorted by key on flash, indexed by a new entropy-coded trie data structure Low RA(exactly 1) by directly pointing to the correct location on flash. A configurable number of HashStores merge to a SortedStore

Too few levels in a multi-level KV store, what will happen? Size of each level may be large, impossible to reside in memory even for the very first level; Need more memory for good performance the cost for compaction can be higher if file size increases Or the size increase sharply with level, for example, level n is 100 times larger than level n-1, then More reads are serviced at lower level, incurring RA the cost for compaction can be very high (higher WAF) 101 files are read into memory

Q4: Use Figure 3 (Design of LogStore: an in-memory cuckoo hash table (index and filter) to describe how a PUT request and a GET request is served in a LogStore. In particular, explain how the tag is used in a LogStore. Cuckoo hashing: Use more than one hash functions Each key can be store in several locations Try Cuckoo hashing here

Key h1(key) h2(key) K1 1 5 K2 1 3 Key Tag(h1(key)) Tag(h2(key)) K1 5 1 K2 3 1 GET(K1) tag in bucket 1 (h1(k1)) is 5 or tag stored in bucket 5 (h2(k1)) is 1? If matches, use offset in the bucket to retrieve the (Key, Value) pair from flash if Key=K1, return Value Otherwise, not found in LogStore PUT(K1, V1) Write (K1, V1) at the end of the log Look into bucket 1(5), if empty, write 5(1) as tag and offset pointing to the flash location of (K1, V1) pair; if occupied, must check it is an old item for K1 or other K2; if it is K1, update offset to pointing to the flash location of (K1, V1) pair; If it is K2, evict K2 out to bucket h2(k2) (recorded as tag),where h1(k2) as tag and the offset are recorded; then insert h2(k1) as tag and offset pointing to the flash location of (K1, V1) pair; if bucket h2(k2) is occupied by another key, the key should be evicted before K2 is inserted, the process continues until no collision or maximum number of displacements reaches.

Q5: Use Figure 4 to explain how a LogStore is converted into a HashStore?

In LogStore, KV items are sorted natively in insert order because of log Once A LogStore fills up, begin to convert it into HashStore During conversion, the old (immutable) LogStore serve lookups and a new LogStore receives inserts. Store the KV items in hash order based on the hash map data structure (K1, K2, K3, K4) (K2, K4, K1, K3) KV items are of equal size Offset: sizeof(kv item)*bucket_index Hash table in memory is removed Use a hash filter to check whether a key exists or not Like boom filter Also may return False Positive The old LogStore is deleted.

Q6: Once a LogStore fills up (e.g., the insertion algorithm terminates without finding any vacant slot after a maximum number of is placements in the hash table), SILT freezes the LogStore and converts it into a more memory-efficient data structure. Compared to LogStore, what s the advantage of HashStore? Why doesn t SILT create HashStore at the beginning (without first creating LogStore)?

With HashStore, in-memory hash table can be removed as KV items are sorted in hash order and the location of entries can be calculated with hash functions. However, if HashStore is directly used to serve insert, a lot of extra flash write will be introduced because of hash collision and the performance will be degraded. It is not a big problem in LogStore, because the collision is resolved in memory.

Q7: When fixed-length key-value entries are sorted by key on flash, a trie for the shortest unique prefixes of the keys serves as an index for these sorted data. While a SortedStore is fully sorted, could you comment on the cost of merging a HashStore with a SortedStore? Compare this cost to the major compaction cost for LevelDB? It will be very expensive to merge a HashStore and a Sorted Store because the SortedStore can be very large and the items in HashStore may scatter at any place and the SortedStore needs to be rewritten. In leveldb, tables at level L and L+1 are sorted and only limited tables are involved in the compaction (1:10), so the cost can be controlled.

Q8: Figure 5 shows an example of using a trie to index sorted data. Please use Figures 5 and 6 to explain how the index of a SortedStore is produced.

Key prefixes with no shading-shortest unique prefixes which are used for indexing Shaded part-ignored for indexing since suffix part would not change key location For instance, to lookup a key 10010, follows down to the leaf node that represents 100. As there are three preceding leaf node, index of key is 3 Representation for tree T having left and right sub trees L and T is given by: Repr(T) = L Repr(L)Repr(R) Where L : no of leaf nodes in left sub tree

For example, the key to lookup is 10010, and the trie=3213111; Key[0] =1, goes to right subtrie by skipping the left subtrie. But how? Read trie[0]=3, meaning there are 3 leaf nodes on the left subtrie go to trie[1]=2, meaning there are 2 leaf nodes on the left subtrie, so there is another inner layer go to trie[2]=1, meaning a leaf node is attached to it So the left subtrie has finished traversing Go to trie[3], we arrive the right subtrie Key[1]=0, go to left subtrie(3111), meaning there are 3 leaf nodes on the left subtrie Key[2]=0, go to the left subtrie, as there is only one left leaf node (subtrie[1]=1), if it exists, it must be there. Because it is the first item on the right subtrie, so the index=leaf nodes on the left subtrie, which is 3