Towards Weakly Consistent Local Storage Systems

Similar documents
Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

SFS: Random Write Considered Harmful in Solid State Drives

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Flash-Conscious Cache Population for Enterprise Database Workloads

SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs. Shai Bergman Tanya Brokhman Tzachi Cohen Mark Silberstein

Session 201-B: Accelerating Enterprise Applications with Flash Memory

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

A New Metric for Analyzing Storage System Performance Under Varied Workloads

Accelerating OLTP performance with NVMe SSDs Veronica Lagrange Changho Choi Vijay Balakrishnan

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

Box: Using HBase as a message queue. David MacKenzie Staff So2ware Engineer

Ideal choice for light workloads

When MPPDB Meets GPU:

Purity: building fast, highly-available enterprise flash storage from commodity components

GFS: The Google File System

PowerVault MD3 SSD Cache Overview

DBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki

Storage Technologies - 3

Take control of storage performance

Toward Seamless Integration of RAID and Flash SSD

[537] Flash. Tyler Harter

AN ALTERNATIVE TO ALL- FLASH ARRAYS: PREDICTIVE STORAGE CACHING

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Improving Ceph Performance while Reducing Costs

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Near Memory Key/Value Lookup Acceleration MemSys 2017

Main Memory and the CPU Cache

Presented by: Nafiseh Mahmoudi Spring 2017

MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION

Accelerate Applications Using EqualLogic Arrays with directcache

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

What is QES 2.1? Agenda. Supported Model. Live demo

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

Optimizing Flash-based Key-value Cache Systems

Using Transparent Compression to Improve SSD-based I/O Caches

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

IBM System Storage DS8870 Release R7.3 Performance Update

Facilitating Magnetic Recording Technology Scaling for Data Center Hard Disk Drives through Filesystem-level Transparent Local Erasure Coding

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

EMC VFCache. Performance. Intelligence. Protection. #VFCache. Copyright 2012 EMC Corporation. All rights reserved.

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

White Paper. File System Throughput Performance on RedHawk Linux

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT

A Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs?

Copyright 2012 EMC Corporation. All rights reserved.

CFS-v: I/O Demand-driven VM Scheduler in KVM

Warsaw. 11 th September 2018

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Dell EMC SC Series SC5020 9,000 Mailbox Exchange 2016 Resiliency Storage Solution using 7.2K Drives

Dongjun Shin Samsung Electronics

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

HyPer-sonic Combined Transaction AND Query Processing

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

The Cirrus Research Computing Cloud

Solid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Cost and Performance benefits of Dell Compellent Automated Tiered Storage for Oracle OLAP Workloads

CS370 Operating Systems

ASN Configuration Best Practices

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

Ben Walker Data Center Group Intel Corporation

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

Computer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

NetVault Backup Client and Server Sizing Guide 3.0

Getting it Right: Testing Storage Arrays The Way They ll be Used

Enterprise Ceph: Everyway, your way! Amit Dell Kyle Red Hat Red Hat Summit June 2016

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

International Journal of Computer & Organization Trends Volume5 Issue3 May to June 2015

Improving throughput for small disk requests with proximal I/O

18-447: Computer Architecture Lecture 18: Virtual Memory III. Yoongu Kim Carnegie Mellon University Spring 2013, 3/1

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

MongoDB on Kaminario K2

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

Kinetic Action: Micro and Macro Benchmark-based Performance Analysis of Kinetic Drives Against LevelDB-based Servers

ibench: Quantifying Interference in Datacenter Applications

Isilon Performance. Name

NetSlices: Scalable Mul/- Core Packet Processing in User- Space

Analysis of high capacity storage systems for e-vlbi

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018

Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache

Extreme Storage Performance with exflash DIMM and AMPS

ECE468 Computer Organization and Architecture. Memory Hierarchy

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Transcription:

Towards Weakly Consistent Local Storage Systems Ji-Yong Shin 1,2, Mahesh Balakrishnan 2, Tudor Marian 3, Jakub Szefer 2, Hakim Weatherspoon 1 1 Cornell University, 2 Yale University, 3 Google

2 Consistency/Performance Trade-off in Distributed Systems Slower reads to latest version of data Client Client Client Client Clients Faster reads to stale data using weak consistency Primary ReplicaOon Back-up

3 Server Comparison Year 2006 2016 Model (4U) Dell PowerEdge 6850 Dell PowerEdge R930 CPU [# of cores] 4 2 core Xeon [ 8 ] Memory 64GB 6TB Network Storage 2 1GigE 8 SCSI/SAS HDD 4 24 core Xeon [ 96 ] 2 1GigE 2 10GigE 24 SAS HDD/SSD 10 x PCIe SSD 12 X 96 X 11 X 4.2 X (175X) Modern Server Distributed System

4 Can we apply distributed system principles to local storage systems to improve performance? Consistency and performance trade-off

5 Why Consistency/Performance Trade-off? Distributed Systems Different versions of data exist in different servers due to network delays for replicaoon Older versions are faster to access when the client is closer to the server Modern Servers Different versions of data exist in different storage media due to logging, caching, copy-onwrite, deduplicaoon, etc. Older versions are faster to access when they are on faster storage media Reasons for different access speeds ü RAM, SSD, HDD, hybrid-drives, etc. ü Disk arm contenoon ü SSD under garbage collecoon ü Degraded mode in RAID

6 Fine-grained Log and Coarse-grained Cache MulOple logged objects fit in one cache block Read A Fast read (stale) A ( 3 ) B ( 4 ) Memory Cache D ( 5 ) Read B Slow read (latest) Read D SSD A ( 3 ) B ( 4 ) C ( 4 ) A ( 5 ) D ( 5 ) Log of KV pairs Key (Ver) Key (Ver)

Goal Speedup local storage systems using stale data (consistency/performance trade-off) How should storage systems access older versions? Which version should be to returned? What should be the interface? What are the target applicaoons? ACM Symposium on Cloud Computing Oct 6, 2016 7

8 Rest of the Talk StaleStore Yogurt: An Instance of StaleStore EvaluaOon Conclusion

9 StaleStore A local storage system that can trade-off consistency and performance Can be in any form KV-store, filesystem, block store, DB, etc. Maintains mulople versions of data Should have interface to access older versions Can esomate cost for accessing each version Aware of data locaoons and storage device condioons Aware of consistency semanocs Ordered writes and nooon of Omestamps and snapshots Distributed weak (client-centric) consistency semanocs

StaleStore: Consistency Model Distributed (client-centric) consistency semanocs Per-client, per-object guarantees for reads Bounded staleness Read-my-writes Monotonic-reads: A client reads an object that is the same or later version than the version that was last read by the same client ACM Symposium on Cloud Computing Oct 6, 2016 10

StaleStore: Target ApplicaOons Distributed applicaoons Aware of distributed consistency Can deal with data staleness Server applicaoons Can provide per client guarantees ACM Symposium on Cloud Computing Oct 6, 2016 11

12 Rest of the Talk StaleStore Yogurt: An Instance of StaleStore EvaluaOon Conclusion

13 Yogurt: A Block-Level StaleStore An log-structured disk array with cache [Shin et al., FAST 13] (Linux kernel module) Read Prefer to read from non-logging disks Prefer to read from the most idle disk Fast read (stale) Fast read (stale) Read Cache Read Slow read (latest) Log Disk 0 Disk 1 Disk 2

Yogurt: Basic APIs Write (Address, Data, Version #) Versioned (Ome-stamped) Write Version # consotutes snapshots Read (Address, Version #) Versioned (Ome-stamped) Read GetCost(Address, Version #) Cost esomaoon for each version ACM Symposium on Cloud Computing Oct 6, 2016 14

15 Yogurt Cost EsOmaOon GetCost(Address, Version) returns an integer Disk vs Memory Cache Cache always has lower cost (e.g. cache = -1, disk = posiove int) Disk vs disk Number of queued I/Os with weights Queued writes have higher weight than reads

16 Reading blocks from Yogurt Monotonic-reads example Client session Lowest Ver = 3 Read version [Blk 1: Ver 5] Read block 1 1. Checks current Omestamp: highest Ver = 2. Issues GetCost() for block 1 between versions 3 and 8 (N queries with uniform distance) 3. Reads the cheapest: e.g. 1 (5): Read(1, 5) 4. Records version for block 1 8 Cache Global Timestamp 3 1 5 38 45 67 3 (3) 1 (4) 2 (4) 1 (5) 3 (5) 1 (6) 2 (6) 3 (7) 2 (8)

Data construct on Yogurt High level data constructs span mulople blocks Blocks should be read from a consistent snapshot Later reads depend on prior reads: GetVersionRange() 3 3 3 Cornell University in NYC 3 Cornell University in NYC Cornell University in Ithaca Timestamp 2 1 Cornell Cornell University University 1 2 1 1 in NYC in Ithaca Ithaca College in Ithaca 0 0 Ithaca 0 0 College in Ithaca 0 1 2 Block Address ACM Symposium on Cloud Computing Oct 6, 2016 17

18 Rest of the Talk StaleStore Yogurt: An Instance of StaleStore EvaluaQon Conclusion

19 EvaluaOon Yogurt: 3 disk seqng with memory cache Focus on read latency while using monotonic-reads Clients simultaneously access servers Primary-backup seqng Baseline 1: reads latest data in the primary server 100ms delay Baseline 2: reads latest data in a local server Client Client Client Client Clients Primary Stream of Versioned Writes Back-up (Yogurt) UOlize stale data in a local server

EvaluaOon: Block Access Uniform random workload 8 clients access one block at a Ome X-axis: # of available older versions built up during warm up Average Read Latency (ms) 200 180 160 140 120 100 80 60 40 20 0 Primary Local latest Yogurt MR 1 2 3 4 5 6 7 8 Number of Stale Versions Available at Start Time ACM Symposium on Cloud Computing Oct 6, 2016 20

EvaluaOon: Key-Value Store YCSB Workload-A (Zipf with 50% read, 50% write) 16 clients access mulople blocks of key-value pairs KV Store greedily searches the cheapest using Yogurt APIs KV pairs can be paroally updated Average Read Latency (ms) 180 160 140 120 100 80 60 40 20 0 Primary Local latest Yogurt MR 4KB 8KB 12KB 16KB 20KB Key-Value Pair Size ACM Symposium on Cloud Computing Oct 6, 2016 21

22 Conclusion Modern servers are similar to distributed systems Local storage systems can trade-off consistency and performance We call them StaleStores Many systems have potenoals to use this feature Yogurt, a block level StaleStore EffecOvely trades-off consistency and performance Supports high level constructs that span mulople blocks

23 Thank you QuesOons?

24 Extra slides

25 Fine-grained log and coarse-grained cache MulOple logged objects fit in one cache block Read A Fast read (stale) Memory Cache A ( 3 ) B ( 1 ) C ( 1 ) D ( 2 ) Read B Slow read (latest) Read C SSD A ( 3 ) B ( 1 ) C ( 0 ) A ( 4 ) C ( 1 ) D ( 2 ) Log of KV pairs Key (Ver) Key (Ver)

Fine-grained log and coarse-grained cache 8 threads reading and wriong at 9:1 raoo KV-pairs per cache block from 2 to 16 Allowed staleness from 0 to 15 updates (bounded staleness) 1200 Average Read Latency (us) 1000 800 600 400 200 0 2 items 4 items 8 items 16 items 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Allowed Staleness (# of updates) ACM Symposium on Cloud Computing Oct 6, 2016 Max 60% 26

27 Deduplicated system with read cache Systems that cache deduplicated chunks Logical block to physical chunk map B 0 B 1 B 2 Read B 1 Memory Cache Slow Read B2 read (latest) Disk C 2 C 3 Fast read (stale) C 2

Deduplicated system with read cache 8 threads reading and wriong at 9:1 raoo DeduplicaOon raoo controlled from 30 to 90% Allowed staleness from 0 to 15 updates (bounded staleness) Average Read Latency (us) 25000 20000 15000 10000 5000 30% 50% 70% 90% Max 30% 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Allowed Staleness (# of updates) ACM Symposium on Cloud Computing Oct 6, 2016 28

29 Write cache that is slow for reads Griffin: disk cache over SSD for SSD lifeome Read Addr 1 Slow read (latest) Disk Cache 3 (5) 1 (2) 1 (3) Logged blocks Fast read (stale) Flushes latest SSD 0 (3) 1 (1) 2 (0) 3 (4) Linear block address space Addr (Ver)

Write cache that is slow for reads 8 threads reading and wriong at 9:1 raoo Data flushed from disk to SSD every 128MB to 1GB writes Allowed staleness from 0 to 7 updates (bounded staleness) Average Read Latency (us) 8000 7000 6000 5000 4000 3000 2000 1000 0 Max 95% 128MB 256MB 512MB 1024MB 0 1 2 3 4 5 6 7 Allowed Staleness (# of updates) ACM Symposium on Cloud Computing Oct 6, 2016 30