Request-Oriented Durable Write Caching for Application Performance appeared in USENIX ATC '15. Jinkyu Jeong Sungkyunkwan University

Similar documents
Request-Oriented Durable Write Caching for Application Performance

Enlightening the I/O Path: A Holistic Approach for Application Performance

Analyzing and Optimizing Linux Kernel for PostgreSQL. Sangwook Kim PGConf.Asia 2017

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

ijournaling: Fine-Grained Journaling for Improving the Latency of Fsync System Call

Closing the Performance Gap Between Volatile and Persistent K-V Stores

A Batch of Commit Batching Peter Geoghegan and Greg Smith 2ndQuadrant

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

SFS: Random Write Considered Harmful in Solid State Drives

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Rethink the Sync 황인중, 강윤지, 곽현호. Embedded Software Lab. Embedded Software Lab.

PASTE: A Network Programming Interface for Non-Volatile Main Memory

Albis: High-Performance File Format for Big Data Systems

Optimizing Fsync Performance with Dynamic Queue Depth Adaptation

Beyond Block I/O: Rethinking

Dongjun Shin Samsung Electronics

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Rethink the Sync. Abstract. 1 Introduction

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

SLM-DB: Single-Level Key-Value Store with Persistent Memory

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

Non-Blocking Writes to Files

Exploring System Challenges of Ultra-Low Latency Solid State Drives

Architecture of a Real-Time Operational DBMS

Performance comparisons and trade-offs for various MySQL replication schemes

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory

Benchmarking Persistent Memory in Computers

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Using Transparent Compression to Improve SSD-based I/O Caches

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory

Boosting Quasi-Asynchronous I/Os (QASIOs)

High Performance Transactions in Deuteronomy

January 28-29, 2014 San Jose

Ben Walker Data Center Group Intel Corporation

Efficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han

DJFS: Providing Highly Reliable and High-Performance File System with Small-Sized NVRAM

A Database System Performance Study with Micro Benchmarks on a Many-core System

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

Toward SLO Complying SSDs Through OPS Isolation

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout

FlashTier: A Lightweight, Consistent and Durable Storage Cache

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Instant Recovery for Main-Memory Databases

JOURNALING techniques have been widely used in modern

Improving throughput for small disk requests with proximal I/O

PCIe Storage Beyond SSDs

FOEDUS: OLTP Engine for a Thousand Cores and NVRAM

A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk

<Insert Picture Here> Btrfs Filesystem

Database Hardware Selection Guidelines

SICV Snapshot Isolation with Co-Located Versions

Accelerate Applications Using EqualLogic Arrays with directcache

RAIN: Reinvention of RAID for the World of NVMe

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

File System Management

Duy Le (Dan) - The College of William and Mary Hai Huang - IBM T. J. Watson Research Center Haining Wang - The College of William and Mary

An Efficient Memory-Mapped Key-Value Store for Flash Storage

PostgreSQL Entangled in Locks:

2. PICTURE: Cut and paste from paper

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto

I/O Stack Optimization for Smartphones

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University

FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs

Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Achieving Memory Level Performance: Secrets Beyond Shared Flash

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

PASTE: A Networking API for Non-Volatile Main Memory

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Oracle Performance on M5000 with F20 Flash Cache. Benchmark Report September 2011

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

How consumer product like Google's Pixelbook benefit from NVMe Storage? Zhiping Yang, Ph. D. Google Inc. Oct. 17, 2017

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

OSSD: A Case for Object-based Solid State Drives

Understanding Write Behaviors of Storage Backends in Ceph Object Store

Speeding Up Cloud/Server Applications Using Flash Memory

NV-Tree Reducing Consistency Cost for NVM-based Single Level Systems

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University

Toward Seamless Integration of RAID and Flash SSD

Memory Hierarchy. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Ceph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine

A Better Storage Solution

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

WHITEPAPER. Improve PostgreSQL Performance with Memblaze PBlaze SSD

Phase Change Memory An Architecture and Systems Perspective

Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching

Making Storage Smarter Jim Williams Martin K. Petersen

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

Comparing Performance of Solid State Devices and Mechanical Disks

Transcription:

Request-Oriented Durable Write Caching for Application Performance appeared in USENIX ATC '15 Jinkyu Jeong Sungkyunkwan University

Introduction Volatile DRAM cache is ineffective for write Writes are dominant I/Os [FAST 09, FAST 10, FAST 14] Non-volatile write cache (NVWC) provides Fast response for write w/o loss of durability NVWC candidates: Flash 3D Xpoint PCM MRAM GB/$ [Bhadkamkar et al., FAST 09] BORG: Block-reORGanization for self-optimizing storage systems [Koller et al., FAST 10] I/O deduplication: Utilizing content similarity to improve I/O performance [Harter et al., FAST 14] Analysis of HDFS under HBase: a Facebook messages case study NV-DRAM Performance 2/35

Non-volatile Write Cache Usage Simple caching policy Application P1 P2 P3 Write Write Write Operating System NVWC Blindly caching all writes Backing Storage Lazily writing back to storage 3/35

Non-volatile Write Cache Usage Simple caching policy P1 P2 P3 No consideration for Write Write Write application performance Operating System Application NVWC Blindly caching all writes Backing Storage Lazily writing back to storage 4/35

Impact on Application Performance Illustrative experiment P1 PostgreSQL RDBMS P2 P3 TPC-C workload Write Write Write Operating System NVWC Backing Storage 32MB NV-DRAM or 4GB Flash SSD 2 HDDs (Data/Log) 5/35

Impact on Application Performance Experimental result * System perf. Marginal gain 2.1X 1.7X - ~ 2.1X improved * Application perf. - ~ 50% degraded Performance drop by 50%! 6/35

What s the Problem? Criticality-agnostic contention 7/35

Criticality-Agnostic Contention Different write criticality Client Request Response Application Application performance P1 P2 P3 Operating System NVWC Backing Storage 8/35

Criticality-Agnostic Contention Different write criticality Client Request Response Application P1 P2 P3 Background process/thread Critical Non-critical Operating System NVWC Backing Storage 9/35

Criticality-Agnostic Contention Different write criticality Client Request Response Application P1 P2 P3 Critical Non-critical * Contentions - Capacity contention - Bandwidth contention Operating System NVWC Backing Storage 10/35

Criticality-Agnostic Contention Capacity contention Client Request Response Application P1 P2 P3 Critical Non-critical Operating System Frequent write stalls Backing Storage Bounded writeback throughput 11/35

Criticality-Agnostic Contention Bandwidth contention Client Request Response Application P1 P2 P3 Critical Non-critical C NCNCNCNCNCWBWBWBWB Head Excessive queueing delay Sufficient free blocks Backing Storage 12/35

Our Approach Request-oriented caching policy Client Application * Definitions CP P1 NCP P2 NCP P3 Critical Non-critical Operating System - Critical process (CP): a process handling request - Critical write: a write awaited by a critical proc. NVWC Backing Storage Caching critical writes only Sync I/O Critical I/O Async I/O 13/35

Challenge How to accurately detect critical writes Types of critical write Sync. writes from critical processes Dependency-induced critical writes Process dependency-induced I/O dependency-induced 14/35

Dependency Problem Process dependency NCP Process Dep. Lock B1 Wake B3 B4 B5 CP Wait for B1 15/35

Dependency Problem I/O dependency NCP CP I/O Dep. B1 B2 Sync Wait for B2 Complete B3 B4 B5 * Example scenarios: - CP fsync() to a block under writeback issued by NCP - CP tries to overwrite fs journal buffer under writeback 16/35

Critical Write Detection Critical process identification Application-guided identification 17/35

Critical Process Identification Application-guided identification Client 1 Client 2 Application CP CP NCP NCPNCP API Operating System NVWC Backing Storage 18/35

Critical Write Detection Critical process identification Application-guided identification Dependency resolution Criticality inheritance protocols Process criticality inheritance I/O criticality inheritance Blocking object tracking 19/35

Criticality Inheritance Protocols Process criticality inheritance NCP Lock B1 Inherit Wake B2 B3 B4 CP 20/35

Criticality Inheritance Protocols I/O criticality inheritance NCP B1 B2 Discard B3 B4 B5 CP Sync Reissue B2 Complete Key issue: caching the dependent write outstanding to disk w/o side effects 21/35

Criticality Inheritance Protocols Blocking object tracking Handling cascading dependencies NCP B1 Lock Inherit Wake B2 B3 B4 CP Reissue B1 Wake 22/35

Evaluation Implementation on Linux 3.13 w/ FlashCache 3.1 Application studies PostgreSQL database Client 1 Client 2 Backend1 Backend2 Check pointer Log writer Writer Redis key-value store Client 1,2,3, Master Snap shotter Log rewriter 23/35

Evaluation Experimental setup PostgreSQL / Redis FlashCache 4GB ramdisk / 256GB SSD 10K RPM HDD x2 1Gbps TPC-C / YCSB Server Machine Client Machine 24/35

Evaluation Experimental setup PostgreSQL / Redis * Caching policies - ALL (default) FlashCache No discretion 4GB ramdisk / 256GB SSD 10K RPM HDD x2 1Gbps TPC-C / YCSB Server Machine Client Machine 25/35

Evaluation Experimental setup PostgreSQL / Redis * Caching policies - ALL (default) - SYNC FlashCache Sync. writes 4GB ramdisk / 256GB SSD 10K RPM HDD x2 Async. writes 1Gbps TPC-C / YCSB Server Machine Client Machine 26/35

Evaluation Experimental setup PostgreSQL / Redis * Caching policies - ALL (default) - SYNC - CP FlashCache CP sync. writes 4GB ramdisk / 256GB SSD Rest of writes 10K RPM HDD x2 1Gbps TPC-C / YCSB Server Machine Client Machine 27/35

Evaluation Experimental setup PostgreSQL / Redis FlashCache * Caching policies - ALL (default) - SYNC - CP - CP+PI + Process criticality inheritance 4GB ramdisk / 256GB SSD 10K RPM HDD x2 Rest of writes 1Gbps TPC-C / YCSB Server Machine Client Machine 28/35

Evaluation Experimental setup PostgreSQL / Redis FlashCache * Caching policies - ALL (default) - SYNC - CP - CP+PI - CP+PI+IOI + I/O criticality inheritance 4GB ramdisk / 256GB SSD 10K RPM HDD x2 Rest of writes 1Gbps TPC-C / YCSB Server Machine Client Machine 29/35

Evaluation Experimental setup Trx log writes PostgreSQL / Redis FlashCache 4GB ramdisk / 256GB SSD 10K RPM HDD x2 Rest of writes 1Gbps * Caching policies - ALL (default) - SYNC - CP - CP+PI - CP+PI+IOI - WAL (PostgreSQL) TPC-C / YCSB Server Machine Client Machine 30/35

PostgreSQL Performance TPC-C workload w/ ramdisk 80% Same performance w/ 72% less cached writes Our scheme resolves capacity contention & runtime dependencies Scarce Sufficient 31/35

PostgreSQL Performance TPC-C workload w/ SSD 2.2X Our scheme resolves bandwidth contention & runtime dependencies Sufficient Sufficient 32/35

Redis Performance Update-heavy workload w/ 16GB SSD 47% better throughput Improved tail latency 13X better @ 99.9 th %ile (50ms vs. 649ms) Our scheme improves request throughput & request latency 33/35

Conclusion Key observation Each write has different performance-criticality Request-oriented caching policy Solely utilizes NVWC for application performance Improves performance while reducing cached writes Future work Criticality-aware I/O management without NVWC Application to user-interactive environments 34/35

Q&A Thank you 35/35