Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Similar documents
Using Transparent Compression to Improve SSD-based I/O Caches

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

An Efficient Memory-Mapped Key-Value Store for Flash Storage

Clotho: Transparent Data Versioning at the Block I/O Level

The Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp

CA485 Ray Walshe Google File System

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Speeding Up Cloud/Server Applications Using Flash Memory

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

D5.5 Overall system performance, scaling, and outlook on large multicores

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Distributed caching for cloud computing

Isilon Performance. Name

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

Presented by: Nafiseh Mahmoudi Spring 2017

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Understanding Manycore Scalability of File Systems. Changwoo Min, Sanidhya Kashyap, Steffen Maass Woonhak Kang, and Taesoo Kim

Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

WHITEPAPER. Improve PostgreSQL Performance with Memblaze PBlaze SSD

stec Host Cache Solution

Design Considerations for Using Flash Memory for Caching

Solid State Drive (SSD) Cache:

Seagate Enterprise SATA SSD with DuraWrite Technology Competitive Evaluation

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

From server-side to host-side:

Linux Device Drivers: Case Study of a Storage Controller. Manolis Marazakis FORTH-ICS (CARV)

White Paper. File System Throughput Performance on RedHawk Linux

PARDA: Proportional Allocation of Resources for Distributed Storage Access

MongoDB on Kaminario K2

A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Improving throughput for small disk requests with proximal I/O

Toward SLO Complying SSDs Through OPS Isolation

Extremely Fast Distributed Storage for Cloud Service Providers

IBM System Storage DS8870 Release R7.3 Performance Update

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

High Performance SSD & Benefit for Server Application

The Failure of SSDs. Adam Leventhal Senior Staff Engineer Sun Microsystems / Fishworks

Accelerate Applications Using EqualLogic Arrays with directcache

Validating the NetApp Virtual Storage Tier in the Oracle Database Environment to Achieve Next-Generation Converged Infrastructures

Request-Oriented Durable Write Caching for Application Performance appeared in USENIX ATC '15. Jinkyu Jeong Sungkyunkwan University

EMC XTREMCACHE ACCELERATES ORACLE

Why Does Solid State Disk Lower CPI?

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

Optimizing Flash-based Key-value Cache Systems

Flash-Conscious Cache Population for Enterprise Database Workloads

January 28-29, 2014 San Jose

Efficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han

Performance Modeling and Analysis of Flash based Storage Devices

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

I-CASH: Intelligently Coupled Array of SSD and HDD

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

The RAMDISK Storage Accelerator

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Duy Le (Dan) - The College of William and Mary Hai Huang - IBM T. J. Watson Research Center Haining Wang - The College of William and Mary

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.2 3, 5.5

Session 201-B: Accelerating Enterprise Applications with Flash Memory

PRESERVE DATABASE PERFORMANCE WHEN RUNNING MIXED WORKLOADS

Making Storage Smarter Jim Williams Martin K. Petersen

Lenovo Database Configuration

IBM and HP 6-Gbps SAS RAID Controller Performance

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

LEVERAGING EMC FAST CACHE WITH SYBASE OLTP APPLICATIONS

Comparing Performance of Solid State Devices and Mechanical Disks

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

Virtual Memory: From Address Translation to Demand Paging

Demand Code Paging for NAND Flash in MMU-less Embedded Systems. Jose Baiocchi and Bruce Childers

Solid State Performance Comparisons: SSD Cache Performance

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Block Storage Service: Status and Performance

CSCS HPC storage. Hussein N. Harake

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.

Flash In the Data Center

PowerVault MD3 SSD Cache Overview

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

EMC XTREMCACHE ACCELERATES MICROSOFT SQL SERVER

Storage Optimization with Oracle Database 11g

HP SmartCache technology

Architecture of a Real-Time Operational DBMS

Benefits of Automatic Data Tiering in OLTP Database Environments with Dell EqualLogic Hybrid Arrays

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

A New Metric for Analyzing Storage System Performance Under Varied Workloads

Transcription:

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Yannis Klonatos, Thanos Makatos, Manolis Marazakis, Michail D. Flouris, Angelos Bilas {klonatos, makatos, maraz, flouris, bilas}@ics.forth.gr Foundation for Research and Technology - Hellas (FORTH), Institute of Computer Science (ICS) July 23, 2011 Yannis Klonatos et al. FORTH-ICS, Greece 1 / 29

Table of contents Background Previous Work Our goal 1 2 3 4 5 Yannis Klonatos et al. FORTH-ICS, Greece 2 / 29

Background Background Previous Work Our goal Increased need for high-performance storage I/O 1. Larger file-set sizes more I/O time 2. Server virtualization and consolidation more I/O pressure SSDs can mitigate I/O penalties SSD HDD Throughput (R/W) (MB/s) 277/202 100/90 Response time (ms) 0.17 12.6 IOPS (R/W) 30,000/3,500 150/150 Price/capacity ($/GB) $3 $0.3 Capacity per device 32 120 GB Up to 3TB Mixed SSD and HDD environments are necessary Cost-effectiveness: deploy SSDs as HDDs caches Yannis Klonatos et al. FORTH-ICS, Greece 3 / 29

Previous Work Background Previous Work Our goal Web servers as a secondary file cache[kgil et al., 2006] Requires application knowledge and intervention Readyboost feature in Windows Static file preloading Requires user interaction bcache module in the Linux Kernel Has no admission control NetApp s Performance Acceleration Module Needs specialized hardware Yannis Klonatos et al. FORTH-ICS, Greece 4 / 29

Our goal Background Previous Work Our goal Design Azor, a transparent SSD cache Move SSD caching to block-level Hide the address space of SSDs Thorough analysis of design parameters 1. Dynamic differentiation of blocks 2. Cache associativity 3. I/O concurrency Yannis Klonatos et al. FORTH-ICS, Greece 5 / 29

Table of contents Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency 1 2 3 4 5 Yannis Klonatos et al. FORTH-ICS, Greece 6 / 29

Overall design space Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency Yannis Klonatos et al. FORTH-ICS, Greece 7 / 29

Writeback Cache Design Issues Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency 1. Requires synchronous metadata updates for write I/Os, HDDs may not have the up-to-date blocks Must know the location of each block in case of failure 2. Reduces system resilience to failures, A failing SSD results in data loss SSDs are hidden, so other layers can t handle these failures Our write-through design avoids these issues Yannis Klonatos et al. FORTH-ICS, Greece 8 / 29

Overall design space Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency Yannis Klonatos et al. FORTH-ICS, Greece 9 / 29

Overall design space Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency Yannis Klonatos et al. FORTH-ICS, Greece 10 / 29

Overall design space Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency Yannis Klonatos et al. FORTH-ICS, Greece 11 / 29

Dynamic block differentiation Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency Blocks are not equally important to performance Makes sense to differentiate during admission to SSD cache Introduce a 2-Level Block Selection scheme (2LBS) First level: Prioritize filesystem metadata over data Many more small files more FS metadata Additional FS metadata introduced for data protection Cannot rely on DRAM for effective metadata caching Metadata requests represent 50% 80% of total I/O accesses Second level: Prioritize between data blocks Some data are accessed more frequently Some data are used for faster accesses to other data D. Roselli and T. E. Anderson, A comparison of file system workloads, Usenix ATC 2000 Yannis Klonatos et al. FORTH-ICS, Greece 12 / 29

Two-level Block Selection Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency Modify XFS filesystem to tag FS metadata requests Transparent metadata detection also possible Keep in DRAM an estimate of each HDD block s accesses Static allocation: 256 MB DRAM required per TB of HDDs DRAM space required is amortized with better performance Dynamic allocation of counters reduces DRAM footprint Yannis Klonatos et al. FORTH-ICS, Greece 13 / 29

Cache Associativity Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency Associativity: performance and metadata footprint tradeoff Higher-way associativities need more DRAM space for metadata Direct-Mapped cache Minimizes metadata requirements Suffers from conflict misses Fully-Set-Associative cache 4.7 more metadata than the direct-mapped cache Proper choice of replacement policy is important Yannis Klonatos et al. FORTH-ICS, Greece 14 / 29

Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency Cache Associativity - Replacement policy Large variety of replacement algorithms used in CPUs/DRAM Prohibitively expensive in terms of metadata size Assume knowledge of the workload I/O patterns May cause up to 40% performance variance We choose the LRU replacement policy Good reference point for more sophisticated policies Reasonable choice since buffer-cache uses LRU Yannis Klonatos et al. FORTH-ICS, Greece 15 / 29

I/O Concurrency Overall design space Dynamic block differentiation Cache Associativity I/O Concurrency A high degree of I/O concurrency: Allows overlapping I/O with computation Effectively hides I/O latency 1 Allow concurrent read accesses on the same cache line Track only pending I/O requests Reader-writer locks per cache line are prohibitevely expensive 2 Hide SSD write I/Os of read misses Copy the filled buffers to a new request Introduces a memory copy Must maintain state of pending I/Os Yannis Klonatos et al. FORTH-ICS, Greece 16 / 29

Table of contents Experimental Setup Benchmarks Experimental Questions 1 2 3 4 5 Yannis Klonatos et al. FORTH-ICS, Greece 17 / 29

Experimental Setup Experimental Setup Benchmarks Experimental Questions Dual socket, quad core Intel Xeon 5400 (64-bit) Twelve 500GB SATA-II disks with write-through caching Areca 1680D-IX-12 SAS/SATA RAID storage controller Four 32GB Intel SLC SSDs (NAND Flash) HDDs and SSDs on RAID-0 setup, 64KB chunks Centos 5.5 OS, kernel version 2.6.18-194 XFS filesystem 64GB DRAM, varied by experiment Yannis Klonatos et al. FORTH-ICS, Greece 18 / 29

Benchmarks Experimental Setup Benchmarks Experimental Questions TPC-C I/O intensive workloads, between hours to days for each run Type Properties File Set RAM SSD Cache sizes (GB) TPC-H Data warehouse Read only 28GB 4GB 7,14,28 SPECsfs CIFS File- write-dominated, Up to 32GB 128 server OLTP workload latency-sensitive highlyconcurrent 2TB 155GB 4GB 77.5 Yannis Klonatos et al. FORTH-ICS, Greece 19 / 29

Experimental Questions Experimental Setup Benchmarks Experimental Questions Which is the best static decision for handling I/O misses? Does dynamically differentiating blocks improve performance? How does cache associativity impact performance? Can our design options cope with a black box workload? Yannis Klonatos et al. FORTH-ICS, Greece 20 / 29

Table of contents Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload 1 2 3 4 5 Yannis Klonatos et al. FORTH-ICS, Greece 21 / 29

Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload Static decision for I/O misses (SPECsfs2008) CIFS ops/sec Native-12HDDs write-hdd-ssd write-hdd-update 16000 14000 12000 10000 8000 6000 4000 2000 0 Cache Write Policy Latency (msec/op) 20 15 10 5 0 Native write-hdd-update write-hdd-ssd 0 2000 4000 6000 8000 10000 12000 14000 CIFS ops/sec 11% to 66% better performance than HDDs Huge file set, only 30% accessed write-hdd-ssd policy evicts useful blocks Up to 5000 CIFS ops/sec difference for the same latency Yannis Klonatos et al. FORTH-ICS, Greece 22 / 29

Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload Differentiating filesystem metadata (SPECsfs2008) FS metadata continuously increase during execution CIFS ops/sec 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Maximum Sustained Load # Metadata Requests 4 8 10 12 16 32 RAM (GBytes) 600000 550000 500000 450000 400000 350000 300000 250000 200000 150000 100000 50000 0 # Metadata Requests Latency (msec/op) 25 20 15 10 5 0 Native-12HDDs Base Fully-Set-Associative 2LBS Fully-Set-Associative 0 1000 2000 3000 4000 5000 6000 7000 CIFS ops/sec Metadata DRAM misses up to 71% impact DRAM data hit ratio less than 5% 3,000 more CIFS ops/sec between HDDs and Azor 23% latency reduction when using 2LBS in Azor Yannis Klonatos et al. FORTH-ICS, Greece 23 / 29

Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload Differentiating filesystem data blocks (TPC-H) Filesystem data like indices important for databases Data differentiation improves performance Base 14GB 2LBS 14GB Base 28GB Speedup to native 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 DM FA 14 14 28 14 14 28 Cache size (GB) Hit Ratio (%) 100.0 87.5 75.0 62.5 50.0 37.5 25.0 12.5 0.0 DM FA 14 14 28 14 14 28 Cache size (GB) 1.95 and 1.53 improvement for DM and FA caches Medium size DM is 20% better than large size DM With 10% less hit ratio Yannis Klonatos et al. FORTH-ICS, Greece 24 / 29

Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload Importance of cache associativity (TPC-H) Speedup to HDDs 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 7GB cache 14GB cache DM FA 7 14 28 7 14 28 Cache size (GB) Hit Ratio (%) 100.0 87.5 75.0 62.5 50.0 37.5 25.0 12.5 0.0 28GB cache DM FA 7 14 28 7 14 28 Cache size (GB) FA better than DM for all cache sizes Large size FA = 1.36 better than DM counterpart Up to 15% less conflict misses than DM Medium size FA 32% better than large size DM Yannis Klonatos et al. FORTH-ICS, Greece 25 / 29

A black box workload (TPC-C) Static decision for handling I/O misses Dynamic differentiation of blocks Importance of cache associativity A black box workload We choose the best parameters found so far Fully-set-associative cache design SSD cache size of half the workload size NOTPM 1500 1250 1000 750 500 250 Native 12HDDs Base 2LBS Hit Ratio(%) 100.0 87.5 75.0 62.5 50.0 37.5 25.0 12.5 0 Cache Policy 0.0 Cache Policy Base cache: 55% improvement to native 2LBS cache: 34% additional improvement Hit ratio remains the same in both versions Disk utilization is 100%, SSD utilization under 7% Yannis Klonatos et al. FORTH-ICS, Greece 26 / 29

Table of contents 1 2 3 4 5 Yannis Klonatos et al. FORTH-ICS, Greece 27 / 29

We use SSD-based I/O caches to increase storage performance Performance is improved with higher way associativities At the cost of 4.7 higher metadata footprint Write policy can make up to 50% performance difference We explore differentiation of HDD blocks According to their expected importance on system performance Design and evaluation of a two-level block selection scheme Overall, our work shows that differentiation of blocks is a promising technique for improving SSD-based I/O caches Yannis Klonatos et al. FORTH-ICS, Greece 28 / 29