ZEA, A Data Management Approach for SMR. Adam Manzanares

Similar documents
Parallel IO. nci.org.au

Linux SMR Support Status

Linux Kernel Support for Hybrid SMR Devices. Damien Le Moal, Director, System Software Group, Western Digital

Shingled Magnetic Recording (SMR) Panel: Data Management Techniques Examined Tom Coughlin Coughlin Associates

SMR in Linux Systems. Seagate's Contribution to Legacy File Systems. Adrian Palmer, Drive Development Engineering

October 30-31, 2014 Paris

Red Hat Enterprise 7 Beta File Systems

Linux storage system basics

Open Source for OSD. Dan Messinger

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

Using SMR Drives with Smart Storage Stack-Based HBA and RAID Solutions

IO Priority: To The Device and Beyond

Solid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Using Transparent Compression to Improve SSD-based I/O Caches

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Stratis: A New Approach to Local Storage Management

Linux File Systems: Challenges and Futures Ric Wheeler Red Hat

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

Improving Ceph Performance while Reducing Costs

virtual machine block storage with the ceph distributed storage system sage weil xensummit august 28, 2012

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

vsphere Storage Update 1 Modified 16 JAN 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

Storage Systems for Shingled Disks

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

<Insert Picture Here> DIF, DIX and Linux Data Integrity

Linux Filesystems and Storage Chris Mason Fusion-io

GearDB: A GC-free Key-Value Store on HM-SMR Drives with Gear Compaction

Manually Mount Usb Flash Drive Linux Command Line Redhat

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

DRBD SDS. Open Source Software defined Storage for Block IO - Appliances and Cloud Philipp Reisner. Flash Memory Summit 2016 Santa Clara, CA 1

Solid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Understanding Write Behaviors of Storage Backends in Ceph Object Store

Rapid Prototyping and Evaluation of Intelligence Functions of Active Storage Devices

LINUX IO performance tuning for IBM System Storage

Storage Systems. NPTEL Course Jan K. Gopinath Indian Institute of Science

How Next Generation NV Technology Affects Storage Stacks and Architectures

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Skylight A Window on Shingled Disk Operation. Abutalib Aghayev, Peter Desnoyers Northeastern University

Souborové systémy v Linuxu

Emulating Goliath Storage Systems with David

Dell EMC SAN Storage with Video Management Systems

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

SSIM Collection & Archiving Infrastructure Scaling & Performance Tuning Guide

Introduction to Open-Channel Solid State Drives and What s Next!

Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016

Linux Internals For MySQL DBAs. Ryan Lowe Marcos Albe Chris Giard Daniel Nichter Syam Purnam Emily Slocombe Le Peter Boros

Mass-Storage Structure

An SMR-aware Append-only File System Chi-Young Ku Stephen P. Morgan Futurewei Technologies, Inc. Huawei R&D USA

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

Enabling Cost-effective Data Processing with Smart SSD

Linux Storage System Bottleneck Exploration

Storage Technologies - 3

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

SPDK Blobstore: A Look Inside the NVM Optimized Allocator

NVMe Direct. Next-Generation Offload Technology. White Paper

Storage Area Network (SAN)

CS370 Operating Systems

SAS Technical Update Connectivity Roadmap and MultiLink SAS Initiative Jay Neer Molex Corporation Marty Czekalski Seagate Technology LLC

Advanced UNIX File Systems. Berkley Fast File System, Logging File System, Virtual File Systems

Deep Dive: Cluster File System 6.0 new Features & Capabilities

Deep Learning Performance and Cost Evaluation

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

Design Considerations for Using Flash Memory for Caching

Sharing may be done through a protection scheme. Network File System (NFS) is a common distributed file-sharing method

Storage Controller Information

Proceedings of the Linux Symposium. July 13th 17th, 2009 Montreal, Quebec Canada

I/O Topology. Martin K. Petersen Oracle Abstract. 1 Disk Drives and Block Sizes. 2 Partitions

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Deep Learning Performance and Cost Evaluation

Storage Performance Tuning for FAST! Virtual Machines

CS370 Operating Systems

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Storage Protocol Offload for Virtualized Environments Session 301-F

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Storage Controller Considerations

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

STOR-LX. Features. Linux-Based Unified Storage Appliance

Storage Evaluations at BNL

Surveillance Dell EMC Storage with Digifort Enterprise

Optimizing Flash Storage With Linearization Software. Doug Dumitru CTO EasyCO LLC. EasyCo. Santa Clara, CA USA August

SanDisk DAS Cache Compatibility Matrix

The Oracle Database Appliance I/O and Performance Architecture

FC-NVMe. NVMe over Fabrics. Fibre Channel the most trusted fabric can transport NVMe natively. White Paper

Virtio SCSI. An alternative virtualized storage stack for KVM. Stefan Hajnoczi Paolo Bonzini

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

Software-defined Shared Application Acceleration

Surveillance Dell EMC Storage with FLIR Latitude

Managing Array of SSDs When the Storage Device is No Longer the Performance Bottleneck

Dell with Oracle Database 11g R2 on RHEL 6.3 & Oracle Linux 6.3 with Compatible Kernel

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

Linux storage system basics

Red Hat Global File System

HITACHI VIRTUAL STORAGE PLATFORM G SERIES FAMILY MATRIX

SurFS Product Description

OpenVMS Storage Updates

Adding Analytical Behavioral Intelligence to Block Storage Layer

Transcription:

ZEA, A Data Management Approach for SMR Adam Manzanares

Co-Authors Western Digital Research Cyril Guyot, Damien Le Moal, Zvonimir Bandic University of California, Santa Cruz Noah Watkins, Carlos Maltzahn 2016 Western Digital Corporation or affiliates. All rights reserved. 2

Why SMR? HDDs are not going away Exponential growth of data still exists $/TB vs. Flash is still much lower We want to continue this trend! Traditional Recording (PMR) is reaching scalability limits SMR is a density enhancing technology being shipped right now. Future recording technologies may behave like SMR Write constraint similarities HAMR 2016 Western Digital Corporation or affiliates. All rights reserved. 3

Flavors of SMR SMR Constraint Writes must be sequential to avoid data loss Drive Managed Transparent to the user Comes at the cost of predictability and additional drive resources Host Aware Host is aware of SMR working model If user does something wrong the drive will fix the problem internally Host Managed Host is aware of SMR working model If user does something wrong the drive will reject the IO request 2016 Western Digital Corporation or affiliates. All rights reserved. 4

SMR Drive Device Model New SCSI standard Zoned Block Commands (ZBC) SATA equivalent ZAC Drive described by zones and their restrictions Type Write Restriction Intended Use Con Abbreviation Conventional None In-place updates Increased Resources Sequential Preferred Sequential Required None Sequential Write Mostly sequential writes Only sequential writes Variable Performance Our user space library (libzbc) queries zone information from the drive https://github.com/hgst/libzbc CZ SPZ SRZ 2016 Western Digital Corporation or affiliates. All rights reserved. 5

Why Host Managed SMR? We are chasing predictability Large systems with complex data flows Demand latency windows that are relatively tight from storage devices Translates into latency guarantees from the larger system Drive managed HDDs When is GC triggered How long does GC take Host Aware HDDs Seem ok if you do the right thing Degrade to drive managed when you don t Host Managed Complexity and variance of drive internalized schemes are now gone 2016 Western Digital Corporation or affiliates. All rights reserved. 6

Host Managed SMR seems great, but All of these layers must be SMR compatible Any IO reordering causes a problem Must not happen at any point between the user and device What about my new SMR FS? What is your GC policy? Is it tunable? Does your FS introduce latency at your discretion? What about my new KV that is SMR compatible? See above In addition, is it built on top of a file system? Fibre Channel over Ethernet tcm_fc iscsi_target_mod ISCSI tcm_qla2xxx target_core_pscsi Request-based device mapper targets dm-multipath sysfs (transport attributes) Transport classes scsi_transport_fc scsi_transport_sas scsi_transport_... libata Fibre Channel sbp_target FireWire target_core_mod tcm_usb_gadget target_core_iblock USB tcm_vhost Virtual Host LIO target_core_file target_core_user userspace SCSI mid layer scsi-mq SCSI upper level drivers /dev/sda /dev/sr* The Linux Storage Stack Diagram version 4.0, 2015-06-01 /dev/sd* /dev/st*... outlines the Linux storage stack as of Kernel version 4.0 vfs_writev, vfs_readv,... Direct I/O (O_DIRECT) Block-based FS ext2 ext3 ext4 xfs btrfs gfs BIOs (block I/Os) /dev/zram* zram ecryptfs Applications (processes) SCSI low level drivers megaraid_sas qla2xxx pm8001 iscsi_tcp virtio_scsi ufs... ifs ocfs BIOs /dev/rbd* read(2) iso9660... stackable Maps BIOs to requests noop Stackable FS I/O scheduler memory overlayfs cfq write(2) open(2) Network FS NFS coda smbfs... (optional) stat(2) VFS chmod(2) proc Pseudo FS pipefs Devices on top of normal block devices drbd LVM device mapper mdraid dm-crypt dm-mirror... dm-cache dm-thin bcache dm-raid dm-delay deadline Hardware dispatch queue Request based drivers rbd unionfs network ceph FUSE Block Layer /dev/mmcblk*p* mmc... sysfs futexfs usbfs... userspace (e.g. sshfs) network...... BIOs blkmq multi queue Software queues Hardware dispatch queues Request based drivers /dev/nullb* null_blk /dev/vd* virtio_blk Special purpose FS tmpfs devtmpfs ramfs malloc BIOs (block I/Os) /dev/rssd* mtip32xx mmap (anonymous pages) Page cache struct bio - sector on disk - sector cnt - bio_vec cnt - bio_vec index - bio_vec list BIOs hooked in device drivers (they hook in like stacked devices do) /dev/nvme*n* nvme BIO based drivers /dev/skd* skd /dev/rsxx* rsxx ahci ata_piix... aacraid lpfc mpt3sas vmw_pvscsi network HDD SSD DVD drive LSI RAID Qlogic HBA PMC-Sierra HBA para-virtualized SCSI virtio_pci mobile device flash memory Micron PCIe card nvme device stec device Physical devices Adaptec RAID Emulex HBA LSI 12Gbs SAS HBA VMware's para-virtualized SCSI SD-/MMC-Card IBM flash adapter The Linux Storage Stack Diagram http://www.thomas-krenn.com/en/wiki/linux_storage_stack_diagram Created by Werner Fischer and Georg Schönberger License: CC-BY-SA 3.0, see http://creativecommons.org/licenses/by-sa/3.0/ 2016 Western Digital Corporation or affiliates. All rights reserved. 7

What Did We Do? Introduce a zone and block based extent allocator [ ZEA ] Write Logical Extent [ Zone Block Address ] Return Drive LBA Read Extent [ Logical Block Address ] Return data if extent is valid Iterators over extent metadata Allows one to build ZBA -> LBA mapping ZEA + SMR Drive Per Write Metadata 2016 Western Digital Corporation or affiliates. All rights reserved. 8

ZEA Performance vs. Block Device 2016 Western Digital Corporation or affiliates. All rights reserved. 9

ZEA Is Just An Extent Allocator, What Next? ZEA + LevelDB LevelDB is KV store library that is widely used LevelDB Backend API is good for SMR Write File Append Only, Read Randomly, Create & Delete Files, Flat Directory Lets Build a Simple FS compatible with LevelDB ZEA + LevelDB Architecture 2016 Western Digital Corporation or affiliates. All rights reserved. 10

ZEA + LevelDB Performance 2016 Western Digital Corporation or affiliates. All rights reserved. 11

Lessons Learned ZEA is a lightweight abstraction Hides sequential write constraint from application Low overhead vs. a block device when extent size is reasonable Provides addressing flexibility to application Useful for GC LevelDB integration opens up usability of Host Managed SMR HDD Unfortunately LevelDB not so great for large objects Ideal Use case for SMR drive would be large static blobs 2016 Western Digital Corporation or affiliates. All rights reserved. 12

What Is Left? What is a good interface above ZEA Garbage collection policies When and How How to use multiple zones efficiently Allocation Garbage collection Thanks for listening 2016 Western Digital Corporation or affiliates. All rights reserved. 13

adam.manzanares@hgst.com 2016 Western Digital Corporation or affiliates. All rights reserved. 14