Accelerate block service built on Ceph via SPDK Ziye Yang Intel

Similar documents
Ziye Yang. NPG, DCG, Intel

SPDK Blobstore: A Look Inside the NVM Optimized Allocator

Jim Harris Principal Software Engineer Intel Data Center Group

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Jim Harris. Principal Software Engineer. Intel Data Center Group

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

Jim Harris. Principal Software Engineer. Data Center Group

Changpeng Liu, Cloud Software Engineer. Piotr Pelpliński, Cloud Software Engineer

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

Ben Walker Data Center Group Intel Corporation

Scott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct :45 p.m. - 4:30 p.m. Moscone West - Room 3020

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Ceph Optimizations for NVMe

Changpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group

Accelerate Ceph By SPDK on AArch64

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

Ceph BlueStore Performance on Latest Intel Server Platforms. Orlando Moreno Performance Engineer, Intel Corporation May 10, 2018

Storage Performance Development Kit (SPDK) Daniel Verkamp, Software Engineer

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Future of datacenter STORAGE. Carol Wilder, Niels Reimers,

Yuan Zhou Chendi Xue Jian Zhang 02/2017

Intel Builder s Conference - NetApp

Andreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland

Understanding Write Behaviors of Storage Backends in Ceph Object Store

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

Using persistent memory and RDMA for Ceph client write-back caching Scott Peterson, Senior Software Engineer Intel

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Data-Centric Innovation Summit ALPER ILKBAHAR VICE PRESIDENT & GENERAL MANAGER MEMORY & STORAGE SOLUTIONS, DATA CENTER GROUP

Daniel Verkamp, Software Engineer

Storage Performance Tuning for FAST! Virtual Machines

ReFlex: Remote Flash Local Flash

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

WHAT S NEW IN LUMINOUS AND BEYOND. Douglas Fuller Red Hat

Ceph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect

DPDK Vhost/Virtio Performance Report Release 18.05

OpenMPDK and unvme User Space Device Driver for Server and Data Center

March NVM Solutions Group

Supermicro All-Flash NVMe Solution for Ceph Storage Cluster

Application Advantages of NVMe over Fabrics RDMA and Fibre Channel

DPDK Vhost/Virtio Performance Report Release 18.11

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Ed Warnicke, Cisco. Tomasz Zawadzki, Intel

An NVMe-based FPGA Storage Workload Accelerator

MDev-NVMe: A NVMe Storage Virtualization Solution with Mediated Pass-Through

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

InfiniBand Networked Flash Storage

Is Open Source good enough? A deep study of Swift and Ceph performance. 11/2013

Intel s Architecture for NFV

What is QES 2.1? Agenda. Supported Model. Live demo

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

Software and Management for NVMe Session A12 Part B 3:40 to 4:45

Application Acceleration Beyond Flash Storage

Hyper-converged infrastructure with Proxmox VE virtualization platform and integrated Ceph Storage.

DPDK Performance Report Release Test Date: Nov 16 th 2016

The Transition to PCI Express* for Client SSDs

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Intel and SAP Realising the Value of your Data

Hubert Nueckel Principal Engineer, Intel. Doug Nelson Technical Lead, Intel. September 2017

NVMe over Universal RDMA Fabrics

A New Key-Value Data Store For Heterogeneous Storage Architecture

Challenges of High-IOPS Enterprise-level NVMeoF-based All Flash Array From the Viewpoint of Software Vendor

Hardware Based Compression in Ceph OSD with BTRFS

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

DPDK Vhost/Virtio Performance Report Release 17.08

POWER YOUR CREATIVITY WITH THE INTEL CORE X-SERIES PROCESSOR FAMILY

Out-of-band (OOB) Management of Storage Software through Baseboard Management Controller Piotr Wysocki, Kapil Karkra Intel

Colin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct :45 p.m. - 5:30 p.m. Moscone West - Room 3020

Improving Ceph Performance while Reducing Costs

Intel optane memory as platform accelerator. Vladimir Knyazkin

Intel SSD Data center evolution

Userspace NVMe Driver in QEMU

Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system

Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud

The Long-Term Future of Solid State Storage Jim Handy Objective Analysis

DPDK Intel NIC Performance Report Release 18.02

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

Innovator, Disruptor or Laggard, Where will your storage applications live? Next generation storage

Maximizing Network Throughput for Container Based Storage David Borman Quantum

12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016

Faster2 Video Transcode in Handbrake* vs Intel Core i7-7700k Processor

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

DPDK Intel Cryptodev Performance Report Release 18.08

Why NVMe/TCP is the better choice for your Data Center

Solid State Performance Comparisons: SSD Cache Performance

INTRODUCTION TO CEPH. Orit Wasserman Red Hat August Penguin 2017

The Comparison of Ceph and Commercial Server SAN. Yuting Wu AWcloud

Block Storage Service: Status and Performance

Introducing SUSE Enterprise Storage 5

Using FPGAs to accelerate NVMe-oF based Storage Networks

Extremely Fast Distributed Storage for Cloud Service Providers

Intel Distribution for Python* и Intel Performance Libraries

Datacenter Network Solutions Group

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Ultimate Workstation Performance

All-Flash Storage System

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Transcription:

Accelerate block service built on Ceph via SPDK Ziye Yang Intel 1

Agenda SPDK Introduction Accelerate block service built on Ceph SPDK support in Ceph bluestore Summary 2

Agenda SPDK Introduction Accelerate block service built on Ceph SPDK support in Ceph bluestore Summary 3

What? Storage Performance Development Kit Software Building Blocks Open Source BSD Licensed Userspace and Polled Mode http://spdk.io 4

The problem: Software is becoming the bottleneck Latency >2ms >400,000 IO/s <100µs <100µs >25,000 IO/s I/O performance <500 IO/s HDD SATA NAND SSD NVMe* NAND SSD Intel Optane SSD The Opportunity: Use Intel software ingredients to unlock the potential of new media 5

Architecture Released Q4 17 Storage Protocols iscsi Target SCSI vhost-scsi Target NVMe-oF* Target NVMe vhost-blk Target Integration RocksDB Storage Services Block Device Abstraction (BDEV) Logical 3 rd Party Volumes NVMe Linux Async IO Ceph RBD BlobFS Blobstore Ceph Vtune Amplifier Drivers NVMe-oF * Initiator NVMe Devices NVMe* PCIe Driver Intel QuickData Technology Driver Core Application Framework 6

Why? Efficiency & Performance SPDK more performance from Intel CPUs, nonvolatile media, and networking Up to Up to Up to 10X MORE IOPS/core for NVMe-oF* vs. Linux kernel 8X MORE IOPS/core for NVMe vs. Linux kernel 350% BETTER Tail Latency for RocksDB workloads FASTER TTM/ LESS RESOURCES than developing components from scratch Provides Future Proofing as NVM technologies increase in performance Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,includingthe performanceofthat productwhencombinedwith otherproducts. Formore informationgoto http://www.intel.com/performance 7

How? SPDK Community Github : https://github.com/spdk/spdk Trello : https://trello.com/spdk GerritHub : https://review.gerrithub.io/#/q/project:spdk/spdk+status:open IRC: https://freenode.net/ we re on #spdk Home Page: http://www.spdk.io/ 8

1 st SPDK Hackathon!! Nov 6-8 2017, Phoenix 9

Agenda SPDK Introduction Accelerate block service built on Ceph SPDK support in Ceph bluestore Summary 10

Leverage SPDK to accelerate the block service built on Ceph Block service daemon optimization outside Ceph Use optimized Block service daemon, e.g., SPDK iscsi target or NVMe-oF target Introduce Proper Cache policy in optimized block service daemon. OSD Optimization inside Ceph Use SPDK s user space NVMe driver instead of Kernel NVMe driver in bluestore (already have) Bring some ideas from SPDK Blobfs/Blobstore into Ceph Bluefs/Bluestore Network optimization (e.g., Leverage user space stack on DPDK or RDMA, will not be discussed in this topic) 11

Export Block Service SPDK optimized iscsi target SPDK optimized NVMe-oF target SPDK Cache module SPDK Ceph RBD bdev module (Leverage librbd) Existing SPDK apps and modules Existing Ceph components Modules to be developed FileStore Ceph RBD service Bluestore KVStore metadata RocksDB BlueRocksENV metadata RocksDB BlueRocksENV Bluefs Kernel/SPDK driver NVMe device Enhanced Bluefs SPDK NVMe driver NVMe device Bring some ideas in SPDK to enhance bluefs 12

SPDK iscsi target and LIO performance comparison for local detached storage kio/s 3000 2500 2000 1500 1000 500 0 32 Linux LIO kio/s 21 SPDK CPU cores 35 30 25 20 15 10 5 0 CPU cores consumed SPDK improves efficiency almost 2x iscsi Target improvements stem from: - Non-blocking TCP sockets - Pinned iscsi connections - SPDK storage access model TCP processing is limiting factor - 70%+ CPU cycles consumed in kernel network stack - Userspace polled mode TCP required for more improvement System Configuration: 2S Intel Xeon E5-2699v3: 18C, 2.3GHz (HT off), Intel Speed Step enabled, Intel Turbo Boost Technology disabled, 8x4GB DDR4 2133 MT/s, 1 DIMM per channel, Ubuntu * Server 14.10, 3.16.0-30-generic kernel, Ethernet Controller XL710 for 40GbE, 8x Intel P3700 NVM Express * SSD 800GB (4 per CPU socket), FW 8DV10102 As measured by: fio Direct=Yes, 4KB random read I/O, QueueDepth=32, Ramp Time=30s, Run Time=180s, Norandommap=1, I/O Engine = libaio, Numjobs=1 13

Agenda SPDK Introduction Accelerate block service built on Ceph Case study: Accelerate iscsi service exported by Ceph (From istuary s talk in SPDK meetup 2016) SPDK support in Ceph bluestore Summary 14

Block service exported by Ceph via iscsi protocol APP Multipath sdx dm-1 iscsi initiator sdy Client Cloud service providers which provision VM service can use iscsi. iscsi target RBD iscsi target RBD iscsi gateway If Ceph could export block service with good performance, it would be easy to glue those providers to Ceph cluster solution. OSD OSD OSD OSD OSD OSD OSD OSD Ceph cluster 15

iscsi + RBD Gateway iscsi Initiator Ceph server CPU: Intel(R) Xeon(R) CPU E5-2660 v4 @2.00GHz Four intel P3700 SSDs One OSD on each SSD, total 4 osds 4 pools PG number 512, one 10G image in one pool iscsi target server (librbd+spdk / librbd+tgt) CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Only one core enable iscsi initiator CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz iscsi Target iscsi Target Server Librbd OSD0 OSD1 Ceph Server OSD2 OSD3 16

Read performance comparison 4K_randread(IOPS(K)) 50 47 47 40 37 30 20 10 10 20 20 20 28 24 24 12 26 0 One core:tgt One core:spdk-iscsi Two cores:tgt Two cores:spdk-iscsi 1stream 2 streams 3streams 19

Write performance comparison 4K_randwrite(IOPS(K)) 30 25 20 15 10 5 6.5 9.5 18 14 19 24 9.5 13.5 19 22 16 24 25 27 0 One core:tgt One core:spdk-iscsi Two cores:tgt Two cores:spdk-iscsi 1stream 2 streams 3streams 4streams 20

Proposals/opportunities for better leveraging SPDK in Ceph Multiple OSDs support on same NVMe Device by using SPDK. Leverage SPDK s NVMe-oF target with NVMe driver. Risks: Same with kernel, i.e., fail all OSDs on the device if the daemon crash. OSD1 OSD2 OSDn Bluestore NVMEDEVICE SPDK NVME driver IPC(vhost) Bluestore NVMEDEVICE SPDK NVME driver TCP/IP SPDK NVMe-oF target Bluestore NVMEDEVICE SPDK NVME driver RDMA 21

Proposals/opportunities for better leveraging SPDK in Ceph Enhance cache support in NVMEDEVICE via using SPDK NVMe driver Currently, No read/write cache while using SPDK NVMe driver. Need better cache/buffer strategy for read/write performance improvement. Enable zero copy Currently, there are memory copy in NVMEDEVICE while conducting I/O read/write May need to eliminate the memory copy (Possible solution: Enable using DPDK memory while starting OSD) 22

Agenda SPDK Introduction Accelerate block service built on Ceph SPDK support in Ceph bluestore Summary 23

Current SPDK support in Ceph bluestore SPDK upgrade in Ceph: Upgraded SPDK to 16.11 in Dec, 2016 Upgraded SPDK to 17.03 in April, 2017 Upgraded SPDK to 17.07 in August, 2017 Stability Several compilation issues, running time bugs are fixed in code base while using SPDK. 24

SPDK support for Ceph in future To make SPDK really useful in Ceph, we will still do the following works with partners: Continue stability maintenance Version upgrade, bug fixing in compilation/running time. Performance enhancement Continue optimizing NVMEDEVICE module according to customers or partners feedback. New feature Development Occasionally pickup some common requirements/feedback in community and may upstream those features in NVMEDEVICE module 25

Agenda SPDK Introduction Accelerate block service built on Ceph SPDK support in Ceph bluestore Summary 26

Summary SPDK proves to be useful to explore the capability of fast storage devices (e.g., NVMe SSDs) in many scenarios. However it still needs extra development efforts to make SPDK useful for Bluestore in Ceph. Call for actions: Call for participation in SPDK community Welcome to leverage SPDK for Ceph optimization, and contact SPDK dev team for help and collaboration. 27

Q & A 28