Ceph BlueStore Performance on Latest Intel Server Platforms. Orlando Moreno Performance Engineer, Intel Corporation May 10, 2018

Similar documents
Micron 9200 MAX NVMe SSDs + Red Hat Ceph Storage 3.0

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Ultimate Workstation Performance

Intel SSD Data center evolution

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

An Effective and Efficient Performance Optimization Method by Design & Experiment: A Case Study with Ceph Storage

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

Changpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group

Jim Harris. Principal Software Engineer. Data Center Group

Andreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland

Daniel Verkamp, Software Engineer

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

Re- I m a g i n i n g D a t a C e n t e r S t o r a g e a n d M e m o r y

Building an Open Memory-Centric Computing Architecture using Intel Optane Frank Ober Efstathios Efstathiou Oracle Open World 2017 October 3, 2017

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Ben Walker Data Center Group Intel Corporation

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

Are You Insured Against Your Noisy Neighbor Sunku Ranganath, Intel Corporation Sridhar Rao, Spirent Communications

Accelerate block service built on Ceph via SPDK Ziye Yang Intel

Extremely Fast Distributed Storage for Cloud Service Providers

Intel Architecture 2S Server Tioga Pass Performance and Power Optimization

Ceph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect

Understanding Write Behaviors of Storage Backends in Ceph Object Store

A New Key-Value Data Store For Heterogeneous Storage Architecture

Ziye Yang. NPG, DCG, Intel

Jim Harris. Principal Software Engineer. Intel Data Center Group

Future of datacenter STORAGE. Carol Wilder, Niels Reimers,

Scott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct :45 p.m. - 4:30 p.m. Moscone West - Room 3020

Ceph Optimizations for NVMe

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Jim Harris Principal Software Engineer Intel Data Center Group

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

Storage and Memory Hierarchy in HPC: New Paradigm and New Solutions with Intel Dr. Jean-Laurent Philippe

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

Supermicro All-Flash NVMe Solution for Ceph Storage Cluster

Changpeng Liu, Cloud Software Engineer. Piotr Pelpliński, Cloud Software Engineer

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

FAST FORWARD TO YOUR <NEXT> CREATION

Jian Zhang, Senior Software Engineer. Jack Zhang, Senior Enterprise Architect.

March NVM Solutions Group

12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

Intel s Architecture for NFV

Block Storage Service: Status and Performance

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Hubert Nueckel Principal Engineer, Intel. Doug Nelson Technical Lead, Intel. September 2017

Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud

Applying Polling Techniques to QEMU

Is Open Source good enough? A deep study of Swift and Ceph performance. 11/2013

April 2 nd, Bob Burroughs Director, HPC Solution Sales

SPDK Blobstore: A Look Inside the NVM Optimized Allocator

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Essential Performance and Advanced Security

Accelerate Ceph By SPDK on AArch64

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Intel. Rack Scale Design: A Deeper Perspective on Software Manageability for the Open Compute Project Community. Mohan J. Kumar Intel Fellow

Create a Flexible, Scalable High-Performance Storage Cluster with WekaIO Matrix

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Flashed-Optimized VPSA. Always Aligned with your Changing World

Intel HPC Technologies Outlook

Intel optane memory as platform accelerator. Vladimir Knyazkin

Hardware and Software Co-Optimization for Best Cloud Experience

NVMe SSDs with Persistent Memory Regions

Improving Ceph Performance while Reducing Costs

Hyper-converged infrastructure with Proxmox VE virtualization platform and integrated Ceph Storage.

Accelerating Data Center Workloads with FPGAs

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division

Achieving 2.5X 1 Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization

Intel Software Guard Extensions Platform Software for Windows* OS Release Notes

Colin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct :45 p.m. - 5:30 p.m. Moscone West - Room 3020

Data-Centric Innovation Summit ALPER ILKBAHAR VICE PRESIDENT & GENERAL MANAGER MEMORY & STORAGE SOLUTIONS, DATA CENTER GROUP

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Intel technologies, tools and techniques for power and energy efficiency analysis

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

Ed Warnicke, Cisco. Tomasz Zawadzki, Intel

Yuan Zhou Chendi Xue Jian Zhang 02/2017

Srinivas Chennupaty. Intel Corporation, 2018

WITH INTEL TECHNOLOGIES

Deterministic Storage Performance

H.J. Lu, Sunil K Pandey. Intel. November, 2018

HP Z Turbo Drive G2 PCIe SSD

Intel SSD DC P3700 & P3600 Series

The Transition to PCI Express* for Client SSDs

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

An NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Jim Pappas Director of Technology Initiatives, Intel Vice-Chair, Storage Networking Industry Association (SNIA) December 07, 2018

Highly accurate simulations of big-data clusters for system planning and optimization

INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT

Transcription:

Ceph BlueStore Performance on Latest Intel Server Platforms Orlando Moreno Performance Engineer, Intel Corporation May 10, 2018

Legal Disclaimers 2017 Intel Corporation. Intel, the Intel logo, Xeon and Xeon logos are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Intel processors of the same SKU may vary in frequency or power as a result of natural variability in the production process. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks/datacenter. The cost reduction scenarios described are intended to enable you to get a better understanding of how the purchase of a given Intel based product, combined with a number of situation-specific variables, might affect future costs and savings. Circumstances will vary and there may be unaccounted-for costs related to the use and deployment of a given product. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs or cost reduction. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804. No computer system can be absolutely secure. Intel Advanced Vector Extensions (Intel AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo. Available on select Intel processors. Requires an Intel HT Technology-enabled system. Your performance varies depending on the specific hardware and software you use. Learn more by visiting http://www.intel.com/info/hyperthreading. 2

AGENDA Background Hardware and software configurations Performance overview Summary 3

BACKGROUND Storage and Application Workloads

Introducing Innovative NVMe*-Based Storage Solutions for Today and the Future Red Hat Ceph Storage* with Intel Optane SSD DC P4800X combined with Intel SSD DC P4500 delivers exceptional performance, lower latency, and reduced TCO. 1. Responsiveness defined as average read latency measured at Queue Depth 1 during 4k random write workload. Measured using FIO 2.15. Common configuration - Intel 2U Server System, OS CentOS 7.2, kernel 3.10.0-327.el7.x86_64, CPU 2 x Intel Xeon E5-2699 v4 @ 2.20GHz (22 cores), RAM 396GB DDR @ 2133MHz. Intel drives evaluated - Intel Optane SSD DC P4800X 375GB and Intel SSD DC P3700 1600GB. Samsung* drives evaluated Samsung SSD PM1725a, Samsung SSD PM1725, Samsung PM963, Samsung PM953. Micron* drive evaluated Micron 9100 PCIe* NVMe* SSD. Toshiba* drives evaluated Toshiba ZD6300. Test QD1 Random Read 4K latency, QD1 Random RW 4K 70% Read latency, QD1 Random Write 4K latency using FIO 2.15. *Other names and brands may be claimed as the property of others. 5

STORAGE EVOLUTION

PLATFORM EVOLUTION Generation-to-Generation

BLUESTORE BACKEND BlueStore is a new Ceph storage backend optimized for modern media key/value database (RocksDB) for metadata all data written directly to raw device(s) can combine HDD, SSD, NVMe, NVRAM ~2X faster than FileStore Better parallelism, efficiency on fast devices No double writes for data Performs well with very small journals Separate caching and data drives still recommended!

HARDWARE AND SOFTWARE CONFIGURATION 6-Node Disaggregated All-Flash Ceph Cluster Ceph 12.1.1-175 (Luminous rc) Bluestore 2x replication pool, 8192 PGs 1, 2, and 4 OSDs per NVMe SSD

BENCHMARKING CEPH Determining Ceph RBD performance Using Ceph Benchmarking Tool (CBT), FIO was run against several RBD volumes. Several metrics were collected: Aggregate IOPS and bandwidth Average and 99 th percentile latency CPU utilization

BLUESTORE METADATA ON INTEL OPTANE SSD

USING INTEL OPTANE SSD FOR METADATA RocksDB and WAL Adding Intel Optane SSD as metadata drive provides write latency improvements ~25% more IOPS with Optane for small block random writes ~50% increase in aggregate throughput (GB/s) for large (1MB) sequential writes Average latency decreases by up to 25% 2x lower long tail latency

SCALING STORAGE PERFORMANCE Two vectors to scale Ceph performance: Co-locate multiple OSD processes on a NVMe device Add more NVMe devices per node Trade-offs for each method 13

OSD AND NVME SCALING 4KB Random Performance 2-4 OSDs/NVMe SSD and 4-6 NVMe SSDs per node are sweet spots

FUTURE WORK RDMA in Ceph Default Ceph networking stack uses Async Messenger (TCP) Leverage RDMA to reduce CPU utilization and network layer latency Async Messenger compatible with RDMA (RoCE and iwarp) Functionally ready, but optimizations and testing on-going

SUMMARY Using Intel Optane SSD DC P4800X combined with Intel SSD DC P4500 for Ceph storage provides high performance, high capacity, and a more cost effective solution Ceph Bluestore presents opportunities to utilize fast technology such as Intel Optane SSD On-going work to improve Ceph performance on NVMe and enable new technologies, such as RDMA

THANK YOU plus.google.com/+redhat facebook.com/redhatinc linkedin.com/company/red-hat twitter.com/redhat youtube.com/user/redhatvideos

BACKUP

STORAGE EVOLUTION

CEPH PARAMETERS Global debug_lockdep = 0/0 debug_context = 0/0 debug_crush = 0/0 debug_buffer = 0/0 debug_timer = 0/0 debug_filer = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_ms = 0/0 debug_monc = 0/0 debug_tp = 0/0 debug_auth = 0/0 debug_finisher = 0/0 debug_heartbeatmap = 0/0 debug_perfcounter = 0/0 debug_asok = 0/0 debug_throttle = 0/0 debug_mon = 0/0 debug_paxos = 0/0 debug_rgw = 0/0 perf = true mutex_perf_counter = true throttler_perf_counter = false rbd cache = false rbd_cache_writethrough_until_flush = false rbd_op_threads = 2 osd scrub load threshold = 0.01 osd scrub min interval = 137438953472 osd scrub max interval = 137438953472 osd deep scrub interval = 137438953472 osd max scrubs = 16 log file = /var/log/ceph/$name.log log to syslog = false mon compact on trim = false osd pg bits = 8 osd pgp bits = 8 mon pg warn max object skew = 100000 mon pg warn min per osd = 0 mon pg warn max per osd = 32768 osd_crush_chooseleaf_type = 0 21

CEPH PARAMETERS OSD osd_op_num_shards = 8 osd_op_num_threads_per_shard = 2 filestore_max_sync_interval = 1 filestore_op_threads = 10 filestore_queue_max_ops = 5000 filestore_queue_committing_max_ops = 5000 journal_max_write_entries = 1000 journal_queue_max_ops = 3000 objecter_inflight_ops = 102400 filestore_wbthrottle_enable = false filestore_queue_max_bytes = 1048576000 filestore_queue_committing_max_bytes = 1048576000 journal_max_write_bytes = 1048576000 journal_queue_max_bytes = 1048576000 ms_dispatch_throttle_bytes = 1048576000 objecter_infilght_op_bytes = 1048576000 22

PERFORMANCE OVERVIEW 4KB Random Workload

99 TH PERCENTILE LATENCY 4KB Random Workload

OSD AND NVME SCALING 4KB Random Read Performance 2-4 OSDs/NVMe SSD and 4-6 NVMe SSDs per node are sweet spots

OSD AND NVME SCALING 1MB Sequential Performance Writes benefit from more OSDs Reads are bottlenecked by network

OSD AND NVME SCALING 1MB Sequential CPU Utilization Sequential workloads are not CPU intensive