Accelerating Ceph with Flash and High Speed Networks

Similar documents
Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

Optimize Storage Efficiency & Performance with Erasure Coding Hardware Offload. Dror Goldenberg VP Software Architecture Mellanox Technologies

Interconnect Your Future

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

Application Acceleration Beyond Flash Storage

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

N V M e o v e r F a b r i c s -

Open storage architecture for private Oracle database clouds

High Performance Computing

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

The Future of High Performance Interconnects

InfiniBand Networked Flash Storage

Interconnect Your Future

Solutions for Scalable HPC

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Improving Ceph Performance while Reducing Costs

Ceph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Interconnect Your Future

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

NVMe Over Fabrics (NVMe-oF)

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Сетевые технологии для систем хранения данных

Interconnect Your Future

Virtual Storage Tier and Beyond

Paving the Road to Exascale

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Intelligent Hybrid Flash Management

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

DRBD SDS. Open Source Software defined Storage for Block IO - Appliances and Cloud Philipp Reisner. Flash Memory Summit 2016 Santa Clara, CA 1

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

SurFS Product Description

Accelerating Data Centers Using NVMe and CUDA

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

Open vstorage RedHat Ceph Architectural Comparison

Next Generation Architecture for NVM Express SSD

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VMware vsan Ready Nodes

NVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal

NVM Express over Fabrics Storage Solutions for Real-time Analytics

The NE010 iwarp Adapter

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

Designing Next Generation FS for NVMe and NVMe-oF

Ethernet. High-Performance Ethernet Adapter Cards

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

Lenovo Software Defined Storage with Windows Server 2016 Datacenter Storage Spaces

Tiered IOPS Storage for Service Providers Dell Platform and Fibre Channel protocol. CloudByte Reference Architecture

Bringing Intelligence to Enterprise Storage Drives

Challenges of High-IOPS Enterprise-level NVMeoF-based All Flash Array From the Viewpoint of Software Vendor

Session 201-B: Accelerating Enterprise Applications with Flash Memory

FPGA Implementation of Erasure Codes in NVMe based JBOFs

EMC Innovations in High-end storages

Interconnect Your Future Paving the Road to Exascale

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

ARISTA: Improving Application Performance While Reducing Complexity

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

Implementing Ultra Low Latency Data Center Services with Programmable Logic

A product by CloudFounders. Wim Provoost Open vstorage

Leveraging Flash in Scalable Environments: A Systems Perspective on How FLASH Storage is Displacing Disk Storage

Hyper-converged infrastructure with Proxmox VE virtualization platform and integrated Ceph Storage.

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

Enterprise Ceph: Everyway, your way! Amit Dell Kyle Red Hat Red Hat Summit June 2016

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

HIGH-DENSITY HIGH-CAPACITY HIGH-DENSITY HIGH-CAPACITY

Bringing Intelligence to Enterprise Storage Drives

Deep Learning Performance and Cost Evaluation

Broadberry. Hyper-Converged Solution. Date: Q Application: Hyper-Converged S2D Storage. Tags: Storage Spaces Direct, DR, Hyper-V

No compromises: distributed transactions with consistency, availability, and performance

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

The Fastest And Most Efficient Block Storage Software (SDS)

Birds of a Feather Presentation

MySQL and Ceph. MySQL in the Cloud Head-to-Head Performance Lab. 1:20pm 2:10pm Room :20pm 3:10pm Room 203

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

The QM2 PCIe Expansion Card Gives Your NAS a Performance Boost! QM2-2S QM2-2P QM2-2S10G1T QM2-2P10G1T

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

RED HAT CEPH STORAGE ROADMAP. Cesar Pinto Account Manager, Red Hat Norway

DDN About Us Solving Large Enterprise and Web Scale Challenges

DATA PROTECTION IN A ROBO ENVIRONMENT

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Deep Learning Performance and Cost Evaluation

The Impact of SSDs on Software Defined Storage

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

Building an All Flash Server What s the big deal? Isn t it all just plug and play?

Take control of storage performance

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Using FPGAs to accelerate NVMe-oF based Storage Networks

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

Optimizing the Data Center with an End to End Solutions Approach

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Virtualization of the MS Exchange Server Environment

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Transcription:

Accelerating Ceph with Flash and High Speed Networks Dror Goldenberg VP Software Architecture Santa Clara, CA 1

The New Open Cloud Era Compute Software Defined Network Object, Block Software Defined Storage Scale out Open Software Defined Santa Clara, CA 2

Ceph Architecture Architecture enables object, block & file access Fully distributed scale out Source: http://ceph.com/docs/master/rados/configuration/network-config-ref/ Source: http://ceph.com/docs/master/architecture/ Santa Clara, CA 5 Interconnect Capabilities Determine Scale Out Performance

Mellanox - Leading Supplier of End-to-End Interconnect Solutions Santa Clara, CA 6

Entering the Era of 100Gb/s Networks 100Gb/s Adapter, 0.7us latency, RDMA 150 million messages per second (10 / 25 / 40 / 50 / 56 / 100Gb/s) 36 EDR (100Gb/s) Ports, <90ns Latency Throughput of 7.2Tb/s 32 100GbE Ports, 64 25/50GbE Ports (10 / 25 / 40 / 50 / 100GbE) Throughput of 6.4Tb/s Santa Clara, CA Copper (Passive, Active) Optical Cables (VCSEL) Silicon Photonics 7

Building a Balanced Element for the Scale Out Storage System Network I/O should match Flash Drive I/O 40GbE..100GE Santa Clara, CA 8

Cluster Network Traffic Sunny Day Scenarios (Replication) READ Operation N Write Operation N Client Client Read Write Read Reply Replications Write Ack No extra cluster network traffic Typically 2x cluster network traffic (N-1) x Santa Clara, CA 9

Cluster Network Traffic Sunny Day Scenarios (Erasure Coding) READ Operation K M Write Operation K M Client Client Read Read Shards Write encode Write Shards Read Reply decode Write Ack ~1x cluster network traffic ((k-1)/k) x Santa Clara, CA Typically ~1.4x cluster network traffic ((k+m-1)/k) x 10

Cluster Network Traffic Recovery (Replication) Client Read Example - Time to recover Net networking time to move data 20TB system @40GE 1.1hrs 200TB system @40GE 11.1hrs Read Reply Similar flows for scrubbing But more demanding in I/O Recovery backend traffic 1x times lost data Santa Clara, CA 11

Cluster Network Traffic Recovery (Erasure Coding) K M Client Read Shards Example - Time to recover (10+4) Net networking time to move data 20TB system @40GE 14.4hrs 200TB system @40GE 144.4hrs Similar flows for scrubbing decode Recovery backend traffic typically ~14x times lost data (K+M-1) x Santa Clara, CA 12

RDMA Enables Efficient Data Movement Higher Bandwidth Lower Latency Efficient Data Movement With RDMA More CPU Power For Applications Hardware Network Acceleration Higher bandwidth, Lower latency Highest CPU efficiency more CPU Power To Run Applications Santa Clara, CA 13

Accelio, High-Performance Reliable Messaging and RPC Library Open source! https://github.com/accelio/accelio/ && www.accelio.org Faster RDMA integration to application Asynchronous Maximize msg and CPU parallelism Enable >10GB/s from single node Enable <10usec latency under load Integrated with Ceph Beta available in Hammer Mellanox, Red Hat, CohortFS, and Community collaboration XioMessenger built on top of Accelio (RDMA abstraction layer) Santa Clara, CA 15

RDMA and 56GE Contribution to Performance Santa Clara, CA 16

Optimizing Ceph For Throughput and Price/Throughput Red Hat, Supermicro, Seagate, Mellanox, Intel 40GbE Network Setup 40GbE Advantages Up to 2x read throughput per server Up to 50% decrease in latency Santa Clara, CA 17

InfiniFlash Storage with IFOS 1.0 EAP3 Up to 4 RBDs 2 Ceph nodes, connected to InfiniFlash 40GbE NICs from Mellanox Random Read IOPs SanDisk InfiniFlash, Maximizing Ceph Random Read IOPS SanDisk InfiniFlash Random Read Latency (ms) Santa Clara, CA 18

Ceph Optimizations for Flash Setup SanDisk InfiniFlash Scalable Informatics Supermicro Mellanox Servers Dell R720 SI Unison Supermicro Supermicro Nodes 2 2 3 2 Flash 1 InfiniFlash 64x8TB = 512TB 24 SATA SSDs per node 2x PCIe SSDs per node Cluster Network 40GbE 100GbE 40GbE 56GbE Total Read Throughput 71.6 Gb/s 70 Gb/s 43 Gb/s 44 Gb/s 12x SAS SSDs per node Santa Clara, CA 19

High Speed Efficient RDMA Networks Ceph Benefits Balanced systems for true scale out Storage and network bandwidth match per system element Optimal networking performance for key scenarios Replicaton, erasure coding, rebuild, scrubbing and cache tiering Scale-out non blocking network Avoid traffic jams I/O at lowest latency Efficient fabric drain on incast scenarios Efficient data movement with RDMA CPU offload Future QoS improves degraded state behavior, converged networks Hyper-converged systems Advanced features offload erasure coding Optimizations, optimizations, optimizations Santa Clara, CA 20

Thank You! gdror at mellanox.com Santa Clara, CA 21