RoCE vs. iwarp A Great Storage Debate. Live Webcast August 22, :00 am PT

Similar documents
Fibre Channel vs. iscsi. January 31, 2018

Everything You Wanted To Know About Storage: Part Teal The Buffering Pod

Extending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018

Use Cases for iscsi and FCoE: Where Each Makes Sense

RoCE Accelerates Data Center Performance, Cost Efficiency, and Scalability

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

NVMe over Universal RDMA Fabrics

Expert Insights: Expanding the Data Center with FCoE

Universal RDMA: A Unique Approach for Heterogeneous Data-Center Acceleration

InfiniBand Networked Flash Storage

Storage Protocol Offload for Virtualized Environments Session 301-F

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

SNIA Tutorial 3 EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE: Part Teal Queues, Caches and Buffers

Best Practices for Deployments using DCB and RoCE

EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE, BUT WERE TOO PROUD TO ASK Part Cyan Storage Management. September 28, :00 am PT

Architectural Principles for Networked Solid State Storage Access

Mark Rogov, Dell EMC Chris Conniff, Dell EMC. Feb 14, 2018

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Trends in Worldwide Media and Entertainment Storage

Concurrent Support of NVMe over RDMA Fabrics and Established Networked Block and File Storage

iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc

Multi-Cloud Storage: Addressing the Need for Portability and Interoperability

RoCE vs. iwarp Competitive Analysis

NVMe Over Fabrics (NVMe-oF)

Audience This paper is targeted for IT managers and architects. It showcases how to utilize your network efficiently and gain higher performance using

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE, BUT WERE TOO PROUD TO ASK Where Does My Data Go? August 1, :00 am PT

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

iscsi : A loss-less Ethernet fabric with DCB Jason Blosil, NetApp Gary Gumanow, Dell

Multifunction Networking Adapters

An Approach for Implementing NVMeOF based Solutions

Application Acceleration Beyond Flash Storage

Planning For Persistent Memory In The Data Center. Sarah Jelinek/Intel Corporation

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

Architected for Performance. NVMe over Fabrics. September 20 th, Brandon Hoff, Broadcom.

ETHERNET ENHANCEMENTS FOR STORAGE. Sunil Ahluwalia, Intel Corporation

The Evolution of iscsi

NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research

How Are The Networks Coping Up With Flash Storage

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

N V M e o v e r F a b r i c s -

iser as accelerator for Software Defined Storage Rahul Fiske, Subhojit Roy IBM (India)

Comparing Server I/O Consolidation Solutions: iscsi, InfiniBand and FCoE. Gilles Chekroun Errol Roberts

Application Advantages of NVMe over Fabrics RDMA and Fibre Channel

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Centralized vs. Distributed A Great Storage Debate. Live Webcast September 11, :00 am PT

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

Accessing NVM Locally and over RDMA Challenges and Opportunities

Why NVMe/TCP is the better choice for your Data Center

NVMe Direct. Next-Generation Offload Technology. White Paper

ADVANCED DATA REDUCTION CONCEPTS

The NE010 iwarp Adapter

Design and Implementations of FCoE for the DataCenter. Mike Frase, Cisco Systems

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

G2M Research Presentation Flash Memory Summit 2018

High-Performance GPU Clustering: GPUDirect RDMA over 40GbE iwarp

FIBRE CHANNEL OVER ETHERNET

Adrian Proctor Vice President, Marketing Viking Technology

The Best Ethernet Storage Fabric

OpenStack Manila An Overview of Manila Liberty & Mitaka

SAS: Today s Fast and Flexible Storage Fabric

SAS: Today s Fast and Flexible Storage Fabric. Rick Kutcipal President, SCSI Trade Association Product Planning and Architecture, Broadcom Limited

Ron Emerick, Oracle Corporation

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

Designing Next Generation FS for NVMe and NVMe-oF

SNIA Tutorial 1 A CASE FOR FLASH STORAGE HOW TO CHOOSE FLASH STORAGE FOR YOUR APPLICATIONS

Advancing RDMA. A proposal for RDMA on Enhanced Ethernet. Paul Grun SystemFabricWorks

Enterprise Architectures The Pace Accelerates Camberley Bates Managing Partner & Analyst

File vs. Block vs. Object Storage

Creating an agile infrastructure with Virtualized I/O

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

Scaling Data Center Application Infrastructure. Gary Orenstein, Gear6

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

Voltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

VMware Virtual SAN Technology

Annual Update on Flash Memory for Non-Technologists

Evaluation of the Chelsio T580-CR iscsi Offload adapter

What NVMe /TCP Means for Networked Storage. Live Webcast January 22, 2019

SAN Virtuosity Fibre Channel over Ethernet

ETHERNET ENHANCEMENTS FOR STORAGE. Sunil Ahluwalia, Intel Corporation Errol Roberts, Cisco Systems Inc.

By John Kim, Chair SNIA Ethernet Storage Forum. Several technology changes are collectively driving the need for faster networking speeds.

LTFS Bulk Transfer Standard PRESENTATION TITLE GOES HERE

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

Everything You Wanted To Know About Storage (But Were Too Proud To Ask) The Basics

Cavium FastLinQ 25GbE Intelligent Ethernet Adapters vs. Mellanox Adapters

G2M Research Fall 2017 NVMe Market Sizing Webinar

NVM Express over Fabrics Storage Solutions for Real-time Analytics

Lossless 10 Gigabit Ethernet: The Unifying Infrastructure for SAN and LAN Consolidation

Server and Storage Consolidation with iscsi Arrays. David Dale, NetApp Suzanne Morgan, Microsoft

Virtualization of the MS Exchange Server Environment

Introduction to Infiniband

Storage Performance Management Overview. Brett Allison, IntelliMagic, Inc.

Transcription:

RoCE vs. iwarp A Great Storage Debate Live Webcast August 22, 2018 10:00 am PT

Today s Presenters John Kim SNIA ESF Chair Mellanox Tim Lustig Mellanox Fred Zhang Intel 2

SNIA-At-A-Glance 3

SNIA Legal Notice The material contained in this presentation is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 4

Agenda Introductions John Kim Moderator What is RDMA? Technology Introductions RoCE Tim Lustig, Mellanox Technologies iwarp Fred Zhang, Intel Corporation Similarities and Differences Use Cases Challenge Topics Performance, manageability, security, cost, etc. 5

What is RDMA? Remote Direct Memory Access DMA from the memory of one node into the memory of another node without involving either one s operating system Performed by the network adapter itself, no work needs to be done by the CPUs, caches or context switches Benefits: High throughput Low latency Reduced CPU utilization 6

RDMA as a Transport Block storage networking technology and networked file storage SCSI protocol running (usually) on TCP/IP or UDP SMB Direct, NFS v4 Storage Spaces Direct RDMA supported by native InfiniBand*, RoCE and iwarp network protocols Block Device / Native Application TCP / IP UDP SCSI iscsi RDMA iwarp RoCE Standardization (RoCE by IBTA, iwarp by IETF) RFCs 3721, 3722, 4018, 4056, 7143, etc. RoCE first available: 2003 Windows / 2005 Linux / 2006 VMware iwarp first available: 2007 TCP / IP UDP iscsi RoCE iwarp RDMA iscsi usually means SCSI on TCP/IP over Ethernet *Almost always over Ethernet SCSI Target SW 7

RoCE Tim Lustig, Mellanox 8

What is RoCE? RoCE (RDMA over Converged Ethernet) The most popular RDMA implementation over Ethernet Enables highest throughput, lowest latency and lowest CPU overhead for RDMA Designed for enterprise, virtualized, cloud, web 2.0 and storage platforms Increases performance in congested networks Deployed in large data centers Proven, most widely deployed RDMA transport Server efficiency and scaling to 1000s of nodes Scales to 10/25/40/50 and 100G Ethernet support and beyond

RoCE Overview RoCE v1 Needs custom settings on the switch Priority queues to guarantee lossless L2 delivery Takes advantage of PFC (Priority Flow Control) in DCB Ethernet RoCE v2 (lossless) Improved efficiency RDMA transport paradigm depends on a set of characteristics No dropped packets Arbitrary topologies Traffic class types 10

RoCE Support DCB Data Center Bridging DXBX Data Center Bridging Exchange ECN Explicit Congestion Notification PFC Priority Flow Control ETS Enhanced Transmission Specification Ethernet IEEE 802.1x Congestion Notification Yes (802.1az) ECN, DCB Lossless Yes (802.1Qbb) PFC Classes of Service Yes (802.1Qaz) ETS 11

Wide Adoption and Support VMware Microsoft SMB 3.0 (Storage Space Direct) and Azure Oracle IBM Spectrum Scale (formerly known as IBM GPFS) Gluster, Lustre, Apache Spark, Hadoop and Ceph Software-Defined Storage (SDS) and hyperconverged vendors Nearly all NVMe-oF demonstrations, designs, and customer deployments are using RoCE 12

RoCE Benchmarks TCP vs RoCE MSFT SMB 3.0 Ceph 13

RoCE Future-Proofs the Data Center Transform Ethernet networks and remove network, storage and CPU bottlenecks Support for NVMe eliminates throughput and latency bottlenecks of slower SAS and SATA drivers A NVMe SSD can provide sustained bandwidth of about 50 HDDs RoCE extends NVMe to NVMe-oF Access remote storage systems similarly as locally attached storage Solid State NVM is expected to be 1,000 times faster than flash 3D XPoint, Optane 14

Additional Resources RoCE Initiative.org (http://www.roceinitiative.org/resources/) Demartek (https://www.demartek.com/) 15

iwarp Fred Zhang, Intel 16

What is iwarp iwarp is: Internet Wide-Area RDMA Protocol iwarp iwarp is NOT an acronym iwarp can be used in different network environments: LAN, storage network, Data center, or even WAN http://www.rdmaconsortium.org/home/faqs_apr25.htm 17

What is iwarp iwarp extensions to TCP/IP were standardized by the Internet Engineering Task Force (IETF) in 2007. These extensions eliminated three major sources of networking overhead: TCP/IP stack process, memory copies, and application context switches. Offload TCP/IP Zero Copy Offloads the TCP/IP process from the CPU to the RDMA-enabled NIC (RNIC) iwarp enables the application to place the data directly into the destination application s memory buffer, without unnecessary buffer copies Eliminates CPU overhead for network stack processing Significantly relieves CPU load and frees memory bandwidth Less Application Context Switching iwarp can bypass the OS and work in user space to post the command directly to the RNIC without the need for expensive system calls into the OS Can dramatically reduce application context switching and latency 18

iwarp Protocols 19

Top Tier iwarp Applications Application Category User/Kernel OS SMB Direct Client/Server Storage Network file system Kernel Windows Storage Spaces Direct Storage Network block storage Kernel Windows NVMe* over Fabrics Initiator/Target Storage Network block storage Kernel Linux NVMe over Fabrics Initiator/Target for SPDK Storage Network block storage User Linux LIO iser Initiator/Target Storage Network block storage Kernel Linux udapl Messaging middleware User Linux OFI/libfabric provider for VERBs Messaging middleware User Linux Open MPI/Intel MPI Library HPC User Linux NFS/RDMA client/server Storage network file system Kernel Linux rsockets Messaging middleware User Linux 20

iwarp Performance >1M IOPs SMB Direct Storage Performance, 1.67x TCP with Intel Ethernet Connection X722 4x10Gb featured iwarp 4k, 70%Read 30%Write 21

Accelerate Live Migration with iwarp Time to Migrate (in Seconds) TCP/IP 51 53 55 25 25 24 Live Migration - Windows Server 2016 FastLinQ QL41xxx 25GbE SMBDirect - iwarp 92 93 98 44 44 43 155 161 171 69 68 69 2 2 2 4 4 4 6 6 6 Number of Concurrent VM Migrations FastLinQ QL41xxx iwarp Reduces Live Migra9on Time by 58% Highly Predictable Migra9ons Benefits Shorter Maintenance Windows Adap9ve Load Balancing SLAs Less Flight 9me = Less Risk 22

Similarities and Differences 23

RoCE vs. iwarp Network Stack Differences Orange content defined by the IBTA Green content defined by IEEE/IETF RoCE Vendors Software RDMA application/ulp RDMA API (VERBs) RDMA software stack iwarp Vendors Typically hardware IB transport protocol IB network layer IB transport protocol IB network layer IB transport protocol UDP IP iwarp* protocol TCP IP IB link layer Ethernet link layer Ethernet link layer Ethernet link layer RoCE v1 RoCE v2 iwarp InfiniBand management Ethernet/IP management Ethernet/IP management Ethernet/IP management RoCE portion adopted from Supplement to InfiniBand Architecture Specification Volume 1 Release 1.2.1, Annex A17: RoCEv2, September 2, 2014 24

Key Differences RoCEv2 iwarp Underlying Network UDP TCP Congestion Management Adapter Offload Option Rely on DCB Full DMA TCP does flow control/ congestion management Full DMA and TCP/IP* Routability Yes Yes Cost Comparable NIC price; Best practice is DCB on switch; DCB configuration experience Comparable NIC price; Integrated with Intel Platform(4x10Gb); No requirement on switch; * Depending on vendor implementations 25

Key Differences - RoCE Light-weight RDMA transport - RDMA transfers done by the adapter with no involvement by the OS or device drivers Based on ECN/DCB (RoCEv1) standards that provide a lossless network and the ability to optimally allocate bandwidth to each protocol on the network Scalable to thousands of nodes based on Ethernet technologies that are widely used and well understood by network managers Widely deployed by Web 2.0, supported by OS vendors and storage manufacturers rnic demand a slight premium but are becoming commodity NICs 26

Key Differences- iwarp Built on TCP instead of UDP TCP provides flow control and congestion management Can still provide high throughput in congested environment DCB is not necessary Can scale to tens of thousands of nodes Can span multiple hops, or across multiple Data Centers 27

Use Cases 28

Use Cases - RoCE Cloud Computing Efficient, scalable clustering and higher performance virtualized servers in VMWare, Red Hat KVM, Citrix Xen, Microsoft Azure, Amazon EC2, Google App Engine Storage Performance increase of 20 to 100% when using RoCE instead of TCP, and latency is typically reduced from 15 to 50% across Microsoft SMD Direct, Ceph and Lustre Big Data / Data Warehousing Accelerates data sharing/sorting, higher IOPS and linear scaling with exponential growth Ideal for Oracle RAC, IBM DB2 PureScale, and Microsoft SQL Virtualization VMware ESX and Windows Hyper-V now support inbox drivers to reduced migration time Hyper-Converged (HCI) Achieve faster performance for storage replication and live migrations Financial Services: Unleashes scalable CPU performance on low latency applications like Tibco, Wombat/NYSE, IBM WebSphere MQ, Red Hat MRG, and 29West/ Informatica. Web 2.0: RoCE minimizes response time, maximizes jobs per second, and enables highly scalable infrastructure designs. It s ideal for applications like Hadoop, Memcached, Eucalyptus, and Cassandra. 29

Use Cases - iwarp High Performance Computing Low-latency message passing over an Ethernet network Optimized for Open MPI/Intel MPI Storage: Hyper-Converged or Disaggregated Low latency, high throughput Built-in Microsoft SMB Direct, Storage Spaces Direct, Storage Replica Support NVMe over Fabric, Persistent Memory over Fabric Ideally for Hyper-Converged storage due to TCP based flow control and congestion management Big Data Accelerates Hadoop MapReduce, SPARK Shuffling Alluxio acceleration Virtualization Windows Server Hyper-V Windows VM live migration acceleration 30

Summary RoCE iwarp Transport UPD/IP TCP/IP Network Lossless Standard Adapter rnic (soft-roce) NIC Offload Hardware Hardware Switch DCB (resilient RoCE) Standard 31

Our Next Great Storage Debate: Centralized vs. Distributed September 11, 2018 Register: https://www.brighttalk.com/webcast/663/332357 32

More Webcasts Other Great Storage Debates FCoE vs. iscsi vs. iser https://www.brighttalk.com/webcast/663/318003 Fibre Channel vs. iscsi: https://www.brighttalk.com/webcast/663/297837 File vs. Block vs. Object Storage: https://www.brighttalk.com/webcast/663/308609 On-Demand Everything You Wanted To Know About Storage But Were Too Proud To Ask Series https://www.snia.org/forums/esf/knowledge/webcasts-topics 33

After This Webcast Please rate this webcast and provide us with feedback This webcast and a PDF of the slides will be posted to the SNIA Ethernet Storage Forum (ESF) website and available on-demand at www.snia.org/forums/esf/knowledge/webcasts A full Q&A from this webcast, including answers to questions we couldn't get to today, will be posted to the SNIA-ESF blog: sniaesfblog.org Follow us on Twitter @SNIAESF 34

Thank You 35