SER1740BU RDMA: The World Of Possibilities Sudhanshu (Suds) Jain # SER1740BU #VMworld2017
Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined. 2
Agenda 1 Infrastructure Challenges & Trends 2 Introducing RDMA 3 vsphere 6.5 with RDMA 4 RDMA: The world of possibilities 3
Deliver risk and opportunity for future Rise of the Third Platform Transforming the World of Applications Need to deliver business value in real-time Artificial Intelligence Predictive Analytics Real-time analytics Business Intelligence Transactional Processing Machine Learning Highly parallel processing on very large set of data Low latency for mission critical transactions Hyper-scale Datacenter $5.3 trillion by 2020: IDC 4
Key Infrastructure Challenges EXPLOSIVE GROWTH OF DATA Build Capacity VMworld 2017 NEAR REALTIME Need efficient access of capacity HIGHLY DISTRIBUTED Need efficient access of capacity everywhere Content: Not for publication Towards Data-centric Future! 5
Explosive Growth of Data Stressing your storage and network benchmarks Cloudera Zookeeper Source: Intel IDF 2016 6
Business Value of Data Diminish Fast Time = $$$ What we do when data is changing Act fast to gain business value Source: http://jtonedm.com/2012/11/21/decision-latecy-revisited/ 7
Speed of Analysis Depend on Storage and Database Access Latency Results in cost savings and strategic advantage 8
Highly Distributed Build, deploy and manage interconnected, collaborative workloads & infrastructure Big Data Cloud-Native App Multi-tier App Distributed Storage VMworld 2017 Content: Not for Virtual SAN Deploy multiple workloads with strong demand for inter-vm traffic publication Optimize data delivery to applications Adaptive storage tier critical for improved application performance 9
To Summarize Applications Needs are Transforming! VMworld 2017 All-flash, software defined Applications working on scale of data in the order of tera/peta/exa-bytes instead of mega/giga-bytes Dictating specific ultra-low latency and sync requirements for scale-out applications high network bandwidth, high packet rate with latency in low-μs Content: Not for publication Availability built into the application instead of infrastructure HPC infrastructure design principles are no longer the exclusivity! 10
Key Technology Trends. Evolving space of compute, storage and interconnect Technology Trends CPU densities continues to increase Hi-Density Flash and NVDIMM will dominate Enterprise Storage High speed interconnect to keep up with fast storage VMworld 2017 Content: Not for publication Higher CPU density, faster data access and high-speed interconnects, all have already proven their value in high-performance computing domain Some of the key concept of HPC are already changing how enterprise storage solutions are being designed and deployed; disrupting the complete ecosystem of SAN as well as DAS market Source: http://www.theregister.co.uk/2016/09/05/wikibon_server_san_takeover/ 55
Agenda 1 Infrastructure Challenges & Trends 2 Introducing RDMA 3 vsphere 6.5 with RDMA 4 RDMA: The world of possibilities 12
IO Acceleration over RDMA Proven in HPC; now Enterprise Wants It! TCP/IP Remote Direct Memory Access (RDMA) 13
Benefits of RDMA? Software Defined Acceleration RDMA App 1 Buffers RDMA Stack Device Driver InfiniBand RDMA NIC Direct Copy Kernel iwarp (Internet Wide Area RDMA Protocol) RDMA NIC RDMA App 2 Buffers RDMA Stack Device Driver RoCE (RDMA over Converged Ethernet) RDMA Enables Low latency order of 1μs OS Bypass to reduce software stack latency CPU Offload to give move CPU to applications Zero-copy avoid unnecessary data I/O InfiniBand Switch Ethernet Switch 14
Introducing RDMA over Converged Ethernet (RoCE) A New Fabric To Run Your Applications Source: http://www.roceinitiative.org/ 15
Introducing RDMA over Converged Ethernet (RoCE) Brings true convergence over single fabric RDMA over Converged Ethernet (RoCE): Provides the benefits of RDMA for existing Ethernet data center infrastructure Classical NIC Why RoCE? RDMA NIC Deployment: Most widely deployed RDMA solution over Ethernet Link-Speeds: Available for all Ethernet speeds, including 25/50/100G OS Ubiquity: Drivers available in Red Hat, SUSE, Microsoft Windows and other common operating systems Low Latency: Lowest latency for Ethernet in the industry Tremendous Ecosystem Support: IBTA standard, major NIC vendors, all major OEMs, all major OSs Build single Ethernet fabric to deliver all your datacenter needs! 16
Agenda 1 Infrastructure Challenges & Trends 2 Introducing RDMA 3 vsphere 6.5 with RDMA 4 RDMA: The world of possibilities 17
RoCE v1/v2 Management in vsphere RDMA Stack Integrated with vsphere Networking VMworld 2017 vmrdma# and vmnic# are always coupled vmknic# properties {IP, VLAN, MAC, MTU} used for RoCE configuration Single IP can be used for both RDMA and regular Ethernet esxcli rdma device list will show the list of vmrdma devices present on the system. Content: Not for publication RoCEv1 and RoCEv2 protocol supported One vmrdma device can support both RoCE protocol concurrently 18
Introducing Para-Virtualized RDMA High-performance RDMA with Live-Migration! VM1 App RDMA stack PVRDMA driver ESX PVRDMA backend RDMA stack HCA driver Ethernet (RoCE) VM2 App RDMA stack PVRDMA driver Virtual Machine Expose dual function virtual PCIe device A network interface vmxnet3 An RDMA provider -pvrdma RDMA provider plugs in to the OpenFabrics Enterprise Distribution (OFED) stack In kernel and user-space Full support for the Verbs RDMA API Live vmotion ESXi Leverage native RDMA stack Physical HCA services all VMs Deliver best of both worlds ultra-low latency to applications with live-migration! 19
Virtualizing RDMA Devices RDMA App User Verbs API RDMA Stack vrdma Device Driver vrdma NIC Guest Kernel Para-virtual PCIe device exposes Verbs capabilities Encapsulate Verb request Issue request to hardware To Emulation Layer To VMkernel 20
To App Buffers vrdma in ESXi vrdma NIC vrdma Device Emulation Layer VMkernel RDMA Stack RDMA Device Driver Virtual Machine VMkernel OS-Bypass Map guest pages in VMkernel Bypass the guest OS Zero Copy Store guest memory translations in RDMA NIC RDMA NIC 21
Transparently Support vmotion Suspending VM for vmotion VM 1 vrdma Device Emulation Layer 2 vmotion 3 1 Retrieve RDMA Connection State for VM2 Disconnect Tell Control VM2 Channel Quiesce Retrieve RDMA Connection State for VM1 VM 2 vrdma Device Emulation Layer 2 23
Transparently Support vmotion Resuming VM after vmotion Restore RDMA Connection 1 State vmotioned VM 1 vrdma Device Emulation Layer 2 Reconnect Control Channel Restore RDMA Connection 3 State VM 2 vrdma Device Emulation Layer 4 Resume RDMA Connections 24
Agenda 1 What is RDMA? 2 Why RDMA? 3 vsphere 6.5 with RDMA 4 RDMA: The world of possibilities 25
Versatile Use of RDMA World of Possibilities Applications over Virtualized Infrastructure Business critical applications over virtualized RDMA: SAP, Oracle-RAC, DB2 High-performance computing over virtualized infrastructure Big-Data over RDMA Accelerating Infrastructure Scale-out Storage VSAN, third-party scale-out storage solutions Storage services iscsi over RDMA, NFS Over RDMA, NVMe over Fabric PMEM over RDMA Offload vsphere Traffic vmotion over RDMA Fault Tolerant and Highavailability over RDMA IO-Filter over RDMA High-performing cable 26
Business Critical Application Acceleration Scale Scale-out Efficient Interconnect Key Challenges: Larger active data sets More users running complex queries More operations rate in near real-time Approach: Unlock the reach of data Scale to more compute nodes Applications Efficient access of active data sets pvrdma and RDMA based Storage is foundational to build scale-out business critical applications! 27
HPC over Virtualized Environment Virtualization and Cloud are Transforming HPC Infrastructure Throughput Workloads MPI Workloads Highly Parallel Workloads Scientific or technical workloads Often floating-point intensive Often storage intensive Often parallel Run on server-class systems Virtualization Layer Applications pvrdma enables true consolidation of HPC workloads 28
PVRDMA: Performance Benchmark Configuration Servers HP ProLiant DL360 Gen9 Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz 256GB RAM RDMA adapters Mellanox Technologies MT27520 Family [ConnectX-3 Pro] 40GE Mellanox Technologies MT27700 Family [ConnectX-4] 100GE Hypervisor ESXi-6.5 Guest CentOS 7.2, Kernel 3.10.0-327.22.2.el7.x86_64 4 GB RAM 2 vcpus Switch VMworld 2017 Content: Not for Mellanox SN2700 Spectrum 32-port Non-blocking 100GbE OpenMPI version 2.0.1a1 Test VM1 ESXi Test VM1 ESXi Test VM1 ESXi Switch Test VM1 ESXi Applications publication Mgt. VM1 ESXi 29
PVRDMA 40GE Latency Latency (USec) 160 140 120 100 80 60 40 20 PVRDMA vs. TCP Latency TCP: 1 Thread MTU of 9K RDMA: 1 Thread RC QP MTU of 4K Applications 0 2 4 8 16 32 64 128 256 512 1024 2048 4096 8196 16384 32768 65536 Message Size (Bytes) RDMA WRITE RDMA READ TCP_RR 30
PVRDMA 40GE Bandwidth Bandwidth (GBit/Sec) 40 35 30 25 20 15 PVRDMA vs. TCP Bandwidth TCP: 1 Thread MTU of 9K RDMA: 1 Thread RC QP MTU of 4K Applications 10 4096 8196 16384 32768 65536 Message Size (Bytes) RDMA WRITE RDMA READ TCP_STREAM 31
Demo: HPC workloads over vsphere using pvrdma Build ultra-low latency applications with confidence! Applications 32
Versatile Use of RDMA World of Possibilities Applications over Virtualized Infrastructure Business critical applications over virtualized RDMA: SAP, Oracle-RAC, DB2 High-performance computing over virtualized infrastructure Big-Data over RDMA Accelerating Infrastructure Scale-out Storage VSAN, third-party scale-out storage solutions Storage services iscsi over RDMA, NFS Over RDMA, NVMe over Fabric PMEM over RDMA Offload vsphere Traffic vmotion over RDMA Fault Tolerant and Highavailability over RDMA IO-Filter over RDMA High-performing cable 33
Convergence Using iser (iscsi Extensions for RDMA) iser is iscsi with RDMA Data path Other RDMA apps Verbs Block Storage Access iser SCSI iscsi Datamover Interface iscsi TCP TCP/IP Tech Preview Hybrid and all-flash market is growing fast; expected to grow 30% YoY for next 5 years Need ultra-fast storage solution for SDDC. Fiber-channel is not an option Accelerates iscsi data movement using general purpose RDMA, lower TCO Enterprise ready today with iscsi management Highly scalable with dedicated RDMA QueuePair (QP) per target connection Infrastructure R-NIC NIC Deliver ultra-low latency, high IOPS with lower CPU utilizations! 34
iser: Current State of Affairs Ready for prime time! Low Latency, Low CPU Utilization (Eliminates copies to/from TCP/IP buffers) No Changes to iscsi administration (vsphere, Widows, OpenStack work as is) Vendor and Technology Independent (works on iwarp, RoCE & Infiniband HCAs) Works on Standard Ethernet equipment (10G and 25/50/100G switches) Suitable for All FLASH over High Speed Eth (10, 25, 40, 50, 100 Gbps and beyond) No disruption to administration model Fits well into Software Defined Storage (SDS) paradigm Cost Savings Tech Preview Infrastructure Enterprise applications just work! (vvols, Clustering, Multipath etc.) Ideal for shared storage (both for FLASH and HDD) iser is ready for Shared All FLASH SAN storage today! 35
iser vs Fibre channel Leverage iscsi services with FC like reliability Feature/Protocol iser (40Gb) Fibre Channel (16Gb) Read Latency (4K) 50 us 80 us Write Latency (4K) 139 us 195 us Bandwidths 10/25/40/50/100 Gb 8/16/32 Gb CPU Utilization Low Low Security Authentication, Confidentiality, Integrity Integrity Ownership cost Low Medium - High Market Growing rapidly and evolving Mature and stable Infrastructure Tech Preview Workloads Cloud, Analytics, Enterprise Enterprise iser: Fiber Channel benefits minus the additional costs 36
iscsi Extension for RDMA (iser) Turbo Charge your SAN on Converged Fabric Data Source: FMS, 2016 Tech Preview Infrastructure Data Source: FMS, 2016
Demo: vsphere with iser Target Ethernet economics with best of breed performance! 38
NVMe over Fabric Revolutionizing the SAN Architecture Scalability Bandwidth NVMf NVMe IOPS Latency Unlocking the Reach of Data @speed! POC The value of data is based on how fast it can be accessed and processed Faster storage access enables cost reduction through consolidation Independently scale storage and compute infrastructures Data resilience Mitigate the server stranded storage (with DAS storage) problem Optimized data delivery to applications Efficiency of shared storage with no compromise Infrastructure 39
NVMe over Fabrics Use-cases & Deployment Memory Data & Commands/Responses use Shared Memory Example PCI Express NVMe Fabric Transports Message Data & Commands/ Responses use Capsules Examples Fibre Channel Message and Memory Commands/Responses use Capsules Data uses fabric specific data transfer mechanism Examples RDMA (InfiniBand, RoCE, iwarp) VMworld 2017 POC Multiple options: RDMA vs Fiber-channel vs TCP/IP Needed for both front-end fabric as well as back-end fabric Leverage end-to-end efficient NVMe protocol Implementation varied from simple DAS like allocation to enterprise-class service delivery Infrastructure Content: Not for publication Ethernet based RDMA is most optimal and cost-efficient approach to deliver NVMeoF @Scale! 40
Smart Ethernet Fabric Build the Data-centric Infrastructure Pools of Compute Ethernet rnic 10G/25G/50G/100G Ubiquity: Ethernet is everywhere High performance: 25/50/100G Ethernet and low latency Ethernet Fabric provides multiple benefits High Scalability: Non-block topology and multiple Ethernet connections Deterministic: QoS & advanced congestion control Ethernet Fabric VMworld 2017 Content: Not for Cost/Performance Optimized: Single fabric for all your scale-out needs POC Ethernet rnic 10G/25G/50G/100G Storage/GPUs/ FPGAs/Accelerator ESXi CPU ESXi CPU ESXi CPU Infrastructure publication Data Source: FMS, 2016 41
NVM Express NVMe-MI NVMe over Fabrics NVM Express, Inc. Roadmap 2014 2015 2016 2017 Future direction Q1 Q2 Q3 Q4 NVMe 1.2 Nov 14 Namespace Management Controller Memory Buffer Host Memory Buffer Live Firmware Update Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 NVMe 1.2.1 May 16 NVMe-oF 1.0 May 16 Transport and protocol RDMA binding NVMe 1.3 May 17 Sanitize Streams Virtualization NVMe-MI 1.0 Nov 15 NVMe-MI 1.1* Out-of-band management SES Device discovery NVMe-MI In-band Health & temp monitoring Native Enclosure Mgmt Firmware Update NVMe (next)* IO Determinism Persistent Cntlr Mem Buffer Multipathing NVMe-oF (next)* Enhanced Discovery Authentication TCP Transport Released NVMe Planned NVMe Specification releases * Subject to change
Next-Generation Hardware Evolution NVMe as Caching Tier & Performance Tier SSD NVMe Caching Tier Persistence Tier NVDIMM NVMe NVMe NVDIMM Today s Hardware Evolution of Storage Tier 43
NVMe Over Fabric: Demo Next Generation SAN is Here! POC 44
Versatile Use of RDMA World of Possibilities Applications over Virtualized Infrastructure Business critical applications over virtualized RDMA: SAP, Oracle-RAC, DB2 High-performance computing over virtualized infrastructure Big-Data over RDMA Accelerating Infrastructure Scale-out Storage VSAN, third-party scale-out storage solutions Storage services iscsi over RDMA, NFS Over RDMA, NVMe over Fabric PMEM over RDMA Offload vsphere Traffic vmotion over RDMA Fault Tolerant and Highavailability over RDMA IO-Filter over RDMA High-performing cable 45
Demo: vmotion with RDMA Faster host evacuations with minimal impact on performance! POC vsphere Offload 46
Versatile Use of RDMA World of Possibilities Applications over Virtualized Infrastructure Business critical applications over virtualized RDMA: SAP, Oracle-RAC, DB2 High-performance computing over virtualized infrastructure Big-Data over RDMA Accelerating Infrastructure Scale-out Storage VSAN, third-party scale-out storage solutions Storage services iscsi over RDMA, NFS Over RDMA, NVMe over Fabric PMEM over RDMA Offload vsphere Traffic vmotion over RDMA Fault Tolerant and Highavailability over RDMA IO-Filter over RDMA High-performing cable 48
Key Takeaways Enterprises IT infrastructure challenges need fresh approach RDMA as a technology has lot of promise vsphere is tirelessly innovating to address enterprise needs Leverage vsphere with RDMA to address your IT infrastructure needs and lower the TCO! 49
Questions?