RDMA on vsphere: Update and Future Directions

Similar documents
PARAVIRTUAL RDMA DEVICE

RoCE Update. Liran Liss, Mellanox Technologies March,

Sharing High-Performance Devices Across Multiple Virtual Machines

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers

Performance of RDMA and HPC Applications in Virtual Machines using FDR InfiniBand on VMware vsphere T E C H N I C A L W H I T E P A P E R

Unify Virtual and Physical Networking with Cisco Virtual Interface Card

Application Acceleration Beyond Flash Storage

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Configuration Maximums. Update 1 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

Configuration Maximums

COSC6376 Cloud Computing Lecture 15: IO Virtualization

Storage Protocol Offload for Virtualized Environments Session 301-F

InfiniBand Networked Flash Storage

Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices

HPC Performance in the Cloud: Status and Future Prospects

Configuration Maximums VMware vsphere 5.0

Cloud Networking (VITMMA02) Server Virtualization Data Center Gear

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

vsphere Networking Update 2 VMware vsphere 5.5 VMware ESXi 5.5 vcenter Server 5.5 EN

All Roads Lead to Convergence

VMWARE TUNING BEST PRACTICES FOR SANS, SERVER, AND NETWORKS

Single Root I/O Virtualization (SR-IOV) and iscsi Uncompromised Performance for Virtual Server Environments Leonid Grossman Exar Corporation

Configuring SR-IOV. Table of contents. with HP Virtual Connect and Microsoft Hyper-V. Technical white paper

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

Introduction to the Cisco ASAv

vsphere Networking Update 1 Modified on 04 OCT 2017 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

EXPERIENCES WITH NVME OVER FABRICS

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011

vsphere Networking 17 APR 2018 VMware vsphere 6.7 VMware ESXi 6.7 vcenter Server 6.7

vsphere Networking Update 1 ESXi 5.1 vcenter Server 5.1 vsphere 5.1 EN

W H I T E P A P E R. What s New in VMware vsphere 4: Performance Enhancements

VMware vsphere Administration Training. Course Content

QuickSpecs. HP Z 10GbE Dual Port Module. Models

RoCE vs. iwarp Competitive Analysis

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox CloudX, Mirantis Fuel 5.1/ 5.1.1/6.0 Solution Guide

The NE010 iwarp Adapter

IOV hardware and software plumbing for VMware ESX - journey to uncompromised I/O Virtualization and Convergence.

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Maailman paras palvelinjärjestelmä. Tommi Salli Distinguished Engineer

Agenda. About us Why para-virtualize RDMA Project overview Open issues Future plans

VMware - VMware vsphere: Install, Configure, Manage [V6.7]

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Data Path acceleration techniques in a NFV world

Networking at the Speed of Light

Virtualization. Starting Point: A Physical Machine. What is a Virtual Machine? Virtualization Properties. Types of Virtualization

Mellanox ConnectX-4 NATIVE ESXi Driver for VMware vsphere 5.5/6.0 Release Notes. Rev /

Virtualization. ! Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

Virtual RDMA devices. Parav Pandit Emulex Corporation

Creating an agile infrastructure with Virtualized I/O

A Novel Approach to Gain High Throughput and Low Latency through SR- IOV

Vmware VCP-310. VMware Certified Professional on VI3.

vsphere Networking for the Network Admin Jason Nash, Varrow CTO

Ron Emerick, Oracle Corporation

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

VMware vsphere. Using vsphere VMware Inc. All rights reserved

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

I/O Virtualization The Next Virtualization Frontier

IO virtualization. Michael Kagan Mellanox Technologies

New Features in VMware vsphere (ESX 4)

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

High Performance Migration Framework for MPI Applications on HPC Cloud

Mellanox ConnectX-4 NATIVE ESX Driver for VMware vsphere 5.5/6.0 Release Notes

Link Virtualization based on Xen

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

IT as a Service (Internally or Externally Provisioned) Efficiency. Control. Choice. VMware vsphere 4.0

OFED Storage Protocols

2014 VMware Inc. All rights reserved.

Mark Falco Oracle Coherence Development

N V M e o v e r F a b r i c s -

Emulex Universal Multichannel

VIRTUALIZING SERVER CONNECTIVITY IN THE CLOUD

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

Infiniband and RDMA Technology. Doug Ledford

Virtual Networks: For Storage and Data

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

Cavium FastLinQ 25GbE Intelligent Ethernet Adapters vs. Mellanox Adapters

Introduction to Infiniband

Welcome to the IBTA Fall Webinar Series

Implementing Virtual NVMe for Flash- Storage Class Memory

VI3 to vsphere 4.0 Upgrade and New Technology Ultimate Bootcamp

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Network Design Considerations for VMware Deployments. Koo Juan Huat

VMware vsphere: Install, Configure, Manage (vsphere ICM 6.7)

Rack Disaggregation Using PCIe Networking

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Building an application-centric " storage platform

Voltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO

Transcription:

RDMA on vsphere: Update and Future Directions Bhavesh Davda & Josh Simons Office of the CTO, VMware 3/26/2012 1 2010 VMware Inc. All rights reserved

Agenda Guest-level InfiniBand preliminary results Virtual Machine / Guest OS level RDMA Hypervisor level RDMA 3/26/2012 2

Guest-level InfiniBand with VM DirectPath I/O: Initial Results 3/26/2012 3

Guest-level InfiniBand with VM DirectPath I/O: Initial Results Research Note currently under review: RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 3/26/2012 4

Options to offer RDMA to vsphere Virtual Machines A. Full-function VM DirectPath (passthrough) B. SR-IOV VF VM DirectPath (passthrough) C. SoftRoCE over 10GbE in VM DirectPath mode D. SoftRoCE over paravirtual Ethernet vnic over 10GbE uplink E. SoftRoCE over paravirtual Ethernet vnic between VMs F. Paravirtual RDMA HCA (vrdma) offered to VM 3/26/2012 5

Option A: Full-function VM DirectPath Direct assign physical RDMA HCA (IB/RoCE/iWARP) to VM Physical HCA cannot be shared between VMs or by the ESXi hypervisor VM DirectPath is incompatible with many important vsphere features: Memory Overcommit, Fault Tolerance, Snapshots, Suspend/Resume, vmotion OFED Stack Physical RDMA HCA I/O MMU Guest OS Device Driver (RDMA HCA) Virtualization Layer 3/26/2012 6

Option B: SR-IOV VF VM DirectPath Single-Root IO Virtualization (SR-IOV): PCI-SIG standard Physical (IB/RoCE/iWARP) HCA can be shared between VMs or by the ESXi hypervisor Virtual Functions direct assigned to VMs Physical Function controlled by hypervisor Still VM DirectPath, which is incompatible with many important vsphere features Guest OS OFED Stack RDMA HCA VF Driver RDMA HCA VF Driver OFED I/O MMU PF Device Driver RDMA HCA VF Driver OFED VF VF VF SR-IOV RDMA HCA Virtualization Layer Guest OS Stack RDMA HCA VF Driver PF Guest OS Stack RDMA HCA VF Driver 3/26/2012 7

Option C: SoftRoCE over 10GbE with VM DirectPath RXE driver in Guest OS bonds to 10GbE passthrough NIC (potentially SR-IOV) Can be used to communicate with RDMA applications on other hosts over Ethernet Requires Soft/Hard RoCE in other Ethernet connected hosts as well VM DirectPath == Sacrifice vsphere features Didn t work (needs debugging) Completion with error at client: Failed status 12: wr_id 1 syndrom 0x81 scnt=1, ccnt=0 OFED Stack Device Driver (RXE) 10GbE Physical NIC Device Driver (10GbE) I/O MMU Guest OS Virtualization Layer 3/26/2012 8

Option D: SoftRoCE / Paravirtual vnic / 10GbE uplink RXE driver in Guest OS bonds to paravirtual vmxnet3 vnic vmxnet3 emulation in hypervisor Connected to a vswitch with a10gbe physical NIC uplink Can be used to communicate with RDMA applications on other hosts (Soft/Hard RoCE) over Ethernet Doesn t rely on VMDirectPath All vsphere features usable OFED Stack Device Driver (RXE) Device Driver (vmxnet3 vnic) Device Emulation (vmxnet3) I/O Stack Physical Device Driver (10GbE) 10GbE Physical NIC Guest OS Virtualization Layer 3/26/2012 9

Option E: VM-VM SoftRoCE / Paravirtual vnic RXE driver in Guest OS bonds to paravirtual vmxnet3 vnic Can be used to communicate with RDMA applications in other VMs over Ethernet over vswitch Requires SoftRoCE in other VMs as well Doesn t rely on VMDirectPath All vsphere features usable OFED Stack Device Driver (RXE) Guest OS Device Driver (vmxnet3 vnic) I/O Stack Device Emulation (vmxnet3) Virtualization Layer vswitch 3/26/2012 10

Option F: Paravirtual RDMA HCA (vrdma) offered to VM New paravirtualized device exposed to Virtual Machine Implements Verbs interface Device emulated in ESXi hypervisor Translates Verbs from Guest to Verbs to ESXi OFED Stack Guest physical memory regions mapped to ESXi and passed down to physical RDMA HCA Zero-copy DMA directly from/to guest physical memory Completions/interrupts proxied by emulation Holy Grail of RDMA options for vsphere VMs I/O Stack OFED Stack vrdma HCA Device Driver vrdma Device Emulation ESXi OFED Stack Physical RDMA HCA Device Driver Physical RDMA HCA Guest OS 3/26/2012 11

Performance Comparison 20.00 18.00 RDMA Write Latency (HRT us) [Lower is better] 18.37 16.00 15.01 14.00 12.00 10.00 8.00 6.00 5.00 4.00 2.00 1.20 1.53 0.00 Native IB - Native IB Native IB - Passthru IB (Option A) Native RoCE - Paravirt RXE/vmxnet3/10GbE (Option D) Paravirt RXE/vmxnet3 - Paravirt RXE/vmxnet3 (Option E) Expected paravirt RDMA (vrdma) (Option F) 3/26/2012 12

Summary: VM/Guest Level RDMA Option A: Full function VMDirectPath already used by some customers Option B: SR-IOV VMDirectPath will be included in an upcoming vsphere release Options C/D/E: SoftRoCE can enable RDMAbased guest apps today at the cost of higher latencies Option F: vrdma paravirtual device being prototyped in Office of CTO 3/26/2012 13

RDMA for vsphere hypervisor services OFED Stack Virtual Switch 10 GbE 10/40 GbE (RoCE) 3/26/2012 14

vmotion Using RDMA Transport vmotion: live migration of virtual machines Key enabler for Distributed Resource Scheduler (DRS) and Distributed Power Management (DPM) vmotion data transfer mechanism Currently network (TCP/IP/BSD sockets/mbufs) based Why RDMA? Performance Zero copy send & receive == Reduced CPU utilization Lower latency Eliminate TCP/IP protocol & processing overheads 3/26/2012 15

vmotion (ESXi 5.0) Basics Stages: Pre copy -> Quiesce -> Page in -> Unstun VM memory transfer during Pre copy & Page in Source: Send Pages ; Destination: Receive Pages Multiple vmkernel helper tasks for send and receive SDPS (Stun During Page Send) Stall vcpus if page dirty rate is greater than page transfer rate Facilitates pre copy convergence so few pages outstanding during page-in 3/26/2012 16

Send Pages and Receive Pages Operations over RDMA 1 PPNs 2 Pin PPNs->MPNs 4 Page Contents 3 Map TCP/IP 3 Map 5 Copy 1 PPNs 2 Pin PPNs->MPNs 4 MR Key, IOVA RDMA 3 Map to MR 3 Map to MR 5 RDMA read 3/26/2012 17

Test setup Two HP ProLiant ML 350 G6 machines, 2x Intel Xeon (E5520, E5620), HT enabled, 60 GB RAM Mellanox 40GbE RoCE cards ConnectX-2 VPI PCIe 2.0 x8, 5.0 GT/s SPECjbb2005 50GB workload 56 GB, 4 vcpu Linux VM Run-time config switch for TCP/IP or RDMA transport Single stream helper for RDMA transport vs. two for TCP/IP transport SDPS disabled to reduce variance in results 3/26/2012 18

Total vmotion Time (seconds) 80 70 70.63 60 36 % faster 50 45.31 40 30 20 10 0 TCP/IP RDMA 3/26/2012 19

Precopy bandwidth (Pages/sec) RDMA 14.18 Gbps 432,757.73 30% Higher TCP/IP 10.84 Gbps 330,813.66 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 3/26/2012 20

% Core Utilization used by vmotion Destination CPU Utilization 50 45 40 35 30 25 20 92% Lower 15 10 5 0 0:00 0:05 0:11 0:16 0:21 0:26 0:31 0:36 0:41 0:46 0:51 0:56 1:01 1:06 1:11 1:16 TCP/IP 0 20.11 46.38 46.13 46.58 41.68 46.38 43.69 42.53 42.64 38.69 33.53 33.93 33.49 39.68 35.13 RDMA 0 5.58 5.92 6.24 6.11 5.73 5.71 5.96 3.92 3.83 3/26/2012 21

% Core Utilization used by vmotion Source CPU Utilization 30 25 20 15 84% Lower 10 5 0 0:00 0:05 0:11 0:16 0:21 0:26 0:31 0:36 0:41 0:46 0:51 0:56 1:01 1:06 1:11 TCP/IP 0 13.15 24.05 24.5 23.08 21.36 23.8 21.7 19.65 17.49 14.69 11.83 9.72 10.77 14.41 RDMA 0 1.92 4.29 4.15 3.44 4.5 4.75 5.19 5.39 6.04 3/26/2012 22

Hypervisor level RDMA Requirements Integrate OFED Kernel Space Mid-Layer and Provider components into ESXi hypervisor IP and licensing issues with OFED components need sorting out Add RDMA Verbs support for other hypervisor services: FT, NFS, iscsi, others Create abstraction layer from common patterns to ease porting other TCP/IP/BSD mbuf-based hypervisor services Under discussion with vsphere product teams 3/26/2012 23

Summary Hypervisor-level RDMA Proven benefits for hypervisor services (vmotion, etc.) Currently under discussion internally Passthrough IB delivers credible performance Paravirtual vrdma most attractive option for VM-level RDMA Maintains key virtualization capabilities Delivers better latencies than other approaches Prototyping underway 3/26/2012 24

Acknowledgements VMDirectPath InfiniBand VMware intern from Georgia Tech: Adit Ranadive vmotion over RDMA VMware: Anupam Chanda, Gabe Tarasuk-Levin, Scott Goldman Mellanox: Ali Ayoub RDMA for Virtual Machines Mellanox: Ali Ayoub, Motti Beck, Dror Goldenberg SFW: Paul Grun, Bob Pearson 3/26/2012 25