RDMA Container Support. Liran Liss Mellanox Technologies

Similar documents
Containing RDMA and High Performance Computing

Virtual RDMA devices. Parav Pandit Emulex Corporation

RoCE Update. Liran Liss, Mellanox Technologies March,

OS Virtualization. Linux Containers (LXC)

Introduction to Container Technology. Patrick Ladd Technical Account Manager April 13, 2016

OpenFabrics Interface WG A brief introduction. Paul Grun co chair OFI WG Cray, Inc.

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

Management Scalability. Author: Todd Rimmer Date: April 2014

Voltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO

Network Adapter Flow Steering

Linux Containers Roadmap Red Hat Enterprise Linux 7 RC. Bhavna Sarathy Senior Technology Product Manager, Red Hat

Introduction to Virtualization and Containers Phil Hopkins

Docker A FRAMEWORK FOR DATA INTENSIVE COMPUTING

OS Containers. Michal Sekletár November 06, 2016

Agenda. About us Why para-virtualize RDMA Project overview Open issues Future plans

12th ANNUAL WORKSHOP 2016 RDMA RESET SUPPORT. Liran Liss. Mellanox Technologies. [ April 4th, 2016 ]

Deployment Patterns using Docker and Chef

NTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.

USER SPACE IPOIB PACKET PROCESSING

OPENSHIFT FOR OPERATIONS. Jamie Cloud Guy - US Public Sector at Red Hat

PARAVIRTUAL RDMA DEVICE

CONTAINERS AND MICROSERVICES WITH CONTRAIL

High Performance Containers. Convergence of Hyperscale, Big Data and Big Compute

1 Virtualization Recap

Application Acceleration Beyond Flash Storage

SRP Update. Bart Van Assche,

An introduction to Docker

IBM Bluemix compute capabilities IBM Corporation

Welcome to the IBTA Fall Webinar Series

SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG

OFED Storage Protocols

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

On the Performance Impact of Virtual Link Types to 5G Networking

Ethernet Services over IPoIB

Introduction to containers

深 入解析 Docker 背后的 Linux 内核技术. 孙健波浙江 大学 SEL/VLIS 实验室

Introduction to Infiniband

SAINT LOUIS JAVA USER GROUP MAY 2014

LXC(Linux Container) Lightweight virtual system mechanism Gao feng

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Infrastructure at your Service. Oracle over Docker. Oracle over Docker

Memory Management Strategies for Data Serving with RDMA

Containers and isolation as implemented in the Linux kernel

ETHERNET OVER INFINIBAND

Virtualizaton: One Size Does Not Fit All. Nedeljko Miljevic Product Manager, Automotive Solutions MontaVista Software

The new Docker networking put into action to spin up a SLURM cluster

Storage Protocol Offload for Virtualized Environments Session 301-F

InfiniBand Linux Operating System Software Access Layer

PVS Deployment in the Cloud. Last Updated: June 17, 2016

Introduction to High-Speed InfiniBand Interconnect

14th ANNUAL WORKSHOP 2018 NVMF TARGET OFFLOAD. Liran Liss. Mellanox Technologies. April 2018

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

RDMA in Embedded Fabrics

Wolfram Richter Red Hat. OpenShift Container Netzwerk aus Sicht der Workload

Infrastructure Security 2.0

Container mechanics in Linux and rkt FOSDEM 2016

Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016

13th ANNUAL WORKSHOP 2017 VERBS KERNEL ABI. Liran Liss, Matan Barak. Mellanox Technologies LTD. [March 27th, 2017 ]

Performance Considerations of Network Functions Virtualization using Containers

VNS3 3.5 Container System Add-Ons

Generic RDMA Enablement in Linux

ISLET: Jon Schipp, AIDE jonschipp.com. An Attempt to Improve Linux-based Software Training

Novell Infiniband and XEN

EXPERIENCES WITH NVME OVER FABRICS

CREATING A COMMON SOFTWARE VERBS IMPLEMENTATION

Kubernetes The Path to Cloud Native

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

Writing RDMA applications on Linux

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Cloud & container monitoring , Lars Michelsen Check_MK Conference #4

Allowing Users to Run Services at the OLCF with Kubernetes

Kubernetes networking in the telco space

The Exascale Architecture

Mellanox InfiniBand Training IB Professional, Expert and Engineer Certifications

Windows Azure Services - At Different Levels

InfiniBand OFED Driver for. VMware Virtual Infrastructure (VI) 3.5. Installation Guide

Advanced Computer Networks. End Host Optimization

State of Containers. Convergence of Big Data, AI and HPC

Dockercon 2017 Networking Workshop

BUILDING A BLOCK STORAGE APPLICATION ON OFED - CHALLENGES

Infiniband and RDMA Technology. Doug Ledford

Think Small to Scale Big

Dockerized Tizen Platform

ehca Virtualization on System p

Networking in Virtual Infrastructure and Future Internet. NCHC Jen-Wei Hu

InfiniBand OFED Driver for. VMware Infrastructure 3. Installation Guide

Creating an agile infrastructure with Virtualized I/O

Containers Do Not Need Network Stacks

Network Function Virtualization over Open DC/OS Yung-Han Chen

Container System Overview

For personnal use only

Proceedings of NetDev 1.1: The Technical Conference on Linux Networking (February 10th-12th Seville, Spain) VRF Tutorial

WINDOWS RDMA (ALMOST) EVERYWHERE

Amir Zipory Senior Solutions Architect, Redhat Israel, Greece & Cyprus

Hardened Security in the Cloud Bob Doud, Sr. Director Marketing March, 2018

ViryaOS RFC: Secure Containers for Embedded and IoT. A proposal for a new Xen Project sub-project

CLOUD-NATIVE APPLICATION DEVELOPMENT/ARCHITECTURE

VM Migration, Containers (Lecture 12, cs262a)

12th ANNUAL WORKSHOP Experiences in Writing OFED Software for a New InfiniBand HCA. Knut Omang ORACLE. [ April 6th, 2016 ]

Secure Kubernetes Container Workloads

Transcription:

RDMA Container Support Liran Liss Mellanox Technologies

Agenda Containers 101 RDMA isolation Namespace support Controller support Putting it all together Status Conclusions March 15 18, 2015 #OFADevWorkshop 2

Containers 101 A server-virtualization technology for running multiple isolated user-space instances Each instance Has the look and feel of running over a dedicated server Cannot impact the activity of other instances Containers and Virtual Machines (VMs) provide virtualization at different levels Server virtualization VM contains complete OS + App Virtual Machines VM1 VM2 App App OS OS Hypervisor Containers App App OS System call virtualization Container has only App + libraries March 15 18, 2015 #OFADevWorkshop 3

Example: Docker Open platform to build, ship, and run distributed apps Based on Linux container technology Main promise Easily package an application and its dependencies Regardless of the language, tool chain, and distribution Layered images Large application repository Basis for further specialization Deploy on any Server Regardless of OS distribution Regardless of underlying architecture Lightweight runtime Rapid scale-up/down of services Docker Image App Math libs gcc Lustre OpenMPI OFED-2.4 RH6.5 App layers Base layers March 15 18, 2015 #OFADevWorkshop 4

Linux Containers = Namespaces + cgroups Namespaces Provide the illusion of running in isolation Implemented for multiple OS sub-systems cgroups Restrict resource utilization Controllers for multiple resource types Name space pid net ipc mnt uts uid Namespace examples Description Process IDs Network interfaces, routing tables, and netfilter Semaphores, shared memory, and message queues Root and file-system mounts Host name User IDs cgroup examples Controller Description blkio Access to block devices cpu CPU time cpuset CPU cores devices Device access memory Memory usage net_cls Packet classification net_prio Packet priority March 15 18, 2015 #OFADevWorkshop 5

Container IP Networking Common models Host (e.g., Mezos) Physical interface / VLAN / macvlan Container has global IP Bridge Container has global IP Pod (e.g., GCE) Multi-container scheduling unit Global IP per POD NAT (e.g., Docker) Tunneling Building blocks Network namespaces Interfaces, IP tables, netfilter Virtual networking bridge, ovs, NAT macvlan, vlan, veth veth 10.4.0.2/24 veth Bridge0 10.4.0.1/24 eth0 veth 10.4.0.3/24 veth March 15 18, 2015 #OFADevWorkshop 6

RDMA Isolation Design Goals Simplicity and efficiency Containers share the same RDMA device instance Leverage existing isolation infrastructure Network namespaces and cgroups Focus on application APIs Verbs / RDMACM Exclude management and low-level APIs (e.g., umad, ucm) Deny access using device controller Exclude kernel ULPs (e.g., iser, SRP) Not directly exposed to applications Controlled by other means (blk_io) Subject for future work March 15 18, 2015 #OFADevWorkshop 7

Namespace Observations Isolating Verbs resources is not necessary Only QPNs and RKeys are visible on the wire Both don t have well-known names Applications don t choose them rdmacm maps nicely to network namespaces IP addresses stem from network interfaces Protocols and port numbers map to ServiceID port-spaces RoCE requires a namespace for L3 L2 address resolution Namespace determined by interface Physical port interfaces of PFs/VFs P_Key child devices Additional child devices on same P_Key VLAN child devices macvlan child-devices Conclusions QP and AH API calls should be processed within a namespace context Associate RDMA IDs with namespaces Maintain isolated ServiceID port-space per network namespace March 15 18, 2015 #OFADevWorkshop 8

Resource Namespace Association QP and AH namespaces Used for RoCE L3 L2 address resolution Determined by the process namespace during API calls Default to Host namespace for kernel threads RDMA IDs namespaces Used for Binding to ServiceIDs and solicited MAD steering (see below) Determined by the process namespace upon creation Matched asynchronously with incoming requests Default to Host namespace for kernel threads March 15 18, 2015 #OFADevWorkshop 9

MAD ServiceID Resolution ib_cma Lookup NS ib_cm Get cm_id NS Get netdev NS Match ServiceID Host NS Lookup NS NO YES Is IP CM ServiceID? Match cm_id by <loc_comm_id, rem_comm_id> YES Is solicited? Match netdev by <device, port, VLAN/P_Key, IP> NO CM MADs ib_core

RDMA cgroup Controller Governs application resource utilization per RDMA device For a process or a group of processes Possible controlled resources Opened HCA contexts CQs, PDs, QPs, SRQs, MRs Service Levels (SLs) and User Priorities (UPs) Can t mark individual packets in SW March 15 18, 2015 #OFADevWorkshop 11

Putting it All Together Net NS: 1 cpu: 10% QPs: 10 CQs: 10 App A listen rdma_id: TCP port-space 2000 Net NS: 2 cpu: 20% QPs: 50 CQs: 50 App B listen rdma_id: TCP port-space 2000 Net NS: 3 cpu: 30% QPs: 100 CQs: 100 App C ib_0 0x8001 10.2.0.1 ib_1 0x8001 10.2.0.2 ib_2 0x8002 10.3.0.1 Linux eth0.100 10.4.0.1 eth0.101 10.5.0.1 IB HCA IB core RoCE HCA eth0 11.1.0.1 March 15 18, 2015 #OFADevWorkshop 12

Status ServiceID namespace support for IB completed May be used with any IPoIB interface or child interface Patches sent upstream Coming up RoCE support RDMA cgroup controllers March 15 18, 2015 #OFADevWorkshop 13

Conclusions Container technology is gaining considerable traction The intrinsic efficiency of containers make them an attractive virtualization and deployment solution for high-performance applications E.g., HPC clouds RDMA container support provides such applications access to high-performance networking in a secure and isolated manner March 15 18, 2015 #OFADevWorkshop 14

Thank You #OFADevWorkshop