ehca Virtualization on System p

Similar documents
Introduction to High-Speed InfiniBand Interconnect

Containing RDMA and High Performance Computing

InfiniBand Linux Operating System Software Access Layer

IO virtualization. Michael Kagan Mellanox Technologies

2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide

IBM ^ iseries Logical Partition Isolation and Integrity

The Exascale Architecture

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

The Power of PowerVM Power Systems Virtualization. Eyal Rubinstein

Routing Verification Tools

Advanced Computer Networks. End Host Optimization

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

BMC Capacity Optimization Extended Edition AIX Enhancements

InfiniBand and Mellanox UFM Fundamentals

C H A P T E R InfiniBand Commands Cisco SFS 7000 Series Product Family Command Reference Guide OL

InfiniBand OFED Driver for. VMware Virtual Infrastructure (VI) 3.5. Installation Guide

Low latency, high bandwidth communication. Infiniband and RDMA programming. Bandwidth vs latency. Knut Omang Ifi/Oracle 2 Nov, 2015

InfiniBand OFED Driver for. VMware Infrastructure 3. Installation Guide

Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand

Ron Emerick, Oracle Corporation

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

RDMA in Embedded Fabrics

IBM System p5 550 and 550Q Express servers

PowerVM simplification enhancements. PowerVM simplification enhancements. PowerVM infrastructure overview

OFED Storage Protocols

Generic RDMA Enablement in Linux

IBM System z10 Enterprise Class: Helping to meet global 24x7 demands for information services with improvements for Internet access and coupling

IBM Flex System IB port QDR InfiniBand Adapter. User s Guide

Voltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO

IBM Power 740 Express server

Mellanox IB-Verbs API (VAPI)

IBM System p5 510 and 510Q Express Servers

Welcome to the IBTA Fall Webinar Series

Configuring Mellanox Hardware for VPI Operation Application Note

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

KVM on POWER Status update & IO Architecture

iseries Tech Talk Linux on iseries Technical Update 2004

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

IBM System p5 570 POWER5+ processor and memory features offer new options

InfiniBand * Access Layer Programming Interface

Aspects of the InfiniBand Architecture 10/11/2001

Performance monitoring in InfiniBand networks

KVM for IBM z Systems Limits and Configuration Recommendations

QuickSpecs. HP InfiniBand Options for HP BladeSystems c-class. Overview

MANAGING NODE CONFIGURATION WITH 1000S OF NODES

Memory Management Strategies for Data Serving with RDMA

February 5, 2008 Virtualization Trends On IBM s System P Unraveling The Benefits In IBM s PowerVM

Flex System IB port FDR InfiniBand Adapter Lenovo Press Product Guide

Multicomputer distributed system LECTURE 8

IP over IB Protocol. Introduction CHAPTER

Fabric Interfaces Architecture. Sean Hefty - Intel Corporation

Active-Active LNET Bonding Using Multiple LNETs and Infiniband partitions

Open Fabrics Workshop 2013

IBM p5 Virtualization Technical Support AIX 5L V5.3. Download Full Version :

Application and Partition Mobility

IBM System p 570 server

Dynamic Routing: Exploiting HiperSockets and Real Network Devices

Building a Fast, Virtualized Data Plane with Programmable Hardware. Bilal Anwer Nick Feamster

Future Routing Schemes in Petascale clusters

Lecture 3. The Network Layer (cont d) Network Layer 1-1

New Storage Architectures

IBM. Communications Server support for RoCE Express2 features. z/os Communications Server. Version 2 Release 1

Mellanox OFED for FreeBSD for ConnectX-4/ConnectX-4 Lx/ ConnectX-5 Release Note. Rev 3.5.0

Advancing RDMA. A proposal for RDMA on Enhanced Ethernet. Paul Grun SystemFabricWorks

IBM Power AC922 Server

Creating an agile infrastructure with Virtualized I/O

Unified Runtime for PGAS and MPI over OFED

RDMA Container Support. Liran Liss Mellanox Technologies

NTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.

IBM Power 755 server. High performance compute node for scalable clusters using InfiniBand architecture interconnect products.

Fibre Channel Gateway Overview

Planning and Sizing Virtualization

This presentation cover Gen Z Access Control.

... Application Note AN-531. PCI Express System Interconnect Software Architecture. Notes Introduction. System Architecture.

PPPoE Client DDR Idle-Timer

IBM Power Systems Update. David Spurway IBM Power Systems Product Manager STG, UK and Ireland

Solutions for iseries

IBM PowerVM for Growing Businesses: Managing and Monitoring a Virtual Environment IBM Redbooks Solution Guide

PCI Express System Interconnect Software Architecture for PowerQUICC TM III-based Systems

IBA Software Architecture IP over IB Driver High Level Design. Draft 2

Deployments and Network Topologies

InfiniBand SDR, DDR, and QDR Technology Guide

Module 2 Storage Network Architecture

Chassis Tasks. Viewing Cards on a Chassis. Viewing Card Summary Information CHAPTER

p5 520 server Robust entry system designed for the on demand world Highlights

HP BLc QLogic 4X QDR InfiniBand Switch Release Notes. Firmware Version

Virtual Security Zones on z/vm

Understanding VLANs when Sharing OSA Ports on System z

Design challenges of Highperformance. MPI over InfiniBand. Presented by Karthik

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

IBM PowerKVM available with the Linux only scale-out servers IBM Redbooks Solution Guide

Lecture 7. Xen and the Art of Virtualization. Paul Braham, Boris Dragovic, Keir Fraser et al. 16 November, Advanced Operating Systems

Virtual Security Zones

designed. engineered. results. Parallel DMF

IBM POWER8 100 GigE Adapter Best Practices

Cisco HyperFlex Systems

CSC 5930/9010 Cloud S & P: Virtualization

IBM Tivoli System Automation for z/os

C IBM. Virtualization Technical Support for AIX and Linux V2

Transcription:

ehca Virtualization on System p Christoph Raisch Technical Lead ehca Infiniband and HEA device drivers 2008-04-04

Trademark Statements IBM, the IBM logo, ibm.com, System p, System p5 and POWER Hypervisor are registered trademarks of International Business Machines Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product or service names may be trademarks or service marks of others. (C) Copyright IBM Corp. 2008 All Rights Reserved. 2

Overview I/O Virtualization on System p Memory registration interface Send virtualization Receive virtualization QP 0/1 flow Additional ehca capabilities 3

I/O Virtualization on System p VIOS storage router network router Hypervisor Isolation and Abstraction physical CPUs PCI cards LPARs partition memory virtual CPUs virtual HCA physical memory partitioned physical HCAs architectured H_CALL Interface Paravirtualization Using H_CALLs (Hypervisor calls) instead of manipulating HW resources directly Partitioning CPU Granularity: 1/10 of a CPU Memory Granularity: 16MB blocks I/O Hardware with partitioning support HCA, 16* InfiniBand HEA (IVE), 32* Ethernet I/O infrastructure for non-shareable PCI cards PCI slots are assigned to single partitions PCI based ethernet and storage can be shared through VIOS 4

Registration interface for CQs/EQs/MRs/QPs 1. H_ALLOCATE_RESOURCE_MR (size, attributes, ) 2. H_REGISTER_RPAGE(mr_pages) 3. H_REGISTER_RPAGE(last_mr_page) Allocate a MR + LR_KEY for partition, allocate memory for MR page pointers, set partition number in MR Translate partition address to physical address add physical address to MR page tables Enable MR This allows to work with a single level of address translation Hypervisor has more information how pages are used 5

ehca send virtualization LPARs QP pages Work request Virt addr L_KEY HCA QP LPAR ID QP number Source LID P_KEY PD HCA routing layer MR User data LPAR ID L_KEY translation table PD Hypervisor provided LPAR provided LPAR triggers HCA HCA enforces hypervisor provided parameters QP number P_KEY Source LID QP type HCA uses L_KEY for MR, cross checks LPAR ID, PD HCA creates IB frames HCA routing layer identifies LIDs assigned to LPARs, IB frames either routed to external port or to receive processing 6

ehca receive virtualization LPARs QP pages receive req. Virt addr L_KEY HCA QP10 QP11 QP12 LPAR ID DLID LPAR ID DLID User data LPAR ID DLID Hypervisor provided HCA routing layer identifies LIDs assigned to LPARs, IB frames either routed to receive processing or dropped HCA uses QP as primary selector HCA verifies destination LID HCA uses L_KEY for MR, cross checks LPAR ID, PD HCA routing layer 7

QP 0/1 traffic LPARs Alias QP pages Work request Virt addr L_KEY User data Inbound processing HCA identifies packets on QP 0/1 HCA forwards packets to Hypervisor Hypervisor responds to QP0/1 requests or forwards them through alias QP to LPAR Outbound processing HCA QP01 filter Hypervisor SMA virtual switch alias QP sends packets HCA redirects packets to Hypervisor if necessary Hypervisor forwards packets through HCA to external fabric Fabric 8

ehca capabilities ehca 1, GA in 2006 2k MTU Low latency RC queues (user data in work request) used in System p570 for PCI-E expansion network, 12x SDR mode ehca 2 4k MTU Low latency UD/RC queues Shared receive queue System p Linux drivers available in OFED-1.3 currently available in System z for I/O expansion network and System z coupling, 12x DDR mode 9

ehca summary LPAR isolation designed in right from the start Protection against address spoofing no unfiltered packet stream access IB Parameter enforcement on all send operations for LPARs Each virtual HCA is visible on the subnet, has own GUID Paravirtualized driver interface Allows to change hardware while keeping the same device driver stack Zero overhead for main path operations Full integration in OFED-1.1, 1.2, 1.2.5, 1.3 10

For more Information Implementing InfiniBand on IBM System p (including OFED) http://www.redbooks.ibm.com/redbooks/pdfs/sg247351.pdf IBM High Performance Computing for Linux on POWER Using InfiniBand http://www-03.ibm.com/systems/p/software/whitepapers/hpc_linux.html Getting Started with InfiniBand on System z10 and System z9 http://www.redbooks.ibm.com/redbooks.nsf/redpieceabstracts/sg247539.html?open 11

Questions? 12

Backup 13