How to Turbocharge Network Throughput

Similar documents
15515: z/os Communications Server s Use of HiperSockets

10GbE RoCE Express and Shared Memory Communications RDMA (SMC-R) Frequently Asked Questions

OFA Developer Workshop 2014

HiperSockets for System z Newest Functions

Networking in zenterprise between z/os and Linux on System z

Advanced Computer Networks. End Host Optimization

Shared Memory Communications over RDMA (SMC-R) documentation updates foraparpi13034

Securing Enterprise Extender

IBM. Communications Server support for RoCE Express2 features. z/os Communications Server. Version 2 Release 1

13196: HiperSockets on System zec12 Overview

The z/vm Virtual Switch Advancing the Art of Virtual Networking

Shared Memory Communications over RDMA adapter (RoCE) virtualization documentation updates for APARs OA44576 and PI12223

IBM z/os SMC-R Performance Considerations December IBM z/os Shared Memory Communications. Performance Considerations. David Herr Dan Patel

Configuring and Using SMF Logstreams with zedc Compression

IBM z/os Communications Server Shared Memory Communications (SMC)

Mainframe Networking 101 Share Session 15422

QuickSpecs. HP Z 10GbE Dual Port Module. Models

14950: z/os Communications Server Usage of HiperSockets

Mainframe Networking 101 Share Session. Junie Sanders Kevin Manweiler -

Redbooks Paper. Networking Overview for Linux on zseries. Introduction. Networking options. Simon Williams

Understanding VLANs when Sharing OSA Ports on System z

OFA Developer Workshop 2013

RoCE vs. iwarp Competitive Analysis

Solutions in an SNA/IP (EE) Network

IBM. Communications Server support for RoCE Express2 features. z/os Communications Server. Version 2 Release 2

z/vm Virtual Switch: The Basics

Sysplex Networking Technology Overview

z/os V2R1 CS: Shared Memory Communications - RDMA (SMC-R), Part 1

Mainframe Optimization System z the Center of Enterprise Computing

Application Acceleration Beyond Flash Storage

Keeping Your Network at Peak Performance as You Virtualize the Data Center

How to Manage TCP/IP with NetView for z/os V5R4. Ernie Gilman IBM August 5 th 2010 Session 7618

WebSphere MQ Low Latency Messaging V2.1. High Throughput and Low Latency to Maximize Business Responsiveness IBM Corporation

IBM. SMF 119 TCP connection termination record (subtype 2) enhanced to provide IP filter information. z/os Communications Server. Version 2 Release 2

z/os Introduction and Workshop Communications Server

The NE010 iwarp Adapter

Implementing the z/vse Fast Path to Linux on System z

How IBM Can Identify z/os Networking Issues without tracing

IBM C IBM z Systems Technical Support V7.

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Enable DHCP clients on OSA Interfaces (withaparpm71460)

NVMe over Universal RDMA Fabrics

SNA Strategy and Migration Considerations

Leading-edge Technology on System z

RoCE Update. Liran Liss, Mellanox Technologies March,

IBM z13. Frequently Asked Questions. Worldwide

OpenOnload. Dave Parry VP of Engineering Steve Pope CTO Dave Riddoch Chief Software Architect

iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc

Best Practices for Deployments using DCB and RoCE

INT G bit TCP Offload Engine SOC

z/os Communications Server Performance Functions Update - Session 16746

Advanced iscsi Management April, 2008

IBM System z10 Enterprise Class: Helping to meet global 24x7 demands for information services with improvements for Internet access and coupling

Cisco IOS for S/390 Architecture

Containing RDMA and High Performance Computing

IO virtualization. Michael Kagan Mellanox Technologies

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

IBM POWER8 100 GigE Adapter Best Practices

Introduction to Infiniband

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Introduction to Ethernet Latency

ZVM17: z/vm Device Support Overview

Hardened Security in the Cloud Bob Doud, Sr. Director Marketing March, 2018

10415: SHARE: Atlanta Connectivity to the zbx and Routing in the zenterprise Ensemble

IsoStack Highly Efficient Network Processing on Dedicated Cores

Linux Installation Planning

Sharing High-Performance Devices Across Multiple Virtual Machines

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

The Emperor Strikes Back: Exploiting New and Advanced z13 Features with z/vm. Mike Giglio HealthPlan Services Tampa, Florida, USA

by Brian Hausauer, Chief Architect, NetEffect, Inc

N V M e o v e r F a b r i c s -

Storage Protocol Offload for Virtualized Environments Session 301-F

High Performance Highly Available z Systems RoCE Networks

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin,

ECE 650 Systems Programming & Engineering. Spring 2018

Data Path acceleration techniques in a NFV world

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Memory Management Strategies for Data Serving with RDMA

Advancing RDMA. A proposal for RDMA on Enhanced Ethernet. Paul Grun SystemFabricWorks

IBM Software Group. IBM WebSphere MQ V7.0. Introduction and Technical Overview. An IBM Proof of Technology IBM Corporation

IBM MQ Appliance HA and DR Performance Report Model: M2001 Version 3.0 September 2018

Voltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO

Setting up IBM zaware Step by Step

Making System z the Center of Enterprise Computing

Preview: IBM z/vse Version 4 Release 3 offers more capacity and IBM zenterprise exploitation

Networking at the Speed of Light

TPF 4.1 Communications - TCP/IP Enhancements

IBM Z: Technical Overview of HW and SW Mainframe Evolution Information Length: Ref: 2.0 Days ES82G Delivery method: Classroom. Price: INR.

z/os Introduction and Workshop Overview of z Systems Environment 2017 IBM Corporation

SMB Direct Update. Tom Talpey and Greg Kramer Microsoft Storage Developer Conference. Microsoft Corporation. All Rights Reserved.

Remote Persistent Memory With Nothing But Net Tom Talpey Microsoft

Dynamic Routing: Exploiting HiperSockets and Real Network Devices

SHARE in Pittsburgh Session 15801

z/vm Connectivity Version 5 Release 1 SC

RoGUE: RDMA over Generic Unconverged Ethernet

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

IBM Europe Announcement ZP , dated November 6, 2007

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008

Flex System EN port 10Gb Ethernet Adapter Product Guide

JANUARY 28, 2014, SAN JOSE, CA. Microsoft Lead Partner Architect OS Vendors: What NVM Means to Them

Transcription:

How to Turbocharge Network Throughput Tony Amies RSM Partners Tuesday 1 st November Session EC

Bio Triassic Period Jurassic Period Cretaceous Period Cenzoic Era IMS Prog Era Network SysProg Period IBM Era Network Consultant Period Director & Consultant Period Architect Period RSM Era 245 million years ago Started Work as an IMS DB/DC Programmer New Millennium Now

Agenda Typical Datacentre Connections Key Considerations to Improve Performance Some of the available options Hipersockets Dynamic XCF SMC-R SMC-D Performance Comparisons

Typical Datacentre Connections Application A Application B Application A Application B Applications on same Bypass much of lower TCP/IP layers Very Fast close to memory to memory Applications on different s Shared Traffic does not leave Still quite fast

Typical Datacentre Connections Application A Application B Application A Application B Router Applications on different s Use different s on same subnet Traffic traverses network segment Data now on physical wire Not quite so fast Applications on different s Use different s and subnets Traffic traverses network segment Traffic traverses one or more routers Data now on physical wire/router Smaller MTU size likely Comparatively slow

Typical Datacentre Connections Application A Application B Router Router Applications on different s Use different s on multiple subnets Traffic traverses multiple network segments used as a router Data traverses lots of things security considerations Quite slow Isn t that just bad routing definitions - would this really happen?

Typical Datacentre Connections Application A APPN EN APPN NN Application B APPN EN Router Router Well, its not impossible Think APPN/EE, Network and End nodes SNA application needs to get to another SNA application Connectivity is UDP over IP used as an APPN router

Considerations for improving performance Which applications need to communicate Are the s hosting these in same CPC/CEC Are the s hosting these in same Sysplex Are there zlinux or zvm s involved What z hardware do these run on z/os (and zvm) Version Security considerations (Firewall, PCI compliance) Traffic footprint TCP or UDP Streaming or interactive Typical message sizes and volumes (MTU sizes) Budget

Turbocharging Options Optional Technologies Hipersockets Dynamic XCF SMC-R SMC-D All of these Could potentially improve performance But not all are Supported in different CPC/Sysplex combinations Supported by all hardware Supported by all operating systems Supported by all applications Supported by all security compliances Supported by your accountant

Hipersockets High speed communication between s Within the same CPC/CEC Internal Queued Direct I/O (iqdio) Based on QDIO Architecture Communication via shared memory High speed, low latency Multiple Operating System Support z/os, zlinux, z/vm and even z/vse Transparent to applications No additional hardware or software required (aka free!) Easy to configure Large MTU capability (up to 56K)

Hipersockets CPC/CEC Hipersocket Shared Memory Application Application Hipersockets can be shared by all the s within a CPC/CEC Multiple hipersockets supported per CPC/CEC for isolation of traffic (also VLANs) TCP/IP Send writes to memory, TCP/IP Receive reads from memory Configurable MTU size up to 56K global across Hipersocket Defined in HCD/IOCD as a shared CHPID with type IQD Defined in TCP/IP Profile in similar way to an QDIO INTERFACE DEFINE IPAQIDIO CHPID xx (z/os V2.1 +) DEVICE IUTIQDxx MPCIPA; LINK name IPAQIDIO IUTIQDxx Static route required if not using dynamic routing (such as OSPF)

Hipersockets multiple OS Images CPC/CEC Hipersocket Shared Memory 1 VSWITCH 2 ethx zlinux zlinux zlinux zlinux zlinux z/os zlinux z/vm Native operating systems connect directly to Hipersockets via IQD channel 3 Control Unit addresses each stack (data, read/write, control) Guest operating systems can connect directly to hipersockets via IQD channel zlinux, z/os running under z/vm 3 Control Unit addresses for each stack for each guest OS z/vm can also connect directly to hipersockets Guest operating systems could optionally route via VSwitch and zvm- Special Hipersocket type for zvm Bridge Port (vswitch Link) and IEDN.

Hipersockets additional technologies Multi-write capability CPU reduction Multiple buffers written to hipersockets in single write operation Processing can be offloaded to ziip Supported on z10+ and z/os 1.10+ Completion Queues Queue mechanism if target system for data is saturated Hipersocket data sent synchronously if target has available buffers Hipersocket data queued and sent asynchronously if target has no buffers Supported on z196+ and z/os 1.13+ QDIO Accelerator When used as a router for traffic arriving via Processing of first packet determines forwarding via hipersockets Subsequent packets routed down at DLC layer, bypassing

Dynamic XCF High speed communication between s Within the same SYSPLEX Communication via XCF transport VTAM/ create an XCF data group High speed, low latency Single Operating System Support z/os only Provides a logical LAN between instances Transparent to applications No additional hardware or software required (aka free) Assuming Sysplex and Coupling Facility implemented Very easy to configure

Dynamic XCF Sysplex CF Application Application XCF network shared by all z/os stacks in Sysplex Defined in In each TCP/IP Profile using DYNAMICXCF keyword XCFINIT=YES in VTAM start options Dynamically defines IUTSAMEH and VTAM TRL APPN/HPR CP-CP connections go over XCF instead of across network with Enterprise Extender more secure

Dynamic XCF additional features CEC Hipersocket Shared Memory Sysplex CEC Hipersocket Shared Memory Application Application CF Application Uses best transport medium available Connections between CECs use XCF Connections within CEC over hipersockets Dynamically defined hipersocket network Cannot share manually defined hipersocket network Connections within use IUTSAMEH Hybrid possible Connection routed over hipersocket to an which routes over CF

SMC-R Acronyms rule! RDMA : Remote Direct Memory Access SMC-R : Shared Memory Communications over RDMA RNIC : RDMA Network Interface Card RoCE : RDMA over Converged Ethernet ( Rocky ) RMB : RDMA Remote Memory Buffers RDMA Direct read/write to/from memory Memory area registered for RDMA use (RMBs) No CPU cycles required for I/O TCP/IP processing mainly bypassed - CPU used for interrupt to TCP/IP API RoCE ( Rocky ) New hardware adapter One or more in each CEC sharable between s (only on Z13)

SMC-R continued Very high speed communication between s In different CECs Over distance 300m-600m (extendable to KM with cascaded switches) Up to 100KM+ with multiplexers typically used for GDPS Transparent to applications TCP only Still requires normal connectivity in parallel MTU size 1K, 2K and up to 4K in z/os 2.2 Failover capability back to transport Security Considerations SMC-R traffic flows over ethernet wire Cannot be routed, single switched network Cannot support/traverse firewalls Could compromise security rules or PCI compliance

SMC-R CEC CEC Application SMB SMB Application RoCE RoCE Router RoCE Adaptor Configured alongside normal interface All RoCE interfaces must be connected on same physical 10Gb ethernet RDMA capable switches can be used and cascaded to extend distance Reads and writes into Shared Memory Buffers Very fast, very low latency

SMC-R connection setup CEC CEC Application SMB SMB Application RoCE RoCE Router During Connection setup negotiates with partner To see if RDMA supported To see if RDMA should be used Subsequent packets on connection flow across RoCE adaptors RoCE read/write direct to/from shared memory buffers TCP/IP API read/write direct to/from shared memory buffers

SMC-D More Acronyms! SMC-D : Shared Memory Communications Direct Memory Access over ISM ISM : Internal Shared Memory ISM New virtual vpci network adapter defined in IOCDS, similar to IQD No hardware, just System Z firmware Similar technology to RDMA Very high speed communication between s Within same CEC so similar to hipersockets Also similar to SMC-R, but without hardware or network infrastructure Transparent to applications Requires normal connectivity in parallel Fastest possible communication within a CEC And no additional hardware/software required.. other than a Z13.

SMC-D CEC Application Application SMC ISM SMC Router Applications remain unchanged Socket API bypasses TCP/IP SMC reads/writes to/from ISM Very fast, very very low latency

SMC Technology the complete picture CEC CEC Application Application Application SMC SMC SMC ISM RoCE RoCE Router SMC-D used within CEC SMC-R used across CECs based path only used for connection setup TCP/IP stack bypassed for data Very fast, very low latency any-to-any communications SMCAT Applicability tool to assess SMC-D, SMC-R in your environment

Performance comparisons Relative performance varies depending on many factors Traffic Type Request/Response Streaming Message Sizes Availability of other resources But in summary: New shared memory communications provide ultimate performance but you need a Z13 for SMC-D and RoCE adapters for SMC-R Hipersockets are the best alternative within a CEC, and no H/W or S/W costs. DynamicXCF a good alternative within a Sysplex, especially when combined with hipersockets, and no H/W or S/W costs.

Hipersockets Performance 6000 5000 4000 3000 2000 1000 vs Hipersockets 0 1K 4K 16K 32K 64K Hipersockets Tests using a packet driver running on zlinux and z/os zlinux and z/os different subnets, same zlinux sent short request (256 bytes) z/os responded with 1K to 64K packets Average roundtrip time measured in microseconds

Review: z/os SMC-R Performance Relative to TCP ( Ex4 10Gb) Request Response Workload with different payload. SMC-R provides significantly better performance compared to TCP ( Exp4 10Gb). zec12-2cps V2R1 SMC-R vs. TCP Performance Request Response Workload SMC-R Relative to TCP ( E4 10Gb) %(Relative to TCP Exp4 10Gb) 800 600 400 200 0-200 731.91 706.28 440.62 290.11 209.85 228.8 105.32-3.72-3.32-67.72-1.82-0.79-74.41-9.57-8.51-88 -20.97-19.36-87.59-29.43-29.54-81.52-41.93-39.32-69.55-56.3-53.62-51.28 RR1(1/1) RR10(1k/1k) RR10(2k/2k) RR10(4k/4k) RR10(8k/8k) RR10(16k/16k) RR10(32k/32k) Raw Tput CPU-Server CPU-Client Resp Time Request Response Workload March 24, 2014 Client, Server : 2 CPs 2827-791 Interfaces: SMC-R and Exp4 10Gb 27 2016 IBM Corporation

HiperSockets Comparison Up to 9x the throughput! See breakout summary on next chart. 28 2016 IBM Corporation

SMC-D / ISM to HiperSockets Summary Highlights Request/Response Summary for Workloads with 1k/1k 4k/4k Payloads: Latency: Up to 48% reduction in latency Throughput: Up to 91% increase in throughput CPU cost: Up to 47% reduction in network related CPU cost Request/Response Summary for Workloads with 8k/8k 32k/32k Payloads: Latency: Up to 82% reduction in latency Throughput: Up to 475% (~6x) increase in throughput CPU cost: Up to 82% reduction in network related CPU cost Streaming Workload: Latency: Up to 89% reduction in latency Throughput: Up to 800% (~9x) increase in throughput CPU cost: Up to 89% reduction in network related CPU cost 29 2016 IBM Corporation

Comparison Up to 21x the throughput! See breakout summary on next chart. 30 2016 IBM Corporation

SMC-D / ISM to Summary Highlights Request/Response Summary for Workloads with 1k/1k 4k/4k Payloads: Latency: Up to 94% reduction in latency Throughput: Up to 1601% (~17x) increase in throughput CPU cost: Up to 40% reduction in network related CPU cost Request/Response Summary for Workloads with 8k/8k 32k/32k Payloads: Latency: Up to 93% reduction in latency Throughput: Up to 1313% (~14x) increase in throughput CPU cost: Up to 67% reduction in network related CPU cost Streaming Workload: Latency: Up to 95% reduction in latency Throughput: Up to 2001% (~21x) increase in throughput CPU cost: Up to 85% reduction in network related CPU cost FTP: For Binary Get and Put: Up to 58% lower (receive side) CPU cost and Up to 26% lower (send side) CPU cost and equivalent throughput 31

Compatibility Summary to Connection Hipersockets DynamicXCF SMC-R* SMC-D** Within the same CPC/CEC Same Sysplex Different CECs, Same Sysplex z/os s zvm, zlinux s z/os Guest TCP Traffic Other IP Traffic APPN/HPR Traffic EE XCF No additional hardware required Z13 Required Secure no data goes on wire * Z13 and z/os V2.2 (V2.1 +PTF) required for sharing RoCE adapters between s ** Z13 and z/os V2.2 required SMC-D

Summary Sysplex Sysplex CPC/CEC CPC/CEC CPC/CEC z/os z/os z/vm zlinux z/os z/os z/vm zlinux z/os z/os z/vm zlinux SMC-D, Hipersockets, DynamicXCF Hipersockets, SMC-R, SMC-R, DynamicXCF

Session feedback Please submit your feedback at http://conferences.gse.org.uk/2016/feedback/ec Session is EC