How to Turbocharge Network Throughput Tony Amies RSM Partners Tuesday 1 st November Session EC
Bio Triassic Period Jurassic Period Cretaceous Period Cenzoic Era IMS Prog Era Network SysProg Period IBM Era Network Consultant Period Director & Consultant Period Architect Period RSM Era 245 million years ago Started Work as an IMS DB/DC Programmer New Millennium Now
Agenda Typical Datacentre Connections Key Considerations to Improve Performance Some of the available options Hipersockets Dynamic XCF SMC-R SMC-D Performance Comparisons
Typical Datacentre Connections Application A Application B Application A Application B Applications on same Bypass much of lower TCP/IP layers Very Fast close to memory to memory Applications on different s Shared Traffic does not leave Still quite fast
Typical Datacentre Connections Application A Application B Application A Application B Router Applications on different s Use different s on same subnet Traffic traverses network segment Data now on physical wire Not quite so fast Applications on different s Use different s and subnets Traffic traverses network segment Traffic traverses one or more routers Data now on physical wire/router Smaller MTU size likely Comparatively slow
Typical Datacentre Connections Application A Application B Router Router Applications on different s Use different s on multiple subnets Traffic traverses multiple network segments used as a router Data traverses lots of things security considerations Quite slow Isn t that just bad routing definitions - would this really happen?
Typical Datacentre Connections Application A APPN EN APPN NN Application B APPN EN Router Router Well, its not impossible Think APPN/EE, Network and End nodes SNA application needs to get to another SNA application Connectivity is UDP over IP used as an APPN router
Considerations for improving performance Which applications need to communicate Are the s hosting these in same CPC/CEC Are the s hosting these in same Sysplex Are there zlinux or zvm s involved What z hardware do these run on z/os (and zvm) Version Security considerations (Firewall, PCI compliance) Traffic footprint TCP or UDP Streaming or interactive Typical message sizes and volumes (MTU sizes) Budget
Turbocharging Options Optional Technologies Hipersockets Dynamic XCF SMC-R SMC-D All of these Could potentially improve performance But not all are Supported in different CPC/Sysplex combinations Supported by all hardware Supported by all operating systems Supported by all applications Supported by all security compliances Supported by your accountant
Hipersockets High speed communication between s Within the same CPC/CEC Internal Queued Direct I/O (iqdio) Based on QDIO Architecture Communication via shared memory High speed, low latency Multiple Operating System Support z/os, zlinux, z/vm and even z/vse Transparent to applications No additional hardware or software required (aka free!) Easy to configure Large MTU capability (up to 56K)
Hipersockets CPC/CEC Hipersocket Shared Memory Application Application Hipersockets can be shared by all the s within a CPC/CEC Multiple hipersockets supported per CPC/CEC for isolation of traffic (also VLANs) TCP/IP Send writes to memory, TCP/IP Receive reads from memory Configurable MTU size up to 56K global across Hipersocket Defined in HCD/IOCD as a shared CHPID with type IQD Defined in TCP/IP Profile in similar way to an QDIO INTERFACE DEFINE IPAQIDIO CHPID xx (z/os V2.1 +) DEVICE IUTIQDxx MPCIPA; LINK name IPAQIDIO IUTIQDxx Static route required if not using dynamic routing (such as OSPF)
Hipersockets multiple OS Images CPC/CEC Hipersocket Shared Memory 1 VSWITCH 2 ethx zlinux zlinux zlinux zlinux zlinux z/os zlinux z/vm Native operating systems connect directly to Hipersockets via IQD channel 3 Control Unit addresses each stack (data, read/write, control) Guest operating systems can connect directly to hipersockets via IQD channel zlinux, z/os running under z/vm 3 Control Unit addresses for each stack for each guest OS z/vm can also connect directly to hipersockets Guest operating systems could optionally route via VSwitch and zvm- Special Hipersocket type for zvm Bridge Port (vswitch Link) and IEDN.
Hipersockets additional technologies Multi-write capability CPU reduction Multiple buffers written to hipersockets in single write operation Processing can be offloaded to ziip Supported on z10+ and z/os 1.10+ Completion Queues Queue mechanism if target system for data is saturated Hipersocket data sent synchronously if target has available buffers Hipersocket data queued and sent asynchronously if target has no buffers Supported on z196+ and z/os 1.13+ QDIO Accelerator When used as a router for traffic arriving via Processing of first packet determines forwarding via hipersockets Subsequent packets routed down at DLC layer, bypassing
Dynamic XCF High speed communication between s Within the same SYSPLEX Communication via XCF transport VTAM/ create an XCF data group High speed, low latency Single Operating System Support z/os only Provides a logical LAN between instances Transparent to applications No additional hardware or software required (aka free) Assuming Sysplex and Coupling Facility implemented Very easy to configure
Dynamic XCF Sysplex CF Application Application XCF network shared by all z/os stacks in Sysplex Defined in In each TCP/IP Profile using DYNAMICXCF keyword XCFINIT=YES in VTAM start options Dynamically defines IUTSAMEH and VTAM TRL APPN/HPR CP-CP connections go over XCF instead of across network with Enterprise Extender more secure
Dynamic XCF additional features CEC Hipersocket Shared Memory Sysplex CEC Hipersocket Shared Memory Application Application CF Application Uses best transport medium available Connections between CECs use XCF Connections within CEC over hipersockets Dynamically defined hipersocket network Cannot share manually defined hipersocket network Connections within use IUTSAMEH Hybrid possible Connection routed over hipersocket to an which routes over CF
SMC-R Acronyms rule! RDMA : Remote Direct Memory Access SMC-R : Shared Memory Communications over RDMA RNIC : RDMA Network Interface Card RoCE : RDMA over Converged Ethernet ( Rocky ) RMB : RDMA Remote Memory Buffers RDMA Direct read/write to/from memory Memory area registered for RDMA use (RMBs) No CPU cycles required for I/O TCP/IP processing mainly bypassed - CPU used for interrupt to TCP/IP API RoCE ( Rocky ) New hardware adapter One or more in each CEC sharable between s (only on Z13)
SMC-R continued Very high speed communication between s In different CECs Over distance 300m-600m (extendable to KM with cascaded switches) Up to 100KM+ with multiplexers typically used for GDPS Transparent to applications TCP only Still requires normal connectivity in parallel MTU size 1K, 2K and up to 4K in z/os 2.2 Failover capability back to transport Security Considerations SMC-R traffic flows over ethernet wire Cannot be routed, single switched network Cannot support/traverse firewalls Could compromise security rules or PCI compliance
SMC-R CEC CEC Application SMB SMB Application RoCE RoCE Router RoCE Adaptor Configured alongside normal interface All RoCE interfaces must be connected on same physical 10Gb ethernet RDMA capable switches can be used and cascaded to extend distance Reads and writes into Shared Memory Buffers Very fast, very low latency
SMC-R connection setup CEC CEC Application SMB SMB Application RoCE RoCE Router During Connection setup negotiates with partner To see if RDMA supported To see if RDMA should be used Subsequent packets on connection flow across RoCE adaptors RoCE read/write direct to/from shared memory buffers TCP/IP API read/write direct to/from shared memory buffers
SMC-D More Acronyms! SMC-D : Shared Memory Communications Direct Memory Access over ISM ISM : Internal Shared Memory ISM New virtual vpci network adapter defined in IOCDS, similar to IQD No hardware, just System Z firmware Similar technology to RDMA Very high speed communication between s Within same CEC so similar to hipersockets Also similar to SMC-R, but without hardware or network infrastructure Transparent to applications Requires normal connectivity in parallel Fastest possible communication within a CEC And no additional hardware/software required.. other than a Z13.
SMC-D CEC Application Application SMC ISM SMC Router Applications remain unchanged Socket API bypasses TCP/IP SMC reads/writes to/from ISM Very fast, very very low latency
SMC Technology the complete picture CEC CEC Application Application Application SMC SMC SMC ISM RoCE RoCE Router SMC-D used within CEC SMC-R used across CECs based path only used for connection setup TCP/IP stack bypassed for data Very fast, very low latency any-to-any communications SMCAT Applicability tool to assess SMC-D, SMC-R in your environment
Performance comparisons Relative performance varies depending on many factors Traffic Type Request/Response Streaming Message Sizes Availability of other resources But in summary: New shared memory communications provide ultimate performance but you need a Z13 for SMC-D and RoCE adapters for SMC-R Hipersockets are the best alternative within a CEC, and no H/W or S/W costs. DynamicXCF a good alternative within a Sysplex, especially when combined with hipersockets, and no H/W or S/W costs.
Hipersockets Performance 6000 5000 4000 3000 2000 1000 vs Hipersockets 0 1K 4K 16K 32K 64K Hipersockets Tests using a packet driver running on zlinux and z/os zlinux and z/os different subnets, same zlinux sent short request (256 bytes) z/os responded with 1K to 64K packets Average roundtrip time measured in microseconds
Review: z/os SMC-R Performance Relative to TCP ( Ex4 10Gb) Request Response Workload with different payload. SMC-R provides significantly better performance compared to TCP ( Exp4 10Gb). zec12-2cps V2R1 SMC-R vs. TCP Performance Request Response Workload SMC-R Relative to TCP ( E4 10Gb) %(Relative to TCP Exp4 10Gb) 800 600 400 200 0-200 731.91 706.28 440.62 290.11 209.85 228.8 105.32-3.72-3.32-67.72-1.82-0.79-74.41-9.57-8.51-88 -20.97-19.36-87.59-29.43-29.54-81.52-41.93-39.32-69.55-56.3-53.62-51.28 RR1(1/1) RR10(1k/1k) RR10(2k/2k) RR10(4k/4k) RR10(8k/8k) RR10(16k/16k) RR10(32k/32k) Raw Tput CPU-Server CPU-Client Resp Time Request Response Workload March 24, 2014 Client, Server : 2 CPs 2827-791 Interfaces: SMC-R and Exp4 10Gb 27 2016 IBM Corporation
HiperSockets Comparison Up to 9x the throughput! See breakout summary on next chart. 28 2016 IBM Corporation
SMC-D / ISM to HiperSockets Summary Highlights Request/Response Summary for Workloads with 1k/1k 4k/4k Payloads: Latency: Up to 48% reduction in latency Throughput: Up to 91% increase in throughput CPU cost: Up to 47% reduction in network related CPU cost Request/Response Summary for Workloads with 8k/8k 32k/32k Payloads: Latency: Up to 82% reduction in latency Throughput: Up to 475% (~6x) increase in throughput CPU cost: Up to 82% reduction in network related CPU cost Streaming Workload: Latency: Up to 89% reduction in latency Throughput: Up to 800% (~9x) increase in throughput CPU cost: Up to 89% reduction in network related CPU cost 29 2016 IBM Corporation
Comparison Up to 21x the throughput! See breakout summary on next chart. 30 2016 IBM Corporation
SMC-D / ISM to Summary Highlights Request/Response Summary for Workloads with 1k/1k 4k/4k Payloads: Latency: Up to 94% reduction in latency Throughput: Up to 1601% (~17x) increase in throughput CPU cost: Up to 40% reduction in network related CPU cost Request/Response Summary for Workloads with 8k/8k 32k/32k Payloads: Latency: Up to 93% reduction in latency Throughput: Up to 1313% (~14x) increase in throughput CPU cost: Up to 67% reduction in network related CPU cost Streaming Workload: Latency: Up to 95% reduction in latency Throughput: Up to 2001% (~21x) increase in throughput CPU cost: Up to 85% reduction in network related CPU cost FTP: For Binary Get and Put: Up to 58% lower (receive side) CPU cost and Up to 26% lower (send side) CPU cost and equivalent throughput 31
Compatibility Summary to Connection Hipersockets DynamicXCF SMC-R* SMC-D** Within the same CPC/CEC Same Sysplex Different CECs, Same Sysplex z/os s zvm, zlinux s z/os Guest TCP Traffic Other IP Traffic APPN/HPR Traffic EE XCF No additional hardware required Z13 Required Secure no data goes on wire * Z13 and z/os V2.2 (V2.1 +PTF) required for sharing RoCE adapters between s ** Z13 and z/os V2.2 required SMC-D
Summary Sysplex Sysplex CPC/CEC CPC/CEC CPC/CEC z/os z/os z/vm zlinux z/os z/os z/vm zlinux z/os z/os z/vm zlinux SMC-D, Hipersockets, DynamicXCF Hipersockets, SMC-R, SMC-R, DynamicXCF
Session feedback Please submit your feedback at http://conferences.gse.org.uk/2016/feedback/ec Session is EC