OFA Developer Workshop 2014

Similar documents
OFA Developer Workshop 2013

IBM z/os SMC-R Performance Considerations December IBM z/os Shared Memory Communications. Performance Considerations. David Herr Dan Patel

z/os V2R1 CS: Shared Memory Communications - RDMA (SMC-R), Part 1

IBM z13. Frequently Asked Questions. Worldwide

10GbE RoCE Express and Shared Memory Communications RDMA (SMC-R) Frequently Asked Questions

Independent Submission Request for Comments: 7609 Category: Informational ISSN: August 2015

High Performance Highly Available z Systems RoCE Networks

Networking in zenterprise between z/os and Linux on System z

How to Turbocharge Network Throughput

IBM z/os Communications Server Shared Memory Communications (SMC)

IBM. Communications Server support for RoCE Express2 features. z/os Communications Server. Version 2 Release 1

IBM C IBM z Systems Technical Support V7.

IBM z13 Hardware Innovation

Shared Memory Communications over RDMA adapter (RoCE) virtualization documentation updates for APARs OA44576 and PI12223

IBM z13 (z13) Highlights. Performance and Scale help improve client experience. IBM Systems Data Sheet

Using WebSphere Application Server Optimized Local Adapters (WOLA) to Integrate COBOL and zaap-able Java

IBM z13 Hardware Innovation Overview

Leading-edge Technology on System z

RoCE vs. iwarp Competitive Analysis

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

IBM Db2 Warehouse on Cloud

Optimize Your Heterogeneous SOA Infrastructure

NVMe over Universal RDMA Fabrics

IBM Db2 Analytics Accelerator Version 7.1

Messaging Overview. Introduction. Gen-Z Messaging

IBM C IBM z Systems Technical Support V6.

WebSphere Commerce Developer Professional 9.0

The NE010 iwarp Adapter

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

IBM Tivoli System Automation for z/os

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

Innovate 2013 Automated Mobile Testing

WebSphere MQ Low Latency Messaging V2.1. High Throughput and Low Latency to Maximize Business Responsiveness IBM Corporation

Advanced Computer Networks. End Host Optimization

IMS Connect Much More Than a TCP/IP Gateway

Hardware Cryptography and z/tpf

IMS V13 Overview. Deepak Kohli IMS Product Management

IBM zenterprise System Innovation

DB2 REST API and z/os Connect SQL/Stored Procedures Play a Role in Mobile and API Economics

20 years of Lotus Notes and a look into the next 20 years Lotusphere Comes To You

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Reducing MIPS Using InfoSphere Optim Query Workload Tuner TDZ-2755A. Lloyd Matthews, U.S. Senate

IBM Application Performance Analyzer for z/os Version IBM Corporation

IBM Europe Announcement ZP , dated November 6, 2007

Application Acceleration Beyond Flash Storage

WebSphere Commerce Professional

F5 Networks F5LTM12: F5 Networks Configuring BIG-IP LTM: Local Traffic Manager. Upcoming Dates. Course Description. Course Outline

SAP on IBM z Systems. Customer Conference. April 12-13, 2016 IBM Germany Research & Development

Stellar performance for a virtualized world

Voltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO

WebSphere Commerce Developer Professional

Optimize and Accelerate Your Mission- Critical Applications across the WAN

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

IBM Z servers running Oracle Database 12c on Linux

Best Practices for Deployments using DCB and RoCE

by Brian Hausauer, Chief Architect, NetEffect, Inc

CICS Product Update. Danny Mace Director, CICS Products IBM Software. August 2012 Session Number 11417

QuickSpecs. HP Z 10GbE Dual Port Module. Models

IBM Real-time Compression and ProtecTIER Deduplication

IBM Infrastructure Suite for z/vm and Linux: Introduction IBM Tivoli OMEGAMON XE on z/vm and Linux

IBM CICS Transaction Gateway for Multiplatforms V7.1 delivers access to CICS containers and extended systems monitoring capabilities

High performance and functionality

Enterprise Caching in a Mobile Environment IBM Redbooks Solution Guide

InfoSphere Warehouse with Power Systems and EMC CLARiiON Storage: Reference Architecture Summary

Flex System EN port 10Gb Ethernet Adapter Product Guide

OpenOnload. Dave Parry VP of Engineering Steve Pope CTO Dave Riddoch Chief Software Architect

Extremely Fast Distributed Storage for Cloud Service Providers

IBM DB2 Analytics Accelerator for z/os, v2.1 Providing extreme performance for complex business analysis

HATS 7.1 Performance and Capacity Planning

Virtual WAN Optimization Controllers

Empowering DBA's with IBM Data Studio. Deb Jenson, Data Studio Product Manager,

WP710 Language: English Additional languages: None specified Product: WebSphere Portal Release: 6.0

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

Storage Protocol Offload for Virtualized Environments Session 301-F

z/os Thru-V2R1 Communications Server Performance Functions Update - Session David Herr IBM Raleigh, NC

Solutions in an SNA/IP (EE) Network

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Creating an agile infrastructure with Virtualized I/O

Red Hat Enterprise Linux on IBM System z Performance Evaluation

TALK THUNDER SOFTWARE FOR BARE METAL HIGH-PERFORMANCE SOFTWARE FOR THE MODERN DATA CENTER WITH A10 DATASHEET YOUR CHOICE OF HARDWARE

GUIDE. Optimal Network Designs with Cohesity

QLogic TrueScale InfiniBand and Teraflop Simulations

iser as accelerator for Software Defined Storage Rahul Fiske, Subhojit Roy IBM (India)

z/os Communications Server Performance Functions Update - Session 16746

SMB Direct Update. Tom Talpey and Greg Kramer Microsoft Storage Developer Conference. Microsoft Corporation. All Rights Reserved.

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Exploiting IT Log Analytics to Find and Fix Problems Before They Become Outages

z/vm 6.3 A Quick Introduction

IBM zenterprise System: What s New - Announcement Overview

Cisco Wide Area Application Services (WAAS) Mobile

2011 IBM Research Strategic Initiative: Workload Optimized Systems

Generic RDMA Enablement in Linux

Hitachi Virtual Storage Platform Family

IBM i 7.3 Features for SAP clients A sortiment of enhancements

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

The zenterprise Unified Resource Manager IBM Corporation

IBM LinuxONE Rockhopper

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

Transcription:

OFA Developer Workshop 2014 Shared Memory Communications over RDMA (SMC-R): Update Jerry Stevens IBM sjerry@us.ibm.com

Trademarks, copyrights and disclaimers IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of other IBM trademarks is available on the web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS OR SOFTWARE. Copyright International Business Machines Corporation 2013. All rights reserved. 2

Agenda Topics 1. SMC-R Review 2. Status (what s available) 3. Performance Overview 4. Linux SMC-R Status 3

Topic 1 SMC-R Review 4

Shared Memory Communications over RDMA Concepts / Overview SMC-R enabled platform Shared Memory Communications via RDMA SMC-R enabled platform OS image OS image shared memory shared memory server Sockets SMC Sockets SMC client Virtual server instance RNIC RNIC RDMA enabled (RoCE) Virtual server instance RDMA technology provides the capability to allow hosts to logically share memory. The SMC-R protocol defines a means to exploit the shared memory for communications - transparent to the applications! Clustered Systems SMC-R is an open sockets over RDMA protocol that provides transparent exploitation of RDMA (for TCP based applications) while preserving key functions and qualities of service from the TCP/IP ecosystem that enterprise level servers/network depend on! Draft IETF RFC for SMC-R: http://tools.ietf.org/html/draft-fox-tcpm-shared-memory-rdma-03 5

Dynamic Transition from TCP to SMC-R z/os System A Middleware/Application z/os System B Middleware/Application Sockets Sockets SMC-R TCP IP TCP IP SMC-R Interface Interface data exchanged using RDMA ROCE OSA TCP connection establishment over IP OSA ROCE data exchanged using RDMA TCP syn flows (with TCP Options indicating SMC-R capability) RDMA Network RoCE IP Network (Ethernet) Dynamic (in-line) negotiation for SMC-R is initiated by presence of TCP Options TCP connection transitions to SMC-R allowing application data to be exchanged using RDMA 6

SMC-R (Contact and RDMA Processing) z/os V2R1 System A Client application RoCE z/os V2R1 System B Server application Socket API TCP IP SMC-R O S A (1) TCP/IP handshake (2) CLC O S A Socket API SMC-R TCP IP SMC RDMA Memory Block server written socket data QP R O C E (3) LLC (4) Data SMC Link (RC-QPs) R O C E QP 4) SMC RDMA Memory Block client written socket data 1) Application issues standard TCP Connect; Normal TCP/IP connection (3-way syn) handshake; Determine ability/desire to support SMC-Remote (based on TCP option) 2) When both hosts provide SMC TCP option then exchange RDMA credentials (QPs, RMBEs, GIDs, etc.) within TCP data stream (CLC messages Connection Level Control messages) can still fall back to IP 3) If first contact. then establish point-to-point SMC Link via SMC LLC (Link Layer Control) commands (RDMA- Memory-Block (RMB) pair over RC-QP the same link (QP/RMB) can be used for multiple TCP connections across same 2 peers) 4) Applications issue standard socket send; SMC-R performs RDMA-write into partner s RMBE slot (RMB Element); peer consumes data via standard socket read 7

SMC-R Link Group Architecture RMBs 3 & 4 RMB 1 SMC Rkey 4 Rkey 3 QP 8 Rkey 1 Rkey 2 QP 64 RNIC X A RNIC B SMC Link Group SMC Link 1 SMC Link 2 RNIC C RNIC D SMC RMB 2 QP 12 Rkey 1 Rkey 3 Rkey 4 (RoCE) Rkey 2 QP 68 If one path (e.g. an RNIC) becomes unavailable (in this example RNIC A) then: traffic on the SMC Link 1 is transparently moved to SMC Link 2 using the redundant hardware all application workload RDMA traffic continues without interruption once SMC Link 1 is recovered then traffic can resume using both paths. Note that all paths (SMC Links) have equal access to all RMBs! 8

TCP Connection Load Balancing with SMC-R RDMA direct data path Tier 2 Web Server/Application Cluster Any platform Tier 3 Application/Database Server Cluster- Any platform Cluster Tier 1 Clients Work requests (TCP connections) Server instance Load Balancer Work requests Load Balancer Server instance Work requests VIP1 VIP2 TCP/IP connectio n path Server instance Server clustering is a prevalent deployment pattern for Enterprise class servers Provide High Available, eliminate single points of failure, ability to grow/shrink capacity dynamically, ability to perform non-disruptive planned maintenance, etc. TCP connection load balancing is a key solution for load balancing within a cluster environment External or Internal load balancers provide this capability Existing TCP connection load balancing solutions are not compatible with other RDMA solutions They are not aware of the RDMA protocol AND RDMA flows can not flow through intermediate nodes The SMC-R protocol allows existing TCP load balancing solutions to be deployed with no changes TCP Connection load balancing for SMC-R connections is actually more efficient than normal TCP/IP connections Load balancer selects optimal back end server, data flows can then bypass the load balancer

SMC-R Overview Shared Memory Communications over RDMA (SMC-R) is a protocol that allows TCP sockets applications to transparently exploit RDMA (RoCE) SMC-R is a hybrid solution that: Uses TCP connection (3-way handshake) to establish SMC-R connection Each TCP end point exchanges TCP options that indicate whether it supports the SMC-R protocol SMC-R rendezvous (RDMA attributes) information is then exchanged within the TCP data stream (similar to SSL handshake) Socket application data is exchanged via RDMA (write operations) TCP connection remains active (controls SMC-R connection) This model preserves many critical existing operational and network management features of TCP/IP 10

SMC-R Key Attributes - Summary Optimized Network Performance (leveraging RDMA technology) Transparent to (TCP socket based) application software Leverages existing Ethernet infrastructure (RoCE) Preserves existing network security model Resiliency (dynamic failover to redundant hardware) Transparent to Load Balancers Preserves existing IP topology and network administrative and operational model 11

Topic 2 SMC-R Availability 12

New innovations available on IBM zbc12 and zec12 Data Compression Acceleration High Speed Communication Fabric Flash Technology Exploitation Proactive Systems Health Analytics Hybrid Computing Enhancements Reduce CP consumption, free up storage & speed cross platform data exchange Optimize server to server networking with reduced latency and lower CPU overhead Improve availability and performance during critical workload transitions, now with dynamic reconfiguration; Coupling Facility exploitation (SOD) Increase availability by detecting unusual application or system behaviors for faster problem resolution before they disrupt business x86 blade resource optimization; New alert & notification for blade virtual servers; Latest x86 OS support; Expanding futures roadmap zedc Express 10GbE RoCE Express IBM Flash Express IBM zaware zbx Mod 003; zmanager Automate; Ensemble Availability Manager; DataPower Virtual appliance SoD 13

10GbE RoCE Express with SMC-R: Transparent optimized server to server networking! Network latency for z/os TCP/IP based OLTP workloads reduced by up to 80% 1 zec12 Shared Memory Communications (SMC-R): zbc12 Exploit RDMA over Converged Ethernet (RoCE) to deliver superior communications performance for TCP based applications Networking related CPU consumption for z/os TCP/IP based workloads with streaming data patterns reduced by up to 60% 2 with a network throughput increase of up to 60% 2 z/os V2.1 SMC-R Typical Client Use Cases: Help to reduce both latency and CPU resource consumption over traditional TCP/IP for communications across z/os systems Any z/os TCP sockets based workload can seamlessly use SMC-R without requiring any application changes z/vm 6.3 support for guests 10GbE RoCE Express 1 Based on internal IBM benchmarks in a controlled environment of modeled z/os TCP sockets-based workloads with request/response traffic patterns using SMC-R (10GbE RoCE Express feature) vs TCP/IP (10GbE OSA Express feature). The actual response times and CPU savings any user will experience will vary. 2 Based on internal IBM benchmarks in a controlled environment of modeled z/os TCP sockets-based workloads with streaming traffic patterns using SMC-R (10GbE RoCE Express feature) vs TCP/IP (10GbE OSA Express feature). The actual response times and CPU savings any user will experience will vary. 14

Topic 3 SMC-R Performance 15

z/os to z/os SMC-R Performance (micro benchmarks) 9.1 Gb/sec (bi-directional) CPU savings increases with payload 16

Performance Benefits: Actual Workloads 40% reduction in overall transaction response time for WebSphere Application Server v8.5 Liberty profile TradeLite workload accessing z/os DB2 in another system measured in internal benchmarks 1 Linux on x Workload Client Simulator (JIBE) WebSphere to DB2 communications using SMC-R z/os SYSA SMC-R z/os SYSB TCP/IP HTTP/REST WAS JDBC/DRDA Liberty TradeLite RoCE DB2 File Transfers (FTP) using SMC-R z/os SYSA FTP Client SMC-R FTP RoCE z/os SYSB FTP Server Up to 50% CPU savings for FTP binary file transfers across z/os systems when using SMC-R vs standard TCP/IP 2 1. Based on projections and measurements completed in a controlled environment. Results may vary by customer based on individual workload, configuration and software levels. 2. Based on internal IBM benchmarks in a controlled environment using z/os V2R1 Communications Server FTP client and FTP server, transferring a 1.2GB binary file using SMC-R (10GbE RoCE Express feature) vs standard TCP/IP (10GbE OSA Express4 feature). The actual CPU savings any user will experience may vary. 17

Performance Benefits: Actual Workloads (continued) Up to 48% reduction in response time and up to 10% CPU savings for CICS transactions using DPL (Distributed Program Link) to invoke programs in remote CICS regions in another z/os system via CICS IP interconnectivity (IPIC) when using SMC-R vs standard TCP/IP 3 CICS to CICS IP Intercommunications (IPIC) using SMC-R z/os SMC-R z/os SYSA SYSB CICS A DPL calls IPIC RoCE CICS B Program X WebSphere MQ for z/os using SMC-R z/os SMC-R z/os SYSA SYSB WebSphere MQ MQ messages RoCE WebSphere MQ WebSphere MQ for z/os realizes up to 200% increase in messages per second it can deliver across z/os systems when using SMC-R vs standard TCP/IP 4 3 Based on internal IBM benchmarks using a modeled CICS workload driving a CICS transaction that performs 5 DPL (Distributed Program Link) calls to a CICS region on a remote z/os system via CICS IP interconnectivity (IPIC), using 32K input/output containers. Response times and CPU savings measured on z/os system initiating the DPL calls. The actual response times and CPU savings any user will experience will vary. 4 Based on internal IBM benchmarks using a modeled WebSphere MQ for z/os workload driving non-persistent messages across z/os systems in a request/response pattern. The benchmarks included various data sizes and number of channel pairs The actual throughput and CPU savings users will experience may vary based on the user workload and configuration. 18

Topic 4 Linux SMC-R Status 19

SMC-R for Linux: Overview IBM is in the process of developing an SMC-R solution for Linux kernel based solution (working towards upstream kernel acceptance) No special license requirements (GPL) Introduces a new AF_SMC socket family and a preload library to transparently run AF_INET socket applications for SMC-R (no application changes required) 20

SMC-R for Linux: Key Functions Uses BSD sockets interface Conforms to SMC-R specifications (RFC) Provide key QoS (SSL, load balancer compatibility, HCA fail-over etc.) Kernel solution enables key functions (e.g. QP sharing, memory mgt, fork(), link groups etc.) Will be Java enabled Usability: minimal configuration steps required (i.e. zero IP topology changes) 21

SMC-R for Linux: Project Status Linux internal testing is in progress (significant progress with various levels of testing) Posting code: objective is summer 2014 22

Linux to Linux SMC-R Performance (micro benchmarks) 12 Gb/sec (bi-directional) CPU savings increases Note. Results are preliminary and with are payload subject to change. 23

Linux Hardware Software Performance Configuration Hardware Software Configuration used for measurements zec12 Client, Server Linux on System z images with dedicated 2 CPs each Linux version (~ Red Hat 6.0) Linux SMC-R code is preliminary code z /OS V2R1 Network interfaces used 10GbE RoCE Express OSA Express4s 10Gb For TCP (Linux on System z) : Running Layer 2 Mode (Checksum Offload and Segmentation Offload not applicable) 24

SMC-R References http://www-01.ibm.com/software/network/commserver/smcr/ 25

Thank You #OFADevWorkshop