Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Similar documents
InfiniBand Networked Flash Storage

N V M e o v e r F a b r i c s -

Application Acceleration Beyond Flash Storage

Hardened Security in the Cloud Bob Doud, Sr. Director Marketing March, 2018

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

Why AI Frameworks Need (not only) RDMA?

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

Evaluation of the Chelsio T580-CR iscsi Offload adapter

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

EXPERIENCES WITH NVME OVER FABRICS

Implementing Storage in Intel Omni-Path Architecture Fabrics

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

Advanced Computer Networks. End Host Optimization

Emerging Technologies for HPC Storage

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Birds of a Feather Presentation

Advanced RDMA-based Admission Control for Modern Data-Centers

The Future of High Performance Interconnects

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

RoCE vs. iwarp Competitive Analysis

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Interconnect Your Future

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

NVMf based Integration of Non-volatile Memory in a Distributed System - Lessons learned

Network Design Considerations for Grid Computing

The NE010 iwarp Adapter

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

ADINA DMP System 9.3 Installation Notes

Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud

MM5 Modeling System Performance Research and Profiling. March 2009

Networking at the Speed of Light

Building the Most Efficient Machine Learning System

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

FCoE at 40Gbps with FC-BB-6

Deep Learning Performance and Cost Evaluation

High Performance Interconnects: Landscape, Assessments & Rankings

OFA Developer Workshop 2014

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Solutions for Scalable HPC

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Performance of RDMA and HPC Applications in Virtual Machines using FDR InfiniBand on VMware vsphere T E C H N I C A L W H I T E P A P E R

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer

ARISTA: Improving Application Performance While Reducing Complexity

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

LAMMPSCUDA GPU Performance. April 2011

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Evaluation of Chelsio Terminator 6 (T6) Unified Wire Adapter iscsi Offload

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

2008 International ANSYS Conference

STAR-CCM+ Performance Benchmark and Profiling. July 2014

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

iser as accelerator for Software Defined Storage Rahul Fiske, Subhojit Roy IBM (India)

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Jim Harris. Principal Software Engineer. Intel Data Center Group

IBM Power AC922 Server

SMB Direct Update. Tom Talpey and Greg Kramer Microsoft Storage Developer Conference. Microsoft Corporation. All Rights Reserved.

Solarflare and OpenOnload Solarflare Communications, Inc.

CP2K Performance Benchmark and Profiling. April 2011

Virtual Switch Acceleration with OVS-TC

Improving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload

LS-DYNA Performance Benchmark and Profiling. October 2017

BITUG Big Sig December 2013 NonStop Performance Update David Sly HP UK Tech Services

NAMD GPU Performance Benchmark. March 2011

Storage Protocol Offload for Virtualized Environments Session 301-F

The rcuda middleware and applications

LS-DYNA Performance Benchmark and Profiling. October 2017

CLN4C44. ConnectX -4 Quad Port 25Gb/s Adapter. User Manual. Rev. 1.0

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Deep Learning Performance and Cost Evaluation

Andreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

IBM Power Advanced Compute (AC) AC922 Server

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures

Introduction to Infiniband

Interconnect Your Future

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT

Building the Most Efficient Machine Learning System

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

TIBCO, HP and Mellanox High Performance Extreme Low Latency Messaging

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Transcription:

Meltdown and Spectre Interconnect Evaluation Jan 2018 1

Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about data regions that are protected The attacks use the speculative execution process to gain access to restricted or confidential information Meltdown and Spectre fixes cause processor performance degradation based on the info below: Large: 8% to 19%+ Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions. Examples include OLTP Workloads, random I/O to NvME, etc. Modest: 3% to 7% Database analytics and Java VMs. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Small: 2% to 5% Workloads that spend little time in the kernel were measured. This is with jobs that run mostly in user space and are scheduled using CPU-pinning or NUMA-control. Examples include Linpack NxN on x86 and SPECcpu2006 Minimal - None Accelerator technologies that generally bypass the kernel in favor of user direct access. Examples include DPDK, RDMA and other offloads that bypass the kernel. Following slides provide measured performance impact on various interconnect technologies 2

Offload Interconnect Ensures Highest RDMA and Kernel Bypass are critical to Ensure Highest and ROI Offload-Based Interconnect Impact 0% Onload-Based Interconnect Impact Up to -47% Offload-based Interconnect technologies bypass the Kernel and therefore maintain best performance Example: RDMA (InfiniBand and Ethernet), RDMA-based NVMe-over-Fabric, other interconnect offloads Onload-based interconnect performance is negatively impacted by Meltdown and Spectre fixes Includes TCP/IP over Ethernet, OmniPath and other onload-based products 3

RoCE vs TCP Results Impact: 0% 0% Impact: -47% -47% Before Before applying software patches, After After applying software patches 4

InfiniBand vs OmniPath Results Impact: 0% 0% Impact: -26% -26% Before Before applying software patches, After After applying software patches 5

NVMe-over-Fabric Results RDMA Guarantees Highest NVMe-oF Impact: 0% 0% Before Before applying software patches, After After applying software patches 6

Lower is Better Number of CPU Cores Data Streaming Results Video Streaming (64 Streams = 96Gb/s) 35 Kernel Streaming Mellanox Interconnect Streaming (VMA) 30 25 20 15 21 Cores Needed -44% Impact 31 Cores Needed 30 Fewer CPU Cores Needed for the Same Throughput 10 5 0 <1 Core <1 Core Before After Before After Before Before applying software patches, After After applying software patches 7

Setup Information RoCE and TCP CPU Intel Xeon CPU E5-2697A v4 x86_64 @ 2.60GHz Operating System Red Hat Enterprise Linux Server 7.4 Kernel Version 3.10.0-693.11.6.el7.x86_64, 3.10.0-693.el7.x86_64 Description gen-l-vrt-149_gen-l-vrt-159 kernel-3.10.0-693.11.6.el7.x86_64 b2b Adapter: ConnectX-5, Firmware 16.22.0170, Driver MLNX_OFED_LINUX-4.3-0.0.5.0 InfiniBand and OmniPath CPU: Intel Xeon CPU Gold 6138 CPU @ 2.00GHz Kernel 3.10.0-693.el7.x86_64, 3.10.0-693.11.6.el7.x86_64 Operating System Red Hat Enterprise Linux Server 7.4 OPA driver: IntelOPA-IFS.RHEL74-x86_64.10.6.1.0.2 InfiniBand EDR driver: MLNX_OFED 4.2 NVMe Adapter: ConnectX-5 CPU Intel Xeon CPU E5-2690 v3, RAM 64G Operating System Red Hat Enterprise Linux Server 7.4 OFED 4.3-0.1.0.0 Kernel 3.10.0-693.11.6.el7.x86_64 8

Thank You 9