The Future of High Performance Interconnects

Size: px
Start display at page:

Download "The Future of High Performance Interconnects"

Transcription

1 The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017

2 When Algorithms Go Rogue 2017 Mellanox Technologies 2

3 When Algorithms Go Rogue 2017 Mellanox Technologies 3

4 InfiniBand Accelerates AI Facebook AI Supercomputer #31 on the TOP500 List NVIDIA AI Supercomputer #32 on the TOP500 List EDR InfiniBand In-Network Computing technology key for scalable Deep Learning systems RDMA accelerates Deep Learning performance by 2X, becomes de-facto solution for AI 2017 Mellanox Technologies 4

5 Mellanox Accelerates The World s Fastest Supercomputers Accelerates the #1 supercomputer InfiniBand connects 36% of the total TOP500 systems (179 systems) InfiniBand connects 60% of the HPC TOP500 systems InfiniBand accelerates 45% of the Petascale systems EDR InfiniBand solutions grow 2.5 Times in six months Connects All of 40G Ethernet systems (3 systems), connects the first 100G Ethernet system on the list InfiniBand most used HPC interconnect in first half of 2017 connects 2.5X more end-user projects versus Omni-Path InfiniBand is the Interconnect of Choice for HPC Infrastructures Enabling Machine Learning, High-Performance, Web 2.0, Cloud, Storage, Big Data Applications 2017 Mellanox Technologies 5

6 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data Creates Performance Bottlenecks Analyze Data as it Moves! Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale 2017 Mellanox Technologies 6

7 In-Network Computing to Enable Data-Centric Data Centers CPU GPU Adapters Mesh Switches Torus GPU CPU CPU FPGA Co-Processor SmartNIC GPU GPU Dragonfly+ CPU In-Network Computing Key for Highest Return on Investment 2017 Mellanox Technologies 7

8 In-Network Computing to Enable Data-Centric Data Centers CPU GPU RDMA CORE-Direct Tag-Matching GPUDirect NVMe over Fabrics Mesh SHARP SHIELD Torus GPU CPU Security CPU Security Programmable (FPGA) NVMe Storage Programmable (ARM) GPU GPU Dragonfly+ CPU In-Network Computing Key for Highest Return on Investment 2017 Mellanox Technologies 8

9 Making Interconnect Smart Again!!! Critical for High Performance Computing and Machine Learning Applications 2017 Mellanox Technologies 9

10 Data Centric Architecture to Overcome Latency Bottlenecks CPU-Centric (Onload) Data-Centric (Offload) Network In-Network Computing HPC / Machine Learning Communications Latencies of 30-40us HPC / Machine Learning Communications Latencies of 3-4us Intelligent Interconnect Paves the Road to Exascale Performance 2017 Mellanox Technologies 10

11 MPI profile of an application (eg. WRF) Take MPI_Reduce for example In this application for eg; 4-8 byte MPI_Reduce has a significant component SHARP can reduce this latency from ~100us to <10us for a 512 node run 2017 Mellanox Technologies 11

12 SHARP Allreduce Performance Advantages SHARP enables 75% Reduction in Latency Providing Scalable Flat Latency 2017 Mellanox Technologies 12

13 Performance of MPI with GPUDirect RDMA GPU-GPU Internode MPI Latency GPU-GPU Internode MPI Bandwidth 10x 9.3X Lower is Better Higher is Better 2.18 usec Source: Prof. DK Panda 88% Lower Latency 10X Increase in Throughput 2017 Mellanox Technologies 13

14 RDMA & GPUDirect for High Performance Machine Learning GPUDirect Accelerates communication between servers and storage Allows peer to peer transfers between GPUs Requires RDMA & RoCE ML Training Scenario Billions of TFLOPS per training run RoCE Enabled Network Years of compute-days on Xeon GPUDirect turns years to days Without GPU Direct - Same Data Copied 3x With GPUDirect (Requires RoCE) 2017 Mellanox Technologies 14

15 Accelerating TensorFlow with grpc over RDMA Completion Time (in ms) RDMA TCP Lower is better >2x Faster 1K 2K 4K 8K 16K 32K 64K 128K 256K Message Size (in bytes) Open Source Machine Learning from Google Hottest ML Software; Popular than Linux Powers 40 Different Google Products Distributed Training With grpc Framework Google s Optimized RPC for Distributed Network RDMA Acceleration over UCX Unified Communication X (UCX) - Initiative by ORNL, Mellanox, IBM, NVidia, etc. Integration With Upstream grpc/tensorflow >2X Performance Gain With RDMA 2017 Mellanox Technologies 15

16 Mellanox 100GbE RoCE Powers Baidu Self Driving Car Core Switch Leaf1 100GbE Leaf2 Leaf18 Node 4 Node 3 Node 4 Node 3 Node 4 Node 3 Node 4 Node 3 Node 4 Node 3 Node 4 Node 3 Node 2 Node 2 Node 2 Node 2 Node 2 Node 2 Node 1 Node 1 Node 1 Node 1 Node 1 Node 1 Rack1 Rack2 Rack3 Rack4 Rack37 Rack38 Server Configuration 16x Tesla K40M 2x Mellanox ConnectX-4 100GbE Adapters 38-racks of U GPU Box Node Network Configuration 18x 100GbE Mellanox Spectrum SN GbE Mellanox DAC & Optical Cables ~2X Faster Training with Paddle Paddle Machine Learning Framework 2017 Mellanox Technologies 16

17 Multi-Host Socket Direct Technology Innovative Solution to Dual-Socket Servers 2017 Mellanox Technologies 17

18 Multi-Host Socket-Direct Adapters Increase Server Return on Investment 30%-60% Better CPU Utilization 50%-80% Lower Data Latency 15%- 28% Better Data Throughout Available for All Servers (x86, Power, ARM, etc.) Highest Applications Performance, Scalability and Productivity 2017 Mellanox Technologies 18

19 Multi-Host Socket-Direct Adapters Increase Server Return on Investment 30%-60% Better CPU Utilization 50%-80% Lower Data Latency 15%- 28% Better Data Throughout Available for All Servers (x86, Power, ARM, etc.) Highest Applications Performance, Scalability and Productivity 2017 Mellanox Technologies 19

20 Highest-Performance 200Gb/s Interconnect Solutions 200Gb/s Adapter, 0.6us latency 200 million messages per second (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) 40 HDR (200Gb/s) InfiniBand Ports 80 HDR100 InfiniBand Ports Throughput of 16Tb/s, <90ns Latency GbE, GbE, /50GbE Ports (10 / 25 / 40 / 50 / 100 / 200 GbE) Throughput of 6.4Tb/s Transceivers Active Optical and Copper Cables (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) VCSELs, Silicon Photonics and Copper MPI, SHMEM/PGAS, UPC For Commercial and Open Source Applications Leverages Hardware Accelerations 2017 Mellanox Technologies 20

21 Not a mere CAD Drawing.. Design HDR uplinked EDR network today Move to 2x EDR nodes on same network with ConnectX6 40-port 1U HDR Platform 80-port EDR 2017 Mellanox Technologies 21

22 What We Don t Want to Achieve 2017 Mellanox Technologies 22

23 Mellanox technology options to scale for more bandwidth Mellanox is pushing the boundary on every dimension to increase the bandwidth of LinkX Line Rate (Gb/ s) x 2x 400G 100G Parallel 4 25G NRZ 4 100G WDM 400G WDM 80x 80 Wavelength Channel 4-8x 4x SMF/MMF PAM16 Physical Channel 32 Complex Modulations 2017 Mellanox Technologies 23

24 InfiniBand Delivers Best Return on Investment 30%-250% Higher Return on Investment Up to 50% Saving on Capital and Operation Expenses Highest Applications Performance, Scalability and Productivity Weather Automo=ve Chemistry Molecular Dynamics 1.3X Be(er 2X Be(er 1.4X Be(er 2.5X Be(er 2017 Mellanox Technologies Genomics 1.3X Be(er 24

25 Let s Summarize Scalable, intelligent, flexible, high performance, end-to-end connectivity Standards-based (InfiniBand, Ethernet), supported by large eco-system Supports all compute architectures: x86, Power, ARM, GPU, FPGA etc. Offloading architecture: RDMA, application acceleration engines, etc. Flexible topologies: Fat Tree, Mesh, 3D Torus, Dragonfly+, etc. Converged I/O: compute, storage, management on single fabric Backward and future compatible The Future Depends On Smart Interconnect 2017 Mellanox Technologies 25

26 More Question Time Ashrut Ambastha 2015 Mellanox Technologies - Mellanox Confidential - 26

27 Thank You

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Paving the Road to Exascale August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017 Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Paving the Path to Exascale November 2017 Mellanox Accelerates Leading HPC and AI Systems Summit CORAL System Sierra CORAL System Fastest Supercomputer in Japan Fastest Supercomputer

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

In-Network Computing. Paving the Road to Exascale. June 2017

In-Network Computing. Paving the Road to Exascale. June 2017 In-Network Computing Paving the Road to Exascale June 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect -Centric (Onload) Data-Centric (Offload) Must Wait for the Data Creates

More information

Paving the Road to Exascale

Paving the Road to Exascale Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015

More information

Solutions for Scalable HPC

Solutions for Scalable HPC Solutions for Scalable HPC Scot Schultz, Director HPC/Technical Computing HPC Advisory Council Stanford Conference Feb 2014 Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End

More information

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017 In-Network Computing Sebastian Kalcher, Senior System Engineer HPC May 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer

More information

High Performance Computing

High Performance Computing High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and

More information

Interconnect Your Future Paving the Road to Exascale

Interconnect Your Future Paving the Road to Exascale Interconnect Your Future Paving the Road to Exascale CHPC, December 2017 90% of the World Data was Created in the Last 2 Years 2017 Mellanox Technologies 2 The Future Depends on Fastest Interconnects 1Gb/s

More information

The Future of Interconnect Technology

The Future of Interconnect Technology The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies

More information

Interconnect Your Future

Interconnect Your Future #OpenPOWERSummit Interconnect Your Future Scot Schultz, Director HPC / Technical Computing Mellanox Technologies OpenPOWER Summit, San Jose CA March 2015 One-Generation Lead over the Competition Mellanox

More information

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox InfiniBand Host Channel Adapters (HCA) enable the highest data center

More information

N V M e o v e r F a b r i c s -

N V M e o v e r F a b r i c s - N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server

More information

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011 The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit

More information

Paving the Road to Exascale Computing. Yossi Avni

Paving the Road to Exascale Computing. Yossi Avni Paving the Road to Exascale Computing Yossi Avni HPC@mellanox.com Connectivity Solutions for Efficient Computing Enterprise HPC High-end HPC HPC Clouds ICs Mellanox Interconnect Networking Solutions Adapter

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

Corporate Update. Enabling The Use of Data January Mellanox Technologies

Corporate Update. Enabling The Use of Data January Mellanox Technologies Corporate Update Enabling The Use of Data January 2018 Safe Harbor Statement These slides and the accompanying oral presentation contain forward-looking statements and information. The use of words such

More information

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014 InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment TOP500 Supercomputers, June 2014 TOP500 Performance Trends 38% CAGR 78% CAGR Explosive high-performance

More information

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional

More information

Ethernet. High-Performance Ethernet Adapter Cards

Ethernet. High-Performance Ethernet Adapter Cards High-Performance Ethernet Adapter Cards Supporting Virtualization, Overlay Networks, CPU Offloads and RDMA over Converged Ethernet (RoCE), and Enabling Data Center Efficiency and Scalability Ethernet Mellanox

More information

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies Storage Transitions Change Network Needs Software Defined Storage Flash Storage Storage

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

High-Performance Training for Deep Learning and Computer Vision HPC

High-Performance Training for Deep Learning and Computer Vision HPC High-Performance Training for Deep Learning and Computer Vision HPC Panel at CVPR-ECV 18 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity

More information

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1 How to Network Flash Storage Efficiently at Hyperscale Manoj Wadekar Michael Kagan Flash Memory Summit 2017 Santa Clara, CA 1 ebay Hyper scale Infrastructure Search Front-End & Product Hadoop Object Store

More information

Accelerating Ceph with Flash and High Speed Networks

Accelerating Ceph with Flash and High Speed Networks Accelerating Ceph with Flash and High Speed Networks Dror Goldenberg VP Software Architecture Santa Clara, CA 1 The New Open Cloud Era Compute Software Defined Network Object, Block Software Defined Storage

More information

Deep Learning mit PowerAI - Ein Überblick

Deep Learning mit PowerAI - Ein Überblick Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start

More information

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures M. Bayatpour, S. Chakraborty, H. Subramoni, X. Lu, and D. K. Panda Department of Computer Science and Engineering

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007 Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

By John Kim, Chair SNIA Ethernet Storage Forum. Several technology changes are collectively driving the need for faster networking speeds.

By John Kim, Chair SNIA Ethernet Storage Forum. Several technology changes are collectively driving the need for faster networking speeds. A quiet revolutation is taking place in networking speeds for servers and storage, one that is converting 1Gb and 10 Gb connections to 25Gb, 50Gb and 100 Gb connections in order to support faster servers,

More information

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Providing Superior Server and Storage Performance, Efficiency and Return on Investment As Announced and Demonstrated at

More information

Sharing High-Performance Devices Across Multiple Virtual Machines

Sharing High-Performance Devices Across Multiple Virtual Machines Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,

More information

OCP3. 0. ConnectX Ethernet Adapter Cards for OCP Spec 3.0

OCP3. 0. ConnectX Ethernet Adapter Cards for OCP Spec 3.0 OCP3. 0 ConnectX Ethernet Adapter Cards for OCP Spec 3.0 High Performance 10/25/40/50/100/200 GbE Ethernet Adapter Cards in the Open Compute Project Spec 3.0 Form Factor For illustration only. Actual products

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about

More information

Introduction to Infiniband

Introduction to Infiniband Introduction to Infiniband FRNOG 22, April 4 th 2014 Yael Shenhav, Sr. Director of EMEA, APAC FAE, Application Engineering The InfiniBand Architecture Industry standard defined by the InfiniBand Trade

More information

Maximizing Cluster Scalability for LS-DYNA

Maximizing Cluster Scalability for LS-DYNA Maximizing Cluster Scalability for LS-DYNA Pak Lui 1, David Cho 1, Gerald Lotto 1, Gilad Shainer 1 1 Mellanox Technologies, Inc. Sunnyvale, CA, USA 1 Abstract High performance network interconnect is an

More information

ARISTA: Improving Application Performance While Reducing Complexity

ARISTA: Improving Application Performance While Reducing Complexity ARISTA: Improving Application Performance While Reducing Complexity October 2008 1.0 Problem Statement #1... 1 1.1 Problem Statement #2... 1 1.2 Previous Options: More Servers and I/O Adapters... 1 1.3

More information

Why AI Frameworks Need (not only) RDMA?

Why AI Frameworks Need (not only) RDMA? Why AI Frameworks Need (not only) RDMA? With Design and Implementation Experience of Networking Support on TensorFlow GDR, Apache MXNet, WeChat Amber, and Tencent Angel Bairen Yi (byi@connect.ust.hk) Jingrong

More information

CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters

CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan and Dhabaleswar K. (DK) Panda Speaker: Sourav Chakraborty

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

InfiniBand Networked Flash Storage

InfiniBand Networked Flash Storage InfiniBand Networked Flash Storage Superior Performance, Efficiency and Scalability Motti Beck Director Enterprise Market Development, Mellanox Technologies Flash Memory Summit 2016 Santa Clara, CA 1 17PB

More information

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved. Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access

More information

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc Atos announces the Bull sequana X1000 the first exascale-class supercomputer Jakub Venc The world is changing The world is changing Digital simulation will be the key contributor to overcome 21 st century

More information

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures Haiyang Shi, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda {shi.876, lu.932, panda.2}@osu.edu The Ohio State University

More information

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX -2 EN with RoCE Adapter Delivers Reliable Multicast Messaging With Ultra Low Latency

More information

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr 19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME

More information

IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads

IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads The engine to power your AI data pipeline Introduction: Artificial intelligence (AI) including deep learning (DL) and machine

More information

RapidIO.org Update. Mar RapidIO.org 1

RapidIO.org Update. Mar RapidIO.org 1 RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org

More information

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710 COMPETITIVE BRIEF April 5 Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL7 Introduction: How to Choose a Network Interface Card... Comparison: Mellanox ConnectX

More information

The Best Ethernet Storage Fabric

The Best Ethernet Storage Fabric The Best Ethernet Storage Fabric John F. Kim & Amit Katz Santa Clara, CA August 2017 1 Storage Networking Background: From Fibre Channel to Ethernet 1997 2017 Feature Fibre Channel Ethernet Bandwidth 1

More information

Сетевые технологии для систем хранения данных

Сетевые технологии для систем хранения данных Сетевые технологии для систем хранения данных Nov, 2018 Boris Neiman Sr. System Engineer 1 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide Offices ~2,900 Employees

More information

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA, S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print

More information

Mellanox Virtual Modular Switch

Mellanox Virtual Modular Switch WHITE PAPER July 2015 Mellanox Virtual Modular Switch Introduction...1 Considerations for Data Center Aggregation Switching...1 Virtual Modular Switch Architecture - Dual-Tier 40/56/100GbE Aggregation...2

More information

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Panda Department of Computer Science and Engineering The Ohio

More information

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory

More information

Application Acceleration Beyond Flash Storage

Application Acceleration Beyond Flash Storage Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage

More information

iser as accelerator for Software Defined Storage Rahul Fiske, Subhojit Roy IBM (India)

iser as accelerator for Software Defined Storage Rahul Fiske, Subhojit Roy IBM (India) iser as accelerator for Software Defined Storage Rahul Fiske, Subhojit Roy IBM (India) Agenda Network storage virtualization Current state of Fiber Channel iscsi seeing significant adoption Emergence of

More information

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice Providing the Best Return on Investment by Delivering the Highest System Efficiency and Utilization Top500 Supercomputers June

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

High Performance Interconnects: Landscape, Assessments & Rankings

High Performance Interconnects: Landscape, Assessments & Rankings High Performance Interconnects: Landscape, Assessments & Rankings Dan Olds Partner, OrionX April 12, 2017 Specialized 100G InfiniBand MPI Multi-Rack OPA Single-Rack HPI market segment 100G 40G 10G TCP/IP

More information

Gen-Z Memory-Driven Computing

Gen-Z Memory-Driven Computing Gen-Z Memory-Driven Computing Our vision for the future of computing Patrick Demichel Distinguished Technologist Explosive growth of data More Data Need answers FAST! Value of Analyzed Data 2005 0.1ZB

More information

Hardened Security in the Cloud Bob Doud, Sr. Director Marketing March, 2018

Hardened Security in the Cloud Bob Doud, Sr. Director Marketing March, 2018 Hardened Security in the Cloud Bob Doud, Sr. Director Marketing March, 2018 1 Cloud Computing is Growing at an Astounding Rate Many compelling reasons for business to move to the cloud Cost, uptime, easy-expansion,

More information

How to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries

How to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries How to Boost the Performance of Your MPI and PGAS s with MVAPICH2 Libraries A Tutorial at the MVAPICH User Group (MUG) Meeting 18 by The MVAPICH Team The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Real Application Performance and Beyond

Real Application Performance and Beyond Real Application Performance and Beyond Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400 Fax: 408-970-3403 http://www.mellanox.com Scientists, engineers and analysts

More information

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Talk at Bench 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi

More information

Networking at the Speed of Light

Networking at the Speed of Light Networking at the Speed of Light Dror Goldenberg VP Software Architecture MaRS Workshop April 2017 Cloud The Software Defined Data Center Resource virtualization Efficient services VM, Containers uservices

More information

IBM Power AC922 Server

IBM Power AC922 Server IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated

More information

LAMMPSCUDA GPU Performance. April 2011

LAMMPSCUDA GPU Performance. April 2011 LAMMPSCUDA GPU Performance April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory Council

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

Architectures for Scalable Media Object Search

Architectures for Scalable Media Object Search Architectures for Scalable Media Object Search Dennis Sng Deputy Director & Principal Scientist NVIDIA GPU Technology Workshop 10 July 2014 ROSE LAB OVERVIEW 2 Large Database of Media Objects Next- Generation

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

Introduction to High-Speed InfiniBand Interconnect

Introduction to High-Speed InfiniBand Interconnect Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output

More information

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015 LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA

More information

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Messaging Overview. Introduction. Gen-Z Messaging

Messaging Overview. Introduction. Gen-Z Messaging Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional

More information

IBM Power Advanced Compute (AC) AC922 Server

IBM Power Advanced Compute (AC) AC922 Server IBM Power Advanced Compute (AC) AC922 Server The Best Server for Enterprise AI Highlights IBM Power Systems Accelerated Compute (AC922) server is an acceleration superhighway to enterprise- class AI. A

More information

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning 5th ANNUAL WORKSHOP 209 Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning Hari Subramoni Dhabaleswar K. (DK) Panda The Ohio State University The Ohio State University E-mail:

More information

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance

More information

Dr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist. With Dell Amsterdam, October 27, 2016

Dr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist. With Dell Amsterdam, October 27, 2016 Dr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist With Dell Amsterdam, October 27, 2016 Legal Disclaimers Intel technologies features and benefits depend on system configuration and may

More information

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem Rob Davis Mellanox Technologies robd@mellanox.com The FASTEST Storage Protocol: iser The FASTEST Storage: Flash What it is: iscsi

More information

Building NVLink for Developers

Building NVLink for Developers Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized

More information

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters Designing High Performance Heterogeneous Broadcast for Streaming Applications on Clusters 1 Ching-Hsiang Chu, 1 Khaled Hamidouche, 1 Hari Subramoni, 1 Akshay Venkatesh, 2 Bracy Elton and 1 Dhabaleswar

More information

ETHERNET OPTICS TODAY: 25G NRZ

ETHERNET OPTICS TODAY: 25G NRZ ETHERNET OPTICS TODAY: 25G NRZ THE STATE OF ETHERNET OPTICS PANEL Brad Smith, Director of Marketing, LinkX Interconnects, Mellanox March 23, 2016 OFC 2016 Anaheim, CA BradS@Mellanox.com 1GBASE-T CAT5 is

More information