Interconnect Your Future

Similar documents
In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

In-Network Computing. Paving the Road to Exascale. June 2017

The Future of High Performance Interconnects

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

Interconnect Your Future

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Building the Most Efficient Machine Learning System

Paving the Road to Exascale

Interconnect Your Future Paving the Road to Exascale

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

Building the Most Efficient Machine Learning System

Interconnect Your Future

Interconnect Your Future

Solutions for Scalable HPC

Interconnect Your Future

High Performance Computing

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Paving the Road to Exascale Computing. Yossi Avni

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

Corporate Update. Enabling The Use of Data January Mellanox Technologies

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Future of Interconnect Technology

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

N V M e o v e r F a b r i c s -

Accelerating Ceph with Flash and High Speed Networks

How to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries

The Exascale Architecture

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

Ethernet. High-Performance Ethernet Adapter Cards

Mapping MPI+X Applications to Multi-GPU Architectures

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

ABySS Performance Benchmark and Profiling. May 2010

InfiniBand Networked Flash Storage

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

Maximizing Cluster Scalability for LS-DYNA

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

Birds of a Feather Presentation

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

High Performance Interconnects: Landscape, Assessments & Rankings

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures

IBM CORAL HPC System Solution

Sharing High-Performance Devices Across Multiple Virtual Machines

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

OCP3. 0. ConnectX Ethernet Adapter Cards for OCP Spec 3.0

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Сетевые технологии для систем хранения данных

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.

Future Routing Schemes in Petascale clusters

Oak Ridge National Laboratory Computing and Computational Sciences

Networking at the Speed of Light

Accelerating Data Centers Using NVMe and CUDA

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures

LAMMPSCUDA GPU Performance. April 2011

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

Mellanox Virtual Modular Switch

ARISTA: Improving Application Performance While Reducing Complexity

2008 International ANSYS Conference

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Hardened Security in the Cloud Bob Doud, Sr. Director Marketing March, 2018

Introduction to Infiniband

GROMACS Performance Benchmark and Profiling. September 2012

Deep Learning mit PowerAI - Ein Überblick

MM5 Modeling System Performance Research and Profiling. March 2009

GPUs and Emerging Architectures

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

Voltaire Making Applications Run Faster

Optimizing LS-DYNA Productivity in Cluster Environments

NVMe Over Fabrics (NVMe-oF)

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

NVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal

CP2K Performance Benchmark and Profiling. April 2011

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

IO virtualization. Michael Kagan Mellanox Technologies

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

RapidIO.org Update. Mar RapidIO.org 1

Preparing GPU-Accelerated Applications for the Summit Supercomputer

InfiniBand-based HPC Clusters

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

Jeff Kash, Dan Kuchta, Fuad Doany, Clint Schow, Frank Libsch, Russell Budd, Yoichi Taira, Shigeru Nakagawa, Bert Offrein, Marc Taubenblatt

GROMACS Performance Benchmark and Profiling. August 2011

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

Dr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist. With Dell Amsterdam, October 27, 2016

By John Kim, Chair SNIA Ethernet Storage Forum. Several technology changes are collectively driving the need for faster networking speeds.

Toward a Memory-centric Architecture

Performance Optimizations for LS-DYNA with Mellanox HPC-X Scalable Software Toolkit

Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Transcription:

Interconnect Your Future Paving the Road to Exascale August 2017

Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data Creates Performance Bottlenecks Analyze Data as it Moves! Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale 2017 Mellanox Technologies 2

Data Centric Architecture to Overcome Latency Bottlenecks CPU-Centric (Onload) Data-Centric (Offload) Network In-Network Computing HPC / Machine Learning Communications Latencies of 30-40us HPC / Machine Learning Communications Latencies of 3-4us Intelligent Interconnect Paves the Road to Exascale Performance 2017 Mellanox Technologies 3

In-Network Computing to Enable Data-Centric Data Centers CPU GPU Adapters Switches GPU CPU CPU FPGA Co-Processor SmartNIC GPU GPU CPU In-Network Computing Key for Highest Return on Investment 2017 Mellanox Technologies 4

In-Network Computing to Enable Data-Centric Data Centers CPU GPU GPU RDMA CORE-Direct Tag-Matching GPUDirect NVMe over Fabrics SHARP SHIELD CPU Security CPU Security Programmable (FPGA) NVMe Storage Programmable (ARM) GPU GPU CPU In-Network Computing Key for Highest Return on Investment 2017 Mellanox Technologies 5

In-Network Computing Advantages with SHARP Technology Critical for High Performance Computing and Machine Learning Applications 2017 Mellanox Technologies 6

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Reliable Scalable General Purpose Primitive In-network Tree based aggregation mechanism Large number of groups Multiple simultaneous outstanding operations Applicable to Multiple Use-cases HPC Applications using MPI / SHMEM Distributed Machine Learning applications SHARP Tree Root Scalable High Performance Collective Offload Barrier, Reduce, All-Reduce, Broadcast and more Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND Integer and Floating-Point, 16/32/64/128 bits SHARP Tree Aggregation Node (Process running on HCA) SHARP Tree Endnode (Process running on HCA) SHArP Tree 2017 Mellanox Technologies 7

SHARP Allreduce Performance Advantages SHARP enables 75% Reduction in Latency Providing Scalable Flat Latency 2017 Mellanox Technologies 8

SHIELD Self Healing Interconnect Technology 2017 Mellanox Technologies 9

Consider a Flow From A to B Data Server A Server B 2017 Mellanox Technologies 10

The Simple Case: Local Fix Data Server A Server B 2017 Mellanox Technologies 11

The Remote Case: Using FRN s (Fault Recovery Notifications) FRN Data Data Server A Server B 2017 Mellanox Technologies 12

MPI Tag-Matching Hardware Engines 35 Mellanox In-Network Computing Technology Deliver Highest Performance 2017 Mellanox Technologies 13

MPI Tag-Matching Hardware Engines 2017 Mellanox Technologies 14

Lower is better Lower is better MPI Tag-Matching Offload Advantages 31% 97% 31% lower latency and 97% lower CPU utilization for MPI operations Performance comparisons based on ConnectX-5 Mellanox In-Network Computing Technology Deliver Highest Performance 2017 Mellanox Technologies 15

Mellanox Roadmap Future Proof Your Data Center 2017 Mellanox Technologies 16

Mellanox to Connect Future #1 HPC Systems (Coral) Summit System Sierra System Paving the Path to Exascale 2017 Mellanox Technologies 17

Mellanox to Connect Future #1 HPC Systems (Coral) Summit System Paving the Path to Exascale 2017 Mellanox Technologies 18

Highest-Performance 100Gb/s Interconnect Solutions 100Gb/s Adapter, 0.6us latency 175-200 million messages per second (10 / 25 / 40 / 50 / 56 / 100Gb/s) 36 EDR (100Gb/s) Ports, <90ns Latency Throughput of 7.2Tb/s 7.02 Billion msg/sec (195M msg/sec/port) 32 100GbE Ports, 64 25/50GbE Ports (10 / 25 / 40 / 50 / 100GbE) Throughput of 3.2Tb/s Transceivers Active Optical and Copper Cables (10 / 25 / 40 / 50 / 56 / 100Gb/s) VCSELs, Silicon Photonics and Copper MPI, SHMEM/PGAS, UPC For Commercial and Open Source Applications Leverages Hardware Accelerations 2017 Mellanox Technologies 19

Highest-Performance 200Gb/s Interconnect Solutions 200Gb/s Adapter, 0.6us latency 200 million messages per second (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) 40 HDR (200Gb/s) InfiniBand Ports 80 HDR100 InfiniBand Ports Throughput of 16Tb/s, <90ns Latency 16 400GbE, 32 200GbE, 128 25/50GbE Ports (10 / 25 / 40 / 50 / 100 / 200 GbE) Throughput of 6.4Tb/s Transceivers Active Optical and Copper Cables (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) VCSELs, Silicon Photonics and Copper MPI, SHMEM/PGAS, UPC For Commercial and Open Source Applications Leverages Hardware Accelerations 2017 Mellanox Technologies 20

InfiniBand Delivers Best Return on Investment 30%-250% Higher Return on Investment Up to 50% Saving on Capital and Operation Expenses Highest Applications Performance, Scalability and Productivity Weather Automotive Chemistry Molecular Dynamics Genomics 1.3X Better 2X Better 1.4X Better 2.5X Better 1.3X Better 2017 Mellanox Technologies 21

Thank You