In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

Similar documents
Interconnect Your Future

In-Network Computing. Paving the Road to Exascale. June 2017

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

The Future of High Performance Interconnects

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Interconnect Your Future

Interconnect Your Future

Interconnect Your Future

Paving the Road to Exascale

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System

Interconnect Your Future Paving the Road to Exascale

Interconnect Your Future

Solutions for Scalable HPC

High Performance Computing

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

Paving the Road to Exascale Computing. Yossi Avni

N V M e o v e r F a b r i c s -

The Future of Interconnect Technology

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Corporate Update. Enabling The Use of Data January Mellanox Technologies

How to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

ABySS Performance Benchmark and Profiling. May 2010

Ethernet. High-Performance Ethernet Adapter Cards

The Exascale Architecture

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

Accelerating Ceph with Flash and High Speed Networks

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

InfiniBand Networked Flash Storage

Mapping MPI+X Applications to Multi-GPU Architectures

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures

IBM CORAL HPC System Solution

2008 International ANSYS Conference

Introduction to Infiniband

Oak Ridge National Laboratory Computing and Computational Sciences

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Introduction to High-Speed InfiniBand Interconnect

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

RapidIO.org Update. Mar RapidIO.org 1

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.

Maximizing Cluster Scalability for LS-DYNA

Mellanox Virtual Modular Switch

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

NVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal

Future Routing Schemes in Petascale clusters

OCP3. 0. ConnectX Ethernet Adapter Cards for OCP Spec 3.0

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

RapidIO.org Update.

IO virtualization. Michael Kagan Mellanox Technologies

High Performance Interconnects: Landscape, Assessments & Rankings

Gen-Z Memory-Driven Computing

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

ARISTA: Improving Application Performance While Reducing Complexity

InfiniBand-based HPC Clusters

UCX: An Open Source Framework for HPC Network APIs and Beyond

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Optimizing LS-DYNA Productivity in Cluster Environments

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Accelerating Data Centers Using NVMe and CUDA

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

Birds of a Feather Presentation

Сетевые технологии для систем хранения данных

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

The Tofu Interconnect D

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

Hardened Security in the Cloud Bob Doud, Sr. Director Marketing March, 2018

MM5 Modeling System Performance Research and Profiling. March 2009

HA-PACS/TCA: Tightly Coupled Accelerators for Low-Latency Communication between GPUs

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

Creating an agile infrastructure with Virtualized I/O

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Networking at the Speed of Light

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures

GEN-Z AN OVERVIEW AND USE CASES

AcuSolve Performance Benchmark and Profiling. October 2011

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

LAMMPSCUDA GPU Performance. April 2011

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Cray XC Scalability and the Aries Network Tony Ford

Tightly Coupled Accelerators Architecture

NAMD Performance Benchmark and Profiling. January 2015

Messaging Overview. Introduction. Gen-Z Messaging

EXPERIENCES WITH NVME OVER FABRICS

Towards scalable RDMA locking on a NIC

Sharing High-Performance Devices Across Multiple Virtual Machines

Transcription:

In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data Creates Performance Bottlenecks Analyze Data as it Moves! Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale 2017 Mellanox Technologies 2

Data Centric Architecture to Overcome Latency Bottlenecks CPU-Centric (Onload) Data-Centric (Offload) Network In-Network Computing HPC / Machine Learning Communications Latencies of 30-40us HPC / Machine Learning Communications Latencies of 3-4us Intelligent Interconnect Paves the Road to Exascale Performance 2017 Mellanox Technologies 3

In-Network Computing to Enable Data-Centric Data Centers CPU GPU Adapters Switches GPU CPU CPU FPGA Co-Processor SmartNIC GPU GPU CPU In-Network Computing Key for Highest Return on Investment 2017 Mellanox Technologies 4

In-Network Computing to Enable Data-Centric Data Centers CPU GPU RDMA GPUDirect SHARP SHIELD GPU Tag-Matching NVMe over Fabrics CPU Security CPU Security Programmable (FPGA) NVMe Storage Programmable (ARM) GPU GPU CPU In-Network Computing Key for Highest Return on Investment 2017 Mellanox Technologies 5

In-Network Computing Advantages with SHARP Technology Critical for High Performance Computing and Machine Learning Applications 2017 Mellanox Technologies 6

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Reliable Scalable General Purpose Primitive In-network Tree based aggregation mechanism Large number of groups Multiple simultaneous outstanding operations Applicable to Multiple Use-cases HPC Applications using MPI / SHMEM Distributed Machine Learning applications SHARP Tree Root Scalable High Performance Collective Offload Barrier, Reduce, All-Reduce, Broadcast and more Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND Integer and Floating-Point, 16/32/64/128 bits SHARP Tree Aggregation Node (Process running on HCA) SHARP Tree Endnode (Process running on HCA) SHArP Tree 2017 Mellanox Technologies 7

SHARP Allreduce Performance Advantages SHARP enables 75% Reduction in Latency Providing Scalable Flat Latency 2017 Mellanox Technologies 8

SHIELD Self Healing Interconnect Technology 2017 Mellanox Technologies 9

MPI Tag-Matching Hardware Engines 35 Mellanox In-Network Computing Technology Deliver Highest Performance 2017 Mellanox Technologies 10

MPI Tag-Matching Hardware Engines 2017 Mellanox Technologies 11

Lower is better Lower is better MPI Tag-Matching Offload Advantages 31% 97% 31% lower latency and 97% lower CPU utilization for MPI operations Performance comparisons based on ConnectX-5 Mellanox In-Network Computing Technology Deliver Highest Performance 2017 Mellanox Technologies 12

Multi-Host Socket Direct Technology Innovative Solution to Dual-Socket Servers 2017 Mellanox Technologies 13

Multi-Host Socket-Direct Adapters Increase Server Return on Investment 30%-60% Better CPU Utilization 50%-80% Lower Data Latency 15%- 28% Better Data Throughout Available for All Servers (x86, Power, ARM, etc.) Highest Applications Performance, Scalability and Productivity 2017 Mellanox Technologies 14

Multi-Host Socket-Direct Adapters Increase Server Return on Investment 30%-60% Better CPU Utilization 50%-80% Lower Data Latency 15%- 28% Better Data Throughout Available for All Servers (x86, Power, ARM, etc.) Highest Applications Performance, Scalability and Productivity 2017 Mellanox Technologies 15

InfiniBand Router 2017 Mellanox Technologies 16

InfiniBand Architecture Native InfiniBand Connectivity Between Different InfiniBand Subnets (Each Subnet can Include 40K nodes) Isolation Between Different InfiniBand Networks (Each Network can be Managed Separately) Native InfiniBand Connectivity Between Different Network Topologies (Fat-Tree, Torus, Dragonfly, etc.) 2017 Mellanox Technologies 17

InfiniBand Isolation Enables Great Flexibility Enable separation and fault resilience between multiple InfiniBand subnets Enable sharing a common storage network by multiple compute infrastructure Connect different topologies used by the different subnets 2017 Mellanox Technologies 18

Application Performance Native InfiniBand Router Native InfiniBand Performance 2017 Mellanox Technologies 19

Mellanox Solutions Future Proof Your Data Center 2017 Mellanox Technologies 20

Mellanox to Connect Future #1 HPC Systems (Coral) Summit System Sierra System Paving the Path to Exascale 2017 Mellanox Technologies 21

Highest-Performance 100Gb/s Interconnect Solutions 100Gb/s Adapter, 0.6us latency 175-200 million messages per second (10 / 25 / 40 / 50 / 56 / 100Gb/s) 36 EDR (100Gb/s) Ports, <90ns Latency Throughput of 7.2Tb/s 7.02 Billion msg/sec (195M msg/sec/port) 32 100GbE Ports, 64 25/50GbE Ports (10 / 25 / 40 / 50 / 100GbE) Throughput of 3.2Tb/s Transceivers Active Optical and Copper Cables (10 / 25 / 40 / 50 / 56 / 100Gb/s) VCSELs, Silicon Photonics and Copper MPI, SHMEM/PGAS, UPC For Commercial and Open Source Applications Leverages Hardware Accelerations 2017 Mellanox Technologies 22

Highest-Performance 200Gb/s Interconnect Solutions 200Gb/s Adapter, 0.6us latency 200 million messages per second (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) 40 HDR (200Gb/s) InfiniBand Ports 80 HDR100 InfiniBand Ports Throughput of 16Tb/s, <90ns Latency 16 400GbE, 32 200GbE, 128 25/50GbE Ports (10 / 25 / 40 / 50 / 100 / 200 GbE) Throughput of 6.4Tb/s Transceivers Active Optical and Copper Cables (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) VCSELs, Silicon Photonics and Copper MPI, SHMEM/PGAS, UPC For Commercial and Open Source Applications Leverages Hardware Accelerations 2017 Mellanox Technologies 23

Thank You