Birds of a Feather Presentation

Similar documents
Future Routing Schemes in Petascale clusters

2008 International ANSYS Conference

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Optimizing LS-DYNA Productivity in Cluster Environments

Interconnect Your Future

Paving the Road to Exascale Computing. Yossi Avni

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

Solutions for Scalable HPC

MM5 Modeling System Performance Research and Profiling. March 2009

The NE010 iwarp Adapter

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

The Future of Interconnect Technology

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

Single-Points of Performance

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

High Performance Computing

Interconnect Your Future

Infiniband and RDMA Technology. Doug Ledford

CP2K Performance Benchmark and Profiling. April 2011

ABySS Performance Benchmark and Profiling. May 2010

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Paving the Road to Exascale

NAMD Performance Benchmark and Profiling. November 2010

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Interconnect Your Future

ARISTA: Improving Application Performance While Reducing Complexity

InfiniBand Networked Flash Storage

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Multifunction Networking Adapters

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

HYCOM Performance Benchmark and Profiling

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

Application Acceleration Beyond Flash Storage

N V M e o v e r F a b r i c s -

Voltaire Making Applications Run Faster

RoCE vs. iwarp Competitive Analysis

CP2K Performance Benchmark and Profiling. April 2011

LAMMPSCUDA GPU Performance. April 2011

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Interconnect Your Future

In-Network Computing. Paving the Road to Exascale. June 2017

Introduction to Infiniband

Introduction to High-Speed InfiniBand Interconnect

LS-DYNA Performance Benchmark and Profiling. April 2015

NEMO Performance Benchmark and Profiling. May 2011

NAMD GPU Performance Benchmark. March 2011

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Architected for Performance. NVMe over Fabrics. September 20 th, Brandon Hoff, Broadcom.

All Roads Lead to Convergence

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Ethernet. High-Performance Ethernet Adapter Cards

Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract

NAMD Performance Benchmark and Profiling. January 2015

ICON Performance Benchmark and Profiling. March 2012

Performance Optimizations for LS-DYNA with Mellanox HPC-X Scalable Software Toolkit

TIBCO, HP and Mellanox High Performance Extreme Low Latency Messaging

OCTOPUS Performance Benchmark and Profiling. June 2015

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

QLogic in HPC Vendor Update IDC HPC User Forum April 16, 2008 Jeff Broughton Sr. Director Engineering Host Solutions Group

GROMACS Performance Benchmark and Profiling. August 2011

Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth

Building the Most Efficient Machine Learning System

LS-DYNA Performance Benchmark and Profiling. October 2017

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

Sharing High-Performance Devices Across Multiple Virtual Machines

Real Application Performance and Beyond

QuickSpecs. HP InfiniBand Options for HP BladeSystems c-class. Overview

The Future of High Performance Interconnects

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011

Building the Most Efficient Machine Learning System

Maximizing Cluster Scalability for LS-DYNA

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

IO virtualization. Michael Kagan Mellanox Technologies

iser as accelerator for Software Defined Storage Rahul Fiske, Subhojit Roy IBM (India)

Low latency, high bandwidth communication. Infiniband and RDMA programming. Bandwidth vs latency. Knut Omang Ifi/Oracle 2 Nov, 2015

LAMMPS, LS- DYNA, HPL, and WRF on iwarp vs. InfiniBand FDR

LS-DYNA Performance Benchmark and Profiling. October 2017

GROMACS Performance Benchmark and Profiling. September 2012

The Mellanox ConnectX-2 Dual Port QSFP QDR IB network adapter for IBM System x delivers industryleading performance and low-latency data transfer

Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity

InfiniBand-based HPC Clusters

Mellanox Technologies, Ltd.

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch

Flex System IB port FDR InfiniBand Adapter Lenovo Press Product Guide

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

Transcription:

Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation

InfiniBand Technology Leadership Industry Standard Hardware, software, cabling, management Design for clustering and storage interconnect Price and Performance 4Gb/s node-to-node 12Gb/s switch-to-switch 1us application latency Most aggressive roadmap in the industry Reliable with congestion management Efficient RDMA and Transport Offload Kernel bypass CPU focuses on application processing Scalable for Petascale computing & beyond End-to-end quality of service Virtualization acceleration I/O consolidation Including storage Performance Roadmap Gigabits per second InfiniBand Delivers Ultra Low Latency 2 Mellanox Technologies

InfiniBand in the TOP5 Number of Clusters 8 7 6 5 4 3 2 1 Top5 Interconnect Placement 1-1 11-2 21-3 31-4 41-5 Top5 Placement InfiniBand All Proprietary High Speed GigE Performace (Gflops) 7 6 5 4 3 2 1 InfiniBand Clusters - Performance 36% CAGR Nov 5 June 6 Nov 6 June 7 Nov 7 June 8 InfiniBand Performance InfiniBand makes the most powerful clusters 5 of the top 1 (#1,#4, #7, #8, #1) and 49 of the Top1 The leading interconnect for the Top2 InfiniBand clusters responsible for ~4% of the total Top5 performance InfiniBand enables the most power efficient clusters InfiniBand QDR expected Nov 28 No 1GigE clusters exist on the list 3 Mellanox Technologies

Mellanox InfiniBand End-to-End Products Software Adapter ICs & Cards Switch ICs End-to-End Validation ADAPTER SWITCH ADAPTER Cables Blade/Rack Servers Switch Storage High Throughput - 4Gb/s Kernel bypass Low latency - 1us Remote DMA (RDMA) Low CPU overhead Reliability Maximum Productivity 4 Mellanox Technologies

ConnectX - Fastest InfiniBand Technology Performance driven architecture MPI latency 1us, ~6.5GB/s with 4Gb/s InfiniBand (bi-directional) MPI message rate of >4 Million/sec Superior real application performance Engineering Automotive, oil & gas, financial analysis, etc. 5 ConnectX IB MPI Latency 7 ConnectX IB QDR 4Gb/s MPI Bandwidth PCIe Gen2 usec 4 3 2 1.7us MB/s 6 5 4 3 2 6.47GB/s 1 1 1 2 4 8 16 32 64 128 256 512 124 1 4 16 64 256 124 496 16384 65536 262144 148576 419434 Bytes Bytes PCIe Gen2 IB QDR Latency 5 Mellanox Technologies IB QDR Uni-dir IB QDR Bi-dir

ConnectX Multi-core MPI Scalability Mellanox ConnectX MPI Latency - Multi-core Scaling Mellanox ConnectX MPI Latency - Multi-core Scaling 6 9 Latency (usec) 4 2 Latency (usec) 6 3 1 2 3 4 5 6 7 8 # of CPU cores (# of processes) 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 16 # of CPU cores (# of processes) Scalability to 64+ cores per node, to 2K+ nodes per subnet Guarantees same low latency regardless of the number of cores Guarantees linear scalability for real applications 6 Mellanox Technologies

InfiniScale IV Switch: Unprecedented Scalability 36 4Gb/s or 12 12Gb/s InfiniBand Ports Adaptive routing and congestion control Virtual Subnet Partitioning 6X switching and data capacity Vs. using 24-port 1GigE Ethernet switch devices 4X storage I/O throughput Critical for backup, snapshot and quickly loading large datasets Vs. deploying 8Gb/s Fibre Channel SANs 1X lower end-to-end latency performance Vs. using 1GigE/DCE switches and iwarp-based adapters 3X the server and storage node cluster scalability when building a 3-tier CLOS fabric Vs. using 24-port 1GigE Ethernet switch devices 7 Mellanox Technologies

Addressing the Needs for Petascale Computing Faster network streaming propagation Network speed capabilities Solution: InfiniBand QDR Large clusters Scaling to many nodes, many cores per node Solution: High density InfiniBand switch Balanced random network streaming One to One random streaming Solution: Adaptive routing Balanced known network streaming One to One known streaming Solution: Static routing Un-balanced network streaming Many to one streaming Solution: Congestion control Designed to handle all communications in HW 8 Mellanox Technologies

HPC Applications Demand Highest Throughput LS-DYNA Profiling 1 1 1 1 1 1 1 1 1 1 1 Fluent Message Size Profiling 1.E+1 1.E+9 1.E+8 1.E+7 1.E+6 Total data send (Byte) Total data transferred (Byte) The need for bandwidth [..64] [65..256] [257..124] [125..496] [497..16384] [16385..65536] [65537..262144] [262145..148576] [148577..419434] [419435..infinity] The need for bandwidth MPI message size (Byte) 16 cores 32 cores [..256] [257..124] [125..496] [497..16384] [16385..65536] [65537..262144] [262145..148576] Scalability Mandates Highest Bandwidth Lowest Latency The need for bandwidth The need for latency Message size (Byte) 2 Servers 7 Servers 9 Mellanox Technologies

HPC Council Advisory Distinguished HPC alliance (OEMs, IHVs, ISVs, end-users) Members activities Qualify and optimize HPC solutions Early access to new technology, and mutual development of future solutions Explore new opportunities within the HPC market HPC targeted joint marketing programs A community effort support center for HPC end-users Mellanox Cluster Center Latest InfiniBand and the HPC Advisory Council member technology Development, testing, benchmarking and optimization environment End- user support center - HPCHelp@mellanox.com For details HPC@mellanox.com 1 Mellanox Technologies

Providing advanced, powerful, and stable high performance computing solutions 11 Mellanox Technologies