Paving the Road to Exascale Computing. Yossi Avni

Similar documents
The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

Solutions for Scalable HPC

Birds of a Feather Presentation

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

2008 International ANSYS Conference

Interconnect Your Future

Interconnect Your Future

Future Routing Schemes in Petascale clusters

The Future of Interconnect Technology

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

High Performance Computing

Paving the Road to Exascale

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

The Future of High Performance Interconnects

Interconnect Your Future

InfiniBand-based HPC Clusters

In-Network Computing. Paving the Road to Exascale. June 2017

Interconnect Your Future

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Interconnect Your Future

Building the Most Efficient Machine Learning System

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Building the Most Efficient Machine Learning System

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

The Exascale Architecture

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

Flex System IB port FDR InfiniBand Adapter Lenovo Press Product Guide

Interconnect Your Future Paving the Road to Exascale

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

Optimizing LS-DYNA Productivity in Cluster Environments

MM5 Modeling System Performance Research and Profiling. March 2009

Introduction to High-Speed InfiniBand Interconnect

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity

Introduction to Infiniband

ABySS Performance Benchmark and Profiling. May 2010

Ethernet. High-Performance Ethernet Adapter Cards

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

InfiniBand Networked Flash Storage

Corporate Update. Enabling The Use of Data January Mellanox Technologies

Mellanox Technologies, Ltd.

Voltaire Making Applications Run Faster

InfiniBand Congestion Control

A Cost and Scalability Comparison of the Dragonfly versus the Fat Tree. Frank Olaf Sem-Jacobsen Simula Research Laboratory

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

Performance Optimizations for LS-DYNA with Mellanox HPC-X Scalable Software Toolkit

Accelerating Ceph with Flash and High Speed Networks

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

Maximizing Cluster Scalability for LS-DYNA

AcuSolve Performance Benchmark and Profiling. October 2011

LAMMPSCUDA GPU Performance. April 2011

Welcome to the InfiniBand Low Latency Technical Forum!

OCP3. 0. ConnectX Ethernet Adapter Cards for OCP Spec 3.0

2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

LAMMPS, LS- DYNA, HPL, and WRF on iwarp vs. InfiniBand FDR

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

CP2K Performance Benchmark and Profiling. April 2011

Deep Learning mit PowerAI - Ein Überblick

Tightly Coupled Accelerators Architecture

HYCOM Performance Benchmark and Profiling

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

SUSE Linux Enterprise Server (SLES) 12 SP4 Inbox Driver Release Notes SLES 12 SP4

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

The Mellanox ConnectX-2 Dual Port QSFP QDR IB network adapter for IBM System x delivers industryleading performance and low-latency data transfer

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures

High Performance Interconnects: Landscape, Assessments & Rankings

GPU-centric communication for improved efficiency

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

The rcuda middleware and applications

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

By John Kim, Chair SNIA Ethernet Storage Forum. Several technology changes are collectively driving the need for faster networking speeds.

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson

Interconnection Network for Tightly Coupled Accelerators Architecture

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

Delivering HPC Performance at Scale

Performance monitoring in InfiniBand networks

Oracle Exadata: Strategy and Roadmap

Altair RADIOSS Performance Benchmark and Profiling. May 2013

Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract

InfiniBand and Mellanox UFM Fundamentals

OCP Engineering Workshop - Telco

Recent Topics in the IBTA and a Look Ahead

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters

IBM CORAL HPC System Solution

Extending InfiniBand Globally

Transcription:

Paving the Road to Exascale Computing Yossi Avni HPC@mellanox.com

Connectivity Solutions for Efficient Computing Enterprise HPC High-end HPC HPC Clouds ICs Mellanox Interconnect Networking Solutions Adapter Cards Host/Fabric Software Switches/Gateways Cables Leading Connectivity Solution Provider For Servers and Storage 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 2

Complete End-to-End Connectivity Host/Fabric Software Management Application Accelerations - UFM, FabricIT - Integration with job schedulers - Inbox Drivers - Collectives Accelerations (FCA/CORE-Direct) - GPU Accelerations (GPUDirect) - MPI/SHMEM - RDMA - Quality of Service Networking Efficiency/Scalability - Adaptive Routing - Congestion Management - Traffic aware Routing (TARA) Server and Storage High-Speed Connectivity - Latency - Bandwidth - CPU Utilization - Message rate 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 3

Mellanox s Interconnect Leadership Highest Throughput Lowest Latency CPU Availability Message Rate RDMA Highest Performance End-to-End Quality From Silicon to System Auto Negotiation Power Management Signal Integrity Cable Reach GPU Acceleration Adaptive Routing Advanced HPC Complete Eco-System Congestion Control MPI/SHMEM Offloads Topologies/Routing 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 4

Bandwidth per direction (Gb/s) InfiniBand Link Speed Roadmap # of Lanes per direction Per Lane & Rounded Per Link Bandwidth (Gb/s) 5G-IB DDR 10G-IB QDR 14G-IB-FDR (14.025) 26G-IB-EDR (25.78125) 12 60+60 120+120 168+168 300+300 8 40+40 80+80 112+112 200+200 4 20+20 40+40 56+56 100+100 1 5+5 10+10 14+14 25+25 300G-IB-EDR 168G-IB-FDR 12x HDR 12x NDR 8x NDR 8x HDR 120G-IB-QDR 200G-IB-EDR 112G-IB-FDR 4x HDR 4x NDR x12 60G-IB-DDR 80G-IB-QDR 100G-IB-EDR 56G-IB-FDR 1x NDR x8 40G-IB-DDR 40G-IB-QDR 25G-IB-EDR 14G-IB-FDR 1x HDR x4 20G-IB-DDR 10G-IB-QDR x1 2005-2006 - 2007-2008 - 2009-2010 - 2011 2014 Market Demand 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 5

Next Generation InfiniBand Technology Available: 2011 (end-to-end: adapters, switches, cables) Highest Throughput Connectivity for Server and Storage 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 6

Scalable MPI Collectives Acceleration with FCA Node Offloading/acceleration management (FCA) Offloading at the HCA (CORE-Direct) ~20% Performance increase at 16 nodes! Offloading at the network/switches (icpu) Most Scalable Offloading for MPI Applications 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 7

Mellanox Message Rate Performance Results Highest MPI Message Rate! 90 Million messages per Second Highest IB Message Rate! 23 Million messages per Second PPN process per node, or cores per node 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 8

Network Utilization via Traffic Aware Routing Job Submitted in Scheduler Matching Jobs Automatically Application Level Monitoring & Optimization Measurements Fabric-wide Policy Pushed to Match Application Requirements Maximizing Network Utilization 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 9

Hardware Based Congestion Control Network Latency % Improvement Ping Pong Latency 88% Natural Ring Latency 81.6% Random Ring Latency 81.3% Ping Pong bandwidth 85.5% Applications (HPCC) % Improvement PTRANS 76% FFT 40% For more performance examples: First Experiences with Congestion Control in InfiniBand Hardware ; Ernst Gunnar Gran, Magne Eimot, Sven-Arne Reinemo, Tor Skeie, Olav Lysne, Lars Paul Huse, Gilad Shainer; IPDPS 2010 Congestion Free Network For Highest Efficiency 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 10

Highest Performance GPU Clusters with GPUDirect GPUDirect GPU computing mandates Mellanox solutions GPUDirect: 35% application performance increase 3 nodes 3 nodes Mellanox InfiniBand Accelerates GPU Communications 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 11

Superior InfiniBand Solutions University, Academic Labs, Research Clustered Cloud & Web 2.0 Financial Computational Aided Database Engineering Bioscience Oil and Gas Weather Digital Media Financial Mellanox Connectivity Solutions Performance: 45% Lower latency, highest throughput and 3x the message rate Scalability: proven for Petascale computing, highest scalability through accelerations Reliability :from silicon to system, highest signal integrity, two order of magnitude lower BER Efficiency: Highest CPU/GPU availability through complete offloading, low power consumption Certification: Complete ISVs support and qualification, MPI vendors, job schedulers Return on investment: Most cost/effective, simple to manage, 40Gb/s end-to-end connectivity 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 12

Bottom Line Mellanox Benefits for HPC High-end HPC Enterprise HPC HPC Clouds Entry-level HPC Performance 100+% Increase Complete High-Performance Scalable Interconnect Solutions for Server and Storage TCO 50+% Reduction Energy Costs 65+% Reduction Infrastructure 60+% Saving 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 13

Performance Leadership Across Industries 30%+ of Fortune-100 & top global-high Performance Computers 6 of Top 10 Global Banks 9 of Top 10 Automotive Manufacturers 4 of Top 10 Pharmaceutical Companies 7 of Top 10 Oil and Gas Companies 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 14

Thinking, Designing and Building Scalable HPC HPC@mellanox.com Thank You 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 15