Design challenges of Highperformance. MPI over InfiniBand. Presented by Karthik
|
|
- Wilfrid Woods
- 5 years ago
- Views:
Transcription
1 Design challenges of Highperformance and Scalable MPI over InfiniBand Presented by Karthik
2 Presentation Overview In depth analysis of High-Performance and scalable MPI with Reduced Memory Usage Zero Copy protocol using Unreliable Datagram MVAPICH-Aptus : A scalable High performance Multi-Transport MPI over InfiniBand
3 High Performance and Scalable MPI with Reduced Memory usage Motivation Does aggressively reducing communication buffer memory lead to degradation of end application performance? How much memory can we expect the MPI library to consume during execution of a typical application, while still proving the best available performance?
4 High Performance and Scalable MPI with Reduced Memory usage IB provides several types of transport services Reliable Connection (RC) - Used as the primary transport for MVAPICH and other MPIs over InfiniBand. - Most feature-rich -- supports RDMA and provides reliable service. - Dedicated QP must be created for each communicating peer. Reliable Datagram (RD) - Most of the same features as RC, however, a dedicated QP is not required. - Not implemented with current hardware. Unreliable Connection (UC) - Provides RDMA capability. - No guarantees on ordering or reliability. - Dedicated QP must be created for each communicating peer. Unreliable Datagram (UD) - Connection-less. Single QP can communicate with any other peer QP. - Limited message size. - No guarantees on ordering or reliability.
5 High Performance and Scalable MPI with Reduced Memory usage Upper level software service Shared Receive Queue - This allows multiple QPs to be attached to one receive queue (even for connection oriented transport) - This approach is memory efficient
6 High Performance and Scalable MPI with Reduced Memory usage Remote Direct Memory Access (RDMA) - Application can directly access the memory of the remove process. - RDMA has very low latency.
7 High Performance and Scalable MPI with Reduced Memory usage MVAPICH Design Overview MVAPICH uses two major protocols 1. Eager Protocol - It is used to transfer small messages. - The messages are buffered inside the MPI library. - pre-allocated communication buffers are required on the sender and receiver side 2. Rendezvous Protocol - It is used to transfer large messages. - The message are sent directly to receiver s user memory.
8 High Performance and Scalable MPI with Reduced Memory usage 1. Adaptive RDMA with Send/Receive - In order to avoid a memory-scalability problem when the number of nodes increase, this channel is adaptive. - Limited buffers are allocated initially. - Once a threshold number of messages are exchanged, next messages are transferred using RDMA.
9 High Performance and Scalable MPI with Reduced Memory usage 2. Adaptive RDMA with SQR Channel - Idea is based on ARDMA-SR. Only Difference is the Shared Queue Receiver is used. - Drawback : Sender doesn t know the receiver buffer availability. - Solution : Setting a low-watermark for the SQR.
10 High Performance and Scalable MPI with Reduced Memory usage 3. Shared Receive Queue - This channel exclusively utilizes the SRQ feature. - This follows the same low-watermark technique as the ARDMA-SRQ. - Even though RDMA has low latency, they consume more memory.
11 High Performance and Scalable MPI with Reduced Memory usage NAS Benchmark
12 High Performance and Scalable MPI with Reduced Memory usage High Performance Linpack - Benchmark for solving linear equations. - It is used as the primary measure for ranking biannual Top 500 list of the world s fastest supercomputers
13 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram
14 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Motivation 1. Performance Scalability - Memory copies are detrimental to the overall performance of the application. - HCA cache can only hold a limited number of QPs 2. Resource Scalability - With a connection oriented transport the memory requirements increase linearly with the number of connected processes.
15 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Traditional Zero-Copy 1. Matched Queues Interface - The receiver deciphers the message tag from the sent message and matches it with the posted receive operations. 2. Rendezvous Protocol using RDMA - Initially a handshake protocol is used, followed by RDMA.
16 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram UD vs RC memory usage For 16k connections UD = 40 MB / process RC = 240 MB / process
17 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Challenges for true zero copy design Limited MTU Size - UD transport has a Maximum Transfer Unit(MTU) limit of 2KB. - Segmentation required. Lack of dedicated Receive Buffers - Difficult to post receive buffers for a particular peer as they are all shared. - If no buffer is posted to a QP, message sent is silently dropped. Lack of Reliability - There is no guarantee that a message will arrive at the receiver Lack of ordering - Message may not arrive in the same order they are sent. Lack of RDMA - RDMA only works for connection oriented transport.
18 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Proposed Design - Design is based on serialized communication since RDMA is not specified for UD transport - Serialized implies that the order of transfer is agreed beforehand, and only sender transmit to a QP at a single time.
19 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Solutions to design challenges 1. Efficient Segmentation - The design chooses to get completion signal only for the last packet. - The underlying reliability layer would mark packets as missing at the receiver s end and the sender is notified. 2. Zero Copy Pool - A pool of QPs are maintained. - When a message transfer is initiated, a QP is taken from the pool and the application receive buffer is posted to it. 3. Optimized Reliability and Ordering for Large Messages - One approach is the perform a checksum for the entire receive buffer. - Each operation can specify a 32-bit immediate field that will be available to the receiver as part of the completion entry.
20 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Experimental Evaluation Ping Pong Latency
21 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Uni-Directional Bandwidth
22 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Bi-Directional Bandwidth
23 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand
24 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand Motivation This paper seeks to address two mains questions - 1. What are the different protocols developed for MPI over IB? How well do they perform at scale? 2. Given this knowledge, can the MPI Library be designed to dynamically select protocols to optimized for performance and scalability?
25 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand IB provides several types of transport services Reliable Connection (RC) - Used as the primary transport for MVAPICH and other MPIs over InfiniBand. - Most feature-rich -- supports RDMA and provides reliable service. - Dedicated QP must be created for each communicating peer. Reliable Datagram (RD) - Most of the same features as RC, however, a dedicated QP is not required. - Not implemented with current hardware. Unreliable Connection (UC) - Provides RDMA capability. - No guarantees on ordering or reliability. - Dedicated QP must be created for each communicating peer. Unreliable Datagram (UD) - Connection-less. Single QP can communicate with any other peer QP. - Limited message size. - No guarantees on ordering or reliability.
26 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand Eager Protocol Channel Message Channel
27 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand Rendezvous Protocol Channel Message Channel
28 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand Performance : Eager Latency Channel Evaluation
29 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand Channel Evaluation Performance : Uni-Directional Bandwidth
30 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand Scalability Test : Memory Usage Channel Evaluation
31 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand Scalability Test : Latency Channel Evaluation
32 MVAPICH-Aptus : Scalable High-Performance Multi-Transport MPI over InfiniBand Channel Characteristics Summary
33 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Overview of Design As seen from the experimental results, using only one channel is not sufficient to achieve performance and scalability. The solution is to use a combination of message channels and transports to optimize for performance as well as scalability. Design Challenges 1. When should a channel be created? 2. When should a channel be used?
34 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Channel Allocation
35 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Channel Usage From the experimental results we can see the channels behave differently for different message size A flexible form is defined when sending a message Using this flexible framework, send rules can be changed on a per-system or job level to meet application needs without changing the code within MPI library.
36 Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram Performance Evaluation
37 QUESTIONS?
MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand
MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew Koop 1,2 Terry Jones 2 D. K. Panda 1 {koop, panda}@cse.ohio-state.edu trj@llnl.gov 1 Network-Based Computing Lab, The
More informationHigh Performance MPI on IBM 12x InfiniBand Architecture
High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction
More informationImplementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand
Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Jiuxing Liu and Dhabaleswar K. Panda Computer Science and Engineering The Ohio State University Presentation Outline Introduction
More informationShared Receive Queue based Scalable MPI Design for InfiniBand Clusters
Shared Receive Queue based Scalable MPI Design for InfiniBand Clusters Sayantan Sur Lei Chai Hyun-Wook Jin Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering
More informationImproving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters
Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of
More informationPerformance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State
More informationWelcome to the IBTA Fall Webinar Series
Welcome to the IBTA Fall Webinar Series A four-part webinar series devoted to making I/O work for you Presented by the InfiniBand Trade Association The webinar will begin shortly. 1 September 23 October
More informationMVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand
MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew J. Koop Terry Jones Dhabaleswar K. Panda Network-Based Computing Laboratory The Ohio State University Columbus, OH 4321
More informationUnifying UPC and MPI Runtimes: Experience with MVAPICH
Unifying UPC and MPI Runtimes: Experience with MVAPICH Jithin Jose Miao Luo Sayantan Sur D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationCan Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?
Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang and D. K. Panda {surs, vishnu, jinhy, huanwei, panda}@cse.ohio-state.edu
More informationScreencast: OMPI OpenFabrics Protocols (v1.2 series)
Screencast: OMPI OpenFabrics Protocols (v1.2 series) Jeff Squyres May 2008 May 2008 Screencast: OMPI OpenFabrics Protocols (v1.2 series) 1 Short Messages For short messages memcpy() into / out of pre-registered
More informationReducing Network Contention with Mixed Workloads on Modern Multicore Clusters
Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational
More informationLow latency, high bandwidth communication. Infiniband and RDMA programming. Bandwidth vs latency. Knut Omang Ifi/Oracle 2 Nov, 2015
Low latency, high bandwidth communication. Infiniband and RDMA programming Knut Omang Ifi/Oracle 2 Nov, 2015 1 Bandwidth vs latency There is an old network saying: Bandwidth problems can be cured with
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Amith Mamidala Abhinav Vishnu Dhabaleswar K Panda Department of Computer and Science and Engineering The Ohio State University Columbus,
More informationOp#miza#on and Tuning of Hybrid, Mul#rail, 3D Torus Support and QoS in MVAPICH2
Op#miza#on and Tuning of Hybrid, Mul#rail, 3D Torus Support and QoS in MVAPICH2 MVAPICH2 User Group (MUG) Mee#ng by Hari Subramoni The Ohio State University E- mail: subramon@cse.ohio- state.edu h
More information10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G
10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G Mohammad J. Rashti and Ahmad Afsahi Queen s University Kingston, ON, Canada 2007 Workshop on Communication Architectures
More informationThe Exascale Architecture
The Exascale Architecture Richard Graham HPC Advisory Council China 2013 Overview Programming-model challenges for Exascale Challenges for scaling MPI to Exascale InfiniBand enhancements Dynamically Connected
More informationWhat communication library can do with a little hint from programmers? Takeshi Nanri (Kyushu Univ. and JST CREST, Japan)
1 What communication library can do with a little hint from programmers? Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 2 Background Various tuning opportunities in communication libraries:. Protocol?
More informationDesigning High-Performance and Resilient Message Passing on InfiniBand
Designing High-Performance and Resilient Message Passing on InfiniBand Matthew J. Koop 1 Pavel Shamis 2 Ishai Rabinovitz 2 Dhabaleswar K. (DK) Panda 3 1 High Performance Technologies, Inc (HPTi), mkoop@hpti.com
More informationIntra-MIC MPI Communication using MVAPICH2: Early Experience
Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University
More informationRDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits
RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits Sayantan Sur Hyun-Wook Jin Lei Chai D. K. Panda Network Based Computing Lab, The Ohio State University Presentation
More informationScalable High Performance Message Passing over InfiniBand for Open MPI
Scalable High Performance Message Passing over InfiniBand for Open MPI AndrewFriedley 123 TorstenHoefler 1 MatthewL.Leininger 23 AndrewLumsdaine 1 1 OpenSystemsLaboratory,IndianaUniversity,BloomingtonIN47405,USA
More informationEvaluating the Impact of RDMA on Storage I/O over InfiniBand
Evaluating the Impact of RDMA on Storage I/O over InfiniBand J Liu, DK Panda and M Banikazemi Computer and Information Science IBM T J Watson Research Center The Ohio State University Presentation Outline
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar
More informationDesigning High Performance Communication Middleware with Emerging Multi-core Architectures
Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationOpenFabrics Interface WG A brief introduction. Paul Grun co chair OFI WG Cray, Inc.
OpenFabrics Interface WG A brief introduction Paul Grun co chair OFI WG Cray, Inc. OFI WG a brief overview and status report 1. Keep everybody on the same page, and 2. An example of a possible model for
More informationUnified Runtime for PGAS and MPI over OFED
Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction
More informationApplication-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering
More informationPaving the Road to Exascale
Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015
More informationDesign and Evaluation of Efficient Collective Communications on Modern Interconnects and Multi-core Clusters
Design and Evaluation of Efficient Collective Communications on Modern Interconnects and Multi-core Clusters by Ying Qian A thesis submitted to the Department of Electrical and Computer Engineering in
More informationImplementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand
Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Jiuxing Liu Dhabaleswar K. Panda Computer and Information Science The Ohio State University Columbus, OH 43210 liuj, panda
More informationOptimizing non-blocking Collective Operations for InfiniBand
Optimizing non-blocking Collective Operations for InfiniBand Open Systems Lab Indiana University Bloomington, USA IPDPS 08 - CAC 08 Workshop Miami, FL, USA April, 14th 2008 Introduction Non-blocking collective
More informationImpact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase
2 IEEE 8th International Conference on Cloud Computing Technology and Science Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase Xiaoyi Lu, Dipti Shankar, Shashank Gugnani,
More informationRDMA programming concepts
RDMA programming concepts Robert D. Russell InterOperability Laboratory & Computer Science Department University of New Hampshire Durham, New Hampshire 03824, USA 2013 Open Fabrics Alliance,
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationUDP, TCP, IP multicast
UDP, TCP, IP multicast Dan Williams In this lecture UDP (user datagram protocol) Unreliable, packet-based TCP (transmission control protocol) Reliable, connection oriented, stream-based IP multicast Process-to-Process
More informationRDMA enabled NIC (RNIC) Verbs Overview. Renato Recio
RDMA enabled NIC () Verbs Overview Renato Recio Verbs!The RDMA Protocol Verbs Specification describes the behavior of hardware, firmware, and software as viewed by the host, "not the host software itself,
More informationIn-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017
In-Network Computing Sebastian Kalcher, Senior System Engineer HPC May 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait
More informationAdaptive Connection Management for Scalable MPI over InfiniBand
Adaptive Connection Management for Scalable MPI over InfiniBand Weikuan Yu Qi Gao Dhabaleswar K. Panda Network-Based Computing Lab Dept. of Computer Sci. & Engineering The Ohio State University {yuw,gaoq,panda}@cse.ohio-state.edu
More informationMPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits
MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits Ashish Kumar Singh, Sreeram Potluri, Hao Wang, Krishna Kandalla, Sayantan Sur, and Dhabaleswar K. Panda Network-Based
More informationRequest for Comments: 4755 Category: Standards Track December 2006
Network Working Group V. Kashyap Request for Comments: 4755 IBM Category: Standards Track December 2006 Status of This Memo IP over InfiniBand: Connected Mode This document specifies an Internet standards
More informationHybrid MPI - A Case Study on the Xeon Phi Platform
Hybrid MPI - A Case Study on the Xeon Phi Platform Udayanga Wickramasinghe Center for Research on Extreme Scale Technologies (CREST) Indiana University Greg Bronevetsky Lawrence Livermore National Laboratory
More informationUNDERSTANDING THE IMPACT OF MULTI-CORE ARCHITECTURE IN CLUSTER COMPUTING: A CASE STUDY WITH INTEL DUAL-CORE SYSTEM
UNDERSTANDING THE IMPACT OF MULTI-CORE ARCHITECTURE IN CLUSTER COMPUTING: A CASE STUDY WITH INTEL DUAL-CORE SYSTEM Sweety Sen, Sonali Samanta B.Tech, Information Technology, Dronacharya College of Engineering,
More informationHot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective
Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective A. Vishnu M. Koop A. Moody A. R. Mamidala S. Narravula D. K. Panda Computer Science and Engineering The Ohio State University {vishnu,
More informationFuture Routing Schemes in Petascale clusters
Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract
More informationINAM 2 : InfiniBand Network Analysis and Monitoring with MPI
INAM 2 : InfiniBand Network Analysis and Monitoring with MPI H. Subramoni, A. A. Mathews, M. Arnold, J. Perkins, X. Lu, K. Hamidouche, and D. K. Panda Department of Computer Science and Engineering The
More informationIBRMP: a Reliable Multicast Protocol for InfiniBand
2014 IEEE 22nd Annual Symposium on High-Performance Interconnects IBRMP: a Reliable Multicast Protocol for InfiniBand Qian Liu, Robert D. Russell Department of Computer Science University of New Hampshire
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationAssessing the Ability of Computation/Communication Overlap and Communication Progress in Modern Interconnects
Assessing the Ability of Computation/Communication Overlap and Communication Progress in Modern Interconnects Mohammad J. Rashti Ahmad Afsahi Department of Electrical and Computer Engineering Queen s University,
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationLiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster
LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster H. W. Jin, S. Sur, L. Chai, and D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering
More informationFaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs
FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU), Michael Kaminsky (Intel Labs), David Andersen (CMU) RDMA RDMA is a network feature that
More informationUnderstanding MPI on Cray XC30
Understanding MPI on Cray XC30 MPICH3 and Cray MPT Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Cray provides enhancements on top of this: low level communication
More informationCSE 461 Module 10. Introduction to the Transport Layer
CSE 461 Module 10 Introduction to the Transport Layer Last Time We finished up the Network layer Internetworks (IP) Routing (DV/RIP, LS/OSPF, BGP) It was all about routing: how to provide end-to-end delivery
More informationSayantan Sur, Intel. Presenting work done by Arun Ilango, Dmitry Gladkov, Dmitry Durnov and Sean Hefty and others in the OFIWG community
Sayantan Sur, Intel Presenting work done by Arun Ilango, Dmitry Gladkov, Dmitry Durnov and Sean Hefty and others in the OFIWG community 6 th Annual MVAPICH User Group (MUG) 2018 Legal Disclaimer & Optimization
More informationUnified Communication X (UCX)
Unified Communication X (UCX) Pavel Shamis / Pasha ARM Research SC 18 UCF Consortium Mission: Collaboration between industry, laboratories, and academia to create production grade communication frameworks
More informationLecture 2 Communication services The Trasport Layer. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it
Lecture 2 Communication services The Trasport Layer Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it The structure edge: applications and hosts core: routers of s access s, media:
More informationGPUnet: Networking Abstractions for GPU Programs. Author: Andrzej Jackowski
Author: Andrzej Jackowski 1 Author: Andrzej Jackowski 2 GPU programming problem 3 GPU distributed application flow 1. recv req Network 4. send repl 2. exec on GPU CPU & Memory 3. get results GPU & Memory
More informationIMPROVING MESSAGE-PASSING PERFORMANCE AND SCALABILITY IN HIGH-PERFORMANCE CLUSTERS
IMPROVING MESSAGE-PASSING PERFORMANCE AND SCALABILITY IN HIGH-PERFORMANCE CLUSTERS by Mohammad Javad Rashti A thesis submitted to the Department of Electrical and Computer Engineering In conformity with
More informationHigh Performance MPI-2 One-Sided Communication over InfiniBand
High Performance MPI-2 One-Sided Communication over InfiniBand Weihang Jiang Jiuxing Liu Hyun-Wook Jin Dhabaleswar K. Panda William Gropp Rajeev Thakur Computer and Information Science The Ohio State University
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationHigh Performance MPI-2 One-Sided Communication over InfiniBand
High Performance MPI-2 One-Sided Communication over InfiniBand Weihang Jiang Jiuxing Liu Hyun-Wook Jin Dhabaleswar K. Panda William Gropp Rajeev Thakur Computer and Information Science The Ohio State University
More informationStudy. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement
More informationOptimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication
Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication Sreeram Potluri* Hao Wang* Devendar Bureddy* Ashish Kumar Singh* Carlos Rosales + Dhabaleswar K. Panda* *Network-Based
More informationDesigning Power-Aware Collective Communication Algorithms for InfiniBand Clusters
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,
More informationPerformance of RDMA-capable Storage Protocols on Wide-Area Network
Performance of RDMA-capable Storage Protocols on Wide-Area Network Weikuan Yu Nageswara S.V. Rao Pete Wyckoff Jeffrey S. Vetter Oak Ridge National Laboratory Ohio Supercomputer Center Computer Science
More informationDesigning Shared Address Space MPI libraries in the Many-core Era
Designing Shared Address Space MPI libraries in the Many-core Era Jahanzeb Hashmi hashmi.29@osu.edu (NBCL) The Ohio State University Outline Introduction and Motivation Background Shared-memory Communication
More informationNetwork bandwidth is a performance bottleneck for cluster computing. Especially for clusters built with SMP machines.
Mingzhe Li Motivation Network bandwidth is a performance bottleneck for cluster computing. Especially for clusters built with SMP machines. Multirail network is an efficient way to alleviate this problem
More informationInfiniband Scalability in Open MPI
Infiniband Scalability in Open MPI G. M. Shipman 1,2, T. S. Woodall 1, R. L. Graham 1, A. B. Maccabe 2 1 Advanced Computing Laboratory Los Alamos National Laboratory 2 Scalable Systems Laboratory Computer
More informationCERN openlab Summer 2006: Networking Overview
CERN openlab Summer 2006: Networking Overview Martin Swany, Ph.D. Assistant Professor, Computer and Information Sciences, U. Delaware, USA Visiting Helsinki Institute of Physics (HIP) at CERN swany@cis.udel.edu,
More informationDesigning and Enhancing the Sockets Direct Protocol (SDP) over iwarp and InfiniBand
Designing and Enhancing the Sockets Direct Protocol (SDP) over iwarp and InfiniBand A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School
More informationEfficient and Truly Passive MPI-3 RMA Synchronization Using InfiniBand Atomics
1 Efficient and Truly Passive MPI-3 RMA Synchronization Using InfiniBand Atomics Mingzhe Li Sreeram Potluri Khaled Hamidouche Jithin Jose Dhabaleswar K. Panda Network-Based Computing Laboratory Department
More informationMemory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand
Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Matthew Koop, Wei Huang, Ahbinav Vishnu, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of
More informationLiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster
: Support for High-Performance MPI Intra-Node Communication on Linux Cluster Hyun-Wook Jin Sayantan Sur Lei Chai Dhabaleswar K. Panda Department of Computer Science and Engineering The Ohio State University
More informationA New Design of RDMA-based Small Message Channels for InfiniBand Clusters
A New Design of RDMA-based Small Message Channels for InfiniBand Clusters Matthew Small Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 32306 {small,xyuan}@cs.fsu.edu
More informationNoise Injection Techniques to Expose Subtle and Unintended Message Races
Noise Injection Techniques to Expose Subtle and Unintended Message Races PPoPP2017 February 6th, 2017 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz and Christopher M. Chambreau
More informationAccelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.
Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Panda Department of Computer Science and Engineering The Ohio
More informationThe Role of InfiniBand Technologies in High Performance Computing. 1 Managed by UT-Battelle for the Department of Energy
The Role of InfiniBand Technologies in High Performance Computing 1 Managed by UT-Battelle Contributors Gil Bloch Noam Bloch Hillel Chapman Manjunath Gorentla- Venkata Richard Graham Michael Kagan Vasily
More information10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G
-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-G Mohammad J. Rashti Ahmad Afsahi Department of Electrical and Computer Engineering Queen s University, Kingston, ON,
More informationRuntime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism
1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background
More informationTransport Layer Protocols TCP
Transport Layer Protocols TCP Gail Hopkins Introduction Features of TCP Packet loss and retransmission Adaptive retransmission Flow control Three way handshake Congestion control 1 Common Networking Issues
More informationQuestion 1 (6 points) Compare circuit-switching and packet-switching networks based on the following criteria:
Question 1 (6 points) Compare circuit-switching and packet-switching networks based on the following criteria: (a) Reserving network resources ahead of data being sent: (2pts) In circuit-switching networks,
More informationAdvancing RDMA. A proposal for RDMA on Enhanced Ethernet. Paul Grun SystemFabricWorks
Advancing RDMA A proposal for RDMA on Enhanced Ethernet Paul Grun SystemFabricWorks pgrun@systemfabricworks.com Objective: Accelerate the adoption of RDMA technology Why bother? I mean, who cares about
More informationMicro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects
Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects Jiuxing Liu Balasubramanian Chandrasekaran Weikuan Yu Jiesheng Wu Darius Buntinas Sushmitha Kini Peter Wyckoff Dhabaleswar
More informationUNIT IV -- TRANSPORT LAYER
UNIT IV -- TRANSPORT LAYER TABLE OF CONTENTS 4.1. Transport layer. 02 4.2. Reliable delivery service. 03 4.3. Congestion control. 05 4.4. Connection establishment.. 07 4.5. Flow control 09 4.6. Transmission
More informationHPC Customer Requirements for OpenFabrics Software
HPC Customer Requirements for OpenFabrics Software Matt Leininger, Ph.D. Sandia National Laboratories Scalable Computing R&D Livermore, CA 16 November 2006 I'll focus on software requirements (well maybe)
More informationDesigning Multi-Leader-Based Allgather Algorithms for Multi-Core Clusters *
Designing Multi-Leader-Based Allgather Algorithms for Multi-Core Clusters * Krishna Kandalla, Hari Subramoni, Gopal Santhanaraman, Matthew Koop and Dhabaleswar K. Panda Department of Computer Science and
More informationAccelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures
Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures M. Bayatpour, S. Chakraborty, H. Subramoni, X. Lu, and D. K. Panda Department of Computer Science and Engineering
More informationDesign Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters
Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters G.Santhanaraman, T. Gangadharappa, S.Narravula, A.Mamidala and D.K.Panda Presented by:
More informationExperimental Analysis of InfiniBand Transport Services on WAN
International Conference on Networking, Architecture, and Storage Experimental Analysis of InfiniBand Transport Services on WAN Abstract InfiniBand Architecture (IBA) has emerged as a standard system-area
More informationEnhancing Checkpoint Performance with Staging IO & SSD
Enhancing Checkpoint Performance with Staging IO & SSD Xiangyong Ouyang Sonya Marcarelli Dhabaleswar K. Panda Department of Computer Science & Engineering The Ohio State University Outline Motivation and
More informationProgress Report on Transparent Checkpointing for Supercomputing
Progress Report on Transparent Checkpointing for Supercomputing Jiajun Cao, Rohan Garg College of Computer and Information Science, Northeastern University {jiajun,rohgarg}@ccs.neu.edu August 21, 2015
More informationPerformance of HPC Middleware over InfiniBand WAN
Performance of HPC Middleware over InfiniBand WAN SUNDEEP NARRAVULA, HARI SUBRAMONI, PING LAI, BHARGAVI RAJARAMAN, RANJIT NORONHA, DHABALESWAR K. PANDA Technical Report OSU-CISRC-12/7-TR77. Performance
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationSCTP versus TCP for MPI
SCTP versus TCP for MPI Humaira Kamal Department of Computer Science University of British Columbia Vancouver, BC kamal@cs.ubc.ca Brad Penoff Department of Computer Science University of British Columbia
More informationMM5 Modeling System Performance Research and Profiling. March 2009
MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center
More informationLeveraging Burst Buffer Coordination to Prevent I/O Interference
Leveraging Burst Buffer Coordination to Prevent I/O Interference Anthony Kougkas akougkas@hawk.iit.edu Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun Wednesday, October 26th Baltimore, USA Outline
More information