Developing a Thin and High Performance Implementation of Message Passing Interface 1

Size: px
Start display at page:

Download "Developing a Thin and High Performance Implementation of Message Passing Interface 1"

Transcription

1 Developing a Thin and High Performance Implementation of Message Passing Interface 1 Theewara Vorakosit and Putchong Uthayopas Parallel Research Group Computer and Network System Research Laboratory Department of Computer Engineering, Faculty of Engineering, KasetsartUniversity, 50 Phaholyotin Rd, Chatuchak Bangkok, 900, Thailand address: g @ku.ac.th and pu@ku.ac.th Abstract Communication library is a substantially important part for the development of the parallel applications on PC clusters. MPI is currently the most important messaging passing standard being used worldwide. Although powerful, MPI is very complex and require a certain amount of effort to learn. In fact, only a basic set of MPI functions is enough to develop a large class of parallel applications. There is a need to explore other aspect of massing passing programming which is inadequately explored. These include issues such as fault tolerance programming, debugging, performance optimization, and usability of programming environment. In this paper, the work on implementing of a compact but high performance MPI implementation called MPITH is presented. MPITH is a communication library for PC cluster that conforms to a subset of most used functions in MPI 1.1 standard. This paper discusses the architecture, design, and implementation of MPITH along with the results of the comparison of MPITH performance with MPICH and LAM on PC cluster. The experimental results show that MPITH can deliver a comparable performance to both MPICH and LAM Introduction and Related Works Commodity PC cluster had proven to be a viable solution to provide very high computing power. Moreover, cluster systems can be used in many fields such as scientific computing, high performance web cluster, and high availability system. To use cluster with compute intensive programs, users need a parallel application that specially designed for the cluster. These applications usually communicate using message passing through communication library. Hence, the communication library plays an important role in the development of parallel applications for cluster. The 1 This research is supported in part by KURDI grant SRU and Advanced Micro Devices Far East Inc. -D34-

2 performance of a communication library depends mostly on internal algorithms such as data buffering, communication protocol, and communication algorithm. Most communication-related algorithms used in library involve the collective operation. Communication model such as LogP[5] can be used to create a communication schedule. The optimization of communication schedule based on LogP can be found in [8]. A communication library not only provides an efficient communication, but also provides other useful features such as data manipulation and group communication. Data manipulation helps programmer to compute sum, maximum, and minimum value. Group communication is a mechanism to communicate between more than 2 processes in each time. To make parallel program portable, many standard library interfaces are invented such as PVM [6, 12] and MPI [, 11]. MPI is currently the most important standard. MPI standard define both syntax and semantic of MPI functions. Most used communication libraries that conform to MPI standard are MPICH [7] and LAM [3]. MPICH is developed by Argon National Laboratory. The current version of MPICH is MPICH supports MPI standard 1.2 and part of MPI 2.0. It is freely distributed in open source license. The core of MPICH is written in C language. However, MPICH also includes C++ interface implemented by University of Notre Dame as well. LAM is developed by University of Notre Dame. The current version is LAM is developed in C++ but provides C interface to programmer. In MPI standard consists of a large and complex sets of functions. For example, MPICH supports as many as 280 functions. This complexity makes MPI difficult to learn. In addition, MPI implementation that supports all these functions is huge, complex, slow, and potentially less reliable. One important question to ask is whether the programmers of parallel program need this complexity or not. Therefore, the study has been conducted to see how many MPI functions are used by many popular packages such as PETSc [2], MPI Blacs [4], MPI Povray, HPL benchmark and PGAPack [9]. Table 1 and Figure 1 show the number of functions used in each packages. Table 1 Used MPI Function Application MPI Function Count PETSc 52 MPI Blacs 38 MPI Povray 11 HPL 21 PGAPack 14 -D35-

3 Function Count PETSc MPI Blacs MPI Povray HPL PGAPack Application Figure 1 MPI Functions used by popular software packages and libraries One can observe that the number of function used in many practical software and libraries is in fact very low. Most parallel program and parallel library use only functions. This indicates that MPI most of the functionality in MPI is still unused. Instead of trying to include all functions, there should be an effort to focus on building robust and necessary functions and explore other directions such as fault tolerance programming, automatic load balancing. Current MPI implementation seems to have too many functions to optimize or explore these new features. The fault tolerance is one of most interesting feature in communication library that need to be explored. As the size of a cluster increases, the probability of node failure caused by hardware, OS, and memory increases substantially. With current MPI implementation, if there is node failure, all process has to stop and roll back to the beginning stage. Thus, there is a need for library with such a feature. These problems motivate the development of MPITH, a communication library for cluster. MPITH is designed to conform to only a most used subset of MPI standard. Current version of MPITH supports basic point-topoint and collective communication. The goal of MPITH is to explore the mostly ignored but important aspect of parallel programming such as load balancing, fault tolerance programming. This paper is organized as follow. First, Section 0 describes the architecture of MPITH followed by discussion about implementation of MPITH in Section 0. In Section 0, the performance of MPITH is compared with that of MPICH and LAM. Finally, the conclusion and future work are given in Section 0. -D36-

4 MPITH Architecture MPITH architecture is as illustrated in Figure 2. MPITH is divided into 4 layers: communication device layer, device handler layer, MPI engine layer, and API layer. P C A I API Layer Communication Service Administrative Communication Engine Layer Device she Handler Device Handler Layer TCP UDP DP VIA Device Layer Figure 2 MPITH Architecture Device Layer MPITH is designed to support multiple underlying network protocol through the concept of device. The communication device layer is responsible for interfacing with various abstract devices. MPITH communication layer require that each device conform to the following requirements: Reliable transfer between two end point of communication Support select () system call. Operate in synchronous mode. In the next version, device must be able to operate in both synchronous and asynchronous mode Device may or may not perform the buffering of send/receive data. Each device in MPITH is also separated into the server device and client device. Server device waits for a connection from client device. When a server device accepts an incoming connection, it will create a new client device to communicate with its -D37-

5 peer. Client device is responsible for the subsequent data transmission. In addition, client device is also used to initiate the communication with server device of other processes. Device Handler Layer Device Handler is responsible for runtime device management. The upper layer module must send and receive data through this layer except at the beginning of program execution, where upper layer code may contact the communication device directly. This layer manages a proper device opening and closing and guarantee that the engine layer can send and receive data at any time regardless of the device status. The strategy used allows the device to stay open (connect) most of the time in order to reduce the latency time spent opening the new device. This layer also buffers the incoming data in order to improve the waiting time of the sender. Communication Engine Layer Communication engine layer is an important part that executes the entire communication algorithms and collective operations such as MPI_BCAST, MPI_REDUCE. Moreover, this layer also manages the processes and services the request from API layer. The engine layer is separated from API layer so the addition of other features can be done in API layer. For example, message queue logging or debugging support can be added to API layer before calling engine layer. API Layer API layer implements the interface to parallel application programmer. The syntax of API function conforms to MPI standard. Currently, MPITH supports 13 functions that are divided to 4 sets as follows: Point-to-point communication functions: The functions that belong to this group are MPI_SEND and MPI_RECV. Collective communication functions: The functions that belong to this group are MPI_BCAST, MPI_RECV, MPI_GATHER and MPI_SCATTER. Information query functions: A set of function that gives the information about the runtime environment. The function that belong to this group are MPI_WTIME, MPI_WTICK and MPI_GET_PROCESSOR_NAME Administrative functions: This includes services about process creation and data manipulation. This kind of function may involve collection to broadcast control -D38-

6 message among each process. The functions that belong to this group are MPI_INIT and MPI_FINALIZE. Implementation MPITH is developed using C++ language so that object-oriented technique can be fully utilized. The OO paradigm decrease coupling and increase cohesion between modules. In this section, some of the implementation concept will be discussed. Process Creation Process in MPITH has 2 modes: master mode and slave mode. User starts MPI program via the "mpirun" utility. This utility accepts number of process, machine file and program name. Using that information, mpirun program can create command line argument and execute the given program. At this point, this program is running in "master" mode. When a process is created under master mode, a server device is also created in order to listen to the incoming connection from other processes. In addition, master process also spawns slave processes via system executor. In this version the executor is based on fork/exec mechanism. Remote process creation is done using RSH mechanism or KSIX middleware call if operates under SCE environment. After spawning the slaves, master process will wait for connection requests from the slaves. The received slave addresses and port number are then used to build a global process table. Master process also assigns a MPI Process Identifier (MPID) to each slave. After received all connect request from every slave, master process broadcasts process table to its children. If process is a slave, it will start by sending the connect request to master. The address and port number of master process is taken from the command line arguments. After slave process connected to master, it sends its address and server port number to master. Finally slave process receives process table from master and broadcasts this table to its children. Send/Receive Operation MPITH process always starts using 2 threads. The first thread executes the application program while the second thread controls I/O. When one process wants to send data to another process, MPI_SEND function is used. As the MPI_SEND function in API layer is passed to engine layer, the engine layer copy data from user to its buffer. After that engine layer will call the device handler to send its buffer to destination process. -D39-

7 For the receiving side, the I/O thread responsible for the polling the device ready to read. If it is a server device, I/O thread will accept the incoming connection since this means that there is an incoming connection. If the device is a client device, I/O thread reads the data from device and adds the incoming data to internal queue. When an application calls MPI_RECV, the engine only passes the data from incoming queue to application. The control returns when all incoming data is received from queue. Results The performance of MPITH is evaluated in a 16-nodes Beowulf cluster. Each node is a PC using Athlon 1GHz processor, 512 Mbytes RAM. This cluster connects together using a 0 Mbps Fast Ethernet switch. Most of the test has been conducted to compare the performance of MPITH with the popular MPICH and LAM. For the first test, the send/receive time between 2 nodes has been measured. The results are as shown in Figure 3. It can be seen that the speed of MPITH is comparable to MPICH. Both are a little slower than LAM. MPICH seems to have a slight problem with buffer handling when message size is ranging from about 1-0K MPITH MPICH LAM Send/Receive Time Time (microsecond) Message Size (kbyte) Figure 3 Send/Receive Time Next, the performance data of broadcast reduces, scatter, and gather operations have been measured. The results are as illustrated in Figure 4 through Figure 7 -D40-

8 00000 MPITH MPICH LAM Broadcast Time 0000 Ttime (microsecond) Figure 4 Broadcast Time Message size (kbyte) MPITH Reduce Time MPICH LAM Time (microsecond) Figure 5 Reduce Time Message Size (kbyte) -D41-

9 MPITH MPICH LAM Scatter Time Time (microsecond) Figure 6 Scatter Time Message Size (kbyte) In summary, these results show that MPITH has a performance that is comparable to MPICH. For small message, MPITH has slightly better performance than MPICH. However, for large message, MPICH becomes faster. The performance characteristics are caused by the MPITH buffering policy. In MPITH, all incoming messages, regardless of the message size, are buffered. In contrast, MPICH, buffers only the small messages. Time (microsecond) MPITH MPICH LAM Gather Time Figure 7 Gather Time Message size (kbyte) When MPITH are compared with LAM, the performance of LAM is better for small message size. However, performance of MPITH comes close to LAM when message size increases. Note that from the graph, the speed of operation when message size is about 1 kb is better much than smaller message. This peculiar result is caused by the -D42-

10 implementation of Linux TCP/IP stack. Normally, Linux kernel buffers the incoming message. Kernel only sends the message to user level when the message size reaches a certain threshold value. For small message, the size does exceed the threshold size; kernel will wait for more incoming message before returning to user process. Hence, this will cause the slowdown in TCP transmission for a certain message size. Finally, the performance of a parallel matrix multiplication program has been measured on 15 nodes. The result is as reported in Table 2. Table 2 Matrix Multiplication Program Running Time Size Running time (s) MPITH MPICH LAM It can be seen that the application performance of the matrix multiplication program are almost the same for all MPI implementation used. These results clearly demonstrate that MPITH performance is comparable with MPICH and LAM. Conclusions and Future Work In this paper, the design and implementation concept of MPITH communication library for cluster is presented. MPITH comply with subset of MPI standard. The goal of MPITH project is to build a software infrastructure to explore other aspect of MPI based parallel programming. The comparison of the performance with that of MPICH and LAM has shown very promising results that MPITH can deliver the performance rival to both popular implementations. In the future, there will be an addition of more MPI functions to make it more convenient to code the parallel application. Also, the buffering policy is one of the things that will be improved. Next generation of MPITH will run on KSIX [1] middleware that provides some features such as process management, fault tolerance. These features are essentially needed to develop a fault tolerance MPI program. -D43-

11 References [1] Angskun, T., Uthayopas, P. & Rungsawang, A Dynamic Process Management in KSIX Cluster Middleware in Proceedings of Euro PVM/MPI 2001,Santorini (Thera) Island. Greece, September [2] Balay, S., McInnes, L. C., Gropp, W. D. & Smith, B. F PETSc 2.0 users manual. ANL Report ANL-95/11, Argonne National Laboratory, Argonne, Ill., November. [3] Burns, G., Daoud, R. & Vaigl, J LAM: An open cluster environment for MPI. Technical report, Ohio Supercomputer Center, Columbus, Ohio. URL [4] Whaley, R. C Outstanding Issues in the MPIBLACS. Available on netlib from the blacs/ directory. [5] Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K., Santos, E., Subramonian, R. & von Eicken, T LogP: Towards a realistic model for parallel computation. In Proceeding of 5th Symp. on Parallel Algorithms and Architectures. [6] Geist, G. A., Beguelin, A., Dongarra, J. J., Jiang, W., Manchek, R. & Sunderam, V. S PVM 3 user's guide and reference manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory. [7] Gropp, W., Lusk, E., Doss, N. & Skjellum, A A High-performance, Portable Implementation of The MPI Message Passing Interface Standard. Parallel Computing, 22(6): [8] Karp, R., Sahay, A. & Santos, E Optimal broadcast and summation in the LogP model. Technical Report CSD , University of California, Berkeley. [9] Levine, D PGAPack Parallel Genetic Algorithm Library. Argonne National Laboratory, ANL95 /18, Argonne, Il. [] Message Passing Interface Forum MPI: A Message-Passing Interface Standard. [11] Message Passing Interface Forum MPI-2: Extensions to the Message- Passing Interface. DRAFT. [12] Zhou, H. & Geist, A LPVM: A Step Towards Multithread PVM. Technical report, Oak Ridge Nat l Laboratory. -D44-

BUILDING A HIGHLY SCALABLE MPI RUNTIME LIBRARY ON GRID USING HIERARCHICAL VIRTUAL CLUSTER APPROACH

BUILDING A HIGHLY SCALABLE MPI RUNTIME LIBRARY ON GRID USING HIERARCHICAL VIRTUAL CLUSTER APPROACH BUILDING A HIGHLY SCALABLE MPI RUNTIME LIBRARY ON GRID USING HIERARCHICAL VIRTUAL CLUSTER APPROACH Theewara Vorakosit and Putchong Uthayopas High Performance Computing and Networking Center Faculty of

More information

Multicast can be implemented here

Multicast can be implemented here MPI Collective Operations over IP Multicast? Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, U.S.A fhachen,yochoa,aapong@comp.uark.edu

More information

Low Latency MPI for Meiko CS/2 and ATM Clusters

Low Latency MPI for Meiko CS/2 and ATM Clusters Low Latency MPI for Meiko CS/2 and ATM Clusters Chris R. Jones Ambuj K. Singh Divyakant Agrawal y Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 Abstract

More information

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,

More information

SELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS

SELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS SELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS Thara Angskun, Graham Fagg, George Bosilca, Jelena Pješivac Grbović, and Jack Dongarra,2,3 University of Tennessee, 2 Oak Ridge National

More information

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC) Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts

More information

Performance of the MP_Lite message-passing library on Linux clusters

Performance of the MP_Lite message-passing library on Linux clusters Performance of the MP_Lite message-passing library on Linux clusters Dave Turner, Weiyi Chen and Ricky Kendall Scalable Computing Laboratory, Ames Laboratory, USA Abstract MP_Lite is a light-weight message-passing

More information

Design and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters*

Design and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters* Design and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters* Chao-Tung Yang, Chun-Sheng Liao, and Ping-I Chen High-Performance Computing Laboratory Department of Computer

More information

Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI

Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI and Graham E. Fagg George Bosilca, Thara Angskun, Chen Zinzhong, Jelena Pjesivac-Grbovic, and Jack J. Dongarra

More information

Group Management Schemes for Implementing MPI Collective Communication over IP Multicast

Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Xin Yuan Scott Daniels Ahmad Faraj Amit Karwande Department of Computer Science, Florida State University, Tallahassee,

More information

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo Overview of MPI 東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo 台大数学科学中心科学計算冬季学校 1 Agenda 1. Features of MPI 2. Basic MPI Functions 3. Reduction

More information

An Extensible Message-Oriented Offload Model for High-Performance Applications

An Extensible Message-Oriented Offload Model for High-Performance Applications An Extensible Message-Oriented Offload Model for High-Performance Applications Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu,

More information

Profile-Based Load Balancing for Heterogeneous Clusters *

Profile-Based Load Balancing for Heterogeneous Clusters * Profile-Based Load Balancing for Heterogeneous Clusters * M. Banikazemi, S. Prabhu, J. Sampathkumar, D. K. Panda, T. W. Page and P. Sadayappan Dept. of Computer and Information Science The Ohio State University

More information

Building MPI for Multi-Programming Systems using Implicit Information

Building MPI for Multi-Programming Systems using Implicit Information Building MPI for Multi-Programming Systems using Implicit Information Frederick C. Wong 1, Andrea C. Arpaci-Dusseau 2, and David E. Culler 1 1 Computer Science Division, University of California, Berkeley

More information

Loaded: Server Load Balancing for IPv6

Loaded: Server Load Balancing for IPv6 Loaded: Server Load Balancing for IPv6 Sven Friedrich, Sebastian Krahmer, Lars Schneidenbach, Bettina Schnor Institute of Computer Science University Potsdam Potsdam, Germany fsfried, krahmer, lschneid,

More information

PARA++ : C++ Bindings for Message Passing Libraries

PARA++ : C++ Bindings for Message Passing Libraries PARA++ : C++ Bindings for Message Passing Libraries O. Coulaud, E. Dillon {Olivier.Coulaud, Eric.Dillon}@loria.fr INRIA-lorraine BP101, 54602 VILLERS-les-NANCY, FRANCE Abstract The aim of Para++ is to

More information

Towards a Portable Cluster Computing Environment Supporting Single System Image

Towards a Portable Cluster Computing Environment Supporting Single System Image Towards a Portable Cluster Computing Environment Supporting Single System Image Tatsuya Asazu y Bernady O. Apduhan z Itsujiro Arita z Department of Artificial Intelligence Kyushu Institute of Technology

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

TEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology

TEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology TEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology T.S. Woodall 1, R.L. Graham 1, R.H. Castain 1, D.J. Daniel 1, M.W. Sukalski 2, G.E. Fagg 3, E. Gabriel 3, G. Bosilca

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

MPI: Message Passing Interface An Introduction. S. Lakshmivarahan School of Computer Science

MPI: Message Passing Interface An Introduction. S. Lakshmivarahan School of Computer Science MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science MPI: A specification for message passing libraries designed to be a standard for distributed memory message passing,

More information

Distributed Memory Parallel Programming

Distributed Memory Parallel Programming COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining

More information

MPI History. MPI versions MPI-2 MPICH2

MPI History. MPI versions MPI-2 MPICH2 MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

Low-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2

Low-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2 Low-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2 Vijay Moorthy, Matthew G. Jacunski, Manoj Pillai,Peter, P. Ware, Dhabaleswar K. Panda, Thomas W. Page Jr., P. Sadayappan, V. Nagarajan

More information

300x Matlab. Dr. Jeremy Kepner. MIT Lincoln Laboratory. September 25, 2002 HPEC Workshop Lexington, MA

300x Matlab. Dr. Jeremy Kepner. MIT Lincoln Laboratory. September 25, 2002 HPEC Workshop Lexington, MA 300x Matlab Dr. Jeremy Kepner September 25, 2002 HPEC Workshop Lexington, MA This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions,

More information

RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits

RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits Sayantan Sur Hyun-Wook Jin Lei Chai D. K. Panda Network Based Computing Lab, The Ohio State University Presentation

More information

AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1. October 4 th, Department of Computer Science, Cornell University

AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1. October 4 th, Department of Computer Science, Cornell University AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1 October 4 th, 2012 1 Department of Computer Science, Cornell University Papers 2 Active Messages: A Mechanism for Integrated Communication and Control,

More information

Beowulf Clusters for Evolutionary Computation. Acknowledgements. Tutorial Objective. Expected Background of Participants

Beowulf Clusters for Evolutionary Computation. Acknowledgements. Tutorial Objective. Expected Background of Participants Beowulf Clusters for Evolutionary Computation Arun Khosla Department of Electronics and Communication Engineering National Institute of Technology Jalandhar 144 011. INDIA khoslaak@nitj.ac.in Pramod Kumar

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

100 Mbps DEC FDDI Gigaswitch

100 Mbps DEC FDDI Gigaswitch PVM Communication Performance in a Switched FDDI Heterogeneous Distributed Computing Environment Michael J. Lewis Raymond E. Cline, Jr. Distributed Computing Department Distributed Computing Department

More information

MPI versions. MPI History

MPI versions. MPI History MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com. December 2003 Provided by ClusterWorld for Jeff Squyres cw.squyres.com www.clusterworld.com Copyright 2004 ClusterWorld, All Rights Reserved For individual private use only. Not to be reproduced or distributed

More information

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University

More information

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III) Topics Lecture 6 MPI Programming (III) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking communication Manager-Worker Programming

More information

MPICH on Clusters: Future Directions

MPICH on Clusters: Future Directions MPICH on Clusters: Future Directions Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory thakur@mcs.anl.gov http://www.mcs.anl.gov/~thakur Introduction Linux clusters are

More information

Performance Analysis and Optimal Utilization of Inter-Process Communications on Commodity Clusters

Performance Analysis and Optimal Utilization of Inter-Process Communications on Commodity Clusters Performance Analysis and Optimal Utilization of Inter-Process Communications on Commodity Yili TSENG Department of Computer Systems Technology North Carolina A & T State University Greensboro, NC 27411,

More information

An O/S perspective on networks: Active Messages and U-Net

An O/S perspective on networks: Active Messages and U-Net An O/S perspective on networks: Active Messages and U-Net Theo Jepsen Cornell University 17 October 2013 Theo Jepsen (Cornell University) CS 6410: Advanced Systems 17 October 2013 1 / 30 Brief History

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

Analysis of the Component Architecture Overhead in Open MPI

Analysis of the Component Architecture Overhead in Open MPI Analysis of the Component Architecture Overhead in Open MPI B. Barrett 1, J.M. Squyres 1, A. Lumsdaine 1, R.L. Graham 2, G. Bosilca 3 Open Systems Laboratory, Indiana University {brbarret, jsquyres, lums}@osl.iu.edu

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein Parallel & Cluster Computing cs 6260 professor: elise de doncker by: lina hussein 1 Topics Covered : Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster

More information

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III) Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking

More information

Clusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory

Clusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory Clusters Rob Kunz and Justin Watson Penn State Applied Research Laboratory rfk102@psu.edu Contents Beowulf Cluster History Hardware Elements Networking Software Performance & Scalability Infrastructure

More information

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Gurkirat Kaur, Manoj Kumar 1, Manju Bala 2 1 Department of Computer Science & Engineering, CTIEMT Jalandhar, Punjab, India 2 Department of Electronics

More information

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments 1 A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments E. M. Karanikolaou and M. P. Bekakos Laboratory of Digital Systems, Department of Electrical and Computer Engineering,

More information

Parallel Python using the Multiprocess(ing) Package

Parallel Python using the Multiprocess(ing) Package Parallel Python using the Multiprocess(ing) Package K. 1 1 Department of Mathematics 2018 Caveats My understanding of Parallel Python is not mature, so anything said here is somewhat questionable. There

More information

AN EMPIRICAL STUDY OF EFFICIENCY IN DISTRIBUTED PARALLEL PROCESSING

AN EMPIRICAL STUDY OF EFFICIENCY IN DISTRIBUTED PARALLEL PROCESSING AN EMPIRICAL STUDY OF EFFICIENCY IN DISTRIBUTED PARALLEL PROCESSING DR. ROGER EGGEN Department of Computer and Information Sciences University of North Florida Jacksonville, Florida 32224 USA ree@unf.edu

More information

Optimizing TCP in a Cluster of Low-End Linux Machines

Optimizing TCP in a Cluster of Low-End Linux Machines Optimizing TCP in a Cluster of Low-End Linux Machines ABDALLA MAHMOUD, AHMED SAMEH, KHALED HARRAS, TAREK DARWICH Dept. of Computer Science, The American University in Cairo, P.O.Box 2511, Cairo, EGYPT

More information

Getting Insider Information via the New MPI Tools Information Interface

Getting Insider Information via the New MPI Tools Information Interface Getting Insider Information via the New MPI Tools Information Interface EuroMPI 2016 September 26, 2016 Kathryn Mohror This work was performed under the auspices of the U.S. Department of Energy by Lawrence

More information

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

More information

Introduction to Cluster Computing

Introduction to Cluster Computing Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA Overview High performance computing High throughput computing NOW, HPC, and HTC Parallel algorithms Software

More information

Message-Passing Computing

Message-Passing Computing Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate

More information

Understanding MPI on Cray XC30

Understanding MPI on Cray XC30 Understanding MPI on Cray XC30 MPICH3 and Cray MPT Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Cray provides enhancements on top of this: low level communication

More information

Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments

Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments Paulo S. Souza * Luciano J. Senger ** Marcos J. Santana ** Regina C. Santana ** e-mails: {pssouza, ljsenger, mjs,

More information

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678

More information

On the Performance of Simple Parallel Computer of Four PCs Cluster

On the Performance of Simple Parallel Computer of Four PCs Cluster On the Performance of Simple Parallel Computer of Four PCs Cluster H. K. Dipojono and H. Zulhaidi High Performance Computing Laboratory Department of Engineering Physics Institute of Technology Bandung

More information

High Performance MPI-2 One-Sided Communication over InfiniBand

High Performance MPI-2 One-Sided Communication over InfiniBand High Performance MPI-2 One-Sided Communication over InfiniBand Weihang Jiang Jiuxing Liu Hyun-Wook Jin Dhabaleswar K. Panda William Gropp Rajeev Thakur Computer and Information Science The Ohio State University

More information

A Resource Look up Strategy for Distributed Computing

A Resource Look up Strategy for Distributed Computing A Resource Look up Strategy for Distributed Computing F. AGOSTARO, A. GENCO, S. SORCE DINFO - Dipartimento di Ingegneria Informatica Università degli Studi di Palermo Viale delle Scienze, edificio 6 90128

More information

The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive

The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte Alain Greiner Univ. Paris 6, France http://mpc.lip6.fr

More information

Basic Communication Operations (Chapter 4)

Basic Communication Operations (Chapter 4) Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne Chapter 4: Threads Silberschatz, Galvin and Gagne Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Linux Threads 4.2 Silberschatz, Galvin and

More information

A Modular Client-Server Discrete Event Simulator for Networked Computers

A Modular Client-Server Discrete Event Simulator for Networked Computers A Modular Client-Server Discrete Event Simulator for Networked Computers David Wangerin, Christopher DeCoro, Luis M. Campos, Hugo Coyote and Isaac D. Scherson {dwangeri, cdecoro, lmcampos, coyote, isaac}@ics.uci.edu

More information

Scalasca performance properties The metrics tour

Scalasca performance properties The metrics tour Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware

More information

International Journal of High Performance Computing Applications

International Journal of High Performance Computing Applications International Journal of High Performance Computing Applications http://hpc.sagepub.com/ Towards an Accurate Model for Collective Communications Sathish S. Vadhiyar, Graham E. Fagg and Jack J. Dongarra

More information

High Performance MPI-2 One-Sided Communication over InfiniBand

High Performance MPI-2 One-Sided Communication over InfiniBand High Performance MPI-2 One-Sided Communication over InfiniBand Weihang Jiang Jiuxing Liu Hyun-Wook Jin Dhabaleswar K. Panda William Gropp Rajeev Thakur Computer and Information Science The Ohio State University

More information

DEVELOPMENT OF A UNIVERSITY-BASED HIGH PERFORMANCE COMPUTING SYSTEM OF PC CLUSTERS

DEVELOPMENT OF A UNIVERSITY-BASED HIGH PERFORMANCE COMPUTING SYSTEM OF PC CLUSTERS DEVELOPMENT OF A UNIVERSITY-BASED HIGH PERFORMANCE COMPUTING SYSTEM OF PC CLUSTERS WILLIAM EMMANUEL S. YU Abstract. The rapid increase in performance of mass-market commodity microprocessors and significant

More information

Benchmarking a Network of PCs Running Parallel Applications 1

Benchmarking a Network of PCs Running Parallel Applications 1 (To Appear: International Performance, Computing, and unications Conference, Feb. 998. Phoenix, AZ) Benchmarking a Network of PCs Running Parallel Applications Jeffrey K. Hollingsworth Erol Guven Cuneyt

More information

Scaling Tuple-Space Communication in the Distributive Interoperable Executive Library. Jason Coan, Zaire Ali, David White and Kwai Wong

Scaling Tuple-Space Communication in the Distributive Interoperable Executive Library. Jason Coan, Zaire Ali, David White and Kwai Wong Scaling Tuple-Space Communication in the Distributive Interoperable Executive Library Jason Coan, Zaire Ali, David White and Kwai Wong August 18, 2014 Abstract The Distributive Interoperable Executive

More information

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2015 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 1 Question 1 Why did the use of reference counting for remote objects prove to be impractical? Explain. It s not fault

More information

WhatÕs New in the Message-Passing Toolkit

WhatÕs New in the Message-Passing Toolkit WhatÕs New in the Message-Passing Toolkit Karl Feind, Message-passing Toolkit Engineering Team, SGI ABSTRACT: SGI message-passing software has been enhanced in the past year to support larger Origin 2

More information

Supercomputing Engine for Mathematica. Supercomputing 2008 Austin, Texas

Supercomputing Engine for Mathematica. Supercomputing 2008 Austin, Texas Supercomputing Engine for Mathematica Supercomputing 2008 Austin, Texas Supercomputing Engine for Mathematica Dean E. Dauger, Ph. D. President, Dauger Research, Inc. d@daugerresearch.com The First Mac

More information

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010. Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -

More information

Számítogépes modellezés labor (MSc)

Számítogépes modellezés labor (MSc) Számítogépes modellezés labor (MSc) Running Simulations on Supercomputers Gábor Rácz Physics of Complex Systems Department Eötvös Loránd University, Budapest September 19, 2018, Budapest, Hungary Outline

More information

Open MPI: A High Performance, Flexible Implementation of MPI Point-to-Point Communications

Open MPI: A High Performance, Flexible Implementation of MPI Point-to-Point Communications c World Scientific Publishing Company Open MPI: A High Performance, Flexible Implementation of MPI Point-to-Point Communications Richard L. Graham National Center for Computational Sciences, Oak Ridge

More information

Supercomputing Engine for Mathematica. Supercomputing 2009 Portland, Oregon

Supercomputing Engine for Mathematica. Supercomputing 2009 Portland, Oregon Supercomputing Engine for Mathematica Supercomputing 2009 Portland, Oregon Supercomputing Engine for Mathematica Dean E. Dauger, Ph. D. President, Dauger Research, Inc. d@daugerresearch.com The First Mac

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Ketan Kulkarni and Edgar Gabriel Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston {knkulkarni,gabriel}@cs.uh.edu

More information

Performance Analysis of Parallel Applications Using LTTng & Trace Compass

Performance Analysis of Parallel Applications Using LTTng & Trace Compass Performance Analysis of Parallel Applications Using LTTng & Trace Compass Naser Ezzati DORSAL LAB Progress Report Meeting Polytechnique Montreal Dec 2017 What is MPI? Message Passing Interface (MPI) Industry-wide

More information

The Lighweight Protocol CLIC on Gigabit Ethernet

The Lighweight Protocol CLIC on Gigabit Ethernet The Lighweight Protocol on Gigabit Ethernet Díaz, A.F.; Ortega; J.; Cañas, A.; Fernández, F.J.; Anguita, M.; Prieto, A. Departamento de Arquitectura y Tecnología de Computadores University of Granada (Spain)

More information

Scheduling Heuristics for Efficient Broadcast Operations on Grid Environments

Scheduling Heuristics for Efficient Broadcast Operations on Grid Environments Scheduling Heuristics for Efficient Broadcast Operations on Grid Environments Luiz Angelo Barchet-Steffenel 1 and Grégory Mounié 2 1 LORIA,Université Nancy-2 2 Laboratoire ID-IMAG Nancy, France Montbonnot

More information

Performance Modeling and Evaluation of MPI

Performance Modeling and Evaluation of MPI Performance Modeling and Evaluation of MPI Khalid Al-Tawil Csaba Andras Moritz y Abstract Users of parallel machines need to have a good grasp for how different communication patterns and styles affect

More information

ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES

ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES Sergio N. Zapata, David H. Williams and Patricia A. Nava Department of Electrical and Computer Engineering The University of Texas at El Paso El Paso,

More information

Parallel Processing Experience on Low cost Pentium Machines

Parallel Processing Experience on Low cost Pentium Machines Parallel Processing Experience on Low cost Pentium Machines By Syed Misbahuddin Computer Engineering Department Sir Syed University of Engineering and Technology, Karachi doctorsyedmisbah@yahoo.com Presentation

More information

A Software Tool for Accurate Estimation of Parameters of Heterogeneous Communication Models

A Software Tool for Accurate Estimation of Parameters of Heterogeneous Communication Models A Software Tool for Accurate Estimation of Parameters of Heterogeneous Communication odels Alexey Lastovetsky, Vladimir Rychkov, and aureen O Flynn School of Computer Science and Informatics, University

More information

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014 Introduction to MPI EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2014 References This presentation is almost an exact copy of Dartmouth College's Introduction

More information

Supercomputing Engine for Mathematica. Machine Evaluation Workshop 19-2 Dec 2008 Runcorn, Daresbury, United Kingdom

Supercomputing Engine for Mathematica. Machine Evaluation Workshop 19-2 Dec 2008 Runcorn, Daresbury, United Kingdom Supercomputing Engine for Mathematica Machine Evaluation Workshop 19-2 Dec 2008 Runcorn, Daresbury, United Kingdom Supercomputing Engine for Mathematica Dean E. Dauger, Ph. D. President, Dauger Research,

More information

Recently, symmetric multiprocessor systems have become

Recently, symmetric multiprocessor systems have become Global Broadcast Argy Krikelis Aspex Microsystems Ltd. Brunel University Uxbridge, Middlesex, UK argy.krikelis@aspex.co.uk COMPaS: a PC-based SMP cluster Mitsuhisa Sato, Real World Computing Partnership,

More information

Abstract. 1 Introduction

Abstract. 1 Introduction The performance of fast Givens rotations problem implemented with MPI extensions in multicomputers L. Fernández and J. M. García Department of Informática y Sistemas, Universidad de Murcia, Campus de Espinardo

More information

The Design and Implementation of an Asynchronous Communication Mechanism for the MPI Communication Model

The Design and Implementation of an Asynchronous Communication Mechanism for the MPI Communication Model The Design and Implementation of an Asynchronous Communication Mechanism for the MPI Communication Model Motohiko Matsuda, Tomohiro Kudoh, Hiroshi Tazuka Grid Technology Research Center National Institute

More information

Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects

Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects Jiuxing Liu Balasubramanian Chandrasekaran Weikuan Yu Jiesheng Wu Darius Buntinas Sushmitha Kini Peter Wyckoff Dhabaleswar

More information

A Message Passing Standard for MPP and Workstations

A Message Passing Standard for MPP and Workstations A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker Message Passing Interface (MPI) Message passing library Can be

More information

High-performance message striping over reliable transport protocols

High-performance message striping over reliable transport protocols J Supercomput (2006) 38:261 278 DOI 10.1007/s11227-006-8443-6 High-performance message striping over reliable transport protocols Nader Mohamed Jameela Al-Jaroodi Hong Jiang David Swanson C Science + Business

More information

LogP Performance Assessment of Fast Network Interfaces

LogP Performance Assessment of Fast Network Interfaces November 22, 1995 LogP Performance Assessment of Fast Network Interfaces David Culler, Lok Tin Liu, Richard P. Martin, and Chad Yoshikawa Computer Science Division University of California, Berkeley Abstract

More information

Software Architecture Patterns

Software Architecture Patterns Software Architecture Patterns *based on a tutorial of Michael Stal Harald Gall University of Zurich http://seal.ifi.uzh.ch/ase www.infosys.tuwien.ac.at Overview Goal Basic architectural understanding

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

Modeling Cone-Beam Tomographic Reconstruction U sing LogSMP: An Extended LogP Model for Clusters of SMPs

Modeling Cone-Beam Tomographic Reconstruction U sing LogSMP: An Extended LogP Model for Clusters of SMPs Modeling Cone-Beam Tomographic Reconstruction U sing LogSMP: An Extended LogP Model for Clusters of SMPs David A. Reimann, Vipin Chaudhary 2, and Ishwar K. Sethi 3 Department of Mathematics, Albion College,

More information

Parallel Programming

Parallel Programming Parallel Programming MPI Part 1 Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS17/18 Preliminaries Distributed-memory architecture Paolo Bientinesi MPI 2 Preliminaries Distributed-memory architecture

More information

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel

More information