BUILDING A HIGHLY SCALABLE MPI RUNTIME LIBRARY ON GRID USING HIERARCHICAL VIRTUAL CLUSTER APPROACH

Size: px
Start display at page:

Download "BUILDING A HIGHLY SCALABLE MPI RUNTIME LIBRARY ON GRID USING HIERARCHICAL VIRTUAL CLUSTER APPROACH"

Transcription

1 BUILDING A HIGHLY SCALABLE MPI RUNTIME LIBRARY ON GRID USING HIERARCHICAL VIRTUAL CLUSTER APPROACH Theewara Vorakosit and Putchong Uthayopas High Performance Computing and Networking Center Faculty of Engineering, Kasetsart University, 50 Phaholyotin Rd, Chatuchak Bangkok, 0900, Thailand thvo@hpcnc.cpe.ku.ac.th, pu@ku.ac.th ABSTRACT For large computational grid systems, the Message Passing Interface (MPI) is one of the most widely used programming models for distributed computing. However, current MPI implementations cannot utilize all computing nodes on a grid as a result of network configurations that effectively hide nodes behind a frontend node or gateway. This work presents an approach called Hierarchical Virtual Clustering (HVC) that overcomes this problem in an efficient manner. The idea is to separate a grid into physical and hierarchically organized logical clusters, and to incorporate routing between these clusters at the MPI runtime library level. This approach has been implemented on an experimental grid-enabled runtime system called MPITH. Experiments have been conducted to compare the performance of MPITH with MPICH and found that a comparable speed can be obtained with a very little loss of performance. Hence, the approach presented here can be used to vastly increase grid computing resources available to MPI applications without impacting efficient utilization of those resources. KEY WORDS MPI, grid, cluster, NAT transparency, and routing algorithm. Introduction MPI [],[2] is one of the most established parallel programming models for distributed computing on a grid[3]. However, most of the nodes on a grid are hidden behind clusters front-end nodes; in this situation a gridbased MPI runtime can only utilize the front-end node of each cluster for computations. This substantially reduces the amount of computing power available to grid users. In this paper, a new and systematic approach to enable the MPI runtime to use all nodes on the grid is proposed. This Hierarchical Virtual Cluster or HVC approach provides a uniform way to make all nodes addressable by a grid-based MPI runtime environment. Hence, this practically eliminates the problem caused by NAT and gateways. The HVC approach has been incorporated into MPITH[], our grid-based MPI implementation. Experiments show that this system can utilize more nodes in a scalable and efficient way. The rest of this paper is organized as follows. Section 2 describes previous work in this area. Section 3 describes the system model. Section gives an overview of the model s implementation in MPITH. Experimental results are presented in Section 5. Finally, conclusions and future work are given in Section Related Work Many programming models have been employed to harvest the enormous computing power of a grid system; an extensive survey of various approaches can be found in [5]. MPI is one of the most established parallel programming models that can be used on a grid. Many MPI implementations have been developed; of these, two of the most widely used are LAM[6] and MPICH[7]. MPICH, developed at Argonne National Laboratory, provides a portable, high performance MPI implementation for clusters and grid systems; MPICH supports both the MPI.2 and 2.0 standards. LAM is another MPI implementation that provides many additional features such as OpenPBS integration, beta grid support, high-speed interconnection support, check-point and restart, fast start-up. Open MPI[8], a recent collaboration between several universities and Los Alamos Nat'l Laboratory, aims to integrate technologies of several MPI projects into a fast, efficient, MPI-2 compliant implementation. Open MPI currently focuses on fault-tolerance, heterogeneity, and performance rather than providing seamless grid support. There are also special purpose MPI implementations that focus on specific issues. MPI-FT[9] focuses on how to make the MPI runtime fault-tolerance in a semitransparent way. MPICH-G2[0] is an extension of MPICH to grid systems, supporting the authentication, scheduling, and resource management requirements of a grid environment. However, MPICH-G2 does not yet provide a solution for running tasks across NAT gateways. MagPIe[], another MPI library that extends MPICH, attempts to optimize the MPI communication algorithm for wide area networks using a two-layer hierarchy: cluster level (LAN) and grid level (WAN). Optimizing MPI communications is the focus of other

2 projects such as GridMPI[2], that incorporates an algorithm to build a latency aware communication library. The MPITH project has also proposed topology-aware communications using a genetic algorithm and smart message scheduling. Another challenge presented by the grid is how to build an efficient communication library that is aware of the presence of NAT gateways. This is crucial since, as mentioned above, most current MPI implementations cannot utilize the majority of computing resources hidden behind NAT gateways. There are two approaches to this problem. The first is to use a MPI routing daemon. PACX-MPI[3] utilizes this approach: intra-cluster communication takes places directly among tasks, while inter-cluster communication is done through communication nodes called MPI-servers. An MPI-server compresses data and transfers it via TCP/IP to a destination, called a PACX-server, which decompresses the data and sends it to the target node. Although this is a viable solution, it has some drawbacks. First, the MPI- and PACX-server have to be installed and maintained on each cluster, which complicates administration. Second, the MPI-Server will run and consume some resources continuously even if no MPI programs are being used. Third, the presence of separate MPI routing processes complicates runtime system and adds difficulties to enforcing the grid security model. The second approach is to implement a MPI message forwarder in the MPI runtime. This eliminates the need for a separate MPI routing daemon and promotes strict enforcement of the grid security model. MPICH/MADIII[] uses this approach. MPICH/MADIII is designed to be a complementary tool for MPICH-G2, implementing MPICH device names on top of the Madeleine III communication library. However, interfacing MPICH/MADIII and MPICH-G2 complicates the runtime. Our approach integrates both grid-level and cluster-level message routing into a single MPI implementation. Hence, it provides full transparency and global optimization between the grid and cluster level. 3. Hierarchical Virtual Clusters In order to support NAT transparency and provide a highly scalable MPI runtime environment, a new approach called Hierarchical Virtual Cluster Model (HVC) is proposed. This section describes about the model and MPI process routing mechanism 3.. The Proposed Model A cluster is a group of computers or nodes connected together through an interconnection network. A designated front-end node acts as a management point for the cluster. In this paper, we use a term physical cluster to refer to this form of cluster. There are two types of physical clusters: closed and open. In a closed physical cluster, all compute nodes are placed behind the front-end node. The front-end or gateway node provides NAT facility for other nodes. In an open physical cluster, all compute nodes are directly addressable from the outside network. A grid is an interconnected set of these clusters. Figure shows a grid consisting of three clusters: Mercury, Venus, and Jupiter. Venus and Jupiter are closed physical clusters while Mercury is an open physical cluster. Venus, Jupiter and Mercury are front-end nodes of their respective clusters. Figure Clusters environment Connection initiation between nodes in a physical cluster can be represented by a graph termed a Connectivity Graph (CG), as defined in Definition. Definition. A Connectivity Graph (CG) is a directed graph G = (V,E) where V is the set of nodes in a grid and E is the set of edges. An edge (u, v) E if and only if node u can initiate a connection to node v. Figure 2 shows the CG derived from Figure Figure 2 Connectivity Graph of Figure In a NAT environment, a node behind a gateway can initiate a connection to outside nodes directly but an outside node cannot initiate a connection to a node behind the gateway since it cannot identify the destination address. For example, Mercury2 cannot initiate a connection to Venus2 if Venus2 is in NAT environment behind Venus. Although a connectivity graph provides a lot of useful information, this representation of network topology is too complex for our purposes. A connection between two nodes is useful only if initiation can be done bi-directionally. Based on this observation, a connectivity 5

3 graph can be reduced to a simpler Direct Connectivity Graph (DCG), as in Definition 2. Definition 2. Direct Connectivity Graph (DCG) is an undirected graph G = (V, E ) where V = V, E is the set of edges. An edge e E if and only if (u,v) E and (v,u) E. Figure illustrates three VC[]s in G. The group of VC[] in G is called a VC[] set. Note that VC[] is a strongly connected component in G and an edge (u,v) E if and only if u and v are in the same VC[]. As for VC[], there are more than one VC for each level of them in G. These VC[i] can be grouped into a VC[i] set. Figure 5 and Figure 6 shows that VC[] can be merged into a higher level. The gateway vertex is important because in MPI process mapping, we always map one process onto this node. This process not only performs computation, but also helps route messages to and from nodes inside the cluster as well. Figure 3 DCG derived from Figure 2 Figure 3 shows the DCG resulting from Figure 2. With this concept, the task we need to pursue is finding a systematic way to map MPI tasks onto a grid and provide routing that ultimately organizes the tasks into a DCG. Hence, all tasks will have a way to communicate with each other regardless of any NAT encountered. The mapping proposed in this paper is based on the concept of building a multi-level set of virtual clusters on a grid of physical clusters. A virtual cluster is a set of nodes such that every node can initiate a bi-directional connection to every node in the cluster. Let G be a DCG of a grid. We define virtual clusters and vertices of virtual clusters as follows. Definition 3. A Level Virtual Cluster, denoted by VC[], is a maximal subgraph of G such that the shortest path length u v is for all nodes u, v in the subgraph. Definition. A Level i Virtual Cluster, denoted by VC[i], is a maximal subgraph of G such that shortest path length u v is at most i for all nodes u, v in the subgraph. Definition 5. A gateway vertex of a VC is a vertex v V such that the reduced set V = V {v} causes G = (V,E ) to be divided into two connected components VC[] VC[] 2 VC[] 3 5 Figure VC[] in G VC[3] Figure 5 Intermediate step of VC merging Figure 6 Final VC merging The maximum level of VC in G is equal to the longest path length in G. Using this approach, a mapping of MPI tasks onto physical clusters can be performed by iteratively merging lower level VCs into a higher level VC. Hence, routing can be easily be derived by sending messages from a top level VC in HVC down the hierarchy of VCs. Another important property is that a VC[i] is always a connected component Routing Discovery Algorithm Having established that a VC is always a connected graph, a route can be considered as a shortest path from u to v. Floyd-Warshall s algorithm [5] can be used to find all paired shortest paths. First, process mappings are defined based on the HVC model. Once this information has been obtained, Floyd-Warshall s algorithm is used to compute the routing table. This routing table is then passed to all nodes. The HVC concept always ensures that one process is placed on a gateway node or gateway 5

4 vertex of each VC and provides the necessary routing capability. To provide routing between each pair of nodes, we define a route from u v in G as a simple path <u 0,u,,u k > such that u = u 0 and v = u k. Vertex u in the path is called a next hop of routing from u to v. The source vertex only needs to know the next hop to a given destination. Routes are maintained in a routing table. The routing table maintains next hop vertex for each source and destination. Definition 6 defines the routing table formally (the term routing matrix is used interchangeably). Definition 6. A routing table T is an n n matrix where n is the number of vertices in V such that t uv = u. Floyd-Warshall's algorithm needs weight of each edge. We associate weight to edge as w uv = if (u, v) E ; otherwise w uv =. After the routing table is created, a routing function r can be defined as r(u, v) = t uv Process-Level Routing An MPI application consists of a group of processes running cooperatively. Normally, the mapping of processes onto physical cluster nodes is known prior to the startup of MPI tasks. After the application is started, MPI processes are identified by MPI rank. Given a DCG G(V,E) that represent the grid, process mapping can be defined as follow. Definition 7. Process mapping is a function m: P V where P is a set of processes and V is a set of vertices in G. Definition 8. The reverse of the process mapping function is a relation μ:v P where V is a set of vertices in G and P is a set of processes. Both m and μ are used in the routing routine for MPI message forwarding. Let n(v) be the number of processes that are mapped onto vertex v. At run-time, the MPI process is a forwarder, so n(v) for each v that is a gateway vertex. Routing and message forwarding are implemented as follows: suppose process p sends a message to p 2. Let u = m(p ) and v = m(p 2 ). The next hop vertex is v = t uv. Then, find the process that is mapped onto v. Suppose that p 3 = μ(v ). At this point, p creates an MPI message and sends it to p 3, specifying that the destination is p 2. After p 3 receives message, it uses the same method to find next hop process. This forwarding routine stops when p 2 receives the message. developed as a framework for testing of various messages passing-based research projects. MPITH supports both intra-cluster and inter-cluster environments. MPITH uses a remote execution command such as rsh and ssh to start processes in the same cluster and GRAM/DUROC for inter-cluster execution. It also provides a utility to create an RSL file for use with the globusrun command. The implementation of HVC is as described here. In MPITH, process or tasks are categorized as primary master, secondary master, and slave. The master-type process is responsible for spawning of processes within its physical cluster. There is one primary master and remaining processes are secondary masters. The primary master differs from secondaries in that the former is a source of configuration files. A process that is not a master-type process is a slave process. All process types have the same execution codes. The process type is specified in the configuration file. There are two configuration files: a grid description file and a process mapping file. The grid description file describes HVC; it identifies the clusters, number of nodes in each cluster, and interconnections among clusters. The process mapping file describes process mapping function, including the number of processes that are mapped into each cluster, and specifies the process that is the primary master process. On startup, master-type processes are started in each cluster using the globusrun command with a generated RSL script. The primary master process has two additional configuration files as arguments. After these processes are started, an MPI_Init() function is executed. In MPI_Init(), the master processes synchronize using DUROC and learn the IP address/port of all other master processes. Then, the primary master process broadcasts a grid description and process mapping file to the other masters. After the secondary master processes receive the configuration files, they spawn child process in their cluster according to the requirements in the process mapping file. Child processes then execute the MPI_Init() function to register themselves with their master. Then, all processes return from MPI_Init and execute user code. 5. Experimental Results The proposed algorithm is tested on the Kasetsart University Grid (KUGrid), which is a part of Thai Grid[6]. Four clusters are included in the evaluation. All clusters are linked through KU s Gigabit campus network. Table shows the clusters' configuration.. MPITH Implementation In order to evaluate the proposed model, we implement HVC in MPITH, an experimental library that has been

5 Table Test bed configuration Cluster Type Nodes Brief Configuration Network Maeka Close 32 Opteron 2, 3 GB RAM Gigabit Gass Open 6 dual AthlonMP 800+, GB RAM Fast Magi Close Athlon XP 200+, 52 MB RAM Fast Amata Open 5 Athlon GHz, 52 MB RAM Fast MPITH is evaluated in three areas: point-to-point, broadcast, and application performance in comparison with MPICH All tests presented here were done ten times and the average values reported. First, point-to-point performance is evaluated. We compare time spend in send/receive operation between MPITH and MPICH2. The results are shown in Figure 7. Transmission time (milliseconds) MPICH2 Cluster MPITH Cluster MPITH Grid MPITH NAT Send/recv operation e e+07 Message size (bytes) Figure 7 point-to-point performances Figure 7 shows point-to-point performance of MPITH and MPICH2 for a message size from byte to 2 MB. The series labeled MPICH2 and MPITH cluster is the transmission time of MPI_Send/MPI_Recv in the Magi cluster. MPITH grid is the transmission time of MPI_Send/MPI_Recv between the head node of Magi and Maeka clusters. Finally, MPITH NAT is the transmission time between the head node of Magi and a compute node of Maeka. In the cluster environment, MPITH performance is better than MPICH2, especially for a message size between byte to 6 kb. For larger message sizes, both MPITH and MPICH2 exhibit similar levels of performance. In a grid environment, the transmission time is larger than in case of a cluster. The time of directconnect and NAT environment is the same, especially when the message size is larger than 6 kb. This shows that MPITH s message forwarder routine performs very well. Figure 8 shows MPI_Bcast performance in two cluster configurations and two grid configurations. Table 2 shows the four configurations in this test. Total transmission time (milliseconds) MPICH2 6 nodes MPITH 6 nodes MPITH/grid # MPITH/grid #2 MPI_Bcast e e+07 Message size (bytes) Figure 8 MPI_Bcast performances Table 2 Test-bed for MPI_Bcast Configuration Nodes Maeka Magi Gass MPICH2 6 nodes MPITH 6 nodes MPITH/grid # 8 MPITH/grid #2 6 2 For a small message size, the total transmission time depends mainly on latency that comes from the Linux buffer policy. When the message is smaller than about kb, message fits in the Linux kernel buffer, hence Linux waits until buffer is full. For a larger message size, the message is larger than the buffer, hence Linux will flush message immediately. Figure 8 shows that MPITH is slightly better than MPICH2 in a cluster environment. In a grid environment, MPITH uses the LPBF[7] algorithm for the MPI_Bcast operation. For small messages, time varies greatly due to the buffer management policy. For messages larger than kb, two grid configurations have constant different total transmission time. Running time (seconds) MPICH2 32 nodes cluster MPITH 32 nodes cluster MPITH/grid # MPITH/grid #2 MPITH/grid #3 MPITH/grid # Gaussian Elimination Problem size Figure 9 Gauss elimination on grid environment

6 Table 3 Grid configuration in test environment Configuration Nodes Maeka Amata Magi Gass MPITH/grid # MPITH/grid #2 6 8 MPITH/grid # MPITH/grid # Figure 9 shows running times of a Gaussian elimination application in a grid environment compared with a cluster environment. Table 3 shows the configuration of each grid environment. The result shows that applications running in a cluster perform better than in a grid environment. This is because of three factors; is the implementation of the MPITH algorithm itself. Gaussian elimination uses MPI_Scatter, but the current version of MPITH is not optimized for the grid version of this operation. Second, Gaussian elimination is a fine grain application so it performs better in a network with faster interconnection. The third factor is the processor speed of compute nodes: nodes in the Maeka cluster are faster that those of other clusters. 6. Conclusion This paper presented a model of MPI process creation and routing using the Hierarchical Virtual Cluster concept and implementation as a component of a gridenabled MPI runtime library. The model supports this routing algorithm and demonstrated that tasks on a grid can always communicate, even when the tasks are located on nodes hidden behind NAT gateways or other closed cluster configurations. A master process running on the gateway node of each cluster performs the required message forwarding. For grid computing applications, this means that all nodes on a grid can be fully utilized, substantially increasing the computing power available to grid applications. Tests conducted using the MPITH experimental MPI runtime show that MPI routing imposes very little overhead and provides good performance. This approach does not require any kernel modification nor running a separate routing process, both desirable features for administration and enforcement of grid security. 7. References: [] M. Snir, S. Otto, S. Huss-Lederman, D. Walker and J. Dongarra, MPI: The Complete Reference Volume - The MPI Core, 2 nd edition (MIT Press, 998) [2] W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir and M. Snir, MPI: The Complete Reference Volume 2 - The MPI-2 Extensions, (MIT Press, 998) [3] G. Fox, D. Gannon, and M. Thomas, Special Issues: Grid Computing Environments, Concurrency and Computation: Practice and Experience. (3-5), 2002, [] T. Vorakosit, and P. Uthayopas, Developing a Thin and High Performance Implementation of Message Passing Interface. Proceeding of The Sixth Annual National Symposium on Computational Science and Engineering, Nakhonsithammarat, Thailand, [5] C. Lee, and D. Talia, Grid Programming Models: Current Tools, Issues and Directions. Proceeding of Grid Computing: Making the Global Infrastructure a Really. John Wiley and Sons, Chichester, UK., 2003, [6] G. Burns, R. Daoud, and Je Vaigl, LAM: An Open Cluster Environment for MPI, Proceeding of Supercomputing Symposium, 99, [7] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, Highperformance, portable implementation of the MPI Message Passing Interface Standard. Parallel Computing, 22(6), 996, [8] E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall, Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. Proceeding of th European PVM/MPI Users' Group Meeting, Budapest, Hungary, September, 200, 05- [9] R. Batchu,Y. S. Dandass, A. Skjellum, M. Beddhu, MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware, Cluster Computing(7), 200, [0] N. T. Karonis, B. Toonen, I. Foster, MPICH-G2: A gridenabled implementation of the message passing interface. Journal of Parallel and Distributed Computing. 63(5), 2003, [] N. T. Karonis, R.F.H. Hofman, H.E. Bal, A. Plaat, and R.A.F. Bhoedjang. MAGPIE: MPI's Collective Communication Operations for Clustered Wide Area Systems, Proceeding of The 7 th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Atlanta, GA., 999, 3-0 [2] Y. Ishikawa, M. Matsuda, T. Kudoh, H. Tezuka, and S. Sekiguchi, The Design of a Latency-aware MPI Communication Library, Proceedings of SWOPP03, 2003 [3] R. Keller, B. Krammer, M. S. Mueller, M. M. Resch, and E. Gabriel, MPI Development Tools and Applications for the Grid, Workshop on Grid Applications and Programming Tools, held in conjunction with the GGF8 meetings, Seattle, WA, USA, June 25th, [] O. Aumage, and G. Mercier, MPICH/MADIII: a cluster of clusters enabled MPI implementation, Proceeding of 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003, [5] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2 nd edition (MPI Press, 200) [6] V. Varavidthaya, P. Uthayopas, ThaiGrid: Architecture and Overview, NECTEC Technical Journal(2)9, 2000 [7] T. Vorakosit, and P. Uthayopas, Improving MPI Multicast Performance over Grid Environment using Intelligent Message Scheduling. Proceeding of International Conference on Scientific and Engineering Computation, Singapore, 200.

Developing a Thin and High Performance Implementation of Message Passing Interface 1

Developing a Thin and High Performance Implementation of Message Passing Interface 1 Developing a Thin and High Performance Implementation of Message Passing Interface 1 Theewara Vorakosit and Putchong Uthayopas Parallel Research Group Computer and Network System Research Laboratory Department

More information

Analysis of the Component Architecture Overhead in Open MPI

Analysis of the Component Architecture Overhead in Open MPI Analysis of the Component Architecture Overhead in Open MPI B. Barrett 1, J.M. Squyres 1, A. Lumsdaine 1, R.L. Graham 2, G. Bosilca 3 Open Systems Laboratory, Indiana University {brbarret, jsquyres, lums}@osl.iu.edu

More information

SELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS

SELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS SELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS Thara Angskun, Graham Fagg, George Bosilca, Jelena Pješivac Grbović, and Jack Dongarra,2,3 University of Tennessee, 2 Oak Ridge National

More information

Open MPI: A High Performance, Flexible Implementation of MPI Point-to-Point Communications

Open MPI: A High Performance, Flexible Implementation of MPI Point-to-Point Communications c World Scientific Publishing Company Open MPI: A High Performance, Flexible Implementation of MPI Point-to-Point Communications Richard L. Graham National Center for Computational Sciences, Oak Ridge

More information

TEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology

TEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology TEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology T.S. Woodall 1, R.L. Graham 1, R.H. Castain 1, D.J. Daniel 1, M.W. Sukalski 2, G.E. Fagg 3, E. Gabriel 3, G. Bosilca

More information

Implementing a Hardware-Based Barrier in Open MPI

Implementing a Hardware-Based Barrier in Open MPI Implementing a Hardware-Based Barrier in Open MPI - A Case Study - Torsten Hoefler 1, Jeffrey M. Squyres 2, Torsten Mehlan 1 Frank Mietke 1 and Wolfgang Rehm 1 1 Technical University of Chemnitz 2 Open

More information

MPI Collective Algorithm Selection and Quadtree Encoding

MPI Collective Algorithm Selection and Quadtree Encoding MPI Collective Algorithm Selection and Quadtree Encoding Jelena Pješivac Grbović, Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra Innovative Computing Laboratory, University of Tennessee

More information

Scalable Fault Tolerant Protocol for Parallel Runtime Environments

Scalable Fault Tolerant Protocol for Parallel Runtime Environments Scalable Fault Tolerant Protocol for Parallel Runtime Environments Thara Angskun, Graham E. Fagg, George Bosilca, Jelena Pješivac Grbović, and Jack J. Dongarra Dept. of Computer Science, 1122 Volunteer

More information

Evaluating Sparse Data Storage Techniques for MPI Groups and Communicators

Evaluating Sparse Data Storage Techniques for MPI Groups and Communicators Evaluating Sparse Data Storage Techniques for MPI Groups and Communicators Mohamad Chaarawi and Edgar Gabriel Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston,

More information

Flexible collective communication tuning architecture applied to Open MPI

Flexible collective communication tuning architecture applied to Open MPI Flexible collective communication tuning architecture applied to Open MPI Graham E. Fagg 1, Jelena Pjesivac-Grbovic 1, George Bosilca 1, Thara Angskun 1, Jack J. Dongarra 1, and Emmanuel Jeannot 2 1 Dept.

More information

Fast and Efficient Total Exchange on Two Clusters

Fast and Efficient Total Exchange on Two Clusters Fast and Efficient Total Exchange on Two Clusters Emmanuel Jeannot, Luiz Angelo Steffenel To cite this version: Emmanuel Jeannot, Luiz Angelo Steffenel. Fast and Efficient Total Exchange on Two Clusters.

More information

Hierarchy Aware Blocking and Nonblocking Collective Communications-The Effects of Shared Memory in the Cray XT environment

Hierarchy Aware Blocking and Nonblocking Collective Communications-The Effects of Shared Memory in the Cray XT environment Hierarchy Aware Blocking and Nonblocking Collective Communications-The Effects of Shared Memory in the Cray XT environment Richard L. Graham, Joshua S. Ladd, Manjunath GorentlaVenkata Oak Ridge National

More information

A Component Architecture for LAM/MPI

A Component Architecture for LAM/MPI A Component Architecture for LAM/MPI Jeffrey M. Squyres and Andrew Lumsdaine Open Systems Lab, Indiana University Abstract. To better manage the ever increasing complexity of

More information

Scalable Fault Tolerant Protocol for Parallel Runtime Environments

Scalable Fault Tolerant Protocol for Parallel Runtime Environments Scalable Fault Tolerant Protocol for Parallel Runtime Environments Thara Angskun 1, Graham Fagg 1, George Bosilca 1, Jelena Pješivac Grbović 1, and Jack Dongarra 2 1 Department of Computer Science, The

More information

Improving the Dynamic Creation of Processes in MPI-2

Improving the Dynamic Creation of Processes in MPI-2 Improving the Dynamic Creation of Processes in MPI-2 Márcia C. Cera, Guilherme P. Pezzi, Elton N. Mathias, Nicolas Maillard, and Philippe O. A. Navaux Universidade Federal do Rio Grande do Sul, Instituto

More information

Implementing MPI-2 Extended Collective Operations 1

Implementing MPI-2 Extended Collective Operations 1 Implementing MPI-2 Extended Collective Operations Pedro Silva and João Gabriel Silva Dependable Systems Group, Dept. de Engenharia Informática, Universidade de Coimbra, Portugal ptavares@dsg.dei.uc.pt,

More information

Scheduling Heuristics for Efficient Broadcast Operations on Grid Environments

Scheduling Heuristics for Efficient Broadcast Operations on Grid Environments Scheduling Heuristics for Efficient Broadcast Operations on Grid Environments Luiz Angelo Barchet-Steffenel 1 and Grégory Mounié 2 1 LORIA,Université Nancy-2 2 Laboratoire ID-IMAG Nancy, France Montbonnot

More information

MPI Collective Algorithm Selection and Quadtree Encoding

MPI Collective Algorithm Selection and Quadtree Encoding MPI Collective Algorithm Selection and Quadtree Encoding Jelena Pješivac Grbović a,, George Bosilca a, Graham E. Fagg a, Thara Angskun a, Jack J. Dongarra a,b,c a Innovative Computing Laboratory, University

More information

MPI History. MPI versions MPI-2 MPICH2

MPI History. MPI versions MPI-2 MPICH2 MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

Runtime Optimization of Application Level Communication Patterns

Runtime Optimization of Application Level Communication Patterns Runtime Optimization of Application Level Communication Patterns Edgar Gabriel and Shuo Huang Department of Computer Science, University of Houston, Houston, TX, USA {gabriel, shhuang}@cs.uh.edu Abstract

More information

MPI versions. MPI History

MPI versions. MPI History MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com. December 2003 Provided by ClusterWorld for Jeff Squyres cw.squyres.com www.clusterworld.com Copyright 2004 ClusterWorld, All Rights Reserved For individual private use only. Not to be reproduced or distributed

More information

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments 1 A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments E. M. Karanikolaou and M. P. Bekakos Laboratory of Digital Systems, Department of Electrical and Computer Engineering,

More information

Technical Comparison between several representative checkpoint/rollback solutions for MPI programs

Technical Comparison between several representative checkpoint/rollback solutions for MPI programs Technical Comparison between several representative checkpoint/rollback solutions for MPI programs Yuan Tang Innovative Computing Laboratory Department of Computer Science University of Tennessee Knoxville,

More information

Group Management Schemes for Implementing MPI Collective Communication over IP Multicast

Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Xin Yuan Scott Daniels Ahmad Faraj Amit Karwande Department of Computer Science, Florida State University, Tallahassee,

More information

Using SCTP to hide latency in MPI programs

Using SCTP to hide latency in MPI programs Using SCTP to hide latency in MPI programs H. Kamal, B. Penoff, M. Tsai, E. Vong, A. Wagner University of British Columbia Dept. of Computer Science Vancouver, BC, Canada {kamal, penoff, myct, vongpsq,

More information

Scalable Fault Tolerant MPI: Extending the recovery algorithm

Scalable Fault Tolerant MPI: Extending the recovery algorithm Scalable Fault Tolerant MPI: Extending the recovery algorithm Graham E. Fagg, Thara Angskun, George Bosilca, Jelena Pjesivac-Grbovic, and Jack J. Dongarra Dept. of Computer Science, 1122 Volunteer Blvd.,

More information

POSH: Paris OpenSHMEM A High-Performance OpenSHMEM Implementation for Shared Memory Systems

POSH: Paris OpenSHMEM A High-Performance OpenSHMEM Implementation for Shared Memory Systems Procedia Computer Science Volume 29, 2014, Pages 2422 2431 ICCS 2014. 14th International Conference on Computational Science A High-Performance OpenSHMEM Implementation for Shared Memory Systems Camille

More information

Coupling DDT and Marmot for Debugging of MPI Applications

Coupling DDT and Marmot for Debugging of MPI Applications Coupling DDT and Marmot for Debugging of MPI Applications Bettina Krammer 1, Valentin Himmler 1, and David Lecomber 2 1 HLRS - High Performance Computing Center Stuttgart, Nobelstrasse 19, 70569 Stuttgart,

More information

A Resource Look up Strategy for Distributed Computing

A Resource Look up Strategy for Distributed Computing A Resource Look up Strategy for Distributed Computing F. AGOSTARO, A. GENCO, S. SORCE DINFO - Dipartimento di Ingegneria Informatica Università degli Studi di Palermo Viale delle Scienze, edificio 6 90128

More information

Visual Debugging of MPI Applications

Visual Debugging of MPI Applications Visual Debugging of MPI Applications Basile Schaeli 1, Ali Al-Shabibi 1 and Roger D. Hersch 1 1 Ecole Polytechnique Fédérale de Lausanne (EPFL) School of Computer and Communication Sciences CH-1015 Lausanne,

More information

Managing MPICH-G2 Jobs with WebCom-G

Managing MPICH-G2 Jobs with WebCom-G Managing MPICH-G2 Jobs with WebCom-G Padraig J. O Dowd, Adarsh Patil and John P. Morrison Computer Science Dept., University College Cork, Ireland {p.odowd, adarsh, j.morrison}@cs.ucc.ie Abstract This

More information

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP (extended abstract) Mitsuhisa Sato 1, Motonari Hirano 2, Yoshio Tanaka 2 and Satoshi Sekiguchi 2 1 Real World Computing Partnership,

More information

Optimization of Collective Reduction Operations

Optimization of Collective Reduction Operations Published in International Conference on Computational Science, June 7 9, Krakow, Poland, LNCS, Springer-Verlag, 004. c Springer-Verlag, http://www.springer.de/comp/lncs/index.html Optimization of Collective

More information

How to Run Scientific Applications over Web Services

How to Run Scientific Applications over Web Services How to Run Scientific Applications over Web Services email: Diego Puppin Nicola Tonellotto Domenico Laforenza Institute for Information Science and Technologies ISTI - CNR, via Moruzzi, 60 Pisa, Italy

More information

Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems

Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems fastos.org/molar Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems Jyothish Varma 1, Chao Wang 1, Frank Mueller 1, Christian Engelmann, Stephen L. Scott 1 North Carolina State University,

More information

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Ketan Kulkarni and Edgar Gabriel Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston {knkulkarni,gabriel}@cs.uh.edu

More information

CHAPTER 2 WIRELESS SENSOR NETWORKS AND NEED OF TOPOLOGY CONTROL

CHAPTER 2 WIRELESS SENSOR NETWORKS AND NEED OF TOPOLOGY CONTROL WIRELESS SENSOR NETWORKS AND NEED OF TOPOLOGY CONTROL 2.1 Topology Control in Wireless Sensor Networks Network topology control is about management of network topology to support network-wide requirement.

More information

A Buffered-Mode MPI Implementation for the Cell BE Processor

A Buffered-Mode MPI Implementation for the Cell BE Processor A Buffered-Mode MPI Implementation for the Cell BE Processor Arun Kumar 1, Ganapathy Senthilkumar 1, Murali Krishna 1, Naresh Jayam 1, Pallav K Baruah 1, Raghunath Sharma 1, Ashok Srinivasan 2, Shakti

More information

High Performance MPI-2 One-Sided Communication over InfiniBand

High Performance MPI-2 One-Sided Communication over InfiniBand High Performance MPI-2 One-Sided Communication over InfiniBand Weihang Jiang Jiuxing Liu Hyun-Wook Jin Dhabaleswar K. Panda William Gropp Rajeev Thakur Computer and Information Science The Ohio State University

More information

CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters

CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Dept. of Computer Science Florida State University Tallahassee, FL 32306 {karwande,xyuan}@cs.fsu.edu

More information

High Performance MPI-2 One-Sided Communication over InfiniBand

High Performance MPI-2 One-Sided Communication over InfiniBand High Performance MPI-2 One-Sided Communication over InfiniBand Weihang Jiang Jiuxing Liu Hyun-Wook Jin Dhabaleswar K. Panda William Gropp Rajeev Thakur Computer and Information Science The Ohio State University

More information

A Scalable Parallel Genetic Algorithm for X-ray Spectroscopic Analysis

A Scalable Parallel Genetic Algorithm for X-ray Spectroscopic Analysis A Scalable Parallel Genetic Algorithm for X-ray Spectroscopic Analysis Kai Xu Dept. of Computer Science and Engineering University of Nevada, Reno Reno, NV, 89557 xukai@cs.unr.edu Sushil J. Louis Dept.

More information

Loaded: Server Load Balancing for IPv6

Loaded: Server Load Balancing for IPv6 Loaded: Server Load Balancing for IPv6 Sven Friedrich, Sebastian Krahmer, Lars Schneidenbach, Bettina Schnor Institute of Computer Science University Potsdam Potsdam, Germany fsfried, krahmer, lschneid,

More information

Multicast can be implemented here

Multicast can be implemented here MPI Collective Operations over IP Multicast? Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, U.S.A fhachen,yochoa,aapong@comp.uark.edu

More information

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010. Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -

More information

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM Szabolcs Pota 1, Gergely Sipos 2, Zoltan Juhasz 1,3 and Peter Kacsuk 2 1 Department of Information Systems, University of Veszprem, Hungary 2 Laboratory

More information

On Scalability for MPI Runtime Systems

On Scalability for MPI Runtime Systems 2 IEEE International Conference on Cluster Computing On Scalability for MPI Runtime Systems George Bosilca ICL, University of Tennessee Knoxville bosilca@eecs.utk.edu Thomas Herault ICL, University of

More information

Open MPI: A Flexible High Performance MPI

Open MPI: A Flexible High Performance MPI Open MPI: A Flexible High Performance MPI Richard L. Graham 1, Timothy S. Woodall 1, and Jeffrey M. Squyres 2 1 Advanced Computing Laboratory, Los Alamos National Lab {rlgraham,twoodall}@lanl.gov 2 Open

More information

Performance Analysis and Optimal Utilization of Inter-Process Communications on Commodity Clusters

Performance Analysis and Optimal Utilization of Inter-Process Communications on Commodity Clusters Performance Analysis and Optimal Utilization of Inter-Process Communications on Commodity Yili TSENG Department of Computer Systems Technology North Carolina A & T State University Greensboro, NC 27411,

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

Rollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello

Rollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Rollback-Recovery Protocols for Send-Deterministic Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Fault Tolerance in HPC Systems is Mandatory Resiliency is

More information

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on

More information

MPICH/MADIII : a Cluster of Clusters Enabled MPI Implementation

MPICH/MADIII : a Cluster of Clusters Enabled MPI Implementation MPICH/MADIII : a Cluster of Clusters Enabled MPI Implementation Olivier Aumage Guillaume Mercier Abstract This paper presents an MPI implementation that allows an easy and efficient use of the interconnection

More information

Profile-Based Load Balancing for Heterogeneous Clusters *

Profile-Based Load Balancing for Heterogeneous Clusters * Profile-Based Load Balancing for Heterogeneous Clusters * M. Banikazemi, S. Prabhu, J. Sampathkumar, D. K. Panda, T. W. Page and P. Sadayappan Dept. of Computer and Information Science The Ohio State University

More information

A framework for adaptive collective communications for heterogeneous hierarchical computing systems

A framework for adaptive collective communications for heterogeneous hierarchical computing systems Journal of Computer and System Sciences 74 (2008) 1082 1093 www.elsevier.com/locate/jcss A framework for adaptive collective communications for heterogeneous hierarchical computing systems Luiz Angelo

More information

Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW

Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW Teng Ma, Aurelien Bouteiller, George Bosilca, and Jack J. Dongarra Innovative Computing Laboratory, EECS, University

More information

Comparing the performance of MPICH with Cray s MPI and with SGI s MPI

Comparing the performance of MPICH with Cray s MPI and with SGI s MPI CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 3; 5:779 8 (DOI:./cpe.79) Comparing the performance of with Cray s MPI and with SGI s MPI Glenn R. Luecke,, Marina

More information

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Designing High Performance Communication Middleware with Emerging Multi-core Architectures Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Multiprocessors 2007/2008

Multiprocessors 2007/2008 Multiprocessors 2007/2008 Abstractions of parallel machines Johan Lukkien 1 Overview Problem context Abstraction Operating system support Language / middleware support 2 Parallel processing Scope: several

More information

POCCS: A Parallel Out-of-Core Computing System for Linux Clusters

POCCS: A Parallel Out-of-Core Computing System for Linux Clusters POCCS: A Parallel Out-of-Core System for Linux Clusters JIANQI TANG BINXING FANG MINGZENG HU HONGLI ZHANG Department of Computer Science and Engineering Harbin Institute of Technology No.92, West Dazhi

More information

Estimation of MPI Application Performance on Volunteer Environments

Estimation of MPI Application Performance on Volunteer Environments Estimation of MPI Application Performance on Volunteer Environments Girish Nandagudi 1, Jaspal Subhlok 1, Edgar Gabriel 1, and Judit Gimenez 2 1 Department of Computer Science, University of Houston, {jaspal,

More information

A Synchronous Mode MPI Implementation on the Cell BE Architecture

A Synchronous Mode MPI Implementation on the Cell BE Architecture A Synchronous Mode MPI Implementation on the Cell BE Architecture Murali Krishna 1, Arun Kumar 1, Naresh Jayam 1, Ganapathy Senthilkumar 1, Pallav K Baruah 1, Raghunath Sharma 1, Shakti Kapoor 2, Ashok

More information

A Middleware Framework for Dynamically Reconfigurable MPI Applications

A Middleware Framework for Dynamically Reconfigurable MPI Applications A Middleware Framework for Dynamically Reconfigurable MPI Applications Kaoutar ElMaghraoui, Carlos A. Varela, Boleslaw K. Szymanski, and Joseph E. Flaherty Department of Computer Science Rensselaer Polytechnic

More information

Handling Single Node Failures Using Agents in Computer Clusters

Handling Single Node Failures Using Agents in Computer Clusters Handling Single Node Failures Using Agents in Computer Clusters Blesson Varghese, Gerard McKee and Vassil Alexandrov School of Systems Engineering, University of Reading Whiteknights Campus, Reading, Berkshire

More information

Scheduling Dynamically Spawned Processes in MPI-2

Scheduling Dynamically Spawned Processes in MPI-2 Scheduling Dynamically Spawned Processes in MPI-2 Márcia C. Cera 1, Guilherme P. Pezzi 1, Maurício L. Pilla 2, Nicolas B. Maillard 1, and Philippe O. A. Navaux 1 1 Universidade Federal do Rio Grande do

More information

Performance Evaluation of National Knowledge Network Connectivity

Performance Evaluation of National Knowledge Network Connectivity International Journal of Computer Applications (975 888) Performance Evaluation of National Knowledge Network Connectivity Vipin Saxena, PhD. Department of Computer Science B.B. Ambedkar University (A

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI

Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI and Graham E. Fagg George Bosilca, Thara Angskun, Chen Zinzhong, Jelena Pjesivac-Grbovic, and Jack J. Dongarra

More information

PM2: High Performance Communication Middleware for Heterogeneous Network Environments

PM2: High Performance Communication Middleware for Heterogeneous Network Environments PM2: High Performance Communication Middleware for Heterogeneous Network Environments Toshiyuki Takahashi, Shinji Sumimoto, Atsushi Hori, Hiroshi Harada, and Yutaka Ishikawa Real World Computing Partnership,

More information

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS Raj Kumar, Vanish Talwar, Sujoy Basu Hewlett-Packard Labs 1501 Page Mill Road, MS 1181 Palo Alto, CA 94304 USA { raj.kumar,vanish.talwar,sujoy.basu}@hp.com

More information

Linear Algebra Computation Benchmarks on a Model Grid Platform

Linear Algebra Computation Benchmarks on a Model Grid Platform Linear Algebra Computation Benchmarks on a Model Grid Platform Loriano Storchi 1, Carlo Manuali 2, Osvaldo Gervasi 3, Giuseppe Vitillaro 4, Antonio Laganà 1, and Francesco Tarantelli 1,4 1 Department of

More information

Splitting TCP for MPI Applications Executed on Grids

Splitting TCP for MPI Applications Executed on Grids Splitting for MPI Applications Executed on Grids Olivier Glück, Jean-Christophe Mignot To cite this version: Olivier Glück, Jean-Christophe Mignot. Splitting for MPI Applications Executed on Grids. 2011

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

Hierarchical Addressing and Routing Mechanisms for Distributed Applications over Heterogeneous Networks

Hierarchical Addressing and Routing Mechanisms for Distributed Applications over Heterogeneous Networks Hierarchical Addressing and Routing Mechanisms for Distributed Applications over Heterogeneous Networks Damien Magoni Université Louis Pasteur LSIIT magoni@dpt-info.u-strasbg.fr Abstract. Although distributed

More information

Abstract. 1 Introduction

Abstract. 1 Introduction The performance of fast Givens rotations problem implemented with MPI extensions in multicomputers L. Fernández and J. M. García Department of Informática y Sistemas, Universidad de Murcia, Campus de Espinardo

More information

Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks

Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks Richard Harada Product Manager RuggedCom Inc. Introduction Reliable and fault tolerant high speed communication

More information

Implementing the MPI Process Topology Mechanism

Implementing the MPI Process Topology Mechanism Implementing the MPI Process Topology Mechanism Jesper Larsson Träff C&C Research Laboratories, NEC Europe Ltd. Rathausallee 10, D-53757 Sankt Augustin, Germany traff@ccrl-nece.de Abstract The topology

More information

PARALLEL IMPLEMENTATION OF DIJKSTRA'S ALGORITHM USING MPI LIBRARY ON A CLUSTER.

PARALLEL IMPLEMENTATION OF DIJKSTRA'S ALGORITHM USING MPI LIBRARY ON A CLUSTER. PARALLEL IMPLEMENTATION OF DIJKSTRA'S ALGORITHM USING MPI LIBRARY ON A CLUSTER. INSTRUCUTOR: DR RUSS MILLER ADITYA PORE THE PROBLEM AT HAND Given : A directed graph,g = (V, E). Cardinalities V = n, E =

More information

IMPLEMENTATION OF MPI-BASED WIMAX BASE STATION SYSTEM FOR SDR

IMPLEMENTATION OF MPI-BASED WIMAX BASE STATION SYSTEM FOR SDR IMPLEMENTATION OF MPI-BASED WIMAX BASE STATION SYSTEM FOR SDR Hyohan Kim (HY-SDR Research Center, Hanyang Univ., Seoul, South Korea; hhkim@dsplab.hanyang.ac.kr); Chiyoung Ahn(HY-SDR Research center, Hanyang

More information

Parallel Prefix (Scan) Algorithms for MPI

Parallel Prefix (Scan) Algorithms for MPI Parallel Prefix (Scan) Algorithms for MPI Peter Sanders 1 and Jesper Larsson Träff 2 1 Universität Karlsruhe Am Fasanengarten 5, D-76131 Karlsruhe, Germany sanders@ira.uka.de 2 C&C Research Laboratories,

More information

Influence of the Progress Engine on the Performance of Asynchronous Communication Libraries

Influence of the Progress Engine on the Performance of Asynchronous Communication Libraries Influence of the Progress Engine on the Performance of Asynchronous Communication Libraries Edgar Gabriel Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu

More information

Review of Graph Theory. Gregory Provan

Review of Graph Theory. Gregory Provan Review of Graph Theory Gregory Provan Overview Need for graphical models of computation Cloud computing Data centres Telecommunications networks Graph theory Graph Models for Cloud Computing Integration

More information

Proposal of MPI operation level checkpoint/rollback and one implementation

Proposal of MPI operation level checkpoint/rollback and one implementation Proposal of MPI operation level checkpoint/rollback and one implementation Yuan Tang Innovative Computing Laboratory Department of Computer Science University of Tennessee, Knoxville, USA supertangcc@yahoo.com

More information

Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem

Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem Author manuscript, published in "Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), Singapour : Singapour (2006)" DOI : 0.09/CCGRID.2006.3 Design and Evaluation of Nemesis,

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Optimizing Molecular Dynamics

Optimizing Molecular Dynamics Optimizing Molecular Dynamics This chapter discusses performance tuning of parallel and distributed molecular dynamics (MD) simulations, which involves both: (1) intranode optimization within each node

More information

Solving Traveling Salesman Problem on High Performance Computing using Message Passing Interface

Solving Traveling Salesman Problem on High Performance Computing using Message Passing Interface Solving Traveling Salesman Problem on High Performance Computing using Message Passing Interface IZZATDIN A. AZIZ, NAZLEENI HARON, MAZLINA MEHAT, LOW TAN JUNG, AISYAH NABILAH Computer and Information Sciences

More information

An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks

An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks Ryan G. Lane Daniels Scott Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306 {ryanlane,sdaniels,xyuan}@cs.fsu.edu

More information

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project Introduction to GT3 The Globus Project Argonne National Laboratory USC Information Sciences Institute Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved.

More information

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1 Communication Networks I December, Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page Communication Networks I December, Notation G = (V,E) denotes a

More information

LibPhotonNBC: An RDMA Aware Collective Library on Photon

LibPhotonNBC: An RDMA Aware Collective Library on Photon LibPhotonNBC: An RDMA Aware Collective Library on Photon Udayanga Wickramasinghe 1, Ezra Kissel 1, and Andrew Lumsdaine 1,2 1 Department of Computer Science, Indiana University, USA, {uswickra,ezkissel}@indiana.edu

More information

Scalable Tool Infrastructure for the Cray XT Using Tree-Based Overlay Networks

Scalable Tool Infrastructure for the Cray XT Using Tree-Based Overlay Networks Scalable Tool Infrastructure for the Cray XT Using Tree-Based Overlay Networks Philip C. Roth, Oak Ridge National Laboratory and Jeffrey S. Vetter, Oak Ridge National Laboratory and Georgia Institute of

More information

An Overlap study for Cluster Computing

An Overlap study for Cluster Computing 2015 International Conference on Computational Science and Computational Intelligence An study for Cluster Computing Eduardo Colmenares 1 Department of Computer Science Midwestern State University Wichita

More information

Lecture 7: February 10

Lecture 7: February 10 CMPSCI 677 Operating Systems Spring 2016 Lecture 7: February 10 Lecturer: Prashant Shenoy Scribe: Tao Sun 7.1 Server Design Issues 7.1.1 Server Design There are two types of server design choices: Iterative

More information

Importance of Crop model parameters identification. Cluster Computing for SWAP Crop Model Parameter Identification using RS.

Importance of Crop model parameters identification. Cluster Computing for SWAP Crop Model Parameter Identification using RS. Cluster Computing for SWAP Crop Model Parameter Identification using RS HONDA Kiyoshi Md. Shamim Akhter Yann Chemin Putchong Uthayopas Importance of Crop model parameters identification Agriculture Activity

More information

processes based on Message Passing Interface

processes based on Message Passing Interface Checkpointing and Migration of parallel processes based on Message Passing Interface Zhang Youhui, Wang Dongsheng, Zheng Weimin Department of Computer Science, Tsinghua University, China. Abstract This

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information