Normal mode acoustic propagation models. E.A. Vavalis. the computer code to a network of heterogeneous workstations using the Parallel

Similar documents
100 Mbps DEC FDDI Gigaswitch

MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM. Daniel Grosu, Honorius G^almeanu

Parallelizing a seismic inversion code using PVM: a poor. June 27, Abstract

n m-dimensional data points K Clusters KP Data Points (Cluster centers) K Clusters

Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS

Active Motion Detection and Object Tracking. Joachim Denzler and Dietrich W.R.Paulus.

HARNESS. provides multi-level hot pluggability. virtual machines. split off mobile agents. merge multiple collaborating sites.

Supporting Heterogeneous Network Computing: PVM. Jack J. Dongarra. Oak Ridge National Laboratory and University of Tennessee. G. A.

[8] J. J. Dongarra and D. C. Sorensen. SCHEDULE: Programs. In D. B. Gannon L. H. Jamieson {24, August 1988.

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

The Architecture of a System for the Indexing of Images by. Content

Abstract. This paper describes the implementation of PVM in the new WIN32-bit world. There are

Parallel Implementation of a Unied Approach to. Image Focus and Defocus Analysis on the Parallel Virtual Machine

Rendering Computer Animations on a Network of Workstations

CUMULVS: Collaborative Infrastructure for Developing. Abstract. by allowing them to dynamically attach to, view, and \steer" a running simulation.

Unsupervised Distributed Clustering

Parallel Arch. & Lang. (PARLE 94), Lect. Notes in Comp. Sci., Vol 817, pp , July 1994

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

LINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those

Evaluation of Parallel Application s Performance Dependency on RAM using Parallel Virtual Machine

A MATLAB Toolbox for Distributed and Parallel Processing

A Graphical Interface to Multi-tasking Programming Problems

Northeast Parallel Architectures Center. Syracuse University. May 17, Abstract

Dynamic Reconguration and Virtual Machine. System. Abstract. Metacomputing frameworks have received renewed attention

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

MULTILEVEL PARALLELISM APPLIED TO THE OPTIMIZATION OF MOBILE NETWORKS

The PVM 3.4 Tracing Facility and XPVM 1.1 *

Access pattern Time (in millions of references)

Transactions on Information and Communications Technologies vol 9, 1995 WIT Press, ISSN

Applications PVM (Parallel Virtual Machine) Socket Interface. Unix Domain LLC/SNAP HIPPI-LE/FP/PH. HIPPI Networks

Parallel Unsupervised k-windows: An Efficient Parallel Clustering Algorithm

sizes. Section 5 briey introduces some of the possible applications of the algorithm. Finally, we draw some conclusions in Section 6. 2 MasPar Archite

Dynamic Process Management in an MPI Setting. William Gropp. Ewing Lusk. Abstract

POM: a Virtual Parallel Machine Featuring Observation Mechanisms

Parallelizing the Unsupervised k-windows Clustering Algorithm

Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes. Todd A. Whittaker Ohio State University

Parallel Processing using PVM on a Linux Cluster. Thomas K. Gederberg CENG 6532 Fall 2007

SUMMARY. computationally more expensive but suitable for currently available parallel computers.

execution host commd

December 28, Abstract. In this report we describe our eorts to parallelize the VGRIDSG unstructured surface

PARA++ : C++ Bindings for Message Passing Libraries

Multiple Data Sources

Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments

TECHNICAL RESEARCH REPORT

PVM on Windows and NT Clusters

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

a simple structural description of the application

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Assignment 4. Overview. Prof. Stewart Weiss. CSci 335 Software Design and Analysis III Assignment 4

Application. CoCheck Overlay Library. MPE Library Checkpointing Library. OS Library. Operating System

Network. Department of Statistics. University of California, Berkeley. January, Abstract

A Study of Workstation Computational Performance for Real-Time Flight Simulation

Developing a Thin and High Performance Implementation of Message Passing Interface 1

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

2 Rupert W. Ford and Michael O'Brien Parallelism can be naturally exploited at the level of rays as each ray can be calculated independently. Note, th

J.A.J.Hall, K.I.M.McKinnon. September 1996

Introduction to OpenMP

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

TRAPPER A GRAPHICAL PROGRAMMING ENVIRONMENT O. KR AMER-FUHRMANN. German National Research Center for Computer Science (GMD)

OpenMP Shared Memory Programming

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk

N E T W O R K. MSC.Marc 2000 Network Version for UNIX. Installation and User Notes

Application based Evaluation of Distributed Shared Memory Versus Message Passing

Improving PVM Performance Using ATOMIC User-Level Protocol. Hong Xu and Tom W. Fisher. Marina del Rey, CA

director executor user program user program signal, breakpoint function call communication channel client library directing server

Native Marshalling. Java Marshalling. Mb/s. kbytes

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Design and Implementation of a Java-based Distributed Debugger Supporting PVM and MPI

Distributed Batch Controller. Department of Computer Science, University of Maryland, College Park, MD USA. Waterloo, ON N2L 3G1 Canada

Using A Network of workstations to enhance Database Query Processing Performance

Array Decompositions for Nonuniform Computational Environments

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

Shows nodes and links, Node pair A-B, and a route between A and B.

Developing Interactive PVM-based Parallel Programs on Distributed Computing Systems within AVS Framework

The MPBench Report. Philip J. Mucci. Kevin London. March 1998

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck.

Covering the Aztec Diamond with One-sided Tetrasticks Extended Version

Simple Nested Dielectrics in Ray Traced Images

/98 $10.00 (c) 1998 IEEE

Global Solution of Mixed-Integer Dynamic Optimization Problems

CSC 4320 Test 1 Spring 2017

BLOCK COMMUNICATION BETWEEN BLOCKS BOUNDARY DATA STORAGE

Chapter 20: Database System Architectures

Parallel Algorithm Design. CS595, Fall 2010

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

The Use of the MPI Communication Library in the NAS Parallel Benchmarks

Load Balancing of Parallel Simulated Annealing on a Temporally Heterogeneous Cluster of Workstations

GNATDIST : a conguration language for. distributed Ada 95 applications. Yvon Kermarrec and Laurent Nana. Departement Informatique

Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 1627, FI-70211, Finland

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

Transactions on Information and Communications Technologies vol 9, 1995 WIT Press, ISSN

Spatial Scattering for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations

Extending CRAFT Data-Distributions for Sparse Matrices. July 1996 Technical Report No: UMA-DAC-96/11

The Stepping Stones. to Object-Oriented Design and Programming. Karl J. Lieberherr. Northeastern University, College of Computer Science

Proceedings of Meetings on Acoustics

Object-Oriented Design

Dynamic Tuning of Parallel Programs

Heterogeneous parallel and distributed computing

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

Hardware Implementation of GA.

Transcription:

Normal mode acoustic propagation models on heterogeneous networks of workstations E.A. Vavalis University of Crete, Mathematics Department, 714 09 Heraklion, GREECE and IACM, FORTH, 711 10 Heraklion, GREECE. Abstract A parallel implementation on a Single Instruction Multiple Data (SIMD) platform of an operational numerical sound propagation model is presented. We describe the parallel methodology used and we present certain implementation issues for porting the computer code to a network of heterogeneous workstations using the Parallel Virtual Machine (PVM) infrastructure. 1 Introduction A parallel implementation on a Single Instruction Multiple Data (SIMD) platform of an operational numerical sound propagation model is presented. We describe the parallel methodology used and we present certain implementation issues for porting the computer code to a network of heterogeneous workstations using the Parallel Virtual Machine (PVM) infrastructure. The performance of the code on such a network is presented and analyzed. Instructions on using the parallelized computational model are also given. Parallelism in SNAP SNAP [3] is a sound propagation model based on normal mode theory design to treat a shallow-water ocean environment as realistically as possible. In this 1 Work supported in part by PENED grant 95-08 Preprint submitted to Elsevier Science 19 November 1996

model the acoustic eld from a harmonic source at position (0; z0) can be written as P (r; z) =! 0 4 X n=1 Nu n (z0)u n (z)h (1) 0 (k n r); (1) where! is the source frequency, 0 is the water density, (u n,k n ) is the modal eigenpaire and H (1) 0 the zeroth order Hankel function of the rst kind. Most of the computation involved i this model is included into a double nested loop. The outer ones loop over a certain large set of frequencies while the inner computes the eigenpairs by solving an eigenvalue problem dened by the Helmholtz equation d! 3 u n (z)! dz + 4? k c(z) n 5 un (z) = 0; () and appropriate boundary conditions. For production runs both the number of modes n and the number of frequencies n f are large. Present day computer power limits the size of these two numbers leading to approximations that might be unsatisfactory from a physics point of view. Based on this computational structure one can exploit parallelism for both nested loops since each computational module inside can be carried out independently. Thus there are two levels of inherent parallelism and in our approach we naturally decided to parallelize the outer loop. For the parallel implementation of the SNAP code we used the PVM [] infrastructure. PVM is a software package that allows a heterogeneous network of parallel and serial computers to appear as a single concurrent computational resource. Our PVM/SNAP code consists of a host/master program which after performing certain initializations splits the rest of the computation into a set of independently tasks by simply partitioning the frequencies. The master then spawns the child/slave processes associated with each subtask, assigns them to each member of the computer network and then waits for a termination message from each child/slave process which performs all computation. The child processes after receiving the range of the frequencies assigned to it, computes the associated acoustic eld, writes it to a le on its local disc (or to a common disc if all members of the computer network are on a network le system (NFS) ), sends a completion message to the parent process and exits. It is worth to point out that there is no interprocess communication between the child processes.

3 Performance In this section we present some preliminary timing results that exhibit the increased eciency of the parallelized SNAP code. Specically we have considered a sound propagation problem involving 01 number of frequencies and 35 modes. We solved this problem it on a heterogeneous computer network consisting of: A SUN4 Sparcstation with domain name sonar.iacm.forth.gr. Two IBM/RS 6000 workstations. These machines with domain names akkali and apollon are coupled together with NFS, belong in the domain iesl.forth.gr. Three HP3000 workstations. Two of them (named nireas and orfeas) belong in the domain iesl.forth.gr, are congured in cluster and are connected together with NFS The third one (named n08hp) is in the cc.uch.gr domain and writes on its local disk. It should be pointed out that the above workstations are physically located in three buildings which are several kilometers apart. In Figure 1 we present the trace of the parallel execution of the program using three IBMs only which was obtained using the XPVM parallel performance tool [1]. As expected, we can easily see that the master/host program after performing a limited amount of computation spawns the three child processes which run in parallel till termination. In Table 1 we give the total wall-clock execution time (in hours and seconds) the associated speedups and the congurations of the network system used. Specically we started our measurements using sonar only and kept adding machines as shown in the second column of Table 1. In the third column we give the total execution time (obtained using the timex command) and in the fourth the speedup obtained. (As speedup with i processors we denote the ration of the total time using one processor over total time using i processors.) As this table depicts, we achieve almost optimum speed-up in all congurations and we were able to reduce the total elapse time by a factor of more than 5 using six dierent machines. It should be marked here that the workload was equally distributed along the machines which appear in the second column ordered from the slowest (sonar) to the fastest. In order to use arbitrarily selected machines a workload partition strategy based on the speed and the load of each machine should be used. 3

Table 1 Total execution time and speed-up processors Conguration Total Time Speed-up 1 sonar 5:11 + akkali :37 1.98 3 + apollon :10.39 4 + n08hp 1:3 3.75 5 + nireas 1:07 4.64 6 + proteas 1:00 5.0 4 Load balanching the normal mode computations 5 Usage We rst assume that PVM is installed on all machines we plan to use. The parallel SNAP code consists of a master/host program and a parent/node program. To build the two associated executables one needs to compile the code with the -lfpvm3 -lpvm3 ags. The node executables should be placed in the directory $(HOME)/pvm3/bin/ARCH where ARCH represents the architecture of each machine and has values HPPA for the HP-9000 workstations, SUN4 for the SUN4 Sparcstation, CNVXN for the CONVEX C and RS6K for the IBM/RS600. To specify the machines we plan to use we should create a le (named say hostfile which contains all hosts chosen to run SNAP. The rst row of this le should contain the domain name of the host machine while the rest lines hold the names of the node machines and the user ids (they do not have to be the same on all nodes) as shown below. n08hp.cc.uch.gr nireas.iesl.forth.gr lo=mav pw orpheas.iesl.forth.gr lo=mav pw ikaros.cc.uch.gr lo=mav pw athina.cc.uch.gr lo=mav pw pasifae.cc.uch.gr lo=mav pw To run SNAP now on the specied computer platform we need to have, in the $(HOME) directory, two input les named data.dat and numbers.dat. The rst one contains the data for the SNAP and the second lines with the character 'i' for i = 1; : : : ; nmachs? 1 at each line i. nmachs is the number of the machines to be used. We start the execution by typing pvmd hostfile. 4

Each parent process writes its output on a le named foo. Additional details on the usage of PVM and XPVM can be found in [] and [1] and a complete running example of the parallel SNAP in directory mav/snap on n08hp.cc.uch.gr. References [1] T. Dunigan, Xpvm, Tech. Report ORNL/TM-10881, Mathematical Sciences Section, Oak Ridge National Laboratory, Oak Ridge, TN, Sept. 1988. 18 pages. [] A. Geist, A. Bequelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM: Parallel Virtual Machine. User's Guide and tutorial for networked Parallel Computing, The MIT Press, Cambridge, MA, 1994. [3] F. Jensen and M. Ferla, SNAP: the Saclantcen normal{mode acoustic propagation model, Tech. Report SM{11, Saclantcen ASW Research Center, Saclantcen, Italy, Jan. 1979. 5