Normal mode acoustic propagation models on heterogeneous networks of workstations E.A. Vavalis University of Crete, Mathematics Department, 714 09 Heraklion, GREECE and IACM, FORTH, 711 10 Heraklion, GREECE. Abstract A parallel implementation on a Single Instruction Multiple Data (SIMD) platform of an operational numerical sound propagation model is presented. We describe the parallel methodology used and we present certain implementation issues for porting the computer code to a network of heterogeneous workstations using the Parallel Virtual Machine (PVM) infrastructure. 1 Introduction A parallel implementation on a Single Instruction Multiple Data (SIMD) platform of an operational numerical sound propagation model is presented. We describe the parallel methodology used and we present certain implementation issues for porting the computer code to a network of heterogeneous workstations using the Parallel Virtual Machine (PVM) infrastructure. The performance of the code on such a network is presented and analyzed. Instructions on using the parallelized computational model are also given. Parallelism in SNAP SNAP [3] is a sound propagation model based on normal mode theory design to treat a shallow-water ocean environment as realistically as possible. In this 1 Work supported in part by PENED grant 95-08 Preprint submitted to Elsevier Science 19 November 1996
model the acoustic eld from a harmonic source at position (0; z0) can be written as P (r; z) =! 0 4 X n=1 Nu n (z0)u n (z)h (1) 0 (k n r); (1) where! is the source frequency, 0 is the water density, (u n,k n ) is the modal eigenpaire and H (1) 0 the zeroth order Hankel function of the rst kind. Most of the computation involved i this model is included into a double nested loop. The outer ones loop over a certain large set of frequencies while the inner computes the eigenpairs by solving an eigenvalue problem dened by the Helmholtz equation d! 3 u n (z)! dz + 4? k c(z) n 5 un (z) = 0; () and appropriate boundary conditions. For production runs both the number of modes n and the number of frequencies n f are large. Present day computer power limits the size of these two numbers leading to approximations that might be unsatisfactory from a physics point of view. Based on this computational structure one can exploit parallelism for both nested loops since each computational module inside can be carried out independently. Thus there are two levels of inherent parallelism and in our approach we naturally decided to parallelize the outer loop. For the parallel implementation of the SNAP code we used the PVM [] infrastructure. PVM is a software package that allows a heterogeneous network of parallel and serial computers to appear as a single concurrent computational resource. Our PVM/SNAP code consists of a host/master program which after performing certain initializations splits the rest of the computation into a set of independently tasks by simply partitioning the frequencies. The master then spawns the child/slave processes associated with each subtask, assigns them to each member of the computer network and then waits for a termination message from each child/slave process which performs all computation. The child processes after receiving the range of the frequencies assigned to it, computes the associated acoustic eld, writes it to a le on its local disc (or to a common disc if all members of the computer network are on a network le system (NFS) ), sends a completion message to the parent process and exits. It is worth to point out that there is no interprocess communication between the child processes.
3 Performance In this section we present some preliminary timing results that exhibit the increased eciency of the parallelized SNAP code. Specically we have considered a sound propagation problem involving 01 number of frequencies and 35 modes. We solved this problem it on a heterogeneous computer network consisting of: A SUN4 Sparcstation with domain name sonar.iacm.forth.gr. Two IBM/RS 6000 workstations. These machines with domain names akkali and apollon are coupled together with NFS, belong in the domain iesl.forth.gr. Three HP3000 workstations. Two of them (named nireas and orfeas) belong in the domain iesl.forth.gr, are congured in cluster and are connected together with NFS The third one (named n08hp) is in the cc.uch.gr domain and writes on its local disk. It should be pointed out that the above workstations are physically located in three buildings which are several kilometers apart. In Figure 1 we present the trace of the parallel execution of the program using three IBMs only which was obtained using the XPVM parallel performance tool [1]. As expected, we can easily see that the master/host program after performing a limited amount of computation spawns the three child processes which run in parallel till termination. In Table 1 we give the total wall-clock execution time (in hours and seconds) the associated speedups and the congurations of the network system used. Specically we started our measurements using sonar only and kept adding machines as shown in the second column of Table 1. In the third column we give the total execution time (obtained using the timex command) and in the fourth the speedup obtained. (As speedup with i processors we denote the ration of the total time using one processor over total time using i processors.) As this table depicts, we achieve almost optimum speed-up in all congurations and we were able to reduce the total elapse time by a factor of more than 5 using six dierent machines. It should be marked here that the workload was equally distributed along the machines which appear in the second column ordered from the slowest (sonar) to the fastest. In order to use arbitrarily selected machines a workload partition strategy based on the speed and the load of each machine should be used. 3
Table 1 Total execution time and speed-up processors Conguration Total Time Speed-up 1 sonar 5:11 + akkali :37 1.98 3 + apollon :10.39 4 + n08hp 1:3 3.75 5 + nireas 1:07 4.64 6 + proteas 1:00 5.0 4 Load balanching the normal mode computations 5 Usage We rst assume that PVM is installed on all machines we plan to use. The parallel SNAP code consists of a master/host program and a parent/node program. To build the two associated executables one needs to compile the code with the -lfpvm3 -lpvm3 ags. The node executables should be placed in the directory $(HOME)/pvm3/bin/ARCH where ARCH represents the architecture of each machine and has values HPPA for the HP-9000 workstations, SUN4 for the SUN4 Sparcstation, CNVXN for the CONVEX C and RS6K for the IBM/RS600. To specify the machines we plan to use we should create a le (named say hostfile which contains all hosts chosen to run SNAP. The rst row of this le should contain the domain name of the host machine while the rest lines hold the names of the node machines and the user ids (they do not have to be the same on all nodes) as shown below. n08hp.cc.uch.gr nireas.iesl.forth.gr lo=mav pw orpheas.iesl.forth.gr lo=mav pw ikaros.cc.uch.gr lo=mav pw athina.cc.uch.gr lo=mav pw pasifae.cc.uch.gr lo=mav pw To run SNAP now on the specied computer platform we need to have, in the $(HOME) directory, two input les named data.dat and numbers.dat. The rst one contains the data for the SNAP and the second lines with the character 'i' for i = 1; : : : ; nmachs? 1 at each line i. nmachs is the number of the machines to be used. We start the execution by typing pvmd hostfile. 4
Each parent process writes its output on a le named foo. Additional details on the usage of PVM and XPVM can be found in [] and [1] and a complete running example of the parallel SNAP in directory mav/snap on n08hp.cc.uch.gr. References [1] T. Dunigan, Xpvm, Tech. Report ORNL/TM-10881, Mathematical Sciences Section, Oak Ridge National Laboratory, Oak Ridge, TN, Sept. 1988. 18 pages. [] A. Geist, A. Bequelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM: Parallel Virtual Machine. User's Guide and tutorial for networked Parallel Computing, The MIT Press, Cambridge, MA, 1994. [3] F. Jensen and M. Ferla, SNAP: the Saclantcen normal{mode acoustic propagation model, Tech. Report SM{11, Saclantcen ASW Research Center, Saclantcen, Italy, Jan. 1979. 5