Introduction to Parallel Processing. Lecture #10 May 2002 Guy Tel-Zur

Size: px

Start display at page:

Download "Introduction to Parallel Processing. Lecture #10 May 2002 Guy Tel-Zur"

Patrick Washington
6 years ago
Views:

1 Introduction to Parallel Processing Lecture #10 May 2002 Guy Tel-Zur

2 Topics Parallel Numerical Algorithms Allen & Wilkinson s book chapter 10 More MPI Commands Home assignment #5

3 Wilkinson&Allen PDF

4 Direct, Recursive and Mesh Gauss Elimination Jacobi Red-! Gauss-Seidel Over-relaxation! Black ordering

5 " Matrix Addition Matrix Multiplication Matrix-Vector Multiplication Linear Equations Matrix Multiplication Recursive Implementation Mesh Implementation 2D pipeline Systolic Array Gauss Elimination Jacobi Iteration

6 " Gauss-Seidel Relaxation Red-Black Ordering Over-relaxation Multi-Grid

7 Intermediate MPI Parts from: Using MPI book by Gropp, Lusk and Skjellum, Chapter 4. The source codes can be downloaded from:

8 Topics The Poisson Problem Topologies Jacobi Iterations

9 Terminology The general form of a second order linear PDE: a * 2 u/x 2 + b * 2 u/xy + c * 2 u/y 2 + d * u/x + e * u/y + f = 0 (y denotes time for hyperbolic and parabolic equations) Analog to solutions of general quadratic equation a * x 2 + b * x*y + c * y 2 + d * x + e * y + f = 0 Ellipse: 4*a*c b^2 > 0 Hyperbola: 4*a*c b^2 < 0 Parabola: 4*a*c b^2 = 0 Heat Equation

10 Poisson s equation arises in many models 1D: 2 u/x 2 = f(x) 2D: 2 u/x u/y 2 = f(x,y) 3D: 2 u/x u/y u/z 2 = f(x,y,z) Heat flow: Temperature(position, time) Diffusion: Concentration(position, time) Electrostatic or Gravitational Potential: Potential(position) Fluid flow: Velocity,Pressure,Density(position,time) Quantum mechanics: Wave-function(position,time) Elasticity: Stress,Strain(position,time)

11 Poisson s equation in 1D: 2 u/x 2 = f(x) Discretize 2 u/x 2 = f(x) on regular mesh u i = u(i*h) to get [ u i+1 2*u i + u i-1 ] / h 2 = f(x) Write as solving Tu = -h 2 * f for u where T = Graph and stencil

12 2D Poisson s equation: 2 u/x u/y 2 = f(x,y) Similar to the 1D case, but the matrix T is now Grid points numbered left to right, top row to bottom row Graph and stencil T = D is analogous Similar adjacency matrix for arbitrary graph

13 Composite mesh from a mechanical structure

14 Converting the mesh to a matrix

15 Irregular mesh: NASA Airfoil in 2D (direct solution)

16 Adaptive Mesh Refinement (AMR) Adaptive mesh around an explosion John Bell and Phil Colella at LBL

17 1 0,..,, 1 1 0,..,, 1 : Define a square mesh (grid) on the boundary ), ( ), ( in the interior ), ( 2 n j n j y n i n i x y x g y x u y x f u i i Problem Definition

18 Discretization Poisson Equation: -4* u(i,j) + u(i-1,j) + u(i+1,j) + u(i,j-1) + u(i,j+1) = f(i,j) Jacobi Iterations: u k+1 (i,j)=1/4(u k (i-1,j)+u k (i+1,j)+u k (i,j-1)+u k (i,j+1)-h 2 f(i,j))

19 5 point stencil approx. for 2-D Poisson problem

20 Jacobi Iteration Serial Version (Fortran)

21 Jacobi Iteration Serial Version (C)

22 Finite Difference Algorithm

23 s = start, e=end Jacobi Iteration for a Slice

24 1-D Decomposition of the Domain

26 Ghost Points double precision u(0:n+1,s-1:e+1)

27 Topology Virtual Topology Cartesian Topology In the next slides: 2-D Cartesian Topology

28 2D Cartesian Decomposition 4 x 3 domain

29 Defining Cartesian Topologies Our next task is to define how to assign processes to each part of the decomposed domain MPI lets user specify various application topologies The routine MPI_Cart_create() creates a Cartesian decomposition of the processes, with the number of dimensions given by the ndim argument This creates a new communicator with the same processes as the input communicator, but with the specified topology dims[0]=4; dims[1]=3; periods[0]=0; periods[1]=0; /* specify if connection is with wrap round */ ndim=2; MPI_Cart_create(MPI_COMM_WORLD,ndim,*dims,*per iods,reorder,comm2d);

30 Domain Decomposition C bindings: MPI_Cart_create(MPI_Comm comm_old, int ndims, int *dims, int *isperiodic, int reorder, MPI_Comm *new_comm) MPI_Cart_get -

31 MPI_CART_CREATE integer dims(2) logical isperiodic(2), reoeder dims(1) = 4 dims(2) = 3 isperiodic(1) =.false. isperiodic(2)=.false. reorder =.true. ndim = 2 call MPI_CART_CREATE(MPI_COMM_WORLD, ndim, dims, isperiodic, reorder, comm2d, ierr)

32 To determine the coordinates of a calling process FORTRAN examples: call MPI_CART_GET(comm1d, 2, dims, periods, coords, ierr) print *, '('coords(1), ','coords(2), ')' call MPI_COMM_RANK(comm2d, myrank, ierr) call MPI_CART_COORDS(comm2d, myrank,2,coords,ierr)

33 2-Step Process to Transfer Data

34 More Exotic MPI Functions MPI_Cart_shift(MPI_Comm comm, int direction, int displ, int *src, int *dest)

35 MPI_Cart_shift /* create cartesian topology for processes */ dims[0] = nrow; /* number of rows */ dims[1] = mcol; /* number of columns */ period[0] = 1; /* cyclic in this direction */ period[1] = 0; /* no cyclic in this direction */ MPI_Cart_create(MPI_COMM_WORLD, ndim, dims, period, reorder, &comm2d); MPI_Comm_rank(comm2D, &me); MPI_Cart_coords(comm2D, me, ndim, coords); source = me; /* calling process rank in 2D communicator */ index = 0; /* shift along the 1st index (out of 2) */ displ = 1; /* shift by 1 */ MPI_Cart_shift(comm2D, index, displ, source, &dest1);

36 MPI_PROC_NULL! Compute neighbors IF (myrank.eq.0) THEN left = MPI_PROC_NULL ELSE left = myrank - 1 END IF IF (myrank.eq.p-1)then right = MPI_PROC_NULL ELSE right = myrank+1 END IF

37 MPE_DECOMP1D Determine the array limits (s and e in our code): call MPE_DECOMP1D(n, nprocs, myrank, s, e) Where: nprocs = # of processes in the Cartesian coordinates, myrank = cart. coord. of the calling process n = size of the array (1..n)

38 MPE_DECOMP1D Similar to: s = 1+myrank*(n/nprocs) e = s+(n/nprocs) - 1

39 MPE_DECOMP1D C This file contains a routine for producing a decomposition of C a 1-d array c when given a number of processors. C It may be used in "direct" product decomposition. C The values returned assume a "global" domain in [1:n] subroutine MPE_DECOMP1D(n, numprocs, myid, s, e ) integer n, numprocs, myid, s, e integer nlocal integer deficit

40 MPE_DECOMP1D nlocal = n / numprocs s = myid * nlocal + 1 deficit = mod(n,numprocs) s = s + min(myid,deficit) if (myid.lt. deficit) then nlocal = nlocal + 1 endif e = s + nlocal - 1 if (e.gt.n.or.myid.eq.numprocs-1) e = n return end

41 A code to exchange data for ghost points using blocking send/recv subroutine exchng1(a,nx,s,e,comm1d,nbrbottom,nbrtop ) include "mpif.h" integer nx, s, e double precision a(0:nx+1,s-1:e+1) integer comm1d, nbrbottom, nbrtop integer status(mpi_status_size), ierr call MPI_SEND(a(1,e),nx,MPI_DOUBLE_PRECISION,nbrtop, 0, comm1d, ierr) call MPI_RECV(a(1,s-1),nx,MPI_DOUBLE_PRECISION, nbrbottom,0,comm1d,ierr) call MPI_SEND(a(1,s),nx,MPI_DOUBLE_PRECISION, nbrbottom, 1,comm1d,ierr) call MPI_RECV(a(1,e+1),nx,MPI_DOUBLE_PRECISION, nbrtop,1,comm1d,ierr) return end

42 The previous example was simple But It is not necessarily the best way to implement the exchange of ghost points

43 sendrecv (exchange data ver. 2) subroutine exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop ) include "mpif.h" integer nx, s, e double precision a(0:nx+1,s-1:e+1) integer comm1d, nbrbottom, nbrtop integer status(mpi_status_size), ierr call MPI_SENDRECV( $ a(1,e),nx,mpi_double_precision, nbrtop, 0, $ a(1,s-1),nx,mpi_double_precision,nbrbottom, 0, $ comm1d, status, ierr ) call MPI_SENDRECV( $ a(1,s), nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, $ a(1,e+1), nx, MPI_DOUBLE_PRECISION, nbrtop, 1, $ comm1d, status, ierr ) return end

44 Implementation of the Jacobi Iteration-1 program main include "mpif.h" integer maxn parameter (maxn = 128) double precision a(maxn,maxn),b(maxn,maxn),f(maxn,maxn) integer nx, ny integer myid, numprocs, ierr integer comm1d, nbrbottom, nbrtop, s, e, it double precision diff, diffnorm, dwork double precision t1, t2 double precision MPI_WTIME external MPI_WTIME external diff call MPI_INIT( ierr ) call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

45 c c c Implementation of the Jacobi Iteration-2 if (myid.eq. 0) then Get the size of the problem print *, 'Enter nx' read *, nx nx = 110 endif call MPI_BCAST(nx,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr) ny = nx c Get a new communicator for a decomposition of the domain call MPI_CART_CREATE(MPI_COMM_WORLD,1,numprocs,.false.,.true.,comm1d,ierr )

46 Implementation of the Jacobi Iteration-3 c c Get my position in this communicator, and my neighbors c call MPI_COMM_RANK (comm1d,myid,ierr) call MPI_Cart_shift(comm1d,0,1,nbrbottom, nbrtop,ierr) c c Compute the actual decomposition c call MPE_DECOMP1D(ny,numprocs,myid,s,e ) c c Initialize the right-hand-side (f) and the initial solution guess (a) c call onedinit( a, b, f, nx, s, e )

47 Implementation of the Jacobi Iteration-4 C Actually do the computation. Note the use of a collective C operation to check for convergence, and a do-loop to bound the C number of iterations. call MPI_BARRIER( MPI_COMM_WORLD, ierr ) t1 = MPI_WTIME() do 10 it=1, 100 call exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop ) call sweep1d( a, f, nx, s, e, b ) call exchng1( b, nx, s, e, comm1d, nbrbottom, nbrtop ) call sweep1d( b, f, nx, s, e, a ) dwork = diff( a, b, nx, s, e ) call MPI_Allreduce( dwork, diffnorm, 1, $ MPI_DOUBLE_PRECISION, MPI_SUM, comm1d, ierr ) if (diffnorm.lt. 1.0e-5) goto 20 if (myid.eq. 0) print *, 2*it, ' Difference is ', diffnorm 10 continue

48 Implementation of the Jacobi Iteration-5 if (myid.eq. 0) print *, 'Failed to converge' 20 continue t2 = MPI_WTIME() if (myid.eq. 0) then print *, 'Converged after ', 2*it, ' Iterations in ', t2 - t1,' secs ' endif c call MPI_FINALIZE(ierr) end

49 Implementation of the Jacobi Iteration-6 c Perform a Jacobi sweep for a 1-d decomposition. c Sweep from a into b subroutine sweep1d( a, f, nx, s, e, b ) integer nx, s, e double precision a(0:nx+1,s-1:e+1), f(0:nx+1,s- 1:e+1), + b(0:nx+1,s-1:e+1) integer i, j double precision h h = 1.0d0 / dble(nx+1) do 10 j=s, e do 10 i=1, nx b(i,j) = 0.25 * (a(i-1,j)+a(i,j+1)+a(i,j- 1)+a(i+1,j)) + h * h * f(i,j) 10 continue return end

50 Implementation of the Jacobi Iteration-7 c c The rest of the 1-d program double precision function diff( a, b, nx, s, e ) integer nx, s, e double precision a(0:nx+1, s-1:e+1), b(0:nx+1, s- 1:e+1) double precision sum integer i, j sum = 0.0d0 do 10 j=s,e do 10 i=1,nx sum = sum + (a(i,j) - b(i,j)) ** 2 10 continue diff = sum return end

51 Timing for variants of the 1-D decomposition of the Poisson problem P Blocking Send Ordered Send Sendrecv Buffered Send Non Blocking Isend e~20% (1/14 faster than 1proc)

52 $"#

53 % " # Grid computing#

lslogin3$ cd lslogin3$ tar -xvf ~train00/mpibasic_lab.tar cd mpibasic_lab/pi cd mpibasic_lab/decomp1d

MPI Lab Getting Started Login to ranger.tacc.utexas.edu Untar the lab source code lslogin3$ cd lslogin3$ tar -xvf ~train00/mpibasic_lab.tar Part 1: Getting Started with simple parallel coding hello mpi-world