CPS343 Parallel and High Performance Computing Project 1 Spring 2018

Size: px

Start display at page:

Download "CPS343 Parallel and High Performance Computing Project 1 Spring 2018"

Jack White
5 years ago
Views:

1 CPS343 Parallel and High Performance Computing Project 1 Spring 2018 Assignment Write a program using OpenMP to compute the estimate of the dominant eigenvalue of a matrix Due: Wednesday March 21 The program /gc/cps343/matrix/pm seqcc (found on the workstation cluster, along with the associated file /gc/cps343/matrix/readmatrixseqcc) is a sequential program that opens and reads an HDF5 file to obtain matrix data and then uses the power method to estimate the dominant eigenvalue Starting with this program (or writing your own from scratch), you will need to use OpenMP so your program takes advantage of all available CPU cores when carrying out the power method Minimum Requirement: (60% of maximum) Make appropriate use of the #pragma omp parallel for directive to speed up your program Focus on the power method section of the main program Note that this approach will mean that a set of threads is started to handle each for-loop and then joined when the loop completes This process is repeated for each parallel loop Option 1: (Additional 30%) Create a version of your program that uses a single #pragma omp parallel directive In contrast to the original version, this version will only start one set of threads which will work collaboratively to carry out the power method and then terminate when work is complete You may assume that the matrix dimension is a multiple of the number of threads so that each thread will be responsible for the same number of matrix rows as every other thread Option 2: (Additional 10%) Extend your Option 1 program so that it allows for matrix dimensions that are not a multiple of the number of threads The Power Method Suppose A is an n n matrix The eigenvalues of A are scalars λ which satisfy Av = λv (1) and the vectors v are the associated eigenvectors Finding the eigenvalues and eigenvectors of a matrix is an important problem in many applications In some of these only the largest eigenvalue of a matrix, called the dominant eigenvalue, is needed One way to find it and its associated eigenvector is with the power method See the Appendix an explanation of the method

2 The Power Method Algorithm Given an n n matrix A, a tolerance ɛ > 0, and the maximum allowed number of iterations M > 0, the general power method algorithm can be formulated as follows x := (1, 1, 1,, 1) T initial eigenvector estimate λ := 0 initialized to any value λ 0 := λ + 2ɛ make sure λ λ 0 > ɛ k := 0 while λ λ 0 ɛ and k M do x := x/ x normalize x y := Ax compute next eigenvector estimate λ 0 := λ previous eigenvalue estimate λ := x T y compute new estimate: λ x T Ax x := y normalize eigenvector estimate k := k + 1 end while If the while-loop terminates with k M, then we conclude the algorithm has terminated successfully In this case λ is the dominant eigenvalue and x is the corresponding eigenvector The power method s rate of convergence depends on the difference between the magnitude of the dominant eigenvalue and the other eigenvalues Also, the power method will fail if the matrix does not have any real eigenvalues Parallel Implementation For the first option you will only need to make minor changes most of which will be inserting appropriate #pragma omp parallel for directives The second option will require significantly more work You should specify a single parallel thread body with something like # pragma omp parallel { while ( fabs ( lambda - lambda_ old ) > tol && numiter <= maxiter ) { } }

3 Note that this includes the power method s main loop You ll need to think carefully about shared variables and critical sections You ll also need to use #pragma omp barrier to synchronize the threads at several points The following discussion should help you think about how to create the body of each thread Suppose we have a parallel machine with p processors Following the PCAM design approach, we begin by partitioning the problem into many small individual tasks We can partition the matrix A into individual rows where a i is a 1 n row vector that is the i th row of A Then the product Ax becomes y = Ax = a 1 a 2 a 3 a n x = a 1 x a 2 x a 3 x a n x where the i th entry of y is computed via an inner product: n y i = a i x = a ij x j Each of these products is a task that can be computed in parallel If n is larger than the number of processors p we can assign multiple tasks to each processor during the agglomeration and mapping phases Each task uses a single row of A and the entire vector x in order to compute a single y i Every task must then make its y i value to all other tasks since the entire vector y is normalized to become the vector x for the next iteration One obvious way to agglomerate and map is to group tasks together based on the rows of A they use The rows may be interleaved or consecutive, but it s probably most natural to group tasks that use consecutive rows of A If there are p threads the matrix A is partitioned into individual blocks A i, i = 1,, p where each block has approximately n/p rows The product y = Ax then can be written as y 1 y 1 y p = j=1 A 1 x A 2 x A p x and processor i can compute y i = A i x independently of the other processors Note, however, every processor must have the entire vector y before it can compute the estimate of λ and create a normalized eigenvector estimate

4 Helps and Hints Using OpenMP To adapt your program to use OpenMP you will need to compile with the -fopenmp flag include the header file omph insert appropriate #pragma omp directives think carefully about what variables should be shared between threads Computing partition sizes When attempting the final option you will need some way to assign an unequal number of rows to the threads It is frequently the case in partitioning domains that we need to compute starting and ending indices in an array for a particular part of the partition For example, if a 100-element array is to be partitioned into four parts in a balanced way, each part should have one quarter of the elements In this case we would have The computations here are simple: Part Length Start End the length of each part is just the total array length divided by the number of parts the starting index of the i th part is i times the part length When the array length is not a multiple of the number of parts we will have some slight imbalance, but still want to have all parts be almost the same size For example, if we wanted to split a 100-element array into 8 parts we might find the following Part Length Start End Part Length Start End

5 The following C code will determine the starting and ending indices (inclusive) of the i th part (starting from 0) in a zero-based n-element array that is partitioned into m parts It is a useful routine, worth keeping around somewhere since you might have occasion to use it in the future / Computes the starting and ending displacements for the ith subinterval in an n- element array given that there are m subintervals of approximately equal size Input : int n - length of array ( array indexed [0][n -1]) int m - number of subintervals int i - subinterval number Output : int s - location to store subinterval starting index int e - location to store subinterval ending index Suppose we want to partition a element array into 3 subintervals of roughly the same size The following three pairs of calls find the starting and ending indices of each subinterval : decompose1d ( 100, 3, 0, &s, &e ); ( now s = 0, e = 33) decompose1d ( 100, 3, 1, &s, &e ); ( now s = 34, e = 66) decompose1d ( 100, 3, 2, &s, &e ); ( now s = 67, e = 99) The subinterval length can be computed with e - s + 1 Based on the FORTRAN subroutine MPE_ DECOMP1D in the file UsingMPI / intermediate / decomp f supplied with the book " Using MPI " by Gropp et al It has been adapted to use 0- based indexing / void decompose1d ( int n, int m, int i, int s, int e ) { const int length = n / m; const int deficit = n % m; s = i length + ( i < deficit? i : deficit ); e = s + length - ( i < deficit? 0 : 1 ); if ( ( e >= n ) ( i == m - 1 ) ) e = n - 1; }

6 Appendix Derivation of the Power Method To see why the power method works, consider the n n matrix A with eigenvectors v 1, v 2,, v n and associated eigenvalues λ 1, λ 2,, λ n, where λ 1 > λ 2 λ 3 λ n so λ 1 is the dominant eigenvalue Let x be any vector that can be written in the form x = c 1 v 1 + c 2 v 2 + c 3 v c n v n with c 1 0 (to ensure that x has some component parallel to v 1 ) Then Ax = A(c 1 v 1 + c 2 v 2 + c 3 v c n v n ) When we repeatedly multiply by A we have = c 1 Av 1 + c 2 Av 2 + c 3 Av c n Av n = c 1 λ 1 v 1 + c 2 λ 2 v 2 + c 3 λ 3 v c n λ n v n A k x = c 1 A k v 1 + c 2 A k v 2 + c 3 A k v c n A k v n = c 1 λ k 1v 1 + c 2 λ k 2v 2 + c 3 λ k 3v c n λ k nv n [ = c 1 λ k 1 v 1 + c ( ) k 2 λ2 v 2 + c ( ) k 3 λ3 v c n c 1 λ 1 c 1 λ 1 c 1 ( λn λ 1 ) k v n ] Notice that since λ i /λ 1 < 1 for i = 2,, n, the bracketed expression tends to the eigenvector v 1 as k Thus, the iteration x k+1 Ax k will converge to an eigenvector associated with the dominant eigenvalue of A The estimated value of λ 1 can be computed with λ 1 = lim k x T k Ax k x T k x k x T k = lim x k+1 k x T k x k It is common to normalize the vectors x k as they are produced so that x T k x k = 1 In this case λ 1 = lim k x T k x k+1

OPEN MP and MPI on Kingspeak chpc cluster

OPEN MP and MPI on Kingspeak chpc cluster Command to compile the code with openmp and mpi /uufs/kingspeak.peaks/sys/pkg/openmpi/std_intel/bin/mpicc -o hem hemhotlz.c -I /uufs/kingspeak.peaks/sys/pkg/openmpi/std_intel/include