PARALLEL AND DISTRIBUTED COMPUTING

Size: px

Start display at page:

Download "PARALLEL AND DISTRIBUTED COMPUTING"

Gordon Hodge
5 years ago
Views:

1 PARALLEL AND DISTRIBUTED COMPUTING 2013/ st Semester 1 st Exam January 7, 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers in the available space after each question. You can use either Portuguese or English. - Be sure to write your name and number on all pages, non-identified pages will not be graded! - Justify all your answers. - Don t hurry, you should have plenty of time to finish this exam. Skip questions that you find less comfortable with and come back to them later on. I. (1, ,5 = 5 val.) 1. The following program is running on a machine with 3 SMP processors. int function (int *a, int N, int value) { int retval = 0; int i; #pragma omp parallel for for (i = 0; i < N; i++) { if (a[i] == value) retval = 1; } return retval; } Re-write the loop above so that we compute the number of array elements equal to value. IST ID: Name: 1/8

2 2. By default, the #pragma omp parallel for splits the iterations of a for loop in blocks. a) Indicate, using OpenMP directives, how to make an interleaved distribution of the iterations across threads. b) Assume you have dynamic scheduling with i iterations and t threads. Present an argument in favor of choosing larger blocks and an argument in favor of picking smaller blocks. 3. Cache coherence is an issue in both shared-memory systems (UMA) and Distributed Shared Memory (DSM) systems. Describe how cache coherence can be implemented in each of these two types of systems, explaining why they typically use different solutions. IST ID: Name: 2/8

3 II. ( = 5 val.) 1. The MPI_Scatter uses the following parameters: MPI_Scatter(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm) a) Write the most trivial implementation of MPI_Scatter routine in C. Use only MPI_Send and MPI_Recv as communication functions. Routines that may be useful: int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) int MPI_Comm_size( MPI_Comm comm, int *size) int MPI_Type_size(MPI_Datatype datatype, int *size) void *memcpy(void *dst, const void *restrict src, size_t n) IST ID: Name: 3/8

4 b) Let: x: represent the time to set up a message y: represent the time required to send one byte of data P : represent the number of processors N: represent the size of the data in bytes The best MPI_Scatter implementation runs in Θ(x log P + N P y). If the time to set up the message, x, is neglible, how much faster is this when compared to your previous solution. 2. When does a call to the function MPI_Irecv return? IST ID: Name: 4/8

5 III. ( ,5 + 1,5 = 5 val.) 1. Consider that it has been determined that 10% of the execution time of a sequential program is not amenable to parallelization. a) Compute the absolute maximum speedup achievable by parallelizing this program. b) What is the serial fraction of this program when running on a system with p = 9 processors? IST ID: Name: 5/8

6 2. Consider that for p = 8 processors the experimentally determined serial fraction evaluated to 0,2. Is it reasonable to expect that for p = 16 it evaluates to 0,1? If so, what does it mean? If not, justify. 3. The scalability function of a parallel application has been determined to be M(f(p)) p = p log p. Is this a good or bad indicator? Give a careful justification in terms of the definition of the scalability function. IST ID: Name: 6/8

7 IV. (1 + 1, ,5 = 5 val.) 1. In the Backtrack Search algorithm a depth-first search on a tree is used. In a parallel implementation, in general, all tasks perform the expansion of the first levels of the tree, which constitute redundant operations. Explain what is being addressed by this wasteful computation. What is the compromise being targeted? 2. Explain the limitation of Hyperquicksort that the Parallel Sorting by Regular Sampling algorithm tries to solve. How do these two algorithms compare in terms of scalability? IST ID: Name: 7/8

8 3. In class, we discussed the problem of the parallel implementation of a 2D heat distribution problem, analyzing both row-wise and checkerboard decompositions. In this context, we also introduced the concept of ghostpoints. a) Explain what are ghostpoints. b) Discuss the tradeoff of having more ghostpoints than the strictly necessary. Clearly state the gains and costs. c) In the rowwise decomposition, if we have k ghostpoints that need to be exchanged, we can simply add multiple of k ghostpoints to achieve the objective of the previous question. However, for the checkerboard decomposition we need more than that. Why? How many more? IST ID: Name: 8/8

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING 2013/2014 1 st Semester 2 nd Exam January 29, 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers