PARALLEL AND DISTRIBUTED COMPUTING 2013/2014 1 st Semester 2 nd Exam January 29, 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers in the available space after each question. You can use either Portuguese or English. - Be sure to write your name and number on all pages, non-identified pages will not be graded! - Justify all your answers. - Do not hurry, you should have plenty of time to finish this exam. Skip questions that you find less comfortable with and come back to them later on. I. (1 + 1 + 1,5 + 1 + 0,5 = 5 val.) 1. Define functional and data parallelism. What are the typical reasons that limit the scalability of functional parallelism? And of data parallelism? IST ID: Name: 1/9
2. Consider the following program: 1 unsigned int row, column, t, i; 2 unsigned int m[n*n]; /* assume m is initialized with random integers */ 3 unsigned int v[5] = {0,0,0,0,0}; 4 register unsigned int tmp = 0; /* unused for now */ 5 6 for (row = 0; row < N; row++) { 7 m[row*n] = tmp = 0; 8 for (column = 1; column < N; column++) { 9 if (m[row*n+column] == 7) 10 m[row*n]++; /* accumulate in the first column */ 11 } 12 /* empty line*/ 13 } 14 for (row = 0; row < N; row++) 15 for (t = 0; t < 5; t++) 16 if (m[row*n] > v[t]) { 17 for (i = 3; i > t; i--) 18 v[i+1] = v[i]; 19 v[t] = m[row*n]; 20 break; 21 } a) Explain in detail what is the content of v at the end of the program. b) Parallelize the program using OpenMP. IST ID: Name: 2/9
c) Suggest an improvement to the program to improve parallelization. d) Assume N=1.000.000. Does replacing lines 10 to 12 with the lines below improve the program? Why? 10 tmp++; 11 } 12 m[row*n] = tmp; /* accumulate in the first column */ IST ID: Name: 3/9
II. (1,5 + 1,5 + 2 = 5 val.) 1. Write a simple MPI program to estimate the bandwidth of the network. Discuss the limitations of the solution you propose. IST ID: Name: 4/9
2. What is the best network topology to use in the MPI_Alltoall function? 3. Explain what the MPI_Barrier function does. Give an example of an algorithm where it is required. IST ID: Name: 5/9
III. (1,5 + 1,5 + 2 = 5 val.) 1 1. Amdahl s Law is given by ψ(n, p) and Gustafson-Barsis Law is given by ψ(n, p) f+ 1 f p p + (1 p)s. Explain clearly what do f and s represent. How do f and s change as the size of the problem, n, grows? 2. Explain how the Karp-Flatt Metric, the experimentally determined serial fraction, can be used to optimize a parallel program. What experimental measurements does it use? IST ID: Name: 6/9
3. Consider a problem with a sequential algorithm that runs in Θ(n 2 log n) and with a parallel implementation whose overhead (communication + redundant computation) per processor is given by Θ(log n). If the required memory grows with n 3, compute the scalability function for this parallel algorithm. Discuss the result obtained. What does it mean? IST ID: Name: 7/9
IV. (1 + 0,5 + 0,5 + 1,5 + 1,5 = 5 val.) 1. Algorithms for Branch and Bound Search use a priority queue to allow the selection of the most promising node to explore next. a) Discuss why parallel implementations of this algorithm use multiple priority queues. b) State the potential inefficiency for the parallel algorithm created by these multiple queues. c) How can this inefficiency be minimized? What is the tradeoff? IST ID: Name: 8/9
2. The scalability of the Parallel Sorting by Regular Sampling (PSRS) algorithm we derived in class was p C 1 and the scalability of the Odd-Even Sort algorithm was C. Yet, in practice, PSRS is much more widely used. Discuss why this is so. 3. Give the main advantage and main disadvantage of Cache-Coherent NUMA (aka ccnuma) systems versus Message-Passing systems. IST ID: Name: 9/9