Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree.

Size: px

Start display at page:

Download "Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree."

Josephine Collins
6 years ago
Views:

1 CSCI 351 PARALLEL PROGRAMMING FALL, 2015 Assignment 3 Key Q1. Calculate log n, log n and log n for the following: a. n=3 b. n=13 c. n=32 d. n=123 e. n=321 Answer: Q2. mpi_trap_tree.c The mpi_trap_time.c program you developed in the class lets each process send its sub-area calculated to P0 and P0 calculate the global area. Modify this proram (save as mpi_trap_tree.c) to have tree structured global sum of sub areas instead. ANSWER1: Use Assignment algorithm (with a small correction to else part) / File: mpi_trap_tree2.c Purpose: Implement parallel trapezoidal rule and determine its run-time vs. serial trap rule Input: a, b, n Output: Estimate of the area from between x = a, x = b, x-axis, and the graph of f(x) using the trapezoidal rule and n trapezoids. Use a tree-structured global sum of the process areas. Use XOR function for tree structured communication. Also output the elapsed time to run the parallel version. Compile: mpicc -g -Wall -o mpi_trap_tree mpi_trap_tree.c -lm Run: mpiexec -n <number of processes>./mpi_trap_tree

2 Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Barrier. 2. Start timer on each process. 3. Each process calculates "its" subinterval of integration. 4. Each process estimates the area of f(x) over its interval using the trapezoidal rule. 5. Tree structured global sum of process estimates to process Stop timer on each process. 7. Find max time, store on process Time serial trap on process Print speedup, efficiency. Note: f(x) is hardwired. / #include <stdio.h> #include <math.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area/ double f(double x); / function we're integrating / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm); double Get_max_time(double par_elapsed, int my_rank, int p); int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double area; / My subarea / double total = 0; / Total area / double start, finish, par_elapsed; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); MPI_Barrier(MPI_COMM_WORLD); start = MPI_Wtime(); h = (b-a)/n; / h is the same for all processes / local_n = n/p; / So is the number of trapezoids / local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; area = Trap(local_a, local_b, local_n, h);

3 total = Global_sum(area, my_rank, p, MPI_COMM_WORLD); finish = MPI_Wtime(); par_elapsed = finish - start; par_elapsed = Get_max_time(par_elapsed, my_rank, p); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %23.16e\n", a, b, total); printf("parallel elapsed time = %e seconds\n", par_elapsed); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) int q; MPI_Status status; printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); for (q = 1; q < p; q++) MPI_Send(a_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(b_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(n_p, 1, MPI_INT, q, 0, MPI_COMM_WORLD); else MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(n_p, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double area; / Store result in area /

4 double x; int i; area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; area = area + f(x); area = areah; return area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; // return_val = xx; return_val = exp(sin(x)); return return_val; / f / / Function: Get_max_time Purpose: Find the maximum elapsed time across the processes In args: my_rank: calling process' rank p: total number of processes par_elapsed: elapsed time on calling process Ret val: Process 0: max of all processes times Other procs: input value for par_elapsed / double Get_max_time(double par_elapsed, int my_rank, int p) int source; MPI_Status status; double temp; for (source = 1; source < p; source++) MPI_Recv(&temp, 1, MPI_DOUBLE, source, 0, MPI_COMM_WORLD, &status); if (temp > par_elapsed) par_elapsed = temp; else MPI_Send(&par_elapsed, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); return par_elapsed; / Get_max_time / / Function: Global_sum Purpose: Compute global sum of values distributed across processes Input args: my_contrib: the calling process' contribution to the global sum my_rank: the calling process' rank in the communicator p: the number of processes in the communicator

5 comm: the communicator used for sends and receives Return val: the sum of the my_contrib values contributed by each process. Algorithm: Use tree structured communication, pairing processes to communicate. Notes: 1. The value returned by global_sum on processes other than 0 is meaningless. 2. The pairing of the processes is done using algorithm we developed in assignment1: divisor = 2; core_difference = 1; sum = my_value; while ( divisor <= number of cores ) if ( my_rank % divisor == 0 ) partner = my_rank + core_difference; receive value from partner core; sum += received value; else if( (my_rank % (divisor/2)) == 0) partner = my_rank core_difference; send my sum to partner core; divisor = 2; core_difference =2; / double Global_sum(double my_value, int my_rank, int number_of_cores, MPI_Comm comm) int divisor = 2; int core_difference = 1; double sum = my_value; int partner; double received_value; while ( divisor <= number_of_cores ) if ( my_rank % divisor == 0 ) partner = my_rank + core_difference; //receive value from partner core: MPI_Recv(&received_value, 1, MPI_DOUBLE, partner, 0, comm, MPI_STATUS_IGNORE); sum += received_value; else if( (my_rank % (divisor/2)) == 0) partner = my_rank - core_difference; //send my sum to partner core MPI_Send(&sum, 1, MPI_DOUBLE, partner, 0, comm); divisor = 2; core_difference =2; MPI_Barrier(MPI_COMM_WORLD); //make sure the tree structured communication is complete and thus global sum is calculated. return sum; / Global_sum /

6 /output 158 ~/CS351/mpi]$ mpicc -g -Wall -o mpi_trap_tree2 mpi_trap_tree2.c -lm 159 ~/CS351/mpi]$ mpiexec -n 4./mpi_trap_tree2 Enter a, b, and n With n = 1024 trapezoids, our estimate of the area from to = e+00 Parallel elapsed time = e-04 seconds [rdissanayaka@hpc0 160 ~/CS351/mpi]$ / ANSWER2: Uses bitwise XOR for tree-structured communication / File: mpi_trap_tree.c Purpose: Implement parallel trapezoidal rule and determine its run-time vs. serial trap rule Input: a, b, n Output: Estimate of the area from between x = a, x = b, x-axis, and the graph of f(x) using the trapezoidal rule and n trapezoids. Use a tree-structured global sum of the process areas. Also output the elapsed time to run the parallel version. Compile: mpicc -g -Wall -o mpi_trap_tree mpi_trap_tree.c -lm Run: mpiexec -n <number of processes>./mpi_trap_tree Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Barrier. 2. Start timer on each process. 3. Each process calculates "its" subinterval of integration. 4. Each process estimates the area of f(x) over its interval using the trapezoidal rule. 5. Tree structured global sum of process estimates to process Stop timer on each process. 7. Find max time, store on process Time serial trap on process Print speedup, efficiency. Note: f(x) is hardwired. / #include <stdio.h> #include <math.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area / double f(double x); / function we're integrating / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm); double Get_max_time(double par_elapsed, int my_rank, int p); int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double area; / My subarea / double total = 0; / Total area / double start, finish, par_elapsed; / Let the system do what it needs to start up MPI / MPI_Init(&argc, &argv); / Get my process rank / MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

7 / Find out how many processes are being used / MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); MPI_Barrier(MPI_COMM_WORLD); start = MPI_Wtime(); h = (b-a)/n; / h is the same for all processes / local_n = n/p; / So is the number of trapezoids / / Length of each process' interval of integration = local_nh. So my interval starts at: / local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; area = Trap(local_a, local_b, local_n, h); / Add up the areas calculated by each process / total = Global_sum(area, my_rank, p, MPI_COMM_WORLD); finish = MPI_Wtime(); par_elapsed = finish - start; par_elapsed = Get_max_time(par_elapsed, my_rank, p); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %23.16e\n", a, b, total); printf("parallel elapsed time = %e seconds\n", par_elapsed); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) int q; MPI_Status status; printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); for (q = 1; q < p; q++) MPI_Send(a_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(b_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(n_p, 1, MPI_INT, q, 0, MPI_COMM_WORLD); else MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(n_p, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double area; / Store result in area / double x; int i; area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; area = area + f(x);

8 area = areah; return area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; // return_val = xx; return_val = exp(sin(x)); return return_val; / f / / Function: Get_max_time Purpose: Find the maximum elapsed time across the processes In args: my_rank: calling process' rank p: total number of processes par_elapsed: elapsed time on calling process Ret val: Process 0: max of all processes times Other procs: input value for par_elapsed / double Get_max_time(double par_elapsed, int my_rank, int p) int source; MPI_Status status; double temp; for (source = 1; source < p; source++) MPI_Recv(&temp, 1, MPI_DOUBLE, source, 0, MPI_COMM_WORLD, &status); if (temp > par_elapsed) par_elapsed = temp; else MPI_Send(&par_elapsed, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); return par_elapsed; / Get_max_time / / Function: Global_sum Purpose: Compute global sum of values distributed across processes Input args: my_contrib: the calling process' contribution to the global sum my_rank: the calling process' rank in the communicator p: the number of processes in the communicator comm: the communicator used for sends and receives Return val: the sum of the my_contrib values contributed by each process. Algorithm: Use tree structured communication, pairing processes to communicate. Notes: 1. The value returned by global_sum on processes other than 0 is meaningless. 2. The pairing of the processes is done using bitwise exclusive or. Here's a table showing the rule for for bitwise exclusive or X Y X^Y Here's a table showing the process pairing with 8 processes (r = my_rank, other column heads are bitmask) r x x x x x x x x x x / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm) double sum = my_contrib; double temp; int partner; int done = 0; unsigned bitmask = (unsigned) 1;

9 # ifdef DEBUG int my_pass = -1; partner = -1; printf("proc %d > partner = %d, bitmask = %d, pass = %d\n", my_rank, partner, bitmask, my_pass); fflush(stdout); # endif while (!done && bitmask < p) partner = my_rank ^ bitmask; # ifdef DEBUG my_pass++; printf("proc %d > partner = %d, bitmask = %d, pass = %d\n", my_rank, partner, bitmask, my_pass); fflush(stdout); # endif if (my_rank < partner) if (partner < p) MPI_Recv(&temp, 1, MPI_DOUBLE, partner, 0, comm, MPI_STATUS_IGNORE); sum += temp; bitmask <<= 1; else MPI_Send(&sum, 1, MPI_DOUBLE, partner, 0, comm); done = 1; / Valid only on 0 / return sum; / Global_sum / Q3. [Pacheco Q3.2] mpi_trap2.c Modify mpi_trap_time.c (save as mpi_trap2.c) so that it will correctly estimate the integral even if comm_sz doesn t evenly divide n (You can still assume n>= comm_sz) Solution : CS351/mpi/trapezoid2_mpi.c / File: mpi_trap2_2.c in Melchior CS351/mpi Purpose: Implement parallel trapezoidal rule allowing user input of data. Use MPI_Bcast to broadcast user input to all processes.

10 Input: a, b, n Output: Estimate of the area between x = a, x = b, x-axis, and graph of f(x) using the trapezoidal rule and n trapezoids. Compile: mpicc -g -Wall -o mpi_trap mpi_trap.c Run: mpiexec -n <number of processes>./mpi_trap Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Each process calculates the local n, local a, local b (n evenly divisible by comm_size is not assumed) int quotient = n / p int remainder = n %p if(my_rank< remainder) local_n = quotient+ 1; locl_a = a + my_ranklocal_nh local'_b = local_a + local_nh else local_n = quotient local_a = a + my_ranklocal_nh + remainderh; local_b = local_a + local_nh 2. Each process estimates the area of f(x) over its subinterval using the trapezoidal rule. 3a. Each process!= 0 sends its area to 0. 3b. Process 0 sums the calculations received from the individual processes and prints the result. Note: f(x) is hardwired. n can be evenly divisble by p or not. / #include <stdio.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area / double f(double x); / function we're integrating / int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double my_area; / Integral over my interval / double total; / Total area / int source; / Process sending area / int dest = 0; / All messages go to 0 / int tag = 0; MPI_Status status; / Let the system do what it needs to start up MPI / MPI_Init(&argc, &argv); / Get my process rank / MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); / Find out how many processes are being used / MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); h = (b-a)/n; / h is the same for all processes / int quotient = n / p ; int remainder = n %p ;

11 if(my_rank< remainder) //assign extra remainder trapezoids to first remainder no. of processes local_n = quotient+ 1; local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; else local_n = quotient; local_a = a + my_ranklocal_nh + remainderh; local_b = local_a + local_nh; my_area = Trap(local_a, local_b, local_n, h); / Add up the areas calculated by each process / total = my_area; for (source = 1; source < p; source++) MPI_Recv(&my_area, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &status); total = total + my_area; else MPI_Send(&my_area, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %.15f\n", a, b, total); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p Note: _p for pointer / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); MPI_Bcast(a_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(b_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(n_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double my_area; / Store my result in my_area / double x; int i;

12 my_area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; my_area = my_area + f(x); my_area = my_areah; return my_area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; return_val = xx + 1.0; return return_val; / f / /output [rdissanayaka@hpc0 143 ~/CS351/mpi]$ mpicc -o mpi_trap2_2 mpi_trap2_2.c [rdissanayaka@hpc0 144 ~/CS351/mpi]$ mpirun -np 4 mpi_trap2_2 mv: cannot stat â/net/people/faculty/cs/rdissanayaka/.kde-el7/share/config//profilerc.newâ: No such file or directory mv: cannot stat â/usr/people/faculty/cs/rdissanayaka/.local/share/applications/mimeapps.list.newâ: No such file or directory Enter a, b, and n With n = 1029 trapezoids, our estimate of the area from to = [rdissanayaka@hpc0 145 ~/CS351/mpi]$ /

Collective Communications I

Collective Communications I Ned Nedialkov McMaster University Canada CS/SE 4F03 January 2016 Outline Introduction Broadcast Reduce c 2013 16 Ned Nedialkov 2/14 Introduction A collective communication involves