Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree.
|
|
- Josephine Collins
- 6 years ago
- Views:
Transcription
1 CSCI 351 PARALLEL PROGRAMMING FALL, 2015 Assignment 3 Key Q1. Calculate log n, log n and log n for the following: a. n=3 b. n=13 c. n=32 d. n=123 e. n=321 Answer: Q2. mpi_trap_tree.c The mpi_trap_time.c program you developed in the class lets each process send its sub-area calculated to P0 and P0 calculate the global area. Modify this proram (save as mpi_trap_tree.c) to have tree structured global sum of sub areas instead. ANSWER1: Use Assignment algorithm (with a small correction to else part) / File: mpi_trap_tree2.c Purpose: Implement parallel trapezoidal rule and determine its run-time vs. serial trap rule Input: a, b, n Output: Estimate of the area from between x = a, x = b, x-axis, and the graph of f(x) using the trapezoidal rule and n trapezoids. Use a tree-structured global sum of the process areas. Use XOR function for tree structured communication. Also output the elapsed time to run the parallel version. Compile: mpicc -g -Wall -o mpi_trap_tree mpi_trap_tree.c -lm Run: mpiexec -n <number of processes>./mpi_trap_tree
2 Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Barrier. 2. Start timer on each process. 3. Each process calculates "its" subinterval of integration. 4. Each process estimates the area of f(x) over its interval using the trapezoidal rule. 5. Tree structured global sum of process estimates to process Stop timer on each process. 7. Find max time, store on process Time serial trap on process Print speedup, efficiency. Note: f(x) is hardwired. / #include <stdio.h> #include <math.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area/ double f(double x); / function we're integrating / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm); double Get_max_time(double par_elapsed, int my_rank, int p); int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double area; / My subarea / double total = 0; / Total area / double start, finish, par_elapsed; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); MPI_Barrier(MPI_COMM_WORLD); start = MPI_Wtime(); h = (b-a)/n; / h is the same for all processes / local_n = n/p; / So is the number of trapezoids / local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; area = Trap(local_a, local_b, local_n, h);
3 total = Global_sum(area, my_rank, p, MPI_COMM_WORLD); finish = MPI_Wtime(); par_elapsed = finish - start; par_elapsed = Get_max_time(par_elapsed, my_rank, p); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %23.16e\n", a, b, total); printf("parallel elapsed time = %e seconds\n", par_elapsed); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) int q; MPI_Status status; printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); for (q = 1; q < p; q++) MPI_Send(a_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(b_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(n_p, 1, MPI_INT, q, 0, MPI_COMM_WORLD); else MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(n_p, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double area; / Store result in area /
4 double x; int i; area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; area = area + f(x); area = areah; return area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; // return_val = xx; return_val = exp(sin(x)); return return_val; / f / / Function: Get_max_time Purpose: Find the maximum elapsed time across the processes In args: my_rank: calling process' rank p: total number of processes par_elapsed: elapsed time on calling process Ret val: Process 0: max of all processes times Other procs: input value for par_elapsed / double Get_max_time(double par_elapsed, int my_rank, int p) int source; MPI_Status status; double temp; for (source = 1; source < p; source++) MPI_Recv(&temp, 1, MPI_DOUBLE, source, 0, MPI_COMM_WORLD, &status); if (temp > par_elapsed) par_elapsed = temp; else MPI_Send(&par_elapsed, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); return par_elapsed; / Get_max_time / / Function: Global_sum Purpose: Compute global sum of values distributed across processes Input args: my_contrib: the calling process' contribution to the global sum my_rank: the calling process' rank in the communicator p: the number of processes in the communicator
5 comm: the communicator used for sends and receives Return val: the sum of the my_contrib values contributed by each process. Algorithm: Use tree structured communication, pairing processes to communicate. Notes: 1. The value returned by global_sum on processes other than 0 is meaningless. 2. The pairing of the processes is done using algorithm we developed in assignment1: divisor = 2; core_difference = 1; sum = my_value; while ( divisor <= number of cores ) if ( my_rank % divisor == 0 ) partner = my_rank + core_difference; receive value from partner core; sum += received value; else if( (my_rank % (divisor/2)) == 0) partner = my_rank core_difference; send my sum to partner core; divisor = 2; core_difference =2; / double Global_sum(double my_value, int my_rank, int number_of_cores, MPI_Comm comm) int divisor = 2; int core_difference = 1; double sum = my_value; int partner; double received_value; while ( divisor <= number_of_cores ) if ( my_rank % divisor == 0 ) partner = my_rank + core_difference; //receive value from partner core: MPI_Recv(&received_value, 1, MPI_DOUBLE, partner, 0, comm, MPI_STATUS_IGNORE); sum += received_value; else if( (my_rank % (divisor/2)) == 0) partner = my_rank - core_difference; //send my sum to partner core MPI_Send(&sum, 1, MPI_DOUBLE, partner, 0, comm); divisor = 2; core_difference =2; MPI_Barrier(MPI_COMM_WORLD); //make sure the tree structured communication is complete and thus global sum is calculated. return sum; / Global_sum /
6 /output 158 ~/CS351/mpi]$ mpicc -g -Wall -o mpi_trap_tree2 mpi_trap_tree2.c -lm 159 ~/CS351/mpi]$ mpiexec -n 4./mpi_trap_tree2 Enter a, b, and n With n = 1024 trapezoids, our estimate of the area from to = e+00 Parallel elapsed time = e-04 seconds [rdissanayaka@hpc0 160 ~/CS351/mpi]$ / ANSWER2: Uses bitwise XOR for tree-structured communication / File: mpi_trap_tree.c Purpose: Implement parallel trapezoidal rule and determine its run-time vs. serial trap rule Input: a, b, n Output: Estimate of the area from between x = a, x = b, x-axis, and the graph of f(x) using the trapezoidal rule and n trapezoids. Use a tree-structured global sum of the process areas. Also output the elapsed time to run the parallel version. Compile: mpicc -g -Wall -o mpi_trap_tree mpi_trap_tree.c -lm Run: mpiexec -n <number of processes>./mpi_trap_tree Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Barrier. 2. Start timer on each process. 3. Each process calculates "its" subinterval of integration. 4. Each process estimates the area of f(x) over its interval using the trapezoidal rule. 5. Tree structured global sum of process estimates to process Stop timer on each process. 7. Find max time, store on process Time serial trap on process Print speedup, efficiency. Note: f(x) is hardwired. / #include <stdio.h> #include <math.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area / double f(double x); / function we're integrating / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm); double Get_max_time(double par_elapsed, int my_rank, int p); int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double area; / My subarea / double total = 0; / Total area / double start, finish, par_elapsed; / Let the system do what it needs to start up MPI / MPI_Init(&argc, &argv); / Get my process rank / MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
7 / Find out how many processes are being used / MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); MPI_Barrier(MPI_COMM_WORLD); start = MPI_Wtime(); h = (b-a)/n; / h is the same for all processes / local_n = n/p; / So is the number of trapezoids / / Length of each process' interval of integration = local_nh. So my interval starts at: / local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; area = Trap(local_a, local_b, local_n, h); / Add up the areas calculated by each process / total = Global_sum(area, my_rank, p, MPI_COMM_WORLD); finish = MPI_Wtime(); par_elapsed = finish - start; par_elapsed = Get_max_time(par_elapsed, my_rank, p); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %23.16e\n", a, b, total); printf("parallel elapsed time = %e seconds\n", par_elapsed); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) int q; MPI_Status status; printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); for (q = 1; q < p; q++) MPI_Send(a_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(b_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(n_p, 1, MPI_INT, q, 0, MPI_COMM_WORLD); else MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(n_p, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double area; / Store result in area / double x; int i; area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; area = area + f(x);
8 area = areah; return area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; // return_val = xx; return_val = exp(sin(x)); return return_val; / f / / Function: Get_max_time Purpose: Find the maximum elapsed time across the processes In args: my_rank: calling process' rank p: total number of processes par_elapsed: elapsed time on calling process Ret val: Process 0: max of all processes times Other procs: input value for par_elapsed / double Get_max_time(double par_elapsed, int my_rank, int p) int source; MPI_Status status; double temp; for (source = 1; source < p; source++) MPI_Recv(&temp, 1, MPI_DOUBLE, source, 0, MPI_COMM_WORLD, &status); if (temp > par_elapsed) par_elapsed = temp; else MPI_Send(&par_elapsed, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); return par_elapsed; / Get_max_time / / Function: Global_sum Purpose: Compute global sum of values distributed across processes Input args: my_contrib: the calling process' contribution to the global sum my_rank: the calling process' rank in the communicator p: the number of processes in the communicator comm: the communicator used for sends and receives Return val: the sum of the my_contrib values contributed by each process. Algorithm: Use tree structured communication, pairing processes to communicate. Notes: 1. The value returned by global_sum on processes other than 0 is meaningless. 2. The pairing of the processes is done using bitwise exclusive or. Here's a table showing the rule for for bitwise exclusive or X Y X^Y Here's a table showing the process pairing with 8 processes (r = my_rank, other column heads are bitmask) r x x x x x x x x x x / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm) double sum = my_contrib; double temp; int partner; int done = 0; unsigned bitmask = (unsigned) 1;
9 # ifdef DEBUG int my_pass = -1; partner = -1; printf("proc %d > partner = %d, bitmask = %d, pass = %d\n", my_rank, partner, bitmask, my_pass); fflush(stdout); # endif while (!done && bitmask < p) partner = my_rank ^ bitmask; # ifdef DEBUG my_pass++; printf("proc %d > partner = %d, bitmask = %d, pass = %d\n", my_rank, partner, bitmask, my_pass); fflush(stdout); # endif if (my_rank < partner) if (partner < p) MPI_Recv(&temp, 1, MPI_DOUBLE, partner, 0, comm, MPI_STATUS_IGNORE); sum += temp; bitmask <<= 1; else MPI_Send(&sum, 1, MPI_DOUBLE, partner, 0, comm); done = 1; / Valid only on 0 / return sum; / Global_sum / Q3. [Pacheco Q3.2] mpi_trap2.c Modify mpi_trap_time.c (save as mpi_trap2.c) so that it will correctly estimate the integral even if comm_sz doesn t evenly divide n (You can still assume n>= comm_sz) Solution : CS351/mpi/trapezoid2_mpi.c / File: mpi_trap2_2.c in Melchior CS351/mpi Purpose: Implement parallel trapezoidal rule allowing user input of data. Use MPI_Bcast to broadcast user input to all processes.
10 Input: a, b, n Output: Estimate of the area between x = a, x = b, x-axis, and graph of f(x) using the trapezoidal rule and n trapezoids. Compile: mpicc -g -Wall -o mpi_trap mpi_trap.c Run: mpiexec -n <number of processes>./mpi_trap Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Each process calculates the local n, local a, local b (n evenly divisible by comm_size is not assumed) int quotient = n / p int remainder = n %p if(my_rank< remainder) local_n = quotient+ 1; locl_a = a + my_ranklocal_nh local'_b = local_a + local_nh else local_n = quotient local_a = a + my_ranklocal_nh + remainderh; local_b = local_a + local_nh 2. Each process estimates the area of f(x) over its subinterval using the trapezoidal rule. 3a. Each process!= 0 sends its area to 0. 3b. Process 0 sums the calculations received from the individual processes and prints the result. Note: f(x) is hardwired. n can be evenly divisble by p or not. / #include <stdio.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area / double f(double x); / function we're integrating / int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double my_area; / Integral over my interval / double total; / Total area / int source; / Process sending area / int dest = 0; / All messages go to 0 / int tag = 0; MPI_Status status; / Let the system do what it needs to start up MPI / MPI_Init(&argc, &argv); / Get my process rank / MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); / Find out how many processes are being used / MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); h = (b-a)/n; / h is the same for all processes / int quotient = n / p ; int remainder = n %p ;
11 if(my_rank< remainder) //assign extra remainder trapezoids to first remainder no. of processes local_n = quotient+ 1; local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; else local_n = quotient; local_a = a + my_ranklocal_nh + remainderh; local_b = local_a + local_nh; my_area = Trap(local_a, local_b, local_n, h); / Add up the areas calculated by each process / total = my_area; for (source = 1; source < p; source++) MPI_Recv(&my_area, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &status); total = total + my_area; else MPI_Send(&my_area, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %.15f\n", a, b, total); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p Note: _p for pointer / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); MPI_Bcast(a_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(b_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(n_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double my_area; / Store my result in my_area / double x; int i;
12 my_area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; my_area = my_area + f(x); my_area = my_areah; return my_area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; return_val = xx + 1.0; return return_val; / f / /output [rdissanayaka@hpc0 143 ~/CS351/mpi]$ mpicc -o mpi_trap2_2 mpi_trap2_2.c [rdissanayaka@hpc0 144 ~/CS351/mpi]$ mpirun -np 4 mpi_trap2_2 mv: cannot stat â/net/people/faculty/cs/rdissanayaka/.kde-el7/share/config//profilerc.newâ: No such file or directory mv: cannot stat â/usr/people/faculty/cs/rdissanayaka/.local/share/applications/mimeapps.list.newâ: No such file or directory Enter a, b, and n With n = 1029 trapezoids, our estimate of the area from to = [rdissanayaka@hpc0 145 ~/CS351/mpi]$ /
Collective Communications I
Collective Communications I Ned Nedialkov McMaster University Canada CS/SE 4F03 January 2016 Outline Introduction Broadcast Reduce c 2013 16 Ned Nedialkov 2/14 Introduction A collective communication involves
More informationDistributed Memory Programming with MPI
Distributed Memory Programming with MPI Part 1 Bryan Mills, PhD Spring 2017 A distributed memory system A shared memory system Identifying MPI processes n Common pracace to idenafy processes by nonnegaave
More informationIntroduction to MPI: Part II
Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs
More informationOutline. Introduction to HPC computing. OpenMP MPI. Introduction. Understanding communications. Collective communications. Communicators.
Lecture 8 MPI Outline Introduction to HPC computing OpenMP MPI Introduction Understanding communications Collective communications Communicators Topologies Grouping Data for Communication Input / output
More informationCSE 160 Lecture 15. Message Passing
CSE 160 Lecture 15 Message Passing Announcements 2013 Scott B. Baden / CSE 160 / Fall 2013 2 Message passing Today s lecture The Message Passing Interface - MPI A first MPI Application The Trapezoidal
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationCPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport
CPS 303 High Perormance Computing Wensheng Shen Department o Computational Science SUNY Brockport Chapter 4 An application: numerical integration Use MPI to solve a problem o numerical integration with
More informationParallel Computing Notes Topic: Notes on Hybrid MPI + OpenMP Programming
Parallel Computing Notes Topic: Notes on Hybrid MPI + OpenMP Programming Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University (SDSU) Last Update:
More informationCS 426. Building and Running a Parallel Application
CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations
More informationMPI. (message passing, MIMD)
MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point
More informationIntroduction to MPI: Lecture 1
Introduction to MPI: Lecture 1 Jun Ni, Ph.D. M.E. Associate Professor Department of Radiology Carver College of Medicine Information Technology Services The University of Iowa Learning MPI by Examples:
More informationA User's Guide to MPI. Peter S. Pacheco. Department of Mathematics. University of San Francisco. San Francisco, CA
A User's Guide to MPI Peter S. Pacheco Department of Mathematics University of San Francisco San Francisco, CA 94117 peter@usfca.edu March 26, 1995 Contents 1 Introduction 3 2 Greetings! 4 2.1 General
More informationThe Message Passing Model
Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform
More informationIntroduction to Parallel Programming Message Passing Interface Practical Session Part I
Introduction to Parallel Programming Message Passing Interface Practical Session Part I T. Streit, H.-J. Pflug streit@rz.rwth-aachen.de October 28, 2008 1 1. Examples We provide codes of the theoretical
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationDistributed Memory Programming with Message-Passing
Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and
More information15-440: Recitation 8
15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs
More informationIntroduction to MPI. HY555 Parallel Systems and Grids Fall 2003
Introduction to MPI HY555 Parallel Systems and Grids Fall 2003 Outline MPI layout Sending and receiving messages Collective communication Datatypes An example Compiling and running Typical layout of an
More informationMessage-Passing Computing
Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate
More informationPCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.
PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed
More informationIntroduc)on to OpenMP
Introduc)on to OpenMP Chapter 5.1-5. Bryan Mills, PhD Spring 2017 OpenMP An API for shared-memory parallel programming. MP = multiprocessing Designed for systems in which each thread or process can potentially
More informationCPS 303 High Performance Computing
CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 5: Collective communication The numerical integration problem in Chapter 4 is not very efficient.
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationParallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:
COMP528 MPI Programming, I www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Flynn s taxonomy: Parallel hardware SISD (Single
More informationProf. Thomas Sterling Department of Computer Science Louisiana State University February 10, 2011
Prof. Thomas Sterling Department of Computer Science Louisiana State University February 10, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS MESSAGE PASSING INTERFACE MPI (PART B) Topics MPI
More informationReport S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( )
Report S1 C Kengo Nakajima Information Technology Center Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Problem S1-3 Report S1 (2/2) Develop parallel program
More informationMPI Message Passing Interface
MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information
More informationMPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group
MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,
More informationint sum;... sum = sum + c?
int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalabissu-tokyoacjp/~reiji/pna16/ [ 5 ] MPI: Message Passing Interface Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1 Architecture
More informationCollective Communication in MPI and Advanced Features
Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective
More informationLecture 13. Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators
Lecture 13 Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators Announcements Extra lecture Friday 4p to 5.20p, room 2154 A4 posted u Cannon s matrix multiplication
More informationMessage Passing Interface
Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented
More informationMessage Passing Interface
Message Passing Interface by Kuan Lu 03.07.2012 Scientific researcher at Georg-August-Universität Göttingen and Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg, 37077 Göttingen,
More informationCS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with
More informationIntroduction to the Message Passing Interface (MPI)
Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018
More informationFirst day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS
First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationmith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut
mith College CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut dthiebaut@smith.edu Introduction to MPI D. Thiebaut Inspiration Reference MPI by Blaise Barney, Lawrence Livermore National
More informationAssignment 3 MPI Tutorial Compiling and Executing MPI programs
Assignment 3 MPI Tutorial Compiling and Executing MPI programs B. Wilkinson: Modification date: February 11, 2016. This assignment is a tutorial to learn how to execute MPI programs and explore their characteristics.
More informationParallel Programming. Using MPI (Message Passing Interface)
Parallel Programming Using MPI (Message Passing Interface) Message Passing Model Simple implementation of the task/channel model Task Process Channel Message Suitable for a multicomputer Number of processes
More informationSimple examples how to run MPI program via PBS on Taurus HPC
Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationMessage Passing Interface. most of the slides taken from Hanjun Kim
Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message
More informationLesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.
The goals of this lesson are: understanding the MPI programming model managing the MPI environment handling errors point-to-point communication 1. The MPI Environment Lesson 1 MPI (Message Passing Interface)
More informationIntroduction in Parallel Programming - MPI Part I
Introduction in Parallel Programming - MPI Part I Instructor: Michela Taufer WS2004/2005 Source of these Slides Books: Parallel Programming with MPI by Peter Pacheco (Paperback) Parallel Programming in
More informationParallel Programming Using MPI
Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a
More informationLecture 6: Message Passing Interface
Lecture 6: Message Passing Interface Introduction The basics of MPI Some simple problems More advanced functions of MPI A few more examples CA463D Lecture Notes (Martin Crane 2013) 50 When is Parallel
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives n Understanding how MPI programs execute n Familiarity with fundamental MPI functions
More informationDepartment of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer
HPC-Lab Session 4: MPI, CG M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14
More informationIntroduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.
Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright
More informationIntroduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/
Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point
More informationReport S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )
Report S1 C Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) Problem S1-1 Report S1 (1/2) Read local files /a1.0~a1.3, /a2.0~a2.3. Develop
More informationReport S1 C. Kengo Nakajima
Report S1 C Kengo Nakajima Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Hybrid Distributed Parallel Computing (3747-111) Problem S1-1 Report S1 Read local
More informationTutorial: parallel coding MPI
Tutorial: parallel coding MPI Pascal Viot September 12, 2018 Pascal Viot Tutorial: parallel coding MPI September 12, 2018 1 / 24 Generalities The individual power of a processor is still growing, but at
More informationHIGH PERFORMANCE SCIENTIFIC COMPUTING
( HPSC 5576 ELIZABETH JESSUP ) HIGH PERFORMANCE SCIENTIFIC COMPUTING :: Homework / 8 :: Student / Florian Rappl 1 problem / 10 points Problem 1 Task: Write a short program demonstrating the use of MPE's
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions
More informationParallel Programming, MPI Lecture 2
Parallel Programming, MPI Lecture 2 Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Introduction and Review The Von Neumann Computer Kinds of Parallel
More informationCOMP/CS 605: Introduction to Parallel Computing Topic : Distributed Memory Programming: Message Passing Interface
COMP/CS 605: Introduction to Parallel Computing Topic : Distributed Memory Programming: Message Passing Interface Mary Thomas Department of Computer Science Computational Science Research Center (CSRC)
More informationDistributed Systems + Middleware Advanced Message Passing with MPI
Distributed Systems + Middleware Advanced Message Passing with MPI Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola
More informationMPI: The Message-Passing Interface. Most of this discussion is from [1] and [2].
MPI: The Message-Passing Interface Most of this discussion is from [1] and [2]. What Is MPI? The Message-Passing Interface (MPI) is a standard for expressing distributed parallelism via message passing.
More informationProgramming with MPI. Pedro Velho
Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives Understanding how MPI programs execute Familiarity with fundamental MPI functions
More informationFaculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering
Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Course Number EE 8218 011 Section Number 01 Course Title Parallel Computing
More informationITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...
ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure
More informationMessage Passing Interface - MPI
Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 24, 2011 Many slides adapted from lectures by
More informationPractical Course Scientific Computing and Visualization
July 5, 2006 Page 1 of 21 1. Parallelization Architecture our target architecture: MIMD distributed address space machines program1 data1 program2 data2 program program3 data data3.. program(data) program1(data1)
More informationChip Multiprocessors COMP Lecture 9 - OpenMP & MPI
Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather
More informationParallel Computing Paradigms
Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is
More informationMessage Passing Interface - MPI
Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 31, 2016 Many slides adapted from lectures by Bill
More informationProgramming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho
Programming with MPI on GridRS Dr. Márcio Castro e Dr. Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage -
More informationMPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationMPI introduction - exercises -
MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job
More informationHigh Performance Computing Course Notes Message Passing Programming I
High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works
More informationLecture 6: Parallel Matrix Algorithms (part 3)
Lecture 6: Parallel Matrix Algorithms (part 3) 1 A Simple Parallel Dense Matrix-Matrix Multiplication Let A = [a ij ] n n and B = [b ij ] n n be n n matrices. Compute C = AB Computational complexity of
More informationScientific Computing
Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 21 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling
More informationLecture 7: Distributed memory
Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing
More informationParallel Applications Design with MPI
Parallel Applications Design with MPI Killer applications Science Research Challanges Challenging use of computer power and storage Who might be interested in those applications? Simulation and analysis
More informationRecap of Parallelism & MPI
Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break
More informationShared-memory Programming
Shared-memory Programming Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Programming a Supercomputer Languages: Fortran, C/C++, Python. Nodes Cores Accelerators Supercomputer
More informationDistributed Memory Programming with MPI
Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,
More informationMPI Program Structure
MPI Program Structure Handles MPI communicator MPI_COMM_WORLD Header files MPI function format Initializing MPI Communicator size Process rank Exiting MPI 1 Handles MPI controls its own internal data structures
More informationO.I. Streltsova, D.V. Podgainy, M.V. Bashashin, M.I.Zuev
High Performance Computing Technologies Lecture, Practical training 9 Parallel Computing with MPI: parallel algorithm for linear algebra https://indico-hlit.jinr.ru/event/120/ O.I. Streltsova, D.V. Podgainy,
More informationParallel Programming Using MPI
Parallel Programming Using MPI Short Course on HPC 15th February 2019 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, Indian Institute of Science When Parallel Computing Helps? Want to speed up your calculation
More informationCSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )
CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of
More informationLecture 3 Message-Passing Programming Using MPI (Part 1)
Lecture 3 Message-Passing Programming Using MPI (Part 1) 1 What is MPI Message-Passing Interface (MPI) Message-Passing is a communication model used on distributed-memory architecture MPI is not a programming
More informationIntroduction to MPI-2 (Message-Passing Interface)
Introduction to MPI-2 (Message-Passing Interface) What are the major new features in MPI-2? Parallel I/O Remote Memory Operations Dynamic Process Management Support for Multithreading Parallel I/O Includes
More informationCOSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines
Network card Network card 1 COSC 6374 Parallel Computation Message Passing Interface (MPI ) I Introduction Edgar Gabriel Fall 015 Distributed memory machines Each compute node represents an independent
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Parallel Computing with MPI Building/Debugging MPI Executables MPI Send/Receive Collective Communications with MPI April 10, 2012 Dan Negrut,
More informationTutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros
Tutorial 2: MPI CS486 - Principles of Distributed Computing Papageorgiou Spyros What is MPI? An Interface Specification MPI = Message Passing Interface Provides a standard -> various implementations Offers
More informationParallel Programming Assignment 3 Compiling and running MPI programs
Parallel Programming Assignment 3 Compiling and running MPI programs Author: Clayton S. Ferner and B. Wilkinson Modification date: October 11a, 2013 This assignment uses the UNC-Wilmington cluster babbage.cis.uncw.edu.
More informationAn Introduction to MPI
An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current
More informationHPCSE - I. «MPI Programming Model - Part II» Panos Hadjidoukas
HPCSE - I «MPI Programming Model - Part II» Panos Hadjidoukas 1 Schedule and Goals 24.11.2017: MPI - part 2 asynchronous communication how MPI works study and discuss more examples 2 Outline Measuring
More informationAnomalies. The following issues might make the performance of a parallel program look different than it its:
Anomalies The following issues might make the performance of a parallel program look different than it its: When running a program in parallel on many processors, each processor has its own cache, so the
More information[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS.
Give your answers in the space provided with each question. Answers written elsewhere will not be graded. Q1). [4 points] Consider a memory system with level 1 cache of 64 KB and DRAM of 1GB with processor
More informationDISTRIBUTED MEMORY PROGRAMMING WITH MPI. Carlos Jaime Barrios Hernández, PhD.
DISTRIBUTED MEMORY PROGRAMMING WITH MPI Carlos Jaime Barrios Hernández, PhD. Remember Special Features of Architecture Remember concurrency : it exploits better the resources (shared) within a computer.
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationCS 179: GPU Programming. Lecture 14: Inter-process Communication
CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each
More informationPart One: The Files. C MPI Slurm Tutorial - TSP. Introduction. TSP Problem and Tutorial s Purpose. tsp.tar. The C files, summary
C MPI Slurm Tutorial - TSP Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program Knowledge of C is assumed Code is also given for the
More information