Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree.

Size: px
Start display at page:

Download "Assignment 3 Key CSCI 351 PARALLEL PROGRAMMING FALL, Q1. Calculate log n, log n and log n for the following: Answer: Q2. mpi_trap_tree."

Transcription

1 CSCI 351 PARALLEL PROGRAMMING FALL, 2015 Assignment 3 Key Q1. Calculate log n, log n and log n for the following: a. n=3 b. n=13 c. n=32 d. n=123 e. n=321 Answer: Q2. mpi_trap_tree.c The mpi_trap_time.c program you developed in the class lets each process send its sub-area calculated to P0 and P0 calculate the global area. Modify this proram (save as mpi_trap_tree.c) to have tree structured global sum of sub areas instead. ANSWER1: Use Assignment algorithm (with a small correction to else part) / File: mpi_trap_tree2.c Purpose: Implement parallel trapezoidal rule and determine its run-time vs. serial trap rule Input: a, b, n Output: Estimate of the area from between x = a, x = b, x-axis, and the graph of f(x) using the trapezoidal rule and n trapezoids. Use a tree-structured global sum of the process areas. Use XOR function for tree structured communication. Also output the elapsed time to run the parallel version. Compile: mpicc -g -Wall -o mpi_trap_tree mpi_trap_tree.c -lm Run: mpiexec -n <number of processes>./mpi_trap_tree

2 Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Barrier. 2. Start timer on each process. 3. Each process calculates "its" subinterval of integration. 4. Each process estimates the area of f(x) over its interval using the trapezoidal rule. 5. Tree structured global sum of process estimates to process Stop timer on each process. 7. Find max time, store on process Time serial trap on process Print speedup, efficiency. Note: f(x) is hardwired. / #include <stdio.h> #include <math.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area/ double f(double x); / function we're integrating / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm); double Get_max_time(double par_elapsed, int my_rank, int p); int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double area; / My subarea / double total = 0; / Total area / double start, finish, par_elapsed; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); MPI_Barrier(MPI_COMM_WORLD); start = MPI_Wtime(); h = (b-a)/n; / h is the same for all processes / local_n = n/p; / So is the number of trapezoids / local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; area = Trap(local_a, local_b, local_n, h);

3 total = Global_sum(area, my_rank, p, MPI_COMM_WORLD); finish = MPI_Wtime(); par_elapsed = finish - start; par_elapsed = Get_max_time(par_elapsed, my_rank, p); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %23.16e\n", a, b, total); printf("parallel elapsed time = %e seconds\n", par_elapsed); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) int q; MPI_Status status; printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); for (q = 1; q < p; q++) MPI_Send(a_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(b_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(n_p, 1, MPI_INT, q, 0, MPI_COMM_WORLD); else MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(n_p, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double area; / Store result in area /

4 double x; int i; area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; area = area + f(x); area = areah; return area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; // return_val = xx; return_val = exp(sin(x)); return return_val; / f / / Function: Get_max_time Purpose: Find the maximum elapsed time across the processes In args: my_rank: calling process' rank p: total number of processes par_elapsed: elapsed time on calling process Ret val: Process 0: max of all processes times Other procs: input value for par_elapsed / double Get_max_time(double par_elapsed, int my_rank, int p) int source; MPI_Status status; double temp; for (source = 1; source < p; source++) MPI_Recv(&temp, 1, MPI_DOUBLE, source, 0, MPI_COMM_WORLD, &status); if (temp > par_elapsed) par_elapsed = temp; else MPI_Send(&par_elapsed, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); return par_elapsed; / Get_max_time / / Function: Global_sum Purpose: Compute global sum of values distributed across processes Input args: my_contrib: the calling process' contribution to the global sum my_rank: the calling process' rank in the communicator p: the number of processes in the communicator

5 comm: the communicator used for sends and receives Return val: the sum of the my_contrib values contributed by each process. Algorithm: Use tree structured communication, pairing processes to communicate. Notes: 1. The value returned by global_sum on processes other than 0 is meaningless. 2. The pairing of the processes is done using algorithm we developed in assignment1: divisor = 2; core_difference = 1; sum = my_value; while ( divisor <= number of cores ) if ( my_rank % divisor == 0 ) partner = my_rank + core_difference; receive value from partner core; sum += received value; else if( (my_rank % (divisor/2)) == 0) partner = my_rank core_difference; send my sum to partner core; divisor = 2; core_difference =2; / double Global_sum(double my_value, int my_rank, int number_of_cores, MPI_Comm comm) int divisor = 2; int core_difference = 1; double sum = my_value; int partner; double received_value; while ( divisor <= number_of_cores ) if ( my_rank % divisor == 0 ) partner = my_rank + core_difference; //receive value from partner core: MPI_Recv(&received_value, 1, MPI_DOUBLE, partner, 0, comm, MPI_STATUS_IGNORE); sum += received_value; else if( (my_rank % (divisor/2)) == 0) partner = my_rank - core_difference; //send my sum to partner core MPI_Send(&sum, 1, MPI_DOUBLE, partner, 0, comm); divisor = 2; core_difference =2; MPI_Barrier(MPI_COMM_WORLD); //make sure the tree structured communication is complete and thus global sum is calculated. return sum; / Global_sum /

6 /output 158 ~/CS351/mpi]$ mpicc -g -Wall -o mpi_trap_tree2 mpi_trap_tree2.c -lm 159 ~/CS351/mpi]$ mpiexec -n 4./mpi_trap_tree2 Enter a, b, and n With n = 1024 trapezoids, our estimate of the area from to = e+00 Parallel elapsed time = e-04 seconds [rdissanayaka@hpc0 160 ~/CS351/mpi]$ / ANSWER2: Uses bitwise XOR for tree-structured communication / File: mpi_trap_tree.c Purpose: Implement parallel trapezoidal rule and determine its run-time vs. serial trap rule Input: a, b, n Output: Estimate of the area from between x = a, x = b, x-axis, and the graph of f(x) using the trapezoidal rule and n trapezoids. Use a tree-structured global sum of the process areas. Also output the elapsed time to run the parallel version. Compile: mpicc -g -Wall -o mpi_trap_tree mpi_trap_tree.c -lm Run: mpiexec -n <number of processes>./mpi_trap_tree Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Barrier. 2. Start timer on each process. 3. Each process calculates "its" subinterval of integration. 4. Each process estimates the area of f(x) over its interval using the trapezoidal rule. 5. Tree structured global sum of process estimates to process Stop timer on each process. 7. Find max time, store on process Time serial trap on process Print speedup, efficiency. Note: f(x) is hardwired. / #include <stdio.h> #include <math.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area / double f(double x); / function we're integrating / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm); double Get_max_time(double par_elapsed, int my_rank, int p); int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double area; / My subarea / double total = 0; / Total area / double start, finish, par_elapsed; / Let the system do what it needs to start up MPI / MPI_Init(&argc, &argv); / Get my process rank / MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

7 / Find out how many processes are being used / MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); MPI_Barrier(MPI_COMM_WORLD); start = MPI_Wtime(); h = (b-a)/n; / h is the same for all processes / local_n = n/p; / So is the number of trapezoids / / Length of each process' interval of integration = local_nh. So my interval starts at: / local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; area = Trap(local_a, local_b, local_n, h); / Add up the areas calculated by each process / total = Global_sum(area, my_rank, p, MPI_COMM_WORLD); finish = MPI_Wtime(); par_elapsed = finish - start; par_elapsed = Get_max_time(par_elapsed, my_rank, p); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %23.16e\n", a, b, total); printf("parallel elapsed time = %e seconds\n", par_elapsed); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) int q; MPI_Status status; printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); for (q = 1; q < p; q++) MPI_Send(a_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(b_p, 1, MPI_DOUBLE, q, 0, MPI_COMM_WORLD); MPI_Send(n_p, 1, MPI_INT, q, 0, MPI_COMM_WORLD); else MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); MPI_Recv(n_p, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double area; / Store result in area / double x; int i; area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; area = area + f(x);

8 area = areah; return area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; // return_val = xx; return_val = exp(sin(x)); return return_val; / f / / Function: Get_max_time Purpose: Find the maximum elapsed time across the processes In args: my_rank: calling process' rank p: total number of processes par_elapsed: elapsed time on calling process Ret val: Process 0: max of all processes times Other procs: input value for par_elapsed / double Get_max_time(double par_elapsed, int my_rank, int p) int source; MPI_Status status; double temp; for (source = 1; source < p; source++) MPI_Recv(&temp, 1, MPI_DOUBLE, source, 0, MPI_COMM_WORLD, &status); if (temp > par_elapsed) par_elapsed = temp; else MPI_Send(&par_elapsed, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); return par_elapsed; / Get_max_time / / Function: Global_sum Purpose: Compute global sum of values distributed across processes Input args: my_contrib: the calling process' contribution to the global sum my_rank: the calling process' rank in the communicator p: the number of processes in the communicator comm: the communicator used for sends and receives Return val: the sum of the my_contrib values contributed by each process. Algorithm: Use tree structured communication, pairing processes to communicate. Notes: 1. The value returned by global_sum on processes other than 0 is meaningless. 2. The pairing of the processes is done using bitwise exclusive or. Here's a table showing the rule for for bitwise exclusive or X Y X^Y Here's a table showing the process pairing with 8 processes (r = my_rank, other column heads are bitmask) r x x x x x x x x x x / double Global_sum(double my_contrib, int my_rank, int p, MPI_Comm comm) double sum = my_contrib; double temp; int partner; int done = 0; unsigned bitmask = (unsigned) 1;

9 # ifdef DEBUG int my_pass = -1; partner = -1; printf("proc %d > partner = %d, bitmask = %d, pass = %d\n", my_rank, partner, bitmask, my_pass); fflush(stdout); # endif while (!done && bitmask < p) partner = my_rank ^ bitmask; # ifdef DEBUG my_pass++; printf("proc %d > partner = %d, bitmask = %d, pass = %d\n", my_rank, partner, bitmask, my_pass); fflush(stdout); # endif if (my_rank < partner) if (partner < p) MPI_Recv(&temp, 1, MPI_DOUBLE, partner, 0, comm, MPI_STATUS_IGNORE); sum += temp; bitmask <<= 1; else MPI_Send(&sum, 1, MPI_DOUBLE, partner, 0, comm); done = 1; / Valid only on 0 / return sum; / Global_sum / Q3. [Pacheco Q3.2] mpi_trap2.c Modify mpi_trap_time.c (save as mpi_trap2.c) so that it will correctly estimate the integral even if comm_sz doesn t evenly divide n (You can still assume n>= comm_sz) Solution : CS351/mpi/trapezoid2_mpi.c / File: mpi_trap2_2.c in Melchior CS351/mpi Purpose: Implement parallel trapezoidal rule allowing user input of data. Use MPI_Bcast to broadcast user input to all processes.

10 Input: a, b, n Output: Estimate of the area between x = a, x = b, x-axis, and graph of f(x) using the trapezoidal rule and n trapezoids. Compile: mpicc -g -Wall -o mpi_trap mpi_trap.c Run: mpiexec -n <number of processes>./mpi_trap Algorithm: 0. Process 0 reads in a, b, and n, and distributes them among the processes. 1. Each process calculates the local n, local a, local b (n evenly divisible by comm_size is not assumed) int quotient = n / p int remainder = n %p if(my_rank< remainder) local_n = quotient+ 1; locl_a = a + my_ranklocal_nh local'_b = local_a + local_nh else local_n = quotient local_a = a + my_ranklocal_nh + remainderh; local_b = local_a + local_nh 2. Each process estimates the area of f(x) over its subinterval using the trapezoidal rule. 3a. Each process!= 0 sends its area to 0. 3b. Process 0 sums the calculations received from the individual processes and prints the result. Note: f(x) is hardwired. n can be evenly divisble by p or not. / #include <stdio.h> / We'll be using MPI routines, definitions, etc. / #include <mpi.h> void Get_data(int p, int my_rank, double a_p, double b_p, int n_p); double Trap(double local_a, double local_b, int local_n, double h); / Calculate local area / double f(double x); / function we're integrating / int main(int argc, char argv) int my_rank; / My process rank / int p; / The number of processes / double a; / Left endpoint / double b; / Right endpoint / int n; / Number of trapezoids / double h; / Trapezoid base length / double local_a; / Left endpoint my process / double local_b; / Right endpoint my process / int local_n; / Number of trapezoids for / / my calculation / double my_area; / Integral over my interval / double total; / Total area / int source; / Process sending area / int dest = 0; / All messages go to 0 / int tag = 0; MPI_Status status; / Let the system do what it needs to start up MPI / MPI_Init(&argc, &argv); / Get my process rank / MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); / Find out how many processes are being used / MPI_Comm_size(MPI_COMM_WORLD, &p); Get_data(p, my_rank, &a, &b, &n); h = (b-a)/n; / h is the same for all processes / int quotient = n / p ; int remainder = n %p ;

11 if(my_rank< remainder) //assign extra remainder trapezoids to first remainder no. of processes local_n = quotient+ 1; local_a = a + my_ranklocal_nh; local_b = local_a + local_nh; else local_n = quotient; local_a = a + my_ranklocal_nh + remainderh; local_b = local_a + local_nh; my_area = Trap(local_a, local_b, local_n, h); / Add up the areas calculated by each process / total = my_area; for (source = 1; source < p; source++) MPI_Recv(&my_area, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &status); total = total + my_area; else MPI_Send(&my_area, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); / Print the result / printf("with n = %d trapezoids, our estimate\n", n); printf("of the area from %f to %f = %.15f\n", a, b, total); / Shut down MPI / MPI_Finalize(); return 0; / main / / Function: Get_data Purpose: Read in the data on process 0 and send to other processes Input args: p, my_rank Output args: a_p, b_p, n_p Note: _p for pointer / void Get_data(int p, int my_rank, double a_p, double b_p, int n_p) printf("enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); MPI_Bcast(a_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(b_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(n_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); / Get_data / / Function: Trap Purpose: Estimate a definite area using the trapezoidal rule Input args: local_a (my left endpoint) local_b (my right endpoint) local_n (my number of trapezoids) h (stepsize = length of base of trapezoids) Return val: Trapezoidal rule estimate of area from local_a to local_b / double Trap( double local_a / in /, double local_b / in /, int local_n / in /, double h / in /) double my_area; / Store my result in my_area / double x; int i;

12 my_area = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) x = local_a + ih; my_area = my_area + f(x); my_area = my_areah; return my_area; / Trap / / Function: f Purpose: Compute value of function to be integrated Input args: x / double f(double x) double return_val; return_val = xx + 1.0; return return_val; / f / /output [rdissanayaka@hpc0 143 ~/CS351/mpi]$ mpicc -o mpi_trap2_2 mpi_trap2_2.c [rdissanayaka@hpc0 144 ~/CS351/mpi]$ mpirun -np 4 mpi_trap2_2 mv: cannot stat â/net/people/faculty/cs/rdissanayaka/.kde-el7/share/config//profilerc.newâ: No such file or directory mv: cannot stat â/usr/people/faculty/cs/rdissanayaka/.local/share/applications/mimeapps.list.newâ: No such file or directory Enter a, b, and n With n = 1029 trapezoids, our estimate of the area from to = [rdissanayaka@hpc0 145 ~/CS351/mpi]$ /

Collective Communications I

Collective Communications I Collective Communications I Ned Nedialkov McMaster University Canada CS/SE 4F03 January 2016 Outline Introduction Broadcast Reduce c 2013 16 Ned Nedialkov 2/14 Introduction A collective communication involves

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Part 1 Bryan Mills, PhD Spring 2017 A distributed memory system A shared memory system Identifying MPI processes n Common pracace to idenafy processes by nonnegaave

More information

Introduction to MPI: Part II

Introduction to MPI: Part II Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs

More information

Outline. Introduction to HPC computing. OpenMP MPI. Introduction. Understanding communications. Collective communications. Communicators.

Outline. Introduction to HPC computing. OpenMP MPI. Introduction. Understanding communications. Collective communications. Communicators. Lecture 8 MPI Outline Introduction to HPC computing OpenMP MPI Introduction Understanding communications Collective communications Communicators Topologies Grouping Data for Communication Input / output

More information

CSE 160 Lecture 15. Message Passing

CSE 160 Lecture 15. Message Passing CSE 160 Lecture 15 Message Passing Announcements 2013 Scott B. Baden / CSE 160 / Fall 2013 2 Message passing Today s lecture The Message Passing Interface - MPI A first MPI Application The Trapezoidal

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport CPS 303 High Perormance Computing Wensheng Shen Department o Computational Science SUNY Brockport Chapter 4 An application: numerical integration Use MPI to solve a problem o numerical integration with

More information

Parallel Computing Notes Topic: Notes on Hybrid MPI + OpenMP Programming

Parallel Computing Notes Topic: Notes on Hybrid MPI + OpenMP Programming Parallel Computing Notes Topic: Notes on Hybrid MPI + OpenMP Programming Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University (SDSU) Last Update:

More information

CS 426. Building and Running a Parallel Application

CS 426. Building and Running a Parallel Application CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

Introduction to MPI: Lecture 1

Introduction to MPI: Lecture 1 Introduction to MPI: Lecture 1 Jun Ni, Ph.D. M.E. Associate Professor Department of Radiology Carver College of Medicine Information Technology Services The University of Iowa Learning MPI by Examples:

More information

A User's Guide to MPI. Peter S. Pacheco. Department of Mathematics. University of San Francisco. San Francisco, CA

A User's Guide to MPI. Peter S. Pacheco. Department of Mathematics. University of San Francisco. San Francisco, CA A User's Guide to MPI Peter S. Pacheco Department of Mathematics University of San Francisco San Francisco, CA 94117 peter@usfca.edu March 26, 1995 Contents 1 Introduction 3 2 Greetings! 4 2.1 General

More information

The Message Passing Model

The Message Passing Model Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform

More information

Introduction to Parallel Programming Message Passing Interface Practical Session Part I

Introduction to Parallel Programming Message Passing Interface Practical Session Part I Introduction to Parallel Programming Message Passing Interface Practical Session Part I T. Streit, H.-J. Pflug streit@rz.rwth-aachen.de October 28, 2008 1 1. Examples We provide codes of the theoretical

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

Distributed Memory Programming with Message-Passing

Distributed Memory Programming with Message-Passing Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and

More information

15-440: Recitation 8

15-440: Recitation 8 15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs

More information

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003 Introduction to MPI HY555 Parallel Systems and Grids Fall 2003 Outline MPI layout Sending and receiving messages Collective communication Datatypes An example Compiling and running Typical layout of an

More information

Message-Passing Computing

Message-Passing Computing Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate

More information

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed

More information

Introduc)on to OpenMP

Introduc)on to OpenMP Introduc)on to OpenMP Chapter 5.1-5. Bryan Mills, PhD Spring 2017 OpenMP An API for shared-memory parallel programming. MP = multiprocessing Designed for systems in which each thread or process can potentially

More information

CPS 303 High Performance Computing

CPS 303 High Performance Computing CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 5: Collective communication The numerical integration problem in Chapter 4 is not very efficient.

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy: COMP528 MPI Programming, I www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Flynn s taxonomy: Parallel hardware SISD (Single

More information

Prof. Thomas Sterling Department of Computer Science Louisiana State University February 10, 2011

Prof. Thomas Sterling Department of Computer Science Louisiana State University February 10, 2011 Prof. Thomas Sterling Department of Computer Science Louisiana State University February 10, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS MESSAGE PASSING INTERFACE MPI (PART B) Topics MPI

More information

Report S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( )

Report S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( ) Report S1 C Kengo Nakajima Information Technology Center Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Problem S1-3 Report S1 (2/2) Develop parallel program

More information

MPI Message Passing Interface

MPI Message Passing Interface MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information

More information

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,

More information

int sum;... sum = sum + c?

int sum;... sum = sum + c? int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalabissu-tokyoacjp/~reiji/pna16/ [ 5 ] MPI: Message Passing Interface Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1 Architecture

More information

Collective Communication in MPI and Advanced Features

Collective Communication in MPI and Advanced Features Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective

More information

Lecture 13. Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators

Lecture 13. Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators Lecture 13 Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators Announcements Extra lecture Friday 4p to 5.20p, room 2154 A4 posted u Cannon s matrix multiplication

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

Message Passing Interface

Message Passing Interface Message Passing Interface by Kuan Lu 03.07.2012 Scientific researcher at Georg-August-Universität Göttingen and Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg, 37077 Göttingen,

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel

More information

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut mith College CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut dthiebaut@smith.edu Introduction to MPI D. Thiebaut Inspiration Reference MPI by Blaise Barney, Lawrence Livermore National

More information

Assignment 3 MPI Tutorial Compiling and Executing MPI programs

Assignment 3 MPI Tutorial Compiling and Executing MPI programs Assignment 3 MPI Tutorial Compiling and Executing MPI programs B. Wilkinson: Modification date: February 11, 2016. This assignment is a tutorial to learn how to execute MPI programs and explore their characteristics.

More information

Parallel Programming. Using MPI (Message Passing Interface)

Parallel Programming. Using MPI (Message Passing Interface) Parallel Programming Using MPI (Message Passing Interface) Message Passing Model Simple implementation of the task/channel model Task Process Channel Message Suitable for a multicomputer Number of processes

More information

Simple examples how to run MPI program via PBS on Taurus HPC

Simple examples how to run MPI program via PBS on Taurus HPC Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems. The goals of this lesson are: understanding the MPI programming model managing the MPI environment handling errors point-to-point communication 1. The MPI Environment Lesson 1 MPI (Message Passing Interface)

More information

Introduction in Parallel Programming - MPI Part I

Introduction in Parallel Programming - MPI Part I Introduction in Parallel Programming - MPI Part I Instructor: Michela Taufer WS2004/2005 Source of these Slides Books: Parallel Programming with MPI by Peter Pacheco (Paperback) Parallel Programming in

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a

More information

Lecture 6: Message Passing Interface

Lecture 6: Message Passing Interface Lecture 6: Message Passing Interface Introduction The basics of MPI Some simple problems More advanced functions of MPI A few more examples CA463D Lecture Notes (Martin Crane 2013) 50 When is Parallel

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives n Understanding how MPI programs execute n Familiarity with fundamental MPI functions

More information

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 4: MPI, CG M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc. Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright

More information

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/

Introduction to MPI. Ricardo Fonseca. https://sites.google.com/view/rafonseca2017/ Introduction to MPI Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Distributed Memory Programming (MPI) Message Passing Model Initializing and terminating programs Point to point

More information

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( ) Report S1 C Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) Problem S1-1 Report S1 (1/2) Read local files /a1.0~a1.3, /a2.0~a2.3. Develop

More information

Report S1 C. Kengo Nakajima

Report S1 C. Kengo Nakajima Report S1 C Kengo Nakajima Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Hybrid Distributed Parallel Computing (3747-111) Problem S1-1 Report S1 Read local

More information

Tutorial: parallel coding MPI

Tutorial: parallel coding MPI Tutorial: parallel coding MPI Pascal Viot September 12, 2018 Pascal Viot Tutorial: parallel coding MPI September 12, 2018 1 / 24 Generalities The individual power of a processor is still growing, but at

More information

HIGH PERFORMANCE SCIENTIFIC COMPUTING

HIGH PERFORMANCE SCIENTIFIC COMPUTING ( HPSC 5576 ELIZABETH JESSUP ) HIGH PERFORMANCE SCIENTIFIC COMPUTING :: Homework / 8 :: Student / Florian Rappl 1 problem / 10 points Problem 1 Task: Write a short program demonstrating the use of MPE's

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Parallel Programming, MPI Lecture 2

Parallel Programming, MPI Lecture 2 Parallel Programming, MPI Lecture 2 Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Introduction and Review The Von Neumann Computer Kinds of Parallel

More information

COMP/CS 605: Introduction to Parallel Computing Topic : Distributed Memory Programming: Message Passing Interface

COMP/CS 605: Introduction to Parallel Computing Topic : Distributed Memory Programming: Message Passing Interface COMP/CS 605: Introduction to Parallel Computing Topic : Distributed Memory Programming: Message Passing Interface Mary Thomas Department of Computer Science Computational Science Research Center (CSRC)

More information

Distributed Systems + Middleware Advanced Message Passing with MPI

Distributed Systems + Middleware Advanced Message Passing with MPI Distributed Systems + Middleware Advanced Message Passing with MPI Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

MPI: The Message-Passing Interface. Most of this discussion is from [1] and [2].

MPI: The Message-Passing Interface. Most of this discussion is from [1] and [2]. MPI: The Message-Passing Interface Most of this discussion is from [1] and [2]. What Is MPI? The Message-Passing Interface (MPI) is a standard for expressing distributed parallelism via message passing.

More information

Programming with MPI. Pedro Velho

Programming with MPI. Pedro Velho Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives Understanding how MPI programs execute Familiarity with fundamental MPI functions

More information

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Course Number EE 8218 011 Section Number 01 Course Title Parallel Computing

More information

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:... ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure

More information

Message Passing Interface - MPI

Message Passing Interface - MPI Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 24, 2011 Many slides adapted from lectures by

More information

Practical Course Scientific Computing and Visualization

Practical Course Scientific Computing and Visualization July 5, 2006 Page 1 of 21 1. Parallelization Architecture our target architecture: MIMD distributed address space machines program1 data1 program2 data2 program program3 data data3.. program(data) program1(data1)

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

Parallel Computing Paradigms

Parallel Computing Paradigms Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is

More information

Message Passing Interface - MPI

Message Passing Interface - MPI Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 31, 2016 Many slides adapted from lectures by Bill

More information

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho Programming with MPI on GridRS Dr. Márcio Castro e Dr. Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage -

More information

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

MPI introduction - exercises -

MPI introduction - exercises - MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

Lecture 6: Parallel Matrix Algorithms (part 3)

Lecture 6: Parallel Matrix Algorithms (part 3) Lecture 6: Parallel Matrix Algorithms (part 3) 1 A Simple Parallel Dense Matrix-Matrix Multiplication Let A = [a ij ] n n and B = [b ij ] n n be n n matrices. Compute C = AB Computational complexity of

More information

Scientific Computing

Scientific Computing Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 21 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling

More information

Lecture 7: Distributed memory

Lecture 7: Distributed memory Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing

More information

Parallel Applications Design with MPI

Parallel Applications Design with MPI Parallel Applications Design with MPI Killer applications Science Research Challanges Challenging use of computer power and storage Who might be interested in those applications? Simulation and analysis

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

Shared-memory Programming

Shared-memory Programming Shared-memory Programming Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Programming a Supercomputer Languages: Fortran, C/C++, Python. Nodes Cores Accelerators Supercomputer

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,

More information

MPI Program Structure

MPI Program Structure MPI Program Structure Handles MPI communicator MPI_COMM_WORLD Header files MPI function format Initializing MPI Communicator size Process rank Exiting MPI 1 Handles MPI controls its own internal data structures

More information

O.I. Streltsova, D.V. Podgainy, M.V. Bashashin, M.I.Zuev

O.I. Streltsova, D.V. Podgainy, M.V. Bashashin, M.I.Zuev High Performance Computing Technologies Lecture, Practical training 9 Parallel Computing with MPI: parallel algorithm for linear algebra https://indico-hlit.jinr.ru/event/120/ O.I. Streltsova, D.V. Podgainy,

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Short Course on HPC 15th February 2019 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, Indian Institute of Science When Parallel Computing Helps? Want to speed up your calculation

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

Lecture 3 Message-Passing Programming Using MPI (Part 1)

Lecture 3 Message-Passing Programming Using MPI (Part 1) Lecture 3 Message-Passing Programming Using MPI (Part 1) 1 What is MPI Message-Passing Interface (MPI) Message-Passing is a communication model used on distributed-memory architecture MPI is not a programming

More information

Introduction to MPI-2 (Message-Passing Interface)

Introduction to MPI-2 (Message-Passing Interface) Introduction to MPI-2 (Message-Passing Interface) What are the major new features in MPI-2? Parallel I/O Remote Memory Operations Dynamic Process Management Support for Multithreading Parallel I/O Includes

More information

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines Network card Network card 1 COSC 6374 Parallel Computation Message Passing Interface (MPI ) I Introduction Edgar Gabriel Fall 015 Distributed memory machines Each compute node represents an independent

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Parallel Computing with MPI Building/Debugging MPI Executables MPI Send/Receive Collective Communications with MPI April 10, 2012 Dan Negrut,

More information

Tutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros

Tutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros Tutorial 2: MPI CS486 - Principles of Distributed Computing Papageorgiou Spyros What is MPI? An Interface Specification MPI = Message Passing Interface Provides a standard -> various implementations Offers

More information

Parallel Programming Assignment 3 Compiling and running MPI programs

Parallel Programming Assignment 3 Compiling and running MPI programs Parallel Programming Assignment 3 Compiling and running MPI programs Author: Clayton S. Ferner and B. Wilkinson Modification date: October 11a, 2013 This assignment uses the UNC-Wilmington cluster babbage.cis.uncw.edu.

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

HPCSE - I. «MPI Programming Model - Part II» Panos Hadjidoukas

HPCSE - I. «MPI Programming Model - Part II» Panos Hadjidoukas HPCSE - I «MPI Programming Model - Part II» Panos Hadjidoukas 1 Schedule and Goals 24.11.2017: MPI - part 2 asynchronous communication how MPI works study and discuss more examples 2 Outline Measuring

More information

Anomalies. The following issues might make the performance of a parallel program look different than it its:

Anomalies. The following issues might make the performance of a parallel program look different than it its: Anomalies The following issues might make the performance of a parallel program look different than it its: When running a program in parallel on many processors, each processor has its own cache, so the

More information

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS.

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS. Give your answers in the space provided with each question. Answers written elsewhere will not be graded. Q1). [4 points] Consider a memory system with level 1 cache of 64 KB and DRAM of 1GB with processor

More information

DISTRIBUTED MEMORY PROGRAMMING WITH MPI. Carlos Jaime Barrios Hernández, PhD.

DISTRIBUTED MEMORY PROGRAMMING WITH MPI. Carlos Jaime Barrios Hernández, PhD. DISTRIBUTED MEMORY PROGRAMMING WITH MPI Carlos Jaime Barrios Hernández, PhD. Remember Special Features of Architecture Remember concurrency : it exploits better the resources (shared) within a computer.

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS 179: GPU Programming. Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each

More information

Part One: The Files. C MPI Slurm Tutorial - TSP. Introduction. TSP Problem and Tutorial s Purpose. tsp.tar. The C files, summary

Part One: The Files. C MPI Slurm Tutorial - TSP. Introduction. TSP Problem and Tutorial s Purpose. tsp.tar. The C files, summary C MPI Slurm Tutorial - TSP Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program Knowledge of C is assumed Code is also given for the

More information