Practical MPI for the Geissler group

Size: px
Start display at page:

Download "Practical MPI for the Geissler group"

Transcription

1 Practical MPI for the Geissler group Anna August 12, 2011 Contents 1 Introduction What is MPI? Resources A tiny glossary MPI implementations Writing MPI code MPI program design Basic functions: overhead Basic functions: send/receive Send/receive examples Running MPI code Learning about your MPI installation Compiling in general Compiling on NERSC Running locally Submitting jobs on quaker, muesli, or lers Submitting jobs on NERSC Introduction 1.1 What is MPI? MPI stands for Message Passing Interface. It is a set of specifications for message-passing libraries, and has many implementations in many languages(including C/C++, Fortran, and Python). It s good for CPU parallel-programming tasks where your processes are running the same code mostly independently, but may need to exchange small pieces of information once in a while. Typical uses for Geissler group members might be: data analysis (one analysis per process, each with a different datafile) 1

2 replica exchange/parallel tempering simulations (one simulation per process, each with a different temperature or other set of parameters) This tutorial focuses on what I learned while writing a replica exchange simulation in C++. Tasks that are probably not well suited to MPI include anything using a shared-memory model, programs that require sharing large amounts of data with complex internal structure (e.g., whole system configurations), or programs that utilize a large number of processes for only a small fraction of the total wall-clock time (because the processes hog cluster space even when they re inactive). 1.2 Resources The group has two MPI reference books floating around: Parallel Programming with MPI by Peter S. Pacheco Parallel Programming in C with MPI and OpenMPI by Michael J. Quinn Have a glance through them to get yourself started, then start googling to answer specific debugging questions. A few links I found particularly useful are for runtime issues 1.3 A tiny glossary core aka processor, unit of computing hardware that executes MPI code node a physical computer, like your desktop; modern ones have several cores process what s running on a core, executing a complete copy of your code job aka session, the collection of all N processes that you run together as a single command-line operation rank the ID number of a process (between 0 and N 1) message a packet of information passed between two or more processes within the same job communicator what passes messages between processes; the only one you probably need to know about is MPI_COMM_WORLD 1.4 MPI implementations I chose to use the C++ bindings of the OpenMPI library, a very popular implementation that s already installed on muesli, lers, quaker, and the NERSC machines, and possibly on your workstation. All of the code snippets in Section 2, and some of the compiling and running instructions, are specific to that implementation. 2

3 Thisisnotyouronlyoptionthough! ThereareotherwidelyusedC/C++implementations, most notably MPICH, which is available on the franklin and hopper NERSC machines. There s also a C++ implementation using the Boost framework that lets you pass STL types, at least one Python implementation, and lots of others. Look for one that is well documented, supports your language of choice, is already present or easy to install on the machines you want to use, works with your favorite debugging and IDE suites, etc. 2 Writing MPI code 2.1 MPI program design Step one in any parallel computing project is figuring out what tasks in your code are parallelizable. Good candidates are tasks that involve doing the same operation many times on different pieces of data, where the result of each operation depends only on the input data to that operation and not on the output of any of the other operations. Relevant examples of this class of program tasks include MD force computation (each particle is independent), data analysis like computing g(r) (each configuration is independent), and replica exchange simulations (each replica is independent). In general, Monte Carlo is less parallelizable than molecular dynamics because a single-particle move usually depends on the result of the previous single-particle move (but ask Carl for some nice counterexamples). You don t need every part of your code to be parallelizable, but you ll probably get better speed-ups if you parallelize computationally-intensive tasks. Step two is identifying the input and output data of the to-be-parallelized tasks. MPI is a distributed memory system, so processes only have contact with each other by passing messages. This is nice because one process will never accidentally overwrite memory being used by another process. The flip side is that communication takes place over the network, so thinking about ways to minimize the amount of data being passed around may be worthwhile. When parallelizing the system propagation steps between replica exchange moves, I decided that the input to each replica would be its new simulation parameters (e.g., temperature, pressure, ǫ, or σ for an NPT Lennard-Jones system), and the output would be the set of energies of the replica s final configuration under all possible simulation parameters. Step three is deciding how to allocate tasks to MPI processes. For MD force computation, it would be silly to have one process per particle, but it might be reasonable to assign 128 particles out of a 1024-particle simulation to each of 8 processes. For replica exchange, one replica per process makes sense. I also decided to have a master process that coordinates the simulation, collecting information from and distributing information to each replica process; the replica processes are then the slaves. A common convention for masterslave program designs is to assign rank 0 to the master. Step four is thinking about how to incorporate this new functionality into 3

4 your code. Decide how to isolate the to-be-parallelized tasks into one or more functions, what messages should be passed between which processes at what times, and which processes should run which parts of the code. (In general, all processes run the same executable, but the flow of each process through the code can be controlled with statements like if (rank == somerank).) Cycle through steps 2 4 until you have a design you re happy with. Step five is actually writing the MPI code. Don t do step five until you ve done steps 1 4, especially if your code isn t under version control. Speaking of which, step zero: put your code under version control (ask Todd about using the group git server). 2.2 Basic functions: overhead To use the OpenMPI library in C++, put this line with your other include statements #include "mpi.h" and make sure the mpi.h file is in your path. Somewhere early in your program, before any other MPI command, you have to initialize MPI. This part is executed by every process, although the local value of rank will be different for each process. // passes the command-line arguments to MPI // the command-line arguments don t have to do anything though MPI_Init(&argc, &argv); // initialize the variable rank to the rank of this process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); // initialize the variable nprocs to the total number of processes int nprocs; MPI_Comm_size(MPI_COMM_WORLD, &nprocs); The rank and nprocs variables (which can be named anything you like, incidentally) are very useful for controlling the flow of your code. For instance, you could include lines such as if (rank < nprocs/2) { // code that only half the processes should execute } and the code will execute as expected because rank and nprocs have the values that make sense. Don t expect something like for (int irank = 0; irank < nprocs; irank++) { // code that every process should execute once } 4

5 because every process executes every line of code once by default; that code snippet would result in every process executing the code inside that for-loop nprocs times. Another use for rank can be finding the right input files. Because every process has the same argc and argv, you may not want your simulation parameters or data-file names to be command-line arguments. One option is to put the parameters for each process in separate files, and make a function that reads the appropriate input file and sets up that process. \\ set up name of input file for this process char input_file_name[60]; sprintf(input_file_name, "params_for_rank_%d.txt", rank); \\ pass file name to file-reading function read_input_file(input_file_name); A big part of the point of MPI is making things go faster, so you ll probably want to know how long different parts of your code take to run in wall-time. Instead of the ctime C/C++ functions that I usually use for profiling, MPI provides the function MPI_Wtime for this purpose. It returns a double that represents the current time in seconds; the zero of time is arbitrary but fixed throughout the run-time of the process. So it s best to use MPI_Wtime in pairs: // get current time double starttime = MPI_Wtime(); // some code that takes time // get current time double endtime = MPI_Wtime(); // see how long the code took cout << "this took " << endtime - starttime << " seconds "; cout << "or " << (endtime - starttime)/3600/24 << " days" << endl; One last thing you may be curious about is what physical computer your processes have found themselves on. I think there are ways to access at least part of this information in the submit script (see Section 3), but you can find out directly from within your executable too. The command for this is MPI_Get_processor_name, and it s used like so: // initialize arguments for MPI_Get_processor_name int namelen; char nodename[max_mpi_processor_name]; // call function, overwrites arguments MPI_Get_processor_name(nodename, &namelen); 5

6 // output cout << "rank " << rank << " running on node " << nodename << endl; cout << "name is " << namelen << " chars long" << endl; cout << "max name length was " << MAX_MPI_PROCESSOR_NAME << endl; Finally, after the last MPI function has been called, you need to clean things up: // clean up MPI MPI_Finalize(); 2.3 Basic functions: send/receive Now let s get to the MP part of MPI message passing. Every process can exchange messages with every other process, and there are a variety of functions that allow different sorts of communication patterns: send/receive, broadcast/reduce, gather/scatter, ring pass, etc. Send/receive is the simplest one, just one message being passed from one process to another, and that s the only one I ll cover here. The other communication methods are collective, in contrast to the point-to-point nature of send/receive. Cloud computing algorithms are typically based on collective communication (e.g., Google s patented MapReduce and Apache s open-source Hadoop) so there s a significant possibility that it s worth your while to look into collective communication options. The functions MPI_Send and MPI_Recv have a similar syntax: int MPI_Send(void* data_to_send, int count, MPI_Datatype datatype, int dest_rank, int tag, MPI_Comm communicator); int MPI_Recv(void* data_to_recv, int count, MPI_Datatype datatype, int source_rank, int tag, MPI_Comm communicator, MPI_Status* status); The first argument of each function is the message data itself, and all the others are the envelope that allows the message to be processed correctly on each end. Let s take a closer look at each argument. data_to_send and data_to_recv are pointers to pre-allocated blocks of memory that holds (or will hold) the message data. Since this implementation of MPI deals with pointers and arrays instead of STL types like vectors, your code has to deal with pointers too sorry! The memory at data_to_recv gets overwritten by MPI_Recv. count is the number of values in the message. If you re sending an array containing 8 floats, then count should be equal to 8 in MPI_Send, and 8 in MPI_Recv. datatype is the MPI equivalent of the C++ datatype that you used to initialize your message: MPI_FLOAT, MPI_DOUBLE, MPI_CHAR, etc. There s also an option for MPI_PACKED if you want to send a structure. dest_rank must match the rank of the process calling MPI_Recv, and source_rank must match the rank of the process calling MPI_Send. 6

7 tag is there for fine-tuning the point-to-point communication, but I don t have much of a use for it. The tag in MPI_Send must be an actual integer, whereas the tag in MPI_Recv can be a wildcard like MPI_ANY_TAG. communicator is typically MPI_COMM_WORLD, which we saw in the MPI start-up code snippet. All processes are members of MPI_COMM_WORLD so it will probably meet your needs, but there are other options for communicators if you want something more specialized. status is a structure of type MPI_Status, with members status.mpi_source, status.mpi_tag, and status.mpi_error (all of type int). The return values of MPI_Send and MPI_Recv are error codes, but MPI usually just dies if something goes wrong. 2.4 Send/receive examples Here s an example where process 2 sends the array [0, 2, 4, 6, 8] to process 1, and process 1 sends the array [4, 4, 4, 4, 4] back to process 2. Note the usage of * and &, and note that process 1 receives the first message before it sends the second message. (If both processes tried to send before they tried to receive, the code would hang indefinitely.) // initialize empty static arrays int * first_message[5]; int * second_message[5]; int count = 5; // initialize status MPI_Status status; if (rank == 2) { // put some values in the first array for (int i = 0; i < count; i++) first_message[i] = i*rank; // send the first message with tag=count MPI_Send(&first_message, count, MPI_INT, 1, count, MPI_COMM_WORLD); // receive the second message MPI_Recv(&second_message, count, MPI_INT, 1, MPI_ANY_TAG, MPI_COMM_WORLD, &status); } else if (rank == 1) { // put some values in the second array for (int i = 0; i < count; i++) second_message[i] = count - rank; 7

8 } // receive the first message MPI_Recv(&first_message, count, MPI_INT, 2, MPI_ANY_TAG, MPI_COMM_WORLD, &status); // send the second message with tag=0 MPI_Send(&second_message, count, MPI_INT, 2, 0, MPI_COMM_WORLD); Here s a more complicated example: say you have a bunch of slave processes running simulations of a system with liquid and vapor phases, and you want the master process to make a histogram of the z-velocities of all the vapor particles in all the simulations to check against a Maxwell-Boltzmann distribution. (Patrick Varilly was doing something similar the other day.) Because each slave simulation may have a different number of vapor particles, and to avoid hard-coding the maximum possible number of vapor particles, we ll use dynamic memory allocation; note that the resulting * and & usages are a bit different than in the previous example. This example also introduces the function MPI_Get_count(&status, datatype, &real_count), which figures out the number of values actually received. The value of real_count can be less than the value of count passed to MPI_Recv, making MPI_Get_count useful for debugging in addition to how I used it here. if (rank > 0) { // slaves only // ask a function for the number of vapor particles int n_vapor_particles = get_number_in_vapor(); // initialize empty dynamic array float * z_vels = new float[n_vapor_particles]; // pass pointer to z_vels array to a function // that puts in the correct values get_vapor_z_vels(z_vels); // send to master (rank 0) with tag 1 MPI_Send(z_vels, n_vapor_particles, MPI_FLOAT, 0, 1, MPI_COMM_WORLD); } // end slaves // can have other code here // sends and receives need not be close to each other in the code file // they just have to happen in the right order when the code is executed if (rank == 0) { // master only // initialize status MPI_Status status; 8

9 // loop over all slaves for (int irank = 1; irank < nprocs; irank++) { // initialize empty dynamic array // assume that n_max_particles is initialized elsewhere, // perhaps from a configuration file float * z_vels = new float[n_max_particles]; // receive from slave MPI_Recv(z_vels, n_max_particles, MPI_FLOAT, irank, MPI_ANY_TAG, MPI_COMM_WORLD, &status); // initialize n_vapor_particles with length of received message int n_vapor_particles; MPI_Get_count(&status, MPI_FLOAT, &n_vapor_particles); // output to a log filestream (initialized elsewhere) logfile << "master received z-velocities of " << n_vapor_particles << " vapor particles from slave with rank " << irank << endl; // loop over received values only for (int iparticle = 0; iparticle < n_vapor_particles; iparticle++) { // pass values to a histogramming function add_velocity_to_histogram(z_vels[iparticle]); } // end loop over particles } // end loop over slaves } // end master 3 Running MPI code 3.1 Learning about your MPI installation The command ompi_info outputs a bunch of information about your local OpenMPI installation, most of which I don t know how to deal with. You can grep for some specific things, for example: [anna@quaker test_mpi]$ ompi_info grep Open MPI: Open MPI: [anna@quaker test_mpi]$ ompi_info grep Prefix Prefix: /opt/openmpi 9

10 3.2 Compiling in general MPI code must be compiled with a MPI-specific compiler. For instance, the C++ compiler that s equivalent to g++ is mpic++. The MPI compilers can be used for non-mpi code too, so you can try a test compilation of your code before you even start adding MPI functionality. First, just try your usual compilation command, replacing g++ with mpic++. If you compile on the command-line, [anna@quaker test_mpi]$ mpic++ myprogram.cpp -o myprogram.exe or if you use a makefile, change the value of CXX in the file, e.g., # CXX=g++ CXX=mpic++ If that doesn t work, you may have to add the path to the compiler to your PATH. For instance, if you found in the previous section that Prefix: /opt/openmpi, then [anna@quaker test_mpi]$ export PATH=$PATH:/opt/openmpi/bin or add it in your ~/.bash_profile or ~/.bashrc files. You can also find the path using which: [anna@quaker test_mpi]$ which mpic++ /opt/openmpi/bin/mpic++ If you have trouble with library linking at runtime, it may help to add this line to the makefile: LDLIBSOPTIONS= $(shell mpic++ --showme:link) 3.3 Compiling on NERSC The NERSC machines use the Portland Group compilers by default, instead of the GNU compilers (eg g++) or the Intel compilers (which I ve never used before). To swap to the GNU compilers on hopper or franklin, type nid00007 a/anna> module swap PrgEnv-pgi PrgEnv-gnu and compile using CC instead of mpic++. Note that this implicitly using MPICH instead of OpenMPI, so you have to comment out any OpenMPI-specific lines in your makefile, but otherwise it works without a hitch! To compile with g++ and OpenMPI on carver, type carver% module swap pgi gcc carver% module swap openmpi openmpi-gcc then compile using mpic++ as usual. Different compilers have different strengths and weaknesses, so it may be worth your time to try out the PGI and Intel ones. To see what modules are currently loaded: 10

11 carver% module list Currently Loaded Modulefiles: 1) pgi/10.8 2) openmpi/ Running locally Suppose you have a non-mpi executable called myprogram.exe that takes a single command-line argument myconfigfile.txt, such that you d typically run the program using the command [anna@quaker test_mpi]$./myprogram.exe myconfigfile.txt Then to run 5 local jobs of an MPI version of this program, simply type [anna@quaker test_mpi]$ mpirun -np 5 myprogram.exe myconfigfile.txt Replace the 5 in the number of processes flag -np 5 with the actual number of processes you want to run. To run with the valgrind memory debugger and profiler, type [anna@quaker test_mpi]$ mpirun -np 5 valgrind myprogram.exe myconfigfile.txt When running parallel jobs locally, your speed-up will be limited by the number of cores on your local machine. One way to find out how many cores you have is to type top then hit 1. On quaker, the first few lines of the resulting display are something like Tasks: 2143 total, 1 running, 2142 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 16.4%sy, 0.0%ni, 83.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.3%us, 16.0%sy, 0.0%ni, 83.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.7%us, 17.4%sy, 0.0%ni, 82.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.3%us, 17.3%sy, 0.0%ni, 82.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.3%us, 16.3%sy, 0.0%ni, 82.7%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st Cpu5 : 0.7%us, 17.4%sy, 0.0%ni, 81.3%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st Mem: k total, k used, k free, k buffers Swap: k total, 6516k used, k free, k cached which makes me think that the quaker interactive node has 6 cores. Another source of information about the number of cores is the file /proc/cpuinfo. 3.5 Submitting jobs on quaker, muesli, or lers Although it s definitely possible to submit jobs by typing a qsub command on the command-line, it s easier in the long run to set up a submit script that keeps track of your flags. In the submit script below, all the flags (the things prefaced by #$) could be added to your command-line call if you really wanted to. There are lots of other flags out there, some of which I should probably be using; check them out by reading the qsub man page. 11

12 test_mpi]$ cat submit_mpi.sh #!/bin/sh # run this script by typing the following command, # replacing $n with the actual number of processes: # qsub -pe orte $n./submit_mpi.sh # use bash as your shell #$ -S /bin/bash # change this to your job name #$ -N myjobname # run from the current working directory #$ -cwd # don t join stdout and stderr #$ -j n # export environment variables #$ -V echo This job is being run on $(hostname --short) echo Running $NSLOTS processes # change this to your actual executable and arguments mpirun -np $NSLOTS./myprogram.exe myconfigfile.txt 12

13 This sample MPI submit script is very similar to a non-mpi submit script, but has the variable $NSLOTS that doesn t appear to be initialized anywhere. $NSLOTS is actually a SGE built-in variable that s initialized by the -pe orte flag, which I ve chosen to keep on the command line to make it easier to run different numbers of processes. So to submit 5 jobs to quaker using this submit script, type [anna@quaker test_mpi]$ qsub -pe orte 5./submit_mpi.sh The -pe orte flag also sets the parallel environment to the value orte. There are other options for the parallel environment, such as mpich, but orte worked best for me. The qconf utility is a good source of information about things like this: [anna@quaker test_mpi]$ qconf -spl make mpi mpich orte [anna@quaker test_mpi]$ qconf -sp orte pe_name orte slots 9999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary TRUE 3.6 Submitting jobs on NERSC NERSC is the supercomputing facility at LBNL. If you need more cores than are available on our group clusters, or just want your simulations to run much much much faster without changing a line of your code, NERSC is your best bet. The NERSC user info site has lots of information about how to use their system spend a while looking around there, especially the computational systems and queues and policies sections, to decide if NERSC is right for you. Currently, we have hours on their clusters through the Joint Center for Artificial Photosynthesis (JCAP) project, and possibly through other projects; Phill should have a rough idea of the computational resources available to us. For JCAP, start by ing Lin-Wang Wang <lwwang@lbl.gov> to get an account on the NERSC system and a budget of JCAP hours. There s a form you have to fill out and possibly fax, then a few webforms to click through. 13

14 Expect it to take a few days before you have ssh access to the clusters. Sign in at with your username and password to see how many hours you have available. Your hours can be used on any of the NERSC computers (hopper, franklin, carver, etc). Each of these computers has different software, hardware, and queue configurations, so choose one that fits your needs. Carver looks nice on paper because some queues have long maximum walltimes, but there may be a usage surcharge for using carver and my jobs spent days in the queue before running. Hopper ended up being the best answer for me: my code starts running sooner on hopper than on carver, and hopper uses global scratch whereas franklin has a separate scratch system. Youwillprobablywanttowritetheoutputofyourjobstoascratchdirectory, either local ($SCRATCH) or global ($GSCRATCH) depending on what computer you re on. Your allocated disk space is much larger in scratch than in your home directory, and I/O is much faster. You can even use scratch as if it were your home directory, eg you can submit your jobs from scratch. Carver has gnuplot and hopper doesn t, so another benefit of using global scratch is that you can easily run jobs on hopper then analyze them on carver. Note that your data won t be automatically backed up no matter what directory it s in, and files may even be purged periodically, so remember to back up your data to a safe place (one or more disks hosted by the group, or NERSC s storage system HPSS). The NERSC systems use PBS/Torque instead of SGE/Rocks for queue management. TheflagsaresimilartotheSGEflags, butareprefixedby#pbsinstead of $#, and I think the flags have to be the first thing in the submit script file (ie no comments before or during the flags). In the sample submit scripts on the next couple pages, replace -q debug or -q regular with your queue of choice; replace -l walltime=00:30:00 with your actual maximum walltime (format HH:MM:SS); and replace all the other names, numbers, and directory handling commands with reasonable values. Also note that runtime library linking errors may be resolved by swapping to your correct compiler modules within your submit script, not by anything Google might suggest about changing your $LD_LIBRARY_PATH. There are two main differences between submit scripts on hopper and carver. First, carver uses mpirun -np $nprocs, as on quaker, whereas hopper uses aprun -n $nprocs to do the same thing. Second, they use different syntax and criteria for deciding the number of cores alloted to your job, although both only allocate cores in multiples of the number of processors per node. On carver, there are 8 processors per node, so if you want 16 processors then use the flag -l nodes=2:ppn=8. Hopper has 24 processors per node, so the number in the -l mppwidth flag must be a multiple of 24. On both machines, you will be allocated and charged for the number of processors you request using this flag, which may larger than the number you actually utilize with the mpirun -np or aprun -n commands, so plan your processor use accordingly. Be warned: you don t use this flag, all your processes will run on a single node, making your job painfully slow and possibly running out of memory. 14

15 Here s a submit script for the debug queue on carver. bash-3.2$ cat submit_carver_debug.pbs #!/bin/bash #PBS -S /bin/bash #PBS -N debug_job_name #PBS -j n #PBS -V #PBS -q debug #PBS -l walltime=00:30:00 #PBS -l nodes=2:ppn=8 #PBS -M your @host.com #PBS -m aeb ### SET THESE VARIABLES BY HAND ### nprocs=16 # should match the -l nodes=xx:ppn=8 line above output_prefix=debug_output ################################### # output compute node and number of MPI processes echo This job is being run on $(hostname --short) echo $nprocs # set up with correct modules for GNU compilers # fixes runtime errors involving incorrect linking of # libstdc++ and GLIBCXX libraries module swap pgi gcc module swap openmpi openmpi-gcc module list # set up to use global scratch output_path=$scratch/$output_prefix echo $output_path if [! -d $output_path ]; then mkdir $output_path else # assume the directory s content shouldn t already be there echo "Deleting existing data in $output_path" rm -r $output_path/* fi # move to current working directory (like -cwd flag in SGE) cd $PBS_O_WORKDIR # submit job mpirun -np $nprocs myprogram.exe myargs 15

16 And here s a submit script for the regular queue on hopper. anna@hopper03:/global/scratch/sd/anna/production_output> cat submit_regular_hopper.pbs #!/bin/bash #PBS -S /bin/bash #PBS -N regular_job_name #PBS -j n #PBS -V #PBS -q regular #PBS -l walltime=36:00:00 #PBS -M your @host.com #PBS -m aeb #PBS -l mppwidth=144 ### SET VARIABLES BY HAND ### nprocs=122 output_prefix=production_output ############################# # output compute node and number of MPI processes echo This job is being run on $(hostname --short) echo $nprocs # set up with correct modules for GNU compilers # I didn t check whether this is necessary for hopper # like it is for carver, but why not module swap PrgEnv-pgi PrgEnv-gnu module list # set up to use scratch output_path=$gscratch/$output_prefix echo $output_path if [! -d $output_path ]; then mkdir $output_path else # assume directory should already be there echo "directory found at $output_path, leaving it there" fi # run job cd $PBS_O_WORKDIR aprun -n $nprocs myprogram.exe myargs 16

Simple examples how to run MPI program via PBS on Taurus HPC

Simple examples how to run MPI program via PBS on Taurus HPC Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload

More information

ITCS 4145/5145 Assignment 2

ITCS 4145/5145 Assignment 2 ITCS 4145/5145 Assignment 2 Compiling and running MPI programs Author: B. Wilkinson and Clayton S. Ferner. Modification date: September 10, 2012 In this assignment, the workpool computations done in Assignment

More information

Introduction in Parallel Programming - MPI Part I

Introduction in Parallel Programming - MPI Part I Introduction in Parallel Programming - MPI Part I Instructor: Michela Taufer WS2004/2005 Source of these Slides Books: Parallel Programming with MPI by Peter Pacheco (Paperback) Parallel Programming in

More information

Distributed Memory Programming with Message-Passing

Distributed Memory Programming with Message-Passing Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Point-to-Point Communication. Reference:

Point-to-Point Communication. Reference: Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point

More information

Programming with MPI. Pedro Velho

Programming with MPI. Pedro Velho Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?

More information

Lecture 7: Distributed memory

Lecture 7: Distributed memory Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing

More information

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho Programming with MPI on GridRS Dr. Márcio Castro e Dr. Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage -

More information

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

To connect to the cluster, simply use a SSH or SFTP client to connect to:

To connect to the cluster, simply use a SSH or SFTP client to connect to: RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, phoenix.ce.rit.edu, serves as the master controller or head

More information

Shifter on Blue Waters

Shifter on Blue Waters Shifter on Blue Waters Why Containers? Your Computer Another Computer (Supercomputer) Application Application software libraries System libraries software libraries System libraries Why Containers? Your

More information

Lecture 3 Message-Passing Programming Using MPI (Part 1)

Lecture 3 Message-Passing Programming Using MPI (Part 1) Lecture 3 Message-Passing Programming Using MPI (Part 1) 1 What is MPI Message-Passing Interface (MPI) Message-Passing is a communication model used on distributed-memory architecture MPI is not a programming

More information

L14 Supercomputing - Part 2

L14 Supercomputing - Part 2 Geophysical Computing L14-1 L14 Supercomputing - Part 2 1. MPI Code Structure Writing parallel code can be done in either C or Fortran. The Message Passing Interface (MPI) is just a set of subroutines

More information

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc. Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Parallel Computing with MPI Building/Debugging MPI Executables MPI Send/Receive Collective Communications with MPI April 10, 2012 Dan Negrut,

More information

Parallel Programming Assignment 3 Compiling and running MPI programs

Parallel Programming Assignment 3 Compiling and running MPI programs Parallel Programming Assignment 3 Compiling and running MPI programs Author: Clayton S. Ferner and B. Wilkinson Modification date: October 11a, 2013 This assignment uses the UNC-Wilmington cluster babbage.cis.uncw.edu.

More information

Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) 1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,

More information

Parallel Programming with MPI: Day 1

Parallel Programming with MPI: Day 1 Parallel Programming with MPI: Day 1 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH 43212-1163 1 Table of Contents Brief History of MPI

More information

The Message Passing Model

The Message Passing Model Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

User Guide of High Performance Computing Cluster in School of Physics

User Guide of High Performance Computing Cluster in School of Physics User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Parallel Short Course. Distributed memory machines

Parallel Short Course. Distributed memory machines Parallel Short Course Message Passing Interface (MPI ) I Introduction and Point-to-point operations Spring 2007 Distributed memory machines local disks Memory Network card 1 Compute node message passing

More information

Cluster Clonetroop: HowTo 2014

Cluster Clonetroop: HowTo 2014 2014/02/25 16:53 1/13 Cluster Clonetroop: HowTo 2014 Cluster Clonetroop: HowTo 2014 This section contains information about how to access, compile and execute jobs on Clonetroop, Laboratori de Càlcul Numeric's

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

MPI introduction - exercises -

MPI introduction - exercises - MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job

More information

Introduction to the Message Passing Interface (MPI)

Introduction to the Message Passing Interface (MPI) Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018

More information

MPI 3. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 3. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

CSE 160 Lecture 18. Message Passing

CSE 160 Lecture 18. Message Passing CSE 160 Lecture 18 Message Passing Question 4c % Serial Loop: for i = 1:n/3-1 x(2*i) = x(3*i); % Restructured for Parallelism (CORRECT) for i = 1:3:n/3-1 y(2*i) = y(3*i); for i = 2:3:n/3-1 y(2*i) = y(3*i);

More information

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines

COSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines Network card Network card 1 COSC 6374 Parallel Computation Message Passing Interface (MPI ) I Introduction Edgar Gabriel Fall 015 Distributed memory machines Each compute node represents an independent

More information

CS 426. Building and Running a Parallel Application

CS 426. Building and Running a Parallel Application CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations

More information

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

int sum;... sum = sum + c?

int sum;... sum = sum + c? int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(

More information

Parallel Programming, MPI Lecture 2

Parallel Programming, MPI Lecture 2 Parallel Programming, MPI Lecture 2 Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Introduction and Review The Von Neumann Computer Kinds of Parallel

More information

UBDA Platform User Gudie. 16 July P a g e 1

UBDA Platform User Gudie. 16 July P a g e 1 16 July 2018 P a g e 1 Revision History Version Date Prepared By Summary of Changes 1.0 Jul 16, 2018 Initial release P a g e 2 Table of Contents 1. Introduction... 4 2. Perform the test... 5 3 Job submission...

More information

To connect to the cluster, simply use a SSH or SFTP client to connect to:

To connect to the cluster, simply use a SSH or SFTP client to connect to: RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or

More information

Distributed Memory Programming With MPI Computer Lab Exercises

Distributed Memory Programming With MPI Computer Lab Exercises Distributed Memory Programming With MPI Computer Lab Exercises Advanced Computational Science II John Burkardt Department of Scientific Computing Florida State University http://people.sc.fsu.edu/ jburkardt/classes/acs2

More information

Tutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros

Tutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros Tutorial 2: MPI CS486 - Principles of Distributed Computing Papageorgiou Spyros What is MPI? An Interface Specification MPI = Message Passing Interface Provides a standard -> various implementations Offers

More information

MPI Message Passing Interface

MPI Message Passing Interface MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information

More information

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut mith College CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut dthiebaut@smith.edu Introduction to MPI D. Thiebaut Inspiration Reference MPI by Blaise Barney, Lawrence Livermore National

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Computing with the Moore Cluster

Computing with the Moore Cluster Computing with the Moore Cluster Edward Walter An overview of data management and job processing in the Moore compute cluster. Overview Getting access to the cluster Data management Submitting jobs (MPI

More information

Getting started with the CEES Grid

Getting started with the CEES Grid Getting started with the CEES Grid October, 2013 CEES HPC Manager: Dennis Michael, dennis@stanford.edu, 723-2014, Mitchell Building room 415. Please see our web site at http://cees.stanford.edu. Account

More information

CS354 gdb Tutorial Written by Chris Feilbach

CS354 gdb Tutorial Written by Chris Feilbach CS354 gdb Tutorial Written by Chris Feilbach Purpose This tutorial aims to show you the basics of using gdb to debug C programs. gdb is the GNU debugger, and is provided on systems that

More information

An introduction to MPI

An introduction to MPI An introduction to MPI C MPI is a Library for Message-Passing Not built in to compiler Function calls that can be made from any compiler, many languages Just link to it Wrappers: mpicc, mpif77 Fortran

More information

Solution of Exercise Sheet 2

Solution of Exercise Sheet 2 Solution of Exercise Sheet 2 Exercise 1 (Cluster Computing) 1. Give a short definition of Cluster Computing. Clustering is parallel computing on systems with distributed memory. 2. What is a Cluster of

More information

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.

Lesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems. The goals of this lesson are: understanding the MPI programming model managing the MPI environment handling errors point-to-point communication 1. The MPI Environment Lesson 1 MPI (Message Passing Interface)

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

Hands-on. MPI basic exercises

Hands-on. MPI basic exercises WIFI XSF-UPC: Username: xsf.convidat Password: 1nt3r3st3l4r WIFI EDUROAM: Username: roam06@bsc.es Password: Bsccns.4 MareNostrum III User Guide http://www.bsc.es/support/marenostrum3-ug.pdf Remember to

More information

A message contains a number of elements of some particular datatype. MPI datatypes:

A message contains a number of elements of some particular datatype. MPI datatypes: Messages Messages A message contains a number of elements of some particular datatype. MPI datatypes: Basic types. Derived types. Derived types can be built up from basic types. C types are different from

More information

Anomalies. The following issues might make the performance of a parallel program look different than it its:

Anomalies. The following issues might make the performance of a parallel program look different than it its: Anomalies The following issues might make the performance of a parallel program look different than it its: When running a program in parallel on many processors, each processor has its own cache, so the

More information

Cluster User Training

Cluster User Training Cluster User Training From Bash to parallel jobs under SGE in one terrifying hour Christopher Dwan, Bioteam First delivered at IICB, Kolkata, India December 14, 2009 UNIX ESSENTIALS Unix command line essentials

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction

More information

Tool for Analysing and Checking MPI Applications

Tool for Analysing and Checking MPI Applications Tool for Analysing and Checking MPI Applications April 30, 2010 1 CONTENTS CONTENTS Contents 1 Introduction 3 1.1 What is Marmot?........................... 3 1.2 Design of Marmot..........................

More information

15-440: Recitation 8

15-440: Recitation 8 15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs

More information

Debugging on Blue Waters

Debugging on Blue Waters Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it

More information

Introduction to PICO Parallel & Production Enviroment

Introduction to PICO Parallel & Production Enviroment Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it

More information

What s in this talk? Quick Introduction. Programming in Parallel

What s in this talk? Quick Introduction. Programming in Parallel What s in this talk? Parallel programming methodologies - why MPI? Where can I use MPI? MPI in action Getting MPI to work at Warwick Examples MPI: Parallel Programming for Extreme Machines Si Hammond,

More information

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs

The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs 1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics May 24, 2015 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003 Introduction to MPI HY555 Parallel Systems and Grids Fall 2003 Outline MPI layout Sending and receiving messages Collective communication Datatypes An example Compiling and running Typical layout of an

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives n Understanding how MPI programs execute n Familiarity with fundamental MPI functions

More information

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted

More information

Working on the NewRiver Cluster

Working on the NewRiver Cluster Working on the NewRiver Cluster CMDA3634: Computer Science Foundations for Computational Modeling and Data Analytics 22 February 2018 NewRiver is a computing cluster provided by Virginia Tech s Advanced

More information

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring 2009 Topic Notes: C and Unix Overview This course is about computer organization, but since most of our programming is

More information

Assignment 3 MPI Tutorial Compiling and Executing MPI programs

Assignment 3 MPI Tutorial Compiling and Executing MPI programs Assignment 3 MPI Tutorial Compiling and Executing MPI programs B. Wilkinson: Modification date: February 11, 2016. This assignment is a tutorial to learn how to execute MPI programs and explore their characteristics.

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

Part One: The Files. C MPI Slurm Tutorial - TSP. Introduction. TSP Problem and Tutorial s Purpose. tsp.tar. The C files, summary

Part One: The Files. C MPI Slurm Tutorial - TSP. Introduction. TSP Problem and Tutorial s Purpose. tsp.tar. The C files, summary C MPI Slurm Tutorial - TSP Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program Knowledge of C is assumed Code is also given for the

More information

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel

More information

Supercomputing in Plain English Exercise #6: MPI Point to Point

Supercomputing in Plain English Exercise #6: MPI Point to Point Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a

More information

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

A Guide to Condor. Joe Antognini. October 25, Condor is on Our Network What is an Our Network?

A Guide to Condor. Joe Antognini. October 25, Condor is on Our Network What is an Our Network? A Guide to Condor Joe Antognini October 25, 2013 1 Condor is on Our Network What is an Our Network? The computers in the OSU astronomy department are all networked together. In fact, they re networked

More information

High Performance Beowulf Cluster Environment User Manual

High Performance Beowulf Cluster Environment User Manual High Performance Beowulf Cluster Environment User Manual Version 3.1c 2 This guide is intended for cluster users who want a quick introduction to the Compusys Beowulf Cluster Environment. It explains how

More information

MPI: The Message-Passing Interface. Most of this discussion is from [1] and [2].

MPI: The Message-Passing Interface. Most of this discussion is from [1] and [2]. MPI: The Message-Passing Interface Most of this discussion is from [1] and [2]. What Is MPI? The Message-Passing Interface (MPI) is a standard for expressing distributed parallelism via message passing.

More information

Reusing this material

Reusing this material Messages Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task

More information

Supercomputing environment TMA4280 Introduction to Supercomputing

Supercomputing environment TMA4280 Introduction to Supercomputing Supercomputing environment TMA4280 Introduction to Supercomputing NTNU, IMF February 21. 2018 1 Supercomputing environment Supercomputers use UNIX-type operating systems. Predominantly Linux. Using a shell

More information

Tech Computer Center Documentation

Tech Computer Center Documentation Tech Computer Center Documentation Release 0 TCC Doc February 17, 2014 Contents 1 TCC s User Documentation 1 1.1 TCC SGI Altix ICE Cluster User s Guide................................ 1 i ii CHAPTER 1

More information

Parallel Computing: Overview

Parallel Computing: Overview Parallel Computing: Overview Jemmy Hu SHARCNET University of Waterloo March 1, 2007 Contents What is Parallel Computing? Why use Parallel Computing? Flynn's Classical Taxonomy Parallel Computer Memory

More information

MPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI

MPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI Linux MPI Linux MPI Message Passing Interface Linux MPI Linux MPI Message Passing Interface MPI MPICH MPI Department of Science and Engineering Computing School of Mathematics School Peking University

More information

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2 Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.

More information

CSE 160 Lecture 15. Message Passing

CSE 160 Lecture 15. Message Passing CSE 160 Lecture 15 Message Passing Announcements 2013 Scott B. Baden / CSE 160 / Fall 2013 2 Message passing Today s lecture The Message Passing Interface - MPI A first MPI Application The Trapezoidal

More information

An Introduction to MPI

An Introduction to MPI An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current

More information

Supercomputing in Plain English

Supercomputing in Plain English Supercomputing in Plain English An Introduction to High Performance Computing Part VI: Distributed Multiprocessing Henry Neeman, Director The Desert Islands Analogy Distributed Parallelism MPI Outline

More information

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Message Passing Interface

Message Passing Interface Message Passing Interface by Kuan Lu 03.07.2012 Scientific researcher at Georg-August-Universität Göttingen and Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg, 37077 Göttingen,

More information

MPI Runtime Error Detection with MUST

MPI Runtime Error Detection with MUST MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include

More information