Practical MPI for the Geissler group
|
|
- Arabella Alexandra Nash
- 6 years ago
- Views:
Transcription
1 Practical MPI for the Geissler group Anna August 12, 2011 Contents 1 Introduction What is MPI? Resources A tiny glossary MPI implementations Writing MPI code MPI program design Basic functions: overhead Basic functions: send/receive Send/receive examples Running MPI code Learning about your MPI installation Compiling in general Compiling on NERSC Running locally Submitting jobs on quaker, muesli, or lers Submitting jobs on NERSC Introduction 1.1 What is MPI? MPI stands for Message Passing Interface. It is a set of specifications for message-passing libraries, and has many implementations in many languages(including C/C++, Fortran, and Python). It s good for CPU parallel-programming tasks where your processes are running the same code mostly independently, but may need to exchange small pieces of information once in a while. Typical uses for Geissler group members might be: data analysis (one analysis per process, each with a different datafile) 1
2 replica exchange/parallel tempering simulations (one simulation per process, each with a different temperature or other set of parameters) This tutorial focuses on what I learned while writing a replica exchange simulation in C++. Tasks that are probably not well suited to MPI include anything using a shared-memory model, programs that require sharing large amounts of data with complex internal structure (e.g., whole system configurations), or programs that utilize a large number of processes for only a small fraction of the total wall-clock time (because the processes hog cluster space even when they re inactive). 1.2 Resources The group has two MPI reference books floating around: Parallel Programming with MPI by Peter S. Pacheco Parallel Programming in C with MPI and OpenMPI by Michael J. Quinn Have a glance through them to get yourself started, then start googling to answer specific debugging questions. A few links I found particularly useful are for runtime issues 1.3 A tiny glossary core aka processor, unit of computing hardware that executes MPI code node a physical computer, like your desktop; modern ones have several cores process what s running on a core, executing a complete copy of your code job aka session, the collection of all N processes that you run together as a single command-line operation rank the ID number of a process (between 0 and N 1) message a packet of information passed between two or more processes within the same job communicator what passes messages between processes; the only one you probably need to know about is MPI_COMM_WORLD 1.4 MPI implementations I chose to use the C++ bindings of the OpenMPI library, a very popular implementation that s already installed on muesli, lers, quaker, and the NERSC machines, and possibly on your workstation. All of the code snippets in Section 2, and some of the compiling and running instructions, are specific to that implementation. 2
3 Thisisnotyouronlyoptionthough! ThereareotherwidelyusedC/C++implementations, most notably MPICH, which is available on the franklin and hopper NERSC machines. There s also a C++ implementation using the Boost framework that lets you pass STL types, at least one Python implementation, and lots of others. Look for one that is well documented, supports your language of choice, is already present or easy to install on the machines you want to use, works with your favorite debugging and IDE suites, etc. 2 Writing MPI code 2.1 MPI program design Step one in any parallel computing project is figuring out what tasks in your code are parallelizable. Good candidates are tasks that involve doing the same operation many times on different pieces of data, where the result of each operation depends only on the input data to that operation and not on the output of any of the other operations. Relevant examples of this class of program tasks include MD force computation (each particle is independent), data analysis like computing g(r) (each configuration is independent), and replica exchange simulations (each replica is independent). In general, Monte Carlo is less parallelizable than molecular dynamics because a single-particle move usually depends on the result of the previous single-particle move (but ask Carl for some nice counterexamples). You don t need every part of your code to be parallelizable, but you ll probably get better speed-ups if you parallelize computationally-intensive tasks. Step two is identifying the input and output data of the to-be-parallelized tasks. MPI is a distributed memory system, so processes only have contact with each other by passing messages. This is nice because one process will never accidentally overwrite memory being used by another process. The flip side is that communication takes place over the network, so thinking about ways to minimize the amount of data being passed around may be worthwhile. When parallelizing the system propagation steps between replica exchange moves, I decided that the input to each replica would be its new simulation parameters (e.g., temperature, pressure, ǫ, or σ for an NPT Lennard-Jones system), and the output would be the set of energies of the replica s final configuration under all possible simulation parameters. Step three is deciding how to allocate tasks to MPI processes. For MD force computation, it would be silly to have one process per particle, but it might be reasonable to assign 128 particles out of a 1024-particle simulation to each of 8 processes. For replica exchange, one replica per process makes sense. I also decided to have a master process that coordinates the simulation, collecting information from and distributing information to each replica process; the replica processes are then the slaves. A common convention for masterslave program designs is to assign rank 0 to the master. Step four is thinking about how to incorporate this new functionality into 3
4 your code. Decide how to isolate the to-be-parallelized tasks into one or more functions, what messages should be passed between which processes at what times, and which processes should run which parts of the code. (In general, all processes run the same executable, but the flow of each process through the code can be controlled with statements like if (rank == somerank).) Cycle through steps 2 4 until you have a design you re happy with. Step five is actually writing the MPI code. Don t do step five until you ve done steps 1 4, especially if your code isn t under version control. Speaking of which, step zero: put your code under version control (ask Todd about using the group git server). 2.2 Basic functions: overhead To use the OpenMPI library in C++, put this line with your other include statements #include "mpi.h" and make sure the mpi.h file is in your path. Somewhere early in your program, before any other MPI command, you have to initialize MPI. This part is executed by every process, although the local value of rank will be different for each process. // passes the command-line arguments to MPI // the command-line arguments don t have to do anything though MPI_Init(&argc, &argv); // initialize the variable rank to the rank of this process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); // initialize the variable nprocs to the total number of processes int nprocs; MPI_Comm_size(MPI_COMM_WORLD, &nprocs); The rank and nprocs variables (which can be named anything you like, incidentally) are very useful for controlling the flow of your code. For instance, you could include lines such as if (rank < nprocs/2) { // code that only half the processes should execute } and the code will execute as expected because rank and nprocs have the values that make sense. Don t expect something like for (int irank = 0; irank < nprocs; irank++) { // code that every process should execute once } 4
5 because every process executes every line of code once by default; that code snippet would result in every process executing the code inside that for-loop nprocs times. Another use for rank can be finding the right input files. Because every process has the same argc and argv, you may not want your simulation parameters or data-file names to be command-line arguments. One option is to put the parameters for each process in separate files, and make a function that reads the appropriate input file and sets up that process. \\ set up name of input file for this process char input_file_name[60]; sprintf(input_file_name, "params_for_rank_%d.txt", rank); \\ pass file name to file-reading function read_input_file(input_file_name); A big part of the point of MPI is making things go faster, so you ll probably want to know how long different parts of your code take to run in wall-time. Instead of the ctime C/C++ functions that I usually use for profiling, MPI provides the function MPI_Wtime for this purpose. It returns a double that represents the current time in seconds; the zero of time is arbitrary but fixed throughout the run-time of the process. So it s best to use MPI_Wtime in pairs: // get current time double starttime = MPI_Wtime(); // some code that takes time // get current time double endtime = MPI_Wtime(); // see how long the code took cout << "this took " << endtime - starttime << " seconds "; cout << "or " << (endtime - starttime)/3600/24 << " days" << endl; One last thing you may be curious about is what physical computer your processes have found themselves on. I think there are ways to access at least part of this information in the submit script (see Section 3), but you can find out directly from within your executable too. The command for this is MPI_Get_processor_name, and it s used like so: // initialize arguments for MPI_Get_processor_name int namelen; char nodename[max_mpi_processor_name]; // call function, overwrites arguments MPI_Get_processor_name(nodename, &namelen); 5
6 // output cout << "rank " << rank << " running on node " << nodename << endl; cout << "name is " << namelen << " chars long" << endl; cout << "max name length was " << MAX_MPI_PROCESSOR_NAME << endl; Finally, after the last MPI function has been called, you need to clean things up: // clean up MPI MPI_Finalize(); 2.3 Basic functions: send/receive Now let s get to the MP part of MPI message passing. Every process can exchange messages with every other process, and there are a variety of functions that allow different sorts of communication patterns: send/receive, broadcast/reduce, gather/scatter, ring pass, etc. Send/receive is the simplest one, just one message being passed from one process to another, and that s the only one I ll cover here. The other communication methods are collective, in contrast to the point-to-point nature of send/receive. Cloud computing algorithms are typically based on collective communication (e.g., Google s patented MapReduce and Apache s open-source Hadoop) so there s a significant possibility that it s worth your while to look into collective communication options. The functions MPI_Send and MPI_Recv have a similar syntax: int MPI_Send(void* data_to_send, int count, MPI_Datatype datatype, int dest_rank, int tag, MPI_Comm communicator); int MPI_Recv(void* data_to_recv, int count, MPI_Datatype datatype, int source_rank, int tag, MPI_Comm communicator, MPI_Status* status); The first argument of each function is the message data itself, and all the others are the envelope that allows the message to be processed correctly on each end. Let s take a closer look at each argument. data_to_send and data_to_recv are pointers to pre-allocated blocks of memory that holds (or will hold) the message data. Since this implementation of MPI deals with pointers and arrays instead of STL types like vectors, your code has to deal with pointers too sorry! The memory at data_to_recv gets overwritten by MPI_Recv. count is the number of values in the message. If you re sending an array containing 8 floats, then count should be equal to 8 in MPI_Send, and 8 in MPI_Recv. datatype is the MPI equivalent of the C++ datatype that you used to initialize your message: MPI_FLOAT, MPI_DOUBLE, MPI_CHAR, etc. There s also an option for MPI_PACKED if you want to send a structure. dest_rank must match the rank of the process calling MPI_Recv, and source_rank must match the rank of the process calling MPI_Send. 6
7 tag is there for fine-tuning the point-to-point communication, but I don t have much of a use for it. The tag in MPI_Send must be an actual integer, whereas the tag in MPI_Recv can be a wildcard like MPI_ANY_TAG. communicator is typically MPI_COMM_WORLD, which we saw in the MPI start-up code snippet. All processes are members of MPI_COMM_WORLD so it will probably meet your needs, but there are other options for communicators if you want something more specialized. status is a structure of type MPI_Status, with members status.mpi_source, status.mpi_tag, and status.mpi_error (all of type int). The return values of MPI_Send and MPI_Recv are error codes, but MPI usually just dies if something goes wrong. 2.4 Send/receive examples Here s an example where process 2 sends the array [0, 2, 4, 6, 8] to process 1, and process 1 sends the array [4, 4, 4, 4, 4] back to process 2. Note the usage of * and &, and note that process 1 receives the first message before it sends the second message. (If both processes tried to send before they tried to receive, the code would hang indefinitely.) // initialize empty static arrays int * first_message[5]; int * second_message[5]; int count = 5; // initialize status MPI_Status status; if (rank == 2) { // put some values in the first array for (int i = 0; i < count; i++) first_message[i] = i*rank; // send the first message with tag=count MPI_Send(&first_message, count, MPI_INT, 1, count, MPI_COMM_WORLD); // receive the second message MPI_Recv(&second_message, count, MPI_INT, 1, MPI_ANY_TAG, MPI_COMM_WORLD, &status); } else if (rank == 1) { // put some values in the second array for (int i = 0; i < count; i++) second_message[i] = count - rank; 7
8 } // receive the first message MPI_Recv(&first_message, count, MPI_INT, 2, MPI_ANY_TAG, MPI_COMM_WORLD, &status); // send the second message with tag=0 MPI_Send(&second_message, count, MPI_INT, 2, 0, MPI_COMM_WORLD); Here s a more complicated example: say you have a bunch of slave processes running simulations of a system with liquid and vapor phases, and you want the master process to make a histogram of the z-velocities of all the vapor particles in all the simulations to check against a Maxwell-Boltzmann distribution. (Patrick Varilly was doing something similar the other day.) Because each slave simulation may have a different number of vapor particles, and to avoid hard-coding the maximum possible number of vapor particles, we ll use dynamic memory allocation; note that the resulting * and & usages are a bit different than in the previous example. This example also introduces the function MPI_Get_count(&status, datatype, &real_count), which figures out the number of values actually received. The value of real_count can be less than the value of count passed to MPI_Recv, making MPI_Get_count useful for debugging in addition to how I used it here. if (rank > 0) { // slaves only // ask a function for the number of vapor particles int n_vapor_particles = get_number_in_vapor(); // initialize empty dynamic array float * z_vels = new float[n_vapor_particles]; // pass pointer to z_vels array to a function // that puts in the correct values get_vapor_z_vels(z_vels); // send to master (rank 0) with tag 1 MPI_Send(z_vels, n_vapor_particles, MPI_FLOAT, 0, 1, MPI_COMM_WORLD); } // end slaves // can have other code here // sends and receives need not be close to each other in the code file // they just have to happen in the right order when the code is executed if (rank == 0) { // master only // initialize status MPI_Status status; 8
9 // loop over all slaves for (int irank = 1; irank < nprocs; irank++) { // initialize empty dynamic array // assume that n_max_particles is initialized elsewhere, // perhaps from a configuration file float * z_vels = new float[n_max_particles]; // receive from slave MPI_Recv(z_vels, n_max_particles, MPI_FLOAT, irank, MPI_ANY_TAG, MPI_COMM_WORLD, &status); // initialize n_vapor_particles with length of received message int n_vapor_particles; MPI_Get_count(&status, MPI_FLOAT, &n_vapor_particles); // output to a log filestream (initialized elsewhere) logfile << "master received z-velocities of " << n_vapor_particles << " vapor particles from slave with rank " << irank << endl; // loop over received values only for (int iparticle = 0; iparticle < n_vapor_particles; iparticle++) { // pass values to a histogramming function add_velocity_to_histogram(z_vels[iparticle]); } // end loop over particles } // end loop over slaves } // end master 3 Running MPI code 3.1 Learning about your MPI installation The command ompi_info outputs a bunch of information about your local OpenMPI installation, most of which I don t know how to deal with. You can grep for some specific things, for example: [anna@quaker test_mpi]$ ompi_info grep Open MPI: Open MPI: [anna@quaker test_mpi]$ ompi_info grep Prefix Prefix: /opt/openmpi 9
10 3.2 Compiling in general MPI code must be compiled with a MPI-specific compiler. For instance, the C++ compiler that s equivalent to g++ is mpic++. The MPI compilers can be used for non-mpi code too, so you can try a test compilation of your code before you even start adding MPI functionality. First, just try your usual compilation command, replacing g++ with mpic++. If you compile on the command-line, [anna@quaker test_mpi]$ mpic++ myprogram.cpp -o myprogram.exe or if you use a makefile, change the value of CXX in the file, e.g., # CXX=g++ CXX=mpic++ If that doesn t work, you may have to add the path to the compiler to your PATH. For instance, if you found in the previous section that Prefix: /opt/openmpi, then [anna@quaker test_mpi]$ export PATH=$PATH:/opt/openmpi/bin or add it in your ~/.bash_profile or ~/.bashrc files. You can also find the path using which: [anna@quaker test_mpi]$ which mpic++ /opt/openmpi/bin/mpic++ If you have trouble with library linking at runtime, it may help to add this line to the makefile: LDLIBSOPTIONS= $(shell mpic++ --showme:link) 3.3 Compiling on NERSC The NERSC machines use the Portland Group compilers by default, instead of the GNU compilers (eg g++) or the Intel compilers (which I ve never used before). To swap to the GNU compilers on hopper or franklin, type nid00007 a/anna> module swap PrgEnv-pgi PrgEnv-gnu and compile using CC instead of mpic++. Note that this implicitly using MPICH instead of OpenMPI, so you have to comment out any OpenMPI-specific lines in your makefile, but otherwise it works without a hitch! To compile with g++ and OpenMPI on carver, type carver% module swap pgi gcc carver% module swap openmpi openmpi-gcc then compile using mpic++ as usual. Different compilers have different strengths and weaknesses, so it may be worth your time to try out the PGI and Intel ones. To see what modules are currently loaded: 10
11 carver% module list Currently Loaded Modulefiles: 1) pgi/10.8 2) openmpi/ Running locally Suppose you have a non-mpi executable called myprogram.exe that takes a single command-line argument myconfigfile.txt, such that you d typically run the program using the command [anna@quaker test_mpi]$./myprogram.exe myconfigfile.txt Then to run 5 local jobs of an MPI version of this program, simply type [anna@quaker test_mpi]$ mpirun -np 5 myprogram.exe myconfigfile.txt Replace the 5 in the number of processes flag -np 5 with the actual number of processes you want to run. To run with the valgrind memory debugger and profiler, type [anna@quaker test_mpi]$ mpirun -np 5 valgrind myprogram.exe myconfigfile.txt When running parallel jobs locally, your speed-up will be limited by the number of cores on your local machine. One way to find out how many cores you have is to type top then hit 1. On quaker, the first few lines of the resulting display are something like Tasks: 2143 total, 1 running, 2142 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 16.4%sy, 0.0%ni, 83.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.3%us, 16.0%sy, 0.0%ni, 83.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.7%us, 17.4%sy, 0.0%ni, 82.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.3%us, 17.3%sy, 0.0%ni, 82.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.3%us, 16.3%sy, 0.0%ni, 82.7%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st Cpu5 : 0.7%us, 17.4%sy, 0.0%ni, 81.3%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st Mem: k total, k used, k free, k buffers Swap: k total, 6516k used, k free, k cached which makes me think that the quaker interactive node has 6 cores. Another source of information about the number of cores is the file /proc/cpuinfo. 3.5 Submitting jobs on quaker, muesli, or lers Although it s definitely possible to submit jobs by typing a qsub command on the command-line, it s easier in the long run to set up a submit script that keeps track of your flags. In the submit script below, all the flags (the things prefaced by #$) could be added to your command-line call if you really wanted to. There are lots of other flags out there, some of which I should probably be using; check them out by reading the qsub man page. 11
12 test_mpi]$ cat submit_mpi.sh #!/bin/sh # run this script by typing the following command, # replacing $n with the actual number of processes: # qsub -pe orte $n./submit_mpi.sh # use bash as your shell #$ -S /bin/bash # change this to your job name #$ -N myjobname # run from the current working directory #$ -cwd # don t join stdout and stderr #$ -j n # export environment variables #$ -V echo This job is being run on $(hostname --short) echo Running $NSLOTS processes # change this to your actual executable and arguments mpirun -np $NSLOTS./myprogram.exe myconfigfile.txt 12
13 This sample MPI submit script is very similar to a non-mpi submit script, but has the variable $NSLOTS that doesn t appear to be initialized anywhere. $NSLOTS is actually a SGE built-in variable that s initialized by the -pe orte flag, which I ve chosen to keep on the command line to make it easier to run different numbers of processes. So to submit 5 jobs to quaker using this submit script, type [anna@quaker test_mpi]$ qsub -pe orte 5./submit_mpi.sh The -pe orte flag also sets the parallel environment to the value orte. There are other options for the parallel environment, such as mpich, but orte worked best for me. The qconf utility is a good source of information about things like this: [anna@quaker test_mpi]$ qconf -spl make mpi mpich orte [anna@quaker test_mpi]$ qconf -sp orte pe_name orte slots 9999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary TRUE 3.6 Submitting jobs on NERSC NERSC is the supercomputing facility at LBNL. If you need more cores than are available on our group clusters, or just want your simulations to run much much much faster without changing a line of your code, NERSC is your best bet. The NERSC user info site has lots of information about how to use their system spend a while looking around there, especially the computational systems and queues and policies sections, to decide if NERSC is right for you. Currently, we have hours on their clusters through the Joint Center for Artificial Photosynthesis (JCAP) project, and possibly through other projects; Phill should have a rough idea of the computational resources available to us. For JCAP, start by ing Lin-Wang Wang <lwwang@lbl.gov> to get an account on the NERSC system and a budget of JCAP hours. There s a form you have to fill out and possibly fax, then a few webforms to click through. 13
14 Expect it to take a few days before you have ssh access to the clusters. Sign in at with your username and password to see how many hours you have available. Your hours can be used on any of the NERSC computers (hopper, franklin, carver, etc). Each of these computers has different software, hardware, and queue configurations, so choose one that fits your needs. Carver looks nice on paper because some queues have long maximum walltimes, but there may be a usage surcharge for using carver and my jobs spent days in the queue before running. Hopper ended up being the best answer for me: my code starts running sooner on hopper than on carver, and hopper uses global scratch whereas franklin has a separate scratch system. Youwillprobablywanttowritetheoutputofyourjobstoascratchdirectory, either local ($SCRATCH) or global ($GSCRATCH) depending on what computer you re on. Your allocated disk space is much larger in scratch than in your home directory, and I/O is much faster. You can even use scratch as if it were your home directory, eg you can submit your jobs from scratch. Carver has gnuplot and hopper doesn t, so another benefit of using global scratch is that you can easily run jobs on hopper then analyze them on carver. Note that your data won t be automatically backed up no matter what directory it s in, and files may even be purged periodically, so remember to back up your data to a safe place (one or more disks hosted by the group, or NERSC s storage system HPSS). The NERSC systems use PBS/Torque instead of SGE/Rocks for queue management. TheflagsaresimilartotheSGEflags, butareprefixedby#pbsinstead of $#, and I think the flags have to be the first thing in the submit script file (ie no comments before or during the flags). In the sample submit scripts on the next couple pages, replace -q debug or -q regular with your queue of choice; replace -l walltime=00:30:00 with your actual maximum walltime (format HH:MM:SS); and replace all the other names, numbers, and directory handling commands with reasonable values. Also note that runtime library linking errors may be resolved by swapping to your correct compiler modules within your submit script, not by anything Google might suggest about changing your $LD_LIBRARY_PATH. There are two main differences between submit scripts on hopper and carver. First, carver uses mpirun -np $nprocs, as on quaker, whereas hopper uses aprun -n $nprocs to do the same thing. Second, they use different syntax and criteria for deciding the number of cores alloted to your job, although both only allocate cores in multiples of the number of processors per node. On carver, there are 8 processors per node, so if you want 16 processors then use the flag -l nodes=2:ppn=8. Hopper has 24 processors per node, so the number in the -l mppwidth flag must be a multiple of 24. On both machines, you will be allocated and charged for the number of processors you request using this flag, which may larger than the number you actually utilize with the mpirun -np or aprun -n commands, so plan your processor use accordingly. Be warned: you don t use this flag, all your processes will run on a single node, making your job painfully slow and possibly running out of memory. 14
15 Here s a submit script for the debug queue on carver. bash-3.2$ cat submit_carver_debug.pbs #!/bin/bash #PBS -S /bin/bash #PBS -N debug_job_name #PBS -j n #PBS -V #PBS -q debug #PBS -l walltime=00:30:00 #PBS -l nodes=2:ppn=8 #PBS -M your @host.com #PBS -m aeb ### SET THESE VARIABLES BY HAND ### nprocs=16 # should match the -l nodes=xx:ppn=8 line above output_prefix=debug_output ################################### # output compute node and number of MPI processes echo This job is being run on $(hostname --short) echo $nprocs # set up with correct modules for GNU compilers # fixes runtime errors involving incorrect linking of # libstdc++ and GLIBCXX libraries module swap pgi gcc module swap openmpi openmpi-gcc module list # set up to use global scratch output_path=$scratch/$output_prefix echo $output_path if [! -d $output_path ]; then mkdir $output_path else # assume the directory s content shouldn t already be there echo "Deleting existing data in $output_path" rm -r $output_path/* fi # move to current working directory (like -cwd flag in SGE) cd $PBS_O_WORKDIR # submit job mpirun -np $nprocs myprogram.exe myargs 15
16 And here s a submit script for the regular queue on hopper. anna@hopper03:/global/scratch/sd/anna/production_output> cat submit_regular_hopper.pbs #!/bin/bash #PBS -S /bin/bash #PBS -N regular_job_name #PBS -j n #PBS -V #PBS -q regular #PBS -l walltime=36:00:00 #PBS -M your @host.com #PBS -m aeb #PBS -l mppwidth=144 ### SET VARIABLES BY HAND ### nprocs=122 output_prefix=production_output ############################# # output compute node and number of MPI processes echo This job is being run on $(hostname --short) echo $nprocs # set up with correct modules for GNU compilers # I didn t check whether this is necessary for hopper # like it is for carver, but why not module swap PrgEnv-pgi PrgEnv-gnu module list # set up to use scratch output_path=$gscratch/$output_prefix echo $output_path if [! -d $output_path ]; then mkdir $output_path else # assume directory should already be there echo "directory found at $output_path, leaving it there" fi # run job cd $PBS_O_WORKDIR aprun -n $nprocs myprogram.exe myargs 16
Simple examples how to run MPI program via PBS on Taurus HPC
Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload
More informationITCS 4145/5145 Assignment 2
ITCS 4145/5145 Assignment 2 Compiling and running MPI programs Author: B. Wilkinson and Clayton S. Ferner. Modification date: September 10, 2012 In this assignment, the workpool computations done in Assignment
More informationIntroduction in Parallel Programming - MPI Part I
Introduction in Parallel Programming - MPI Part I Instructor: Michela Taufer WS2004/2005 Source of these Slides Books: Parallel Programming with MPI by Peter Pacheco (Paperback) Parallel Programming in
More informationDistributed Memory Programming with Message-Passing
Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationPoint-to-Point Communication. Reference:
Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point
More informationProgramming with MPI. Pedro Velho
Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?
More informationLecture 7: Distributed memory
Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing
More informationProgramming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho
Programming with MPI on GridRS Dr. Márcio Castro e Dr. Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage -
More informationMPI 2. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationTo connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, phoenix.ce.rit.edu, serves as the master controller or head
More informationShifter on Blue Waters
Shifter on Blue Waters Why Containers? Your Computer Another Computer (Supercomputer) Application Application software libraries System libraries software libraries System libraries Why Containers? Your
More informationLecture 3 Message-Passing Programming Using MPI (Part 1)
Lecture 3 Message-Passing Programming Using MPI (Part 1) 1 What is MPI Message-Passing Interface (MPI) Message-Passing is a communication model used on distributed-memory architecture MPI is not a programming
More informationL14 Supercomputing - Part 2
Geophysical Computing L14-1 L14 Supercomputing - Part 2 1. MPI Code Structure Writing parallel code can be done in either C or Fortran. The Message Passing Interface (MPI) is just a set of subroutines
More informationIntroduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.
Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright
More informationHigh Performance Computing Course Notes Message Passing Programming I
High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Parallel Computing with MPI Building/Debugging MPI Executables MPI Send/Receive Collective Communications with MPI April 10, 2012 Dan Negrut,
More informationParallel Programming Assignment 3 Compiling and running MPI programs
Parallel Programming Assignment 3 Compiling and running MPI programs Author: Clayton S. Ferner and B. Wilkinson Modification date: October 11a, 2013 This assignment uses the UNC-Wilmington cluster babbage.cis.uncw.edu.
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):
More informationCS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with
More informationMPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group
MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,
More informationParallel Programming with MPI: Day 1
Parallel Programming with MPI: Day 1 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH 43212-1163 1 Table of Contents Brief History of MPI
More informationThe Message Passing Model
Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform
More informationThe Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs
1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationUser Guide of High Performance Computing Cluster in School of Physics
User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationParallel Short Course. Distributed memory machines
Parallel Short Course Message Passing Interface (MPI ) I Introduction and Point-to-point operations Spring 2007 Distributed memory machines local disks Memory Network card 1 Compute node message passing
More informationCluster Clonetroop: HowTo 2014
2014/02/25 16:53 1/13 Cluster Clonetroop: HowTo 2014 Cluster Clonetroop: HowTo 2014 This section contains information about how to access, compile and execute jobs on Clonetroop, Laboratori de Càlcul Numeric's
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationMPI introduction - exercises -
MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job
More informationIntroduction to the Message Passing Interface (MPI)
Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018
More informationMPI 3. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationCSE 160 Lecture 18. Message Passing
CSE 160 Lecture 18 Message Passing Question 4c % Serial Loop: for i = 1:n/3-1 x(2*i) = x(3*i); % Restructured for Parallelism (CORRECT) for i = 1:3:n/3-1 y(2*i) = y(3*i); for i = 2:3:n/3-1 y(2*i) = y(3*i);
More informationCOSC 6374 Parallel Computation. Message Passing Interface (MPI ) I Introduction. Distributed memory machines
Network card Network card 1 COSC 6374 Parallel Computation Message Passing Interface (MPI ) I Introduction Edgar Gabriel Fall 015 Distributed memory machines Each compute node represents an independent
More informationCS 426. Building and Running a Parallel Application
CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations
More informationPCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.
PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed
More informationMPI. (message passing, MIMD)
MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point
More informationMPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationint sum;... sum = sum + c?
int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(
More informationParallel Programming, MPI Lecture 2
Parallel Programming, MPI Lecture 2 Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Introduction and Review The Von Neumann Computer Kinds of Parallel
More informationUBDA Platform User Gudie. 16 July P a g e 1
16 July 2018 P a g e 1 Revision History Version Date Prepared By Summary of Changes 1.0 Jul 16, 2018 Initial release P a g e 2 Table of Contents 1. Introduction... 4 2. Perform the test... 5 3 Job submission...
More informationTo connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or
More informationDistributed Memory Programming With MPI Computer Lab Exercises
Distributed Memory Programming With MPI Computer Lab Exercises Advanced Computational Science II John Burkardt Department of Scientific Computing Florida State University http://people.sc.fsu.edu/ jburkardt/classes/acs2
More informationTutorial 2: MPI. CS486 - Principles of Distributed Computing Papageorgiou Spyros
Tutorial 2: MPI CS486 - Principles of Distributed Computing Papageorgiou Spyros What is MPI? An Interface Specification MPI = Message Passing Interface Provides a standard -> various implementations Offers
More informationMPI Message Passing Interface
MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information
More informationmith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut
mith College CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut dthiebaut@smith.edu Introduction to MPI D. Thiebaut Inspiration Reference MPI by Blaise Barney, Lawrence Livermore National
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions
More informationComputing with the Moore Cluster
Computing with the Moore Cluster Edward Walter An overview of data management and job processing in the Moore compute cluster. Overview Getting access to the cluster Data management Submitting jobs (MPI
More informationGetting started with the CEES Grid
Getting started with the CEES Grid October, 2013 CEES HPC Manager: Dennis Michael, dennis@stanford.edu, 723-2014, Mitchell Building room 415. Please see our web site at http://cees.stanford.edu. Account
More informationCS354 gdb Tutorial Written by Chris Feilbach
CS354 gdb Tutorial Written by Chris Feilbach Purpose This tutorial aims to show you the basics of using gdb to debug C programs. gdb is the GNU debugger, and is provided on systems that
More informationAn introduction to MPI
An introduction to MPI C MPI is a Library for Message-Passing Not built in to compiler Function calls that can be made from any compiler, many languages Just link to it Wrappers: mpicc, mpif77 Fortran
More informationSolution of Exercise Sheet 2
Solution of Exercise Sheet 2 Exercise 1 (Cluster Computing) 1. Give a short definition of Cluster Computing. Clustering is parallel computing on systems with distributed memory. 2. What is a Cluster of
More informationLesson 1. MPI runs on distributed memory systems, shared memory systems, or hybrid systems.
The goals of this lesson are: understanding the MPI programming model managing the MPI environment handling errors point-to-point communication 1. The MPI Environment Lesson 1 MPI (Message Passing Interface)
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationHands-on. MPI basic exercises
WIFI XSF-UPC: Username: xsf.convidat Password: 1nt3r3st3l4r WIFI EDUROAM: Username: roam06@bsc.es Password: Bsccns.4 MareNostrum III User Guide http://www.bsc.es/support/marenostrum3-ug.pdf Remember to
More informationA message contains a number of elements of some particular datatype. MPI datatypes:
Messages Messages A message contains a number of elements of some particular datatype. MPI datatypes: Basic types. Derived types. Derived types can be built up from basic types. C types are different from
More informationAnomalies. The following issues might make the performance of a parallel program look different than it its:
Anomalies The following issues might make the performance of a parallel program look different than it its: When running a program in parallel on many processors, each processor has its own cache, so the
More informationCluster User Training
Cluster User Training From Bash to parallel jobs under SGE in one terrifying hour Christopher Dwan, Bioteam First delivered at IICB, Kolkata, India December 14, 2009 UNIX ESSENTIALS Unix command line essentials
More informationProgramming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam
Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More informationTool for Analysing and Checking MPI Applications
Tool for Analysing and Checking MPI Applications April 30, 2010 1 CONTENTS CONTENTS Contents 1 Introduction 3 1.1 What is Marmot?........................... 3 1.2 Design of Marmot..........................
More information15-440: Recitation 8
15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs
More informationDebugging on Blue Waters
Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it
More informationIntroduction to PICO Parallel & Production Enviroment
Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it
More informationWhat s in this talk? Quick Introduction. Programming in Parallel
What s in this talk? Parallel programming methodologies - why MPI? Where can I use MPI? MPI in action Getting MPI to work at Warwick Examples MPI: Parallel Programming for Extreme Machines Si Hammond,
More informationThe Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) CPUs
1 The Message Passing Interface (MPI): Parallelism on Multiple (Possibly Heterogeneous) s http://mpi-forum.org https://www.open-mpi.org/ Mike Bailey mjb@cs.oregonstate.edu Oregon State University mpi.pptx
More informationDomain Decomposition: Computational Fluid Dynamics
Domain Decomposition: Computational Fluid Dynamics May 24, 2015 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will
More informationIntroduction to MPI. HY555 Parallel Systems and Grids Fall 2003
Introduction to MPI HY555 Parallel Systems and Grids Fall 2003 Outline MPI layout Sending and receiving messages Collective communication Datatypes An example Compiling and running Typical layout of an
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 4 Message-Passing Programming Learning Objectives n Understanding how MPI programs execute n Familiarity with fundamental MPI functions
More informationComputer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview
Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a
More informationECE 574 Cluster Computing Lecture 13
ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted
More informationWorking on the NewRiver Cluster
Working on the NewRiver Cluster CMDA3634: Computer Science Foundations for Computational Modeling and Data Analytics 22 February 2018 NewRiver is a computing cluster provided by Virginia Tech s Advanced
More informationComputer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview
Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring 2009 Topic Notes: C and Unix Overview This course is about computer organization, but since most of our programming is
More informationAssignment 3 MPI Tutorial Compiling and Executing MPI programs
Assignment 3 MPI Tutorial Compiling and Executing MPI programs B. Wilkinson: Modification date: February 11, 2016. This assignment is a tutorial to learn how to execute MPI programs and explore their characteristics.
More informationRecap of Parallelism & MPI
Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break
More informationPart One: The Files. C MPI Slurm Tutorial - TSP. Introduction. TSP Problem and Tutorial s Purpose. tsp.tar. The C files, summary
C MPI Slurm Tutorial - TSP Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program Knowledge of C is assumed Code is also given for the
More informationFirst day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS
First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel
More informationSupercomputing in Plain English Exercise #6: MPI Point to Point
Supercomputing in Plain English Exercise #6: MPI Point to Point In this exercise, we ll use the same conventions and commands as in Exercises #1, #2, #3, #4 and #5. You should refer back to the Exercise
More informationParallel Programming Using MPI
Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a
More informationOutline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM
THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking
More informationA Guide to Condor. Joe Antognini. October 25, Condor is on Our Network What is an Our Network?
A Guide to Condor Joe Antognini October 25, 2013 1 Condor is on Our Network What is an Our Network? The computers in the OSU astronomy department are all networked together. In fact, they re networked
More informationHigh Performance Beowulf Cluster Environment User Manual
High Performance Beowulf Cluster Environment User Manual Version 3.1c 2 This guide is intended for cluster users who want a quick introduction to the Compusys Beowulf Cluster Environment. It explains how
More informationMPI: The Message-Passing Interface. Most of this discussion is from [1] and [2].
MPI: The Message-Passing Interface Most of this discussion is from [1] and [2]. What Is MPI? The Message-Passing Interface (MPI) is a standard for expressing distributed parallelism via message passing.
More informationReusing this material
Messages Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationThe Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing
The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task
More informationSupercomputing environment TMA4280 Introduction to Supercomputing
Supercomputing environment TMA4280 Introduction to Supercomputing NTNU, IMF February 21. 2018 1 Supercomputing environment Supercomputers use UNIX-type operating systems. Predominantly Linux. Using a shell
More informationTech Computer Center Documentation
Tech Computer Center Documentation Release 0 TCC Doc February 17, 2014 Contents 1 TCC s User Documentation 1 1.1 TCC SGI Altix ICE Cluster User s Guide................................ 1 i ii CHAPTER 1
More informationParallel Computing: Overview
Parallel Computing: Overview Jemmy Hu SHARCNET University of Waterloo March 1, 2007 Contents What is Parallel Computing? Why use Parallel Computing? Flynn's Classical Taxonomy Parallel Computer Memory
More informationMPI MPI. Linux. Linux. Message Passing Interface. Message Passing Interface. August 14, August 14, 2007 MPICH. MPI MPI Send Recv MPI
Linux MPI Linux MPI Message Passing Interface Linux MPI Linux MPI Message Passing Interface MPI MPICH MPI Department of Science and Engineering Computing School of Mathematics School Peking University
More informationIntroduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2
Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.
More informationCSE 160 Lecture 15. Message Passing
CSE 160 Lecture 15 Message Passing Announcements 2013 Scott B. Baden / CSE 160 / Fall 2013 2 Message passing Today s lecture The Message Passing Interface - MPI A first MPI Application The Trapezoidal
More informationAn Introduction to MPI
An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current
More informationSupercomputing in Plain English
Supercomputing in Plain English An Introduction to High Performance Computing Part VI: Distributed Multiprocessing Henry Neeman, Director The Desert Islands Analogy Distributed Parallelism MPI Outline
More informationPeter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in
More informationDistributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in
More informationMessage Passing Interface
Message Passing Interface by Kuan Lu 03.07.2012 Scientific researcher at Georg-August-Universität Göttingen and Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg, 37077 Göttingen,
More informationMPI Runtime Error Detection with MUST
MPI Runtime Error Detection with MUST At the 27th VI-HPS Tuning Workshop Joachim Protze IT Center RWTH Aachen University April 2018 How many issues can you spot in this tiny example? #include #include
More information