Elementary Parallel Programming with Examples Reinhold Bader (LRZ) Georg Hager (RRZE)
Two Paradigms for Parallel Programming Hardware Designs Distributed Memory M Message Passing explicit programming required M M M Shared Memory common address space for a number of CPUs access efficiency may vary SMP, (cc)numa many programming models potentially easier to handle hardware and OS support! P P P P Message P P P P Communication Network Memory 2
Message Passing vs. Shared Memory: Programming Models Distributed Memory Same program on each processor/machine (SPMD) or Multiple programs with consistent communication structure (MPMD) Program written in a sequential language (Fortran/C[++]) All variables process-local no implicit knowledge of data on other processors Data exchange between processes: Send/receive messages via appropriate library Most tedious, but also the most flexible way of parallelization Message passing standard discussed here: Message Passing Interface, MPI Shared Memory Single Program on single machine UNIX Process splits off threads, mapped to CPUs for work distribution Data may be process-global or thread-local exchange unnecessary or via suitable synchronization mechanisms Programming models explicit threading (hard) directive-based threading via OpenMP (easier) automatic parallelization 3
OpenMP Basics (1) Architecture of OpenMP OS View: parallel work done by threads User View: Directives (comment lines) Library Routines Environment Variables (Resources, Scheduling) 4
OpenMP Basics (2) Fork and Join Program start: only master thread runs First parallel region: team of threads generated process and data distribution possible via directives fork for parallel regions join (synchronize) when leaving parallel region Only master executes serial part Scheduling issues 1 Thread per context thread? 1 Thread per Core? 1 Thread per Socket? Thread 0 1 2 3 5
OpenMP Basics (3) Hello World in OpenMP Fortran 95 program hello use omp_lib implicit none integer :: nthr, myth!$omp parallel private(myth)!$omp single nthr = omp_get_num_threads()!$omp end single myth = omp_get_thread_num() write(6,*) `Hello from `,myth, & & `of `, nthr!$omp end parallel end program hello OpenMP directives: sentinel directive [clause [(par)]] parallel / end parallel directives: enclosed code executed by all threads OpenMP function calls: module omp_lib provides interface get number of threads and index of executing thread Data scoping myth thread-local: private nthr process-global: shared 6
OpenMP Basics (4) Compiling and Running (Intel Fortran Compiler) Compile: ifort openmp o hello.exe hello.f90 Run: export OMP_NUM_THREADS=4./hello.exe Hello from 0 of 4 Hello from 2 of 4 Hello from 3 of 4 Hello from 1 of 4 ordering not reproducible! Compile for serial run: ifort openmp_stubs o hello.exe \ hello.f90 - lpthread uses stub library need for thread library unclear Special compiler switch activates OpenMP directives generates threaded code further options available OpenMP environment defines runtime behaviour e. g., number of threads to be used Serial functionality of program should be consistent function calls require stub library Alternative possibility available 7
If you re missing your favourite language... Of course, all of this is also available for Fortran 77 C C++ other Compilers other Platforms some of which will be discussed in the OpenMP lecture. Now, let us turn to doing the same on a distributed memory system need to use MPI 8
MPI Basics (1) Initialization and Finalization Each processor must start/terminate an MPI process Usually handled automatically More than one process per processor is often, but not always possible First call in MPI program: initialization of parallel machine! call MPI_INIT(ierror) Last call: shut down parallel machine! call MPI_FINALIZE(ierror) ierror = integer argument for error report Only process with rank 0 (see later) is guaranteed to return from MPI_FINALIZE Usually: stdout/stderr of each MPI process is redirected to console where program was started 9
MPI Basics (2) Communicator and Rank MPI_INIT defines communicator MPI_COMM_WORLD: MPI_COMM_WORLD 0 1 2 3 4 5 6 7 Processes rank MPI_COMM_WORLD defines the processes that belong to the parallel machine rank: labels processes inside the parallel machine 10
MPI Basics (3) Communicator and Rank The rank identifies each process within the communicator (e.g., MPI_COMM_WORLD): Get rank with MPI_COMM_RANK: integer rank, ierror call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) rank = 0,1,2,, (number of processes 1) Number of processes: get with MPI_COMM_SIZE: integer size, ierror call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) 11
MPI Basics (4) Communicator and Rank MPI_COMM_WORLD is a global variable and either it - or a communicator created from it - are required as argument for nearly all MPI calls rank is target label for MPI messages Can define what each process should do: if (rank.eq. 0) else end if *** do work for rank 0 *** *** do work for other ranks *** 12
MPI Basics (5) A Very Simple MPI Program in Fortran 95 program hello use mpi integer rank, size, ierror MPI subroutine calls MPI_<routine name> call MPI_INIT(ierror) call MPI_COMM_SIZE & & (MPI_COMM_WORLD, size, ierror) call MPI_COMM_RANK & & (MPI_COMM_WORLD, rank, ierror) write(*,*) 'Hello World! I am ', & & rank,' of ',size require either include file mpif.h in C: mpi.h or Fortran 90 module mpi last argument: return code in C: function value! call MPI_FINALIZE(ierror) end program 13
MPI Basics (6) Compiling and running Compile: mpif90 o hello hello.f90 Run on 4 processors: mpirun np 4./hello Output: Order undefined! Hello World! I am 3 of 4 Hello World! I am 1 of 4 Hello World! I am 0 of 4 Hello World! I am 2 of 4 14
Summary Even here we can see Complexity of handling MPI is much higher than for OpenMP Beware OpenMP / MPI designed for portability But: many technical details vary between platforms Read documentation for compiling, linking (is there an mpif90?) running (use mpirun or mpiexec or neither?) treatment of command line arguments treatment of STDIN/STDOUT/STDERR is there an OpenMP Fortran 90 Module/stub library? 15