MPI Workshop - III. Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3

Similar documents
High-Performance Computing: MPI (ctd)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Practical Scientific Computing: Performanceoptimized

Basic MPI Communications. Basic MPI Communications (cont d)

Cluster Computing MPI. Industrial Standard Message Passing

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface

Introduction to MPI Programming Part 2

Message Passing with MPI

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Recap of Parallelism & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Outline. Communication modes MPI Message Passing Interface Standard

MPI. (message passing, MIMD)

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8

CINES MPI. Johanne Charpentier & Gabriel Hautreux

int MPI_Cart_shift ( MPI_Comm comm, int direction, int displ, int *source, int *dest )

High Performance Computing

High Performance Computing Course Notes Message Passing Programming III

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Programming with MPI Collectives

High Performance Computing Course Notes Message Passing Programming III

MPI. What to Learn This Week? MPI Program Structure. What is MPI? This week, we will learn the basics of MPI programming.

MA471. Lecture 5. Collective MPI Communication

MPI point-to-point communication

Introduction to MPI. Ricardo Fonseca.

Review of MPI Part 2

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Intermediate MPI features

CS 179: GPU Programming. Lecture 14: Inter-process Communication

Introduction to Parallel. Programming

Buffering in MPI communications

Data parallelism. [ any app performing the *same* operation across a data stream ]

Parallel Computing. PD Dr. rer. nat. habil. Ralf-Peter Mundani. Computation in Engineering / BGU Scientific Computing in Computer Science / INF

Introduction to the Message Passing Interface (MPI)

Introduction to MPI part II. Fabio AFFINITO

lslogin3$ cd lslogin3$ tar -xvf ~train00/mpibasic_lab.tar cd mpibasic_lab/pi cd mpibasic_lab/decomp1d

Matrix-vector Multiplication

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

COMP 322: Fundamentals of Parallel Programming

CPS 303 High Performance Computing

Parallel Programming in C with MPI and OpenMP

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Distributed Memory Programming with MPI

Scientific Computing

Parallel Programming. Matrix Decomposition Options (Matrix-Vector Product)

Introduction to TDDC78 Lab Series. Lu Li Linköping University Parts of Slides developed by Usman Dastgeer

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

A Message Passing Standard for MPP and Workstations. Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W.

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Parallel Programming, MPI Lecture 2

Message Passing Interface

Introduction to MPI. Shaohao Chen Research Computing Services Information Services and Technology Boston University

Non-Blocking Communications

Introduction to MPI: Part II

Parallel Computing. Distributed memory model MPI. Leopold Grinberg T. J. Watson IBM Research Center, USA. Instructor: Leopold Grinberg

HPC Parallel Programing Multi-node Computation with MPI - I

Introduction to Parallel Programming

More MPI. Bryan Mills, PhD. Spring 2017

Standard MPI - Message Passing Interface

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo

Collective Communication in MPI and Advanced Features

An Introduction to MPI

A Message Passing Standard for MPP and Workstations

Distributed Systems + Middleware Advanced Message Passing with MPI

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop

MPI Application Development with MARMOT

Lecture Topic: Multi-Core Processors: MPI 1.0 Overview (Part-II)

AMath 483/583 Lecture 18 May 6, 2011

Overview of MPI 國家理論中心數學組 高效能計算 短期課程. 名古屋大学情報基盤中心教授片桐孝洋 Takahiro Katagiri, Professor, Information Technology Center, Nagoya University

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8

What is Point-to-Point Comm.?

Non-Blocking Communications

Introduction to MPI. Jerome Vienne Texas Advanced Computing Center January 10 th,

Parallel programming MPI

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).

MPI Message Passing Interface

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced

Parallel Programming Using Basic MPI. Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center

Introduction to Parallel. Programming

Message Passing Interface. most of the slides taken from Hanjun Kim

O.I. Streltsova, D.V. Podgainy, M.V. Bashashin, M.I.Zuev

Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR

MPI MESSAGE PASSING INTERFACE

Programming Using the Message Passing Paradigm

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

Message Passing Interface

Introduction to MPI. Ritu Arora Texas Advanced Computing Center June 17,

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

Practical Course Scientific Computing and Visualization

The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing

Lecture 6: Parallel Matrix Algorithms (part 3)

NUMERICAL PARALLEL COMPUTING

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS

Experiencing Cluster Computing Message Passing Interface

INTRODUCTION TO MPI COMMUNICATORS AND VIRTUAL TOPOLOGIES

Collective Communications I

Transcription:

MPI Workshop - III Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3

Schedule 4Course Map 4Fix environments to run MPI codes 4CartesianTopology! MPI_Cart_create! MPI_ Cart _rank, MPI_Cart_coords! MPI_ Cart _shift! MPI_Gatherv, MPI_Reduce, MPI_Sendrecv 4Passing Data Structures in MPI! MPI_Type_struct, MPI_Vector, MPI_Hvector

MPI functional routines MPI Examples Week 1 Point to Point Basic Collective MPI_SEND (MPI_ISEND) MPI_RECV (MPI_IRECV) MPI_BCAST MPI_SCATTER MPI_GATHER Helloworld Swapmessage Vector Sum Course Map Week 2 Collective Communications MPI_BCAST MPI_SCATTERV MPI_GATHERV MPI_REDUCE MPI_BARRIER Pi Matrix/vector multiplication Matrix/matrix mulplication Week 3 Advanced Topics MPI_DATATYPE MPI_HVECTOR MPI_VECTOR MPI_STRUCT MPI_CART_CREATE Poisson Equation Passing Structures/ common blocks Parallel topologies in MPI

Poisson Equation on a 2D Grid Poisson Equation on a 2D Grid periodic boundary conditions periodic boundary conditions + + = = e e a y x y L x a y L x a y x y x ] 2 2 /4) 3 [( ] 2 2 /4) [( ), ( 2 ), ( 4 ), ( π ρ πρ φ

4F90 Code! N 2 matrix for ρ and φ! Initialize ρ.! Discretize the equation! iterate until convergence! output results Serial Poisson Solver 1 φi, j= π x 2 ρi, j+ ( φi+ 1, j+ φi 1, j+ φi, j+ 1+ φi, j 1) 4 i, j new old φ φ / ρ < ε i, j i, j i, j i, j

Serial Poisson Solver: Solution

Serial Poisson Solver (cont( cont) do j=1, M do i = 1, N!Fortran: access down columns first phi(i,j) = rho(i,j) + φ i+1, j.25 * ( phi_old( modulo(i,n)+1, j ) φ i 1, j + phi_old( modulo(i-2,n)+1, j ) + phi_old( i, modulo(j,n)+1 ) φ i, j+1 φ i, j 1 enddo enddo + phi_old( i, modulo(j-2,n)+1 ) )

Parallel Poisson Solver in MPI domain decomp.. 3 x 5 processor grid Columns 0 1 2 3 4 Row 0 0,0 0,1 0,2 0,3 0,4 Row 1 1,0 1,1 1,2 1,3 1,4 Row 2 2,0 2,1 2,2 2,3 2,4

Parallel Poisson Solver in MPI Processor Grid: e.g. 3 x 5, N=M=64 Columns M 0 1 2 3 4 N 1 1314 26 27 39 40 52 53 64 1 1 Row 0 0,0 0,1 0,2 0,3 0,4 22 23 Row 1 Row 2 43 44 64 1,0 1,1 1,2 1,3 1,4 2,0 2,1 2,2 2,3 2,4

Parallel Poisson Solver in MPI Boundary data movement each iteration Columns 0 1 2 3 4 Row 0 0,0 0,1 0,2 0,3 0,4 Row 1 1,0 1,1 1,2 1,3 1,4 Row 2 2,0 2,1 2,2 2,3 2,4

Ghost Cells: Local Indices 0 1 M_local = 13 0 1 13 14 N_local = 21 P(1,2) 21 22

Data Movement e.g. Shift Right, (East) --> boundary data --> ghost cells P(i, j) P(i+1, j)

Communicators and Topologies 4A Communicator is a set of processors which can talk to each other 4The basic communicator is MPI_COMM_WORLD 4One can create new groups or subgroups of processors from MPI_COMM_WORLD or other communicators 4MPI allows one to associate a Cartesian or Graph topology with a communicator

MPI Cartesian Topology Functions 4MPI_CART_CREATE( old_comm, nmbr_of_dims, dim_sizes(), wrap_around(), reorder, cart_comm, ierr)! old_comm = MPI_COMM_WORLD! nmbr_of_dims = 2! dim_sizes() = (np_rows, np_cols) = (3, 5)! wrap_around = (.true.,.true. )! reorder =.false. (generally set to.true.) allows system to reorder the procr #s for better performance! cart_comm = grid_comm (name for new communicator)

MPI Cartesian Topology Functions 4MPI_CART_RANK( comm, coords(), rank, ierr)! comm = grid_comm! coords() = ( coords(1), coords(2) ), e.g. (0,2) for P(0,2)! rank = processor rank inside grid_com! returns the rank of the procr with coordinates coords() 4MPI_CART_COORDS( comm, rank, nmbr_of_dims, coords(), ierr )! nmbr_of_dims = 2! returns the coordinates of the procr in grid_comm given its rank in grid_comm

MPI Cartesian Topology Functions 4MPI_CART_SUB( grid_comm, free_coords(), sub_comm, ierr)! grid_comm = communicator with a topology! free_coords() : (.false.,.true. ) -> (i fixed, j varies), i.e. row communicator! : (.true.,.false. ) -> (i varies, j fixed), i.e. column communicator! sub_comm = the new sub communicator (say row_comm or col_comm)

MPI Cartesian Topology Functions 4MPI_CART_SHIFT( grid_comm, direction, displ, rank_recv_from, rank_send_to, ierr)! grid_comm = communicator with a topology! direction : = 0 i varies, column shift; N or S! : = 1 j varies, row shift ; E or W! disp : how many procrs to shift over (+ or -)! e.g. N shift: direction=0, disp= -1,! S shift: direction=0, disp= +1! E shift: direction=1, disp= +1! W shift: direction=1, disp= -1

MPI Cartesian Topology Functions 4MPI_CART_SHIFT( grid_comm, direction, displ, rank_recv_from, rank_send_to, ierr)! MPI_CART_SHIFT does not actually perform any data transfer. It returns two ranks.! rank_recv_from: the rank of the procr from which the calling procr will receive the new data! rank_send_to: the rank of the procr to which data will be sent from the calling procr! Note: MPI_CART_SHIFT does the modulo arithmetic if the corresponding dimensions has wrap_around(*) =.true.

Parallel Poisson Solver N=upward shift in colms!n or upward shift!p(i+1,j) (recv from) P(i,j) (send to) P(i-1,j) direction = 0!i varies disp = -1!i i-1 top_bottom_buffer = phi_old_local(1,:) call MPI_CART_SHIFT( grid_comm, direction, disp, rank_recv_from, rank_send_to, ierr) call MPI_SENDRECV( top_bottom_buffer, M_local+1, MPI_DOUBLE_PRECISION, rank_send_to, tag, bottom_ghost_cells, M_local, MPI_DOUBLE_PRECISION, rank_recv_from, tag, grid_comm, status, ierr) phi_old_local(n_local+1,:) = bottom_ghost_cells

Parallel Poisson Solver Main computation do j=1, M_local do i = 1, N_local phi(i,j) = rho(i,j) + φ.25 * ( phi_old( i+1, j ) i+1, j φ + phi_old( i-1, j ) i 1, j + phi_old( i, j+1 ) φ i j, +1 φ i j, 1 enddo enddo + phi_old( i, j-1 ) ) Note: indices are all within range now due to ghost cells

i_offset=0; do i = 1, coord(1) Parallel Poisson Solver: Global vs.. Local Indices i_offset = i_offset + nmbr_local_rows(i) enddo j_offset=0; do j = 1, coord(2) j_offset = j_offset + nmbr_local_cols(j) enddo do j = j_offset+1, j_offset + M_local!global indices y = (real(j)-.5)/m*ly-ly/2 do i = i_offset+1, i_offset+1 + N_local!global enddo enddo x = (real(i)-.5)/n*lx makerho_local(i-i_offset,j-j_offset) = f(x,y)!store with local indices!store with local indices

Parallel Poisson Solver in MPI processor grid: e.g. 3 x 5, N=M=64 Columns 0 1 M 2 3 4 1 1314 26 27 39 40 52 53 64 1 1 Row 0 0,0 0,1 0,2 0,3 0,4 22 1 13 23 1 N Row 1 Row 2 43 44 64 1,0 1,1 1,2 1,3 1,4 21 2,0 2,1 2,2 2,3 2,4

MPI Reduction & Communication Functions 4Point-to-point communications in N,S,E,W shifts! MPI_SENDRECV( sendbuf, sendcount,sendtype, dest, sendtag, recvbuf, recvcount, recvtype, source,recvtag, comm, status ierr) 4Reduction operations in the computation! MPI_ALLREDUCE( sendbuf, recvbuf, count, datatype, operation, ierr)! operation = MPI_SUM, MPI_MAX, MPI_MIN_LOC,...

I/O of final results Step 1: in row_comm comm, Gatherv the columns into matrices of size (# rows) x M Columns 0 1 M 2 3 4 N Row 0 0,0 0,1 0,2 0,3 0,4 Row 1 1,0 1,1 1,2 1,3 1,4 Row 2 2,0 2,1 2,2 2,3 2,4

I/O of final results Step 2: transpose the matrix; in row_comm comm, Gatherv the columns in M x N matrix. Result is the Transpose of the matrix for φ. N M φ M

MPI Workshop - III Research Staff Passing Structures in MPI Week 3 of 3

Topic 4Passing Data Structures in MPI: C version! MPI_Type_struct! also see more general features MPI_PACK, MPI_UNPACK! See the code MPI_Type_struct_annotated.c in /nfs/workshops/mpi/advanced_mpi_examples

A structure in C #define IDIM 2 #define FDIM 3 #define CHDIM 10 struct buf2{ int Int[IDIM]; float Float[FDIM]; char Char[CHDIM]; float x, y, z; int i,j,k,l,m,n; } sendbuf, recvbuf;

struct buf2{ Create a structure datatype in MPI int Int[IDIM]; float Float[FDIM]; char Char[CHDIM]; float x,y,z; int i,j,k,l,m,n; } sendbuf, recvbuf; MPI_Datatype type[nblocks]; /* #define NBLOCKS 5 */ int blocklen[nblocks]; type=[mpi_int, MPI_FLOAT, MPI_CHAR, MPI_FLOAT, MPI_INT]; blocklen=[idim,fdim,chdim,3,6];

Create a structure datatype in MPI (cont( cont) struct buf2{ int Int[IDIM]; float Float[FDIM]; char Char[CHDIM]; float x,y,z; int i,j,k,l,m,n; } sendbuf, recvbuf; MPI_Aint disp; /* to align on 4-byte boundaries */ MPI_Address(&sendbuf.Int[0],disp); MPI_Address(&sendbuf.Float[0],disp+1); MPI_Address(&sendbuf.Char[0],disp+2); MPI_Address(&sendbuf.x,disp+3); MPI_Address(&sendbuf.i,disp+4);

Create a structure datatype in MPI (cont( cont) int base; MPI_Datatype NewType; /* new MPI type we re creating */ MPI_Status status; base = disp[0]; /* byte alignment */ for(i=0;i<nblocks;i++) disp[i] -= base; /* MPI calls to create new structure datatype. Since this is the 90 s we need more than to just want this new datatype, we must commit to it. */ MPI_Type_struct(NBLOCKS, blocklen, disp, type, &NewType); MPI_Type_commit(&NewType);

Use of the new structure datatype in MPI /* Right Shift: rank-1 mod(np) rank rank+1 mod(np) */ MPI_Comm_size(MPI_COMM_WORLD, &NP); MPI_Comm_rank(MPI_COMM_WORLD, &rank); sendto recvfrom = (rank+1)%np; = (rank-1+np)%np; MPI_Sendrecv(&sendbuf, 1, NewType, sendto, tag, &recvbuf, 1, NewType, recvfrom, tag, MPI_COMM_WORLD, &status);

References: Some useful books 4MPI: The complete reference! Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker and Jack Dongara, MIT Press! /nfs/workshops/mpi/docs/mpi_complete_reference.ps.z 4Parallel Programming with MPI! Peter S. Pacheco, Morgan Kaufman Publishers, Inc 4Using MPI: Portable Parallel Programing with Message Passing Interface! William Gropp, E. Lusk and A. Skjellum, MIT Press