Introduction to TDDC78 Lab Series. Lu Li Linköping University Parts of Slides developed by Usman Dastgeer

Similar documents
Introduction to Lab Series DMS & MPI

Buffering in MPI communications

Practical Scientific Computing: Performanceoptimized

High Performance Computing Course Notes Message Passing Programming III

Cluster Computing MPI. Industrial Standard Message Passing

Intermediate MPI features

High Performance Computing

Message Passing Interface. most of the slides taken from Hanjun Kim

Introduction to parallel computing concepts and technics

High-Performance Computing: MPI (ctd)

Message Passing Interface

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Programming SoHPC Course June-July 2015 Vladimir Subotic MPI - Message Passing Interface

High Performance Computing Course Notes Message Passing Programming III

Non-Blocking Communications

Topics. Lecture 6. Point-to-point Communication. Point-to-point Communication. Broadcast. Basic Point-to-point communication. MPI Programming (III)

Intermediate MPI (Message-Passing Interface) 1/11

Intermediate MPI (Message-Passing Interface) 1/11

Message Passing with MPI

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Standard MPI - Message Passing Interface

Collective Communications II

Introduction to MPI. HY555 Parallel Systems and Grids Fall 2003

Parallel Programming

USER-DEFINED DATATYPES

Experiencing Cluster Computing Message Passing Interface

Department of Informatics V. HPC-Lab. Session 4: MPI, CG M. Bader, A. Breuer. Alex Breuer

Non-Blocking Communications

Outline. Communication modes MPI Message Passing Interface Standard

Parallel Programming

COSC 6374 Parallel Computation

Point-to-Point Communication. Reference:

Parallel Computing. PD Dr. rer. nat. habil. Ralf-Peter Mundani. Computation in Engineering / BGU Scientific Computing in Computer Science / INF

More Communication (cont d)

Recap of Parallelism & MPI

MPI Workshop - III. Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3

Matrix-vector Multiplication

MPI. What to Learn This Week? MPI Program Structure. What is MPI? This week, we will learn the basics of MPI programming.

Review of MPI Part 2

A Message Passing Standard for MPP and Workstations

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

More MPI. Bryan Mills, PhD. Spring 2017

Introduction to Parallel. Programming

Parallel Programming in C with MPI and OpenMP

COSC 6374 Parallel Computation. Derived Data Types in MPI. Edgar Gabriel. Spring Derived Datatypes

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

HPC Parallel Programing Multi-node Computation with MPI - I

IPM Workshop on High Performance Computing (HPC08) IPM School of Physics Workshop on High Perfomance Computing/HPC08

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8

CS 179: GPU Programming. Lecture 14: Inter-process Communication

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

The MPI Message-passing Standard Practical use and implementation (III) SPD Course 03/10/2010 Massimo Coppola

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CDP. MPI Derived Data Types and Collective Communication

Message Passing Interface

COSC 6374 Parallel Computation. Introduction to MPI V Derived Data Types. Edgar Gabriel Fall Derived Datatypes

Parallel Programming. Matrix Decomposition Options (Matrix-Vector Product)

Message Passing Programming with MPI. Message Passing Programming with MPI 1

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

Parallel Computing Paradigms

MA471. Lecture 5. Collective MPI Communication

Practical Scientific Computing: Performanceoptimized

Chapter 8 Matrix-Vector Multiplication

NAME MPI_Address - Gets the address of a location in memory. INPUT PARAMETERS location - location in caller memory (choice)

Decomposing onto different processors

Parallel programming MPI

Intermediate MPI. M. D. Jones, Ph.D. Center for Computational Research University at Buffalo State University of New York

CS 470 Spring Mike Lam, Professor. Advanced MPI Topics

Document Classification Problem

Lecture 16. Parallel Sorting MPI Datatypes

Introduction to MPI. SuperComputing Applications and Innovation Department 1 / 143

Parallel Short Course. Distributed memory machines

Introduction to MPI: Part II

Reusing this material

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop

Slides prepared by : Farzana Rahman 1

int MPI_Cart_shift ( MPI_Comm comm, int direction, int displ, int *source, int *dest )

Introduction to MPI. Ricardo Fonseca.

Distributed Memory Parallel Programming

Framework of an MPI Program

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Discussion: MPI Basic Point to Point Communication I. Table of Contents. Cornell Theory Center

Programming with MPI Basic send and receive

Document Classification

For developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).

Data parallelism. [ any app performing the *same* operation across a data stream ]

MPI Message Passing Interface

Collective Communication in MPI and Advanced Features

Parallel Computing and the MPI environment

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM

DPHPC Recitation Session 2 Advanced MPI Concepts

Basic MPI Communications. Basic MPI Communications (cont d)

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Programming with MPI Collectives

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8

MPI Runtime Error Detection with MUST

Lecture 9: MPI continued

Lecture 6: Parallel Matrix Algorithms (part 3)

Transcription:

Introduction to TDDC78 Lab Series Lu Li Linköping University Parts of Slides developed by Usman Dastgeer

Goals Shared- and Distributed-memory systems Programming parallelism (typical problems)

Goals Shared- and Distributed-memory systems Programming parallelism (typical problems) Approach and solve opartitioning Domain decomposition Functional decomposition ocommunication oagglomeration omapping o

TDDC78 Labs: Memory-based Taxonomy Memory Distributed Shared Labs 1 2&3 Distributed 5 Use MPI POSIX threads & OpenMP MPI LAB 4 (tools). May saves your time for LAB 5.

Information sources Compendium oyour primary source of information http://www.ida.liu.se/~tddc78/labs/ ocomprehensive Environment description Lab specification Step-by-step instructions Others Triolith: http://www.nsc.liu.se/systems/triolith/ MPI: http://www.mpi-forum.org/docs/

TDDC 78 Labs: Memory-based Taxonomy Memory Distributed Shared Labs 1 2&3 Distributed 5 Use MPI POSIX threads & OpenMP MPI LAB 5 (tools) at every stage. Saves your time.

Learn about MPI Define MPI types Send / Receive Broadcast Scatter / Gather LAB 1 Use virtual topologies MPI_Issend / MPI_Probe / MPI_Reduce Sending larger pieces of data LAB 5 Synchronize / MPI_Barrier

Lab-1 TDDC78: Image Filters with MPI Blur & Threshold o See compendium for details Your goal is to understand: Define types Send / Receive Broadcast Scatter / Gather For syntax and examples refer to the MPI lecture slides Decompose domains Apply filter in parallel

MPI Types Example typedef struct { int id; double data[10]; } buf_t; // Composite type buf_t item; // Element of the type MPI_Datatype buf_t_mpi; // MPI type to commit int block_lengths [] = { 1, 10 }; // Lengths of type elements MPI_Datatype block_types [] = { MPI_INT, MPI_DOUBLE }; //Set types MPI_Aint start, displ[2]; MPI_Address( &item, &start ); MPI_Address( &item.id, &displ[0] ); MPI_Address( &item.data[0], &displ[1] ); displ[0] -= start; // Displacement relative to address of start displ[1] -= start; // Displacement relative to address of start MPI_Type_struct( 2, block_lengths, displ, block_types, &buf_t_mpi ); MPI_Type_commit( &buf_t_mpi );

Send-Receive... int s_data, r_data;... MPI_Request request; MPI_ISend( &s_data, sizeof(int), MPI_INT, (my_id == 0)?1:0, 0, MPI_COMM_WORLD, &request); MPI_Status status; MPI_Recv( &r_data, sizeof(int), MPI_INT, (my_id == 0)?1:0, 0, MPI_COMM_WORLD, &status ); MPI_Wait(&request, &status);... P0 SendTo(P1) program execution P1 SendTo(P0) RecvFrom(P1) RecvFrom(P0)

Send-Receive Modes (1) SEND BLOCKING Standard Synchronous Buffered Ready MPI_Send MPI_Ssend MPI_Bsend MPI_Rsend RECEIVE BLOCKING MPI_Recv NONBLOCKING MPI_Isend MPI_Issend MPI_Ibsend MPI_Irsend NONBLOCKING MPI_Irecv

Lab-4 Lab 5: Particles Moving particles Moving Validate particlesthe pressure law ValidateDynamic the pressure law: pv=nrt interaction patterns Dynamic interaction patterns # of particles that fly across borders is n o# of particles that fly across borders is not static You need advanced domain decomp You need advanced Motivate yourdomain choice! decomposition omotivate your choice!

Process Topologies (1) Process Topologies (0) By default processors are arranged into 1-dimensional arraysinto 1By default processors are arranged dimensional arrays Processor ranks are computed! accordingly Processor ranks are computed accordingly What if processors need! to communicate in 2 What if processors need dimensions or more? to communicate in 2 dimensions or more? Use virtual topologies achieving 2D! Use virtual topologies achieving 2D instead of 1D instead ofof1d arrangement of arrangement processors with convenient ranking schemes processors with convenient ranking

Process Topologies (1) int dims[2]; // 2D matrix / grid dims[0]= 2; // 2 rows dims[1]= 3; // 3 columns MPI_Dims_create( nproc, 2, dims); int periods[2]; periods[0]= 1; // Row-periodic periods[1]= 0; // Column-non-periodic int reorder = 1; // Re-order allowed MPI_Comm grid_comm; MPI_Cart_create( MPI_COMM_WORLD, 2, dims, periods, reorder, &grid_comm);

Process Topologies (2) int int int int my_coords[2]; // Cartesian Process coordinates my_rank; // Process rank right_nbr[2]; right_nbr_rank; MPI_Cart_get( grid_comm, 2, dims, periods, my_coords); MPI_Cart_rank( grid_comm, my_coords, &my_rank); right_nbr[0] = my_coords[0]+1; right_nbr[1] = my_coords[1]; MPI_Cart_rank( grid_comm, right_nbr, & right_nbr_rank);

Collective Communication (CC)... // One processor for(int j=1; j < nproc; j++) { MPI_Send(&message, sizeof(message_t),...); }... // All the others MPI_Recv(&message,sizeof(message_t),...);

CC: Scatter / Gather Distributing (unevenly sized) chunks of data sendbuf = (int *) malloc( nproc * stride * sizeof(int)); displs = (int *) malloc( nproc * sizeof( int)); scounts = (int *) malloc( nproc * sizeof( int)); for (i = 0; i < nproc; ++i) { displs[i] =... scounts[i] =... } MPI_Scatterv( sendbuf, scounts, displs, MPI_INT, rbuf, 100, MPI_INT, root, comm);

Summary Learning goals opoint-to-point communication oprobing / Non-blocking send (choose) obarriers & Wait = Synchronization oderived data types ocollective communications ovirtual topologies Send/Receive modes ouse with care to keep your code portable, e.g. MPI_Bsend o It works there but not here!

MPI Labs at home? No problem www.open-mpi.org Simple to install Simple to use