Message-Passing Thought Exercise

Size: px
Start display at page:

Download "Message-Passing Thought Exercise"

Transcription

1 Message-Passing Thought Exercise Traffic Modelling David Henty EPCC

2 traffic flow 15 May 2015 Traffic Modelling 2 we want to predict traffic flow to look for effects such as congestion build a computer model

3 simple traffic model divide road into a series of cells either occupied or unoccupied perform a number of steps each step, cars move forward if space ahead is empty could do this by moving pawns on a chess board 15 May 2015 Traffic Modelling 3

4 traffic behaviour 15 May 2015 Traffic Modelling 4 model predicts a number of interesting features traffic lights congestion average speed 1.0 more complicated models are used in practice % 50% 100% density of cars

5 how fast can we run the model? measure speed in Car Operations Per second how many COPs? around 2 COPs but what about three people? can they do six COPs? 15 May 2015 Traffic Modelling 5

6 a parallel traffic model 15 May 2015 Traffic Modelling 6 A S B C

7 State Table If R t (i) = 0, then R t+1 (i) is given by: R t (i-1) = 0 R t (i -1) = 1 R t (i+1) = R t (i+1) = If R t (i) = 1, then R t+1 (i) is given by: R t (i-1) = 0 R t (i -1) = 1 R t (i+1) = R t (i+1) = HPC Concepts

8 Pseudo Code (serial) 8 HPC Concepts declare arrays old(i) and new(i), i = 0,1,...,N,N+1 initialise old(i) for i = 1,2,...,N-1,N (eg randomly) loop over iterations set old(0) = old(n) and set old(n+1) = old(1) loop over i = 1,...,N if old(i) = 1 if old(i+1) = 1 then new(i) = 1 else new(i) = 0 if old(i) = 0 if old(i-1) = 1 then new(i) = 1 else new(i) = 0 end loop over i set old(i) = new(i) for i = 1,2,...,N-1,N end loop over iterations

9 Pseudo Code (serial with subroutines) 9 HPC Concepts declare arrays old(i) and new(i), i = 0,1,...,N,N+1 initialise old(i) for i = 1,2,...,N-1,N (eg randomly) loop over iterations! Implement boundary conditions set old(0) = old(n) and set old(n+1) = old(1)! Update road call newroad(new, old, N)! Prepare for next iteration set old(i) = new(i) for i = 1,2,...,N-1,N end loop over iterations

10 Pseudo Code (distributed memory) 10 HPC Concepts! assume we are running on P processes declare arrays old(i) and new(i), i = 0,1,...,N/P,N/P+1 initialise old(i) for i = 1,2,...,N/P-1,N/P (eg randomly) loop over iterations! Implement boundary conditions (processes arranged as a ring) set old(0) on this process to old(n/p) from previous process set old(n/p+1) on this process to old(1) from next process! Update road call newroad(new, old, N/P)! Prepare for next iteration set old(i) = new(i) for i = 1,2,...,N/P-1,N/P end loop over iterations

11 Halo swapping 15 May 2015 Traffic Modelling 11! Implement boundary conditions set old(0) on this process to old(n/p) from previous process set old(n/p+1) on this process to old(1) from next process Implement this using blocking receives (e.g. MPI_Recv) and: or or synchronous send (routine blocks until message is received) e.g. MPI_Ssend asynchronous send (message copied into buffer, returns straight away) e.g. MPI_Bsend non-blocking synchronous send (no buffering but immediate return) e.g. MPI_Issend / MPI_Wait

12 Synchronous sends 15 May 2015 Traffic Modelling 12! Implement boundary conditions Ssend(old(N/P), up) Recv (old(1), down) Ssend(old(1), down) Recv (old(n/p+1), up) Guaranteed to deadlock

13 Asynchronous (buffered) sends! Implement boundary conditions Bsend(old(N/P), Recv (old(1), Bsend(old(1), up) down) down) Recv (old(n/p+1), up) Where do synchronisation issues become important? call newroad(new, old, N/P)? OK because we are writing new but only reading old set old(i) = new(i)? only OK because Bsend has copied old(1) and old(n/p) We don t really care if/when the message is received we do really care if/when we can safely reuse the local send buffers 15 May 2015 Traffic Modelling 13

14 Non-blocking (immediate) sends 15 May 2015 Traffic Modelling 14! Implement boundary conditions Issend(old(N/P), up) Recv (old(1), down) Issend(old(1), down) Recv (old(n/p+1), up) call newroad(new, old, N/P) set old(i) = new(i) for i = 1,2,...,N/P-1,N/P)

15 Non-blocking (immediate) sends 15 May 2015 Traffic Modelling 15! Implement boundary conditions Issend(old(N/P), up) Recv (old(1), down) Issend(old(1), down) Recv (old(n/p+1), up) call newroad(new, old, N/P) set old(i) = new(i) for i = 1,2,...,N/P-1,N/P)! Wait for communications to complete before next iteration wait(up) wait(down)

16 15 May 2015 Traffic Modelling 16 Non-blocking (immediate) sends! Implement boundary conditions Issend(old(N/P), Recv (old(1), Issend(old(1), up) down) down) Recv (old(n/p+1), up) call newroad(new, old, N/P) set old(i) = new(i) for i = 1,2,...,N/P-1,N/P)! Wait for communications to complete before next iteration wait(up) wait(down) Incorrect! overwriting old is the key issue need to know boundary values of old are sent before overwriting

17 Non-blocking sends: correct 15 May 2015 Traffic Modelling 17! Implement boundary conditions Issend(old(N/P), up) Recv (old(1), down) Issend(old(1), down) Recv (old(n/p+1), up) call newroad(new, old, N/P) wait(up) wait(down) set old(i) = new(i) for i = 1,2,...,N/P-1,N/P)

18 Delaying the waits 15 May 2015 Traffic Modelling 18! Implement boundary conditions Issend(old(N/P), up) Recv (old(1), down) Issend(old(1), down) Recv (old(n/p+1), up) call newroad(new, old, N/P) set old(i) = new(i) for i = 2,3,...,N/P-1) wait(up) old(n/p = new(m/p) wait(down) old(1) = new(1)

19 15 May 2015 Traffic Modelling 19 RMA synchronisation Similar synchronisation issues to non-blocking message passing but worse!

20 Remote Memory Access 20 HPC Concepts Imagine we can do halo swaps directly with read or write where do synchronisation issues become important? what assumptions are you making about remote reads and writes? Consider remote read first old(0) = old(n/p) from previous process old(n/p+1) = old(1) from next process call newroad(new, old, N/P) set old(i) = new(i) for i = 1,2,...,N/P-1,N/P

21 Remote Memory Access 21 HPC Concepts Imagine we can do halo swaps directly with read or write where do synchronisation issues become important? what assumptions are you making about remote reads and writes? Consider remote read first old(0) old(n/p+1) = old(1) = old(n/p) from previous process call newroad(new, old, N/P) from next process! synchronise to ensure my old values have all been read set old(i) = new(i) for i = 1,2,...,N/P-1,N/P! synchronise to ensure neighbours old values have been! updated before I read them on the next iteration assuming reads are blocking like Recv

22 Remote Memory Access 22 HPC Concepts Imagine we can do halo swaps directly with read or write where do synchronisation issues become important? what assumptions are you making about remote reads and writes? Consider remote writes set old(0) on next process = old(n/p) set old(n/p+1) on previous process = old(1) call newroad(new, old, N/P) set old(i) = new(i) for i = 1,2,...,N/P-1,N/P

23 Remote Memory Access 23 HPC Concepts Imagine we can do halo swaps directly with read or write where do synchronisation issues become important? what assumptions are you making about remote reads and writes? Consider remote writes set old(0) on next process = old(n/p) set old(n/p+1) on previous process = old(1)! synchronise to ensure my halos on old have been updated call newroad(new, old, N/P) set old(i) = new(i) for i = 1,2,...,N/P-1,N/P

24 Remote Memory Access 24 HPC Concepts Imagine we can do halo swaps directly with read or write where do synchronisation issues become important? what assumptions are you making about remote reads and writes? Consider remote writes set old(0) on next process = old(n/p) set old(n/p+1) on previous process = old(1)! synchronise to ensure my halos on old have been updated call newroad(new, old, N/P) assuming writes behave like a Bsend set old(i) = new(i) for i = 1,2,...,N/P-1,N/P! synchronise to ensure my neighbours have finished with their! old arrays (in newroad ) before overwriting them

25 Summary 25 HPC Concepts Synchronisation in PGAS approaches is not simple easy to write programs with subtle synchronisation errors Think first, code later!

Further MPI Programming. Paul Burton April 2015

Further MPI Programming. Paul Burton April 2015 Further MPI Programming Paul Burton April 2015 Blocking v Non-blocking communication Blocking communication - Call to MPI sending routine does not return until the send buffer (array) is safe to use again

More information

MPI Casestudy: Parallel Image Processing

MPI Casestudy: Parallel Image Processing MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by

More information

Non-Blocking Communications

Non-Blocking Communications Non-Blocking Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Non-Blocking Communications

Non-Blocking Communications Non-Blocking Communications Deadlock 1 5 2 3 4 Communicator 0 2 Completion The mode of a communication determines when its constituent operations complete. - i.e. synchronous / asynchronous The form of

More information

Exercises: Message-Passing Programming

Exercises: Message-Passing Programming T E H U I V E R S I T Y O H F R G E I B U Exercises: Message-Passing Programming Hello World avid Henty. Write an MPI program which prints the message Hello World.. Compile and run on several processes

More information

Message Passing Programming. Modes, Tags and Communicators

Message Passing Programming. Modes, Tags and Communicators Message Passing Programming Modes, Tags and Communicators Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

MPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh

MPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh MPI Optimisation Advanced Parallel Programming David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh Overview Can divide overheads up into four main categories: Lack of parallelism Load imbalance

More information

Exercises: Message-Passing Programming

Exercises: Message-Passing Programming T H U I V R S I T Y O H F R G I U xercises: Message-Passing Programming Hello World avid Henty. Write an MPI program which prints the message Hello World. ompile and run on one process. Run on several

More information

Parallel Programming

Parallel Programming Parallel Programming Point-to-point communication Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS 18/19 Scenario Process P i owns matrix A i, with i = 0,..., p 1. Objective { Even(i) : compute Ti

More information

COSC 6374 Parallel Computation

COSC 6374 Parallel Computation COSC 6374 Parallel Computation Message Passing Interface (MPI ) II Advanced point-to-point operations Spring 2008 Overview Point-to-point taxonomy and available functions What is the status of a message?

More information

Message Passing Programming. Designing MPI Applications

Message Passing Programming. Designing MPI Applications Message Passing Programming Designing MPI Applications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Message Passing Programming. Modes, Tags and Communicators

Message Passing Programming. Modes, Tags and Communicators Message Passing Programming Modes, Tags and Communicators Overview Lecture will cover - explanation of MPI modes (Ssend, Bsend and Send) - meaning and use of message tags - rationale for MPI communicators

More information

Distributed Memory Programming With MPI (4)

Distributed Memory Programming With MPI (4) Distributed Memory Programming With MPI (4) 2014 Spring Jinkyu Jeong (jinkyu@skku.edu) 1 Roadmap Hello World in MPI program Basic APIs of MPI Example program The Trapezoidal Rule in MPI. Collective communication.

More information

Exercises: Message-Passing Programming

Exercises: Message-Passing Programming T H U I V R S I T Y O H F R G I U xercises: Message-Passing Programming Hello World avid Henty. Write an MPI program which prints the message Hello World. ompile and run on one process. Run on several

More information

Message-Passing Programming with MPI

Message-Passing Programming with MPI Message-Passing Programming with MPI Message-Passing Concepts David Henty d.henty@epcc.ed.ac.uk EPCC, University of Edinburgh Overview This lecture will cover message passing model SPMD communication modes

More information

Parallel Short Course. Distributed memory machines

Parallel Short Course. Distributed memory machines Parallel Short Course Message Passing Interface (MPI ) I Introduction and Point-to-point operations Spring 2007 Distributed memory machines local disks Memory Network card 1 Compute node message passing

More information

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC) Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts

More information

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text

Acknowledgments. Programming with MPI Basic send and receive. A Minimal MPI Program (C) Contents. Type to enter text Acknowledgments Programming with MPI Basic send and receive Jan Thorbecke Type to enter text This course is partly based on the MPI course developed by Rolf Rabenseifner at the High-Performance Computing-Center

More information

Programming with MPI Basic send and receive

Programming with MPI Basic send and receive Programming with MPI Basic send and receive Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Acknowledgments This course is partly based on the MPI course developed

More information

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES. Basic usage of OpenSHMEM

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES. Basic usage of OpenSHMEM SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Basic usage of OpenSHMEM 2 Outline Concept and Motivation Remote Read and Write Synchronisation Implementations OpenSHMEM Summary 3 Philosophy of the talks In

More information

Programming with MPI

Programming with MPI Programming with MPI p. 1/?? Programming with MPI Point-to-Point Transfers Nick Maclaren nmm1@cam.ac.uk May 2008 Programming with MPI p. 2/?? Digression Most books and courses teach point--to--point first

More information

MPI Visualisation. Thomas RICHARD. Academic year 2010/2011

MPI Visualisation. Thomas RICHARD. Academic year 2010/2011 MPI Visualisation Thomas RICHARD Academic year 2010/2011 MSc in High Performance Computing Edinburgh Parallel Computing Centre The University of Edinburgh Year of Presentation: 2011 Abstract The MPI Visualisation

More information

CS450 - Structure of Higher Level Languages

CS450 - Structure of Higher Level Languages Spring 2018 Streams February 24, 2018 Introduction Streams are abstract sequences. They are potentially infinite we will see that their most interesting and powerful uses come in handling infinite sequences.

More information

CSC D70: Compiler Optimization Register Allocation

CSC D70: Compiler Optimization Register Allocation CSC D70: Compiler Optimization Register Allocation Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons

More information

CSE 160 Lecture 18. Message Passing

CSE 160 Lecture 18. Message Passing CSE 160 Lecture 18 Message Passing Question 4c % Serial Loop: for i = 1:n/3-1 x(2*i) = x(3*i); % Restructured for Parallelism (CORRECT) for i = 1:3:n/3-1 y(2*i) = y(3*i); for i = 2:3:n/3-1 y(2*i) = y(3*i);

More information

Message-Passing and MPI Programming

Message-Passing and MPI Programming Message-Passing and MPI Programming More on Point-to-Point 6.1 Introduction N.M. Maclaren nmm1@cam.ac.uk July 2010 These facilities are the most complicated so far, but you may well want to use them. There

More information

Agent Design Example Problems State Spaces. Searching: Intro. CPSC 322 Search 1. Textbook Searching: Intro CPSC 322 Search 1, Slide 1

Agent Design Example Problems State Spaces. Searching: Intro. CPSC 322 Search 1. Textbook Searching: Intro CPSC 322 Search 1, Slide 1 Searching: Intro CPSC 322 Search 1 Textbook 3.0 3.3 Searching: Intro CPSC 322 Search 1, Slide 1 Lecture Overview 1 Agent Design 2 Example Problems 3 State Spaces Searching: Intro CPSC 322 Search 1, Slide

More information

Point-to-Point Communication. Reference:

Point-to-Point Communication. Reference: Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point

More information

Standard MPI - Message Passing Interface

Standard MPI - Message Passing Interface c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those

More information

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction

More information

Programming with MPI

Programming with MPI Programming with MPI p. 1/?? Programming with MPI One-sided Communication Nick Maclaren nmm1@cam.ac.uk October 2010 Programming with MPI p. 2/?? What Is It? This corresponds to what is often called RDMA

More information

Practical Scientific Computing: Performanceoptimized

Practical Scientific Computing: Performanceoptimized Practical Scientific Computing: Performanceoptimized Programming Programming with MPI November 29, 2006 Dr. Ralf-Peter Mundani Department of Computer Science Chair V Technische Universität München, Germany

More information

Writing Message Passing Parallel Programs with MPI

Writing Message Passing Parallel Programs with MPI Writing Message Passing Parallel Programs with MPI A Two Day Course on MPI Usage Course Notes Version 1.8.2 Neil MacDonald, Elspeth Minty, Joel Malard, Tim Harding, Simon Brown, Mario Antonioletti Edinburgh

More information

Parallel Programming with Coarray Fortran

Parallel Programming with Coarray Fortran Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming

More information

MPI Case Study. Fabio Affinito. April 24, 2012

MPI Case Study. Fabio Affinito. April 24, 2012 MPI Case Study Fabio Affinito April 24, 2012 In this case study you will (hopefully..) learn how to Use a master-slave model Perform a domain decomposition using ghost-zones Implementing a message passing

More information

Stepwise Refinement. Lecture 12 COP 3014 Spring February 2, 2017

Stepwise Refinement. Lecture 12 COP 3014 Spring February 2, 2017 Stepwise Refinement Lecture 12 COP 3014 Spring 2017 February 2, 2017 Top-Down Stepwise Refinement Top down stepwise refinement is a useful problem-solving technique that is good for coming up with an algorithm.

More information

High Performance Computing

High Performance Computing High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,

More information

Welfare Navigation Using Genetic Algorithm

Welfare Navigation Using Genetic Algorithm Welfare Navigation Using Genetic Algorithm David Erukhimovich and Yoel Zeldes Hebrew University of Jerusalem AI course final project Abstract Using standard navigation algorithms and applications (such

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One

More information

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting CS2 Algorithms and Data Structures Note 10 Depth-First Search and Topological Sorting In this lecture, we will analyse the running time of DFS and discuss a few applications. 10.1 A recursive implementation

More information

Discussion: MPI Basic Point to Point Communication I. Table of Contents. Cornell Theory Center

Discussion: MPI Basic Point to Point Communication I. Table of Contents. Cornell Theory Center 1 of 14 11/1/2006 3:58 PM Cornell Theory Center Discussion: MPI Point to Point Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate

More information

DATA MINING I - CLUSTERING - EXERCISES

DATA MINING I - CLUSTERING - EXERCISES EPFL ENAC TRANSP-OR Prof. M. Bierlaire Gael Lederrey & Nikola Obrenovic Decision Aids Spring 2018 DATA MINING I - CLUSTERING - EXERCISES Exercise 1 In this exercise, you will implement the k-means clustering

More information

CPS 310 midterm exam #1, 2/19/2016

CPS 310 midterm exam #1, 2/19/2016 CPS 310 midterm exam #1, 2/19/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. For the execution tracing problem (P3) you may wish to use

More information

Programming with Message Passing PART I: Basics. HPC Fall 2012 Prof. Robert van Engelen

Programming with Message Passing PART I: Basics. HPC Fall 2012 Prof. Robert van Engelen Programming with Message Passing PART I: Basics HPC Fall 2012 Prof. Robert van Engelen Overview Communicating processes MPMD and SPMD Point-to-point communications Send and receive Synchronous, blocking,

More information

Parallel Programming

Parallel Programming Parallel Programming Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS 17/18 Exercise MPI_Irecv MPI_Wait ==?? MPI_Recv Paolo Bientinesi MPI 2 Exercise MPI_Irecv MPI_Wait ==?? MPI_Recv ==?? MPI_Irecv

More information

Concurrent & Distributed Systems Supervision Exercises

Concurrent & Distributed Systems Supervision Exercises Concurrent & Distributed Systems Supervision Exercises Stephen Kell Stephen.Kell@cl.cam.ac.uk November 9, 2009 These exercises are intended to cover all the main points of understanding in the lecture

More information

EMF Temporality. Jean-Claude Coté Éric Ladouceur

EMF Temporality. Jean-Claude Coté Éric Ladouceur EMF Temporality Jean-Claude Coté Éric Ladouceur 1 Introduction... 3 1.1 Dimensions of Time... 3 3 Proposed EMF implementation... 4 3.1 Modeled Persistence... 4 3.2 Modeled Temporal API... 5 3.2.1 Temporal

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Framework of an MPI Program

Framework of an MPI Program MPI Charles Bacon Framework of an MPI Program Initialize the MPI environment MPI_Init( ) Run computation / message passing Finalize the MPI environment MPI_Finalize() Hello World fragment #include

More information

Parallel Numerics. 1 Data Dependency Graphs & DAGs. Exercise 3: Vector/Vector Operations & P2P Communication II

Parallel Numerics. 1 Data Dependency Graphs & DAGs. Exercise 3: Vector/Vector Operations & P2P Communication II Technische Universität München WiSe 2014/15 Institut für Informatik Prof. Dr. Thomas Huckle Dipl.-Inf. Christoph Riesinger Sebastian Rettenberger, M.Sc. Parallel Numerics Exercise 3: Vector/Vector Operations

More information

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam

Programming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam Clemens Grelck University of Amsterdam UvA / SURFsara High Performance Computing and Big Data Message Passing as a Programming Paradigm Gentle Introduction to MPI Point-to-point Communication Message Passing

More information

Slides prepared by : Farzana Rahman 1

Slides prepared by : Farzana Rahman 1 Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based

More information

Message Passing Interface. George Bosilca

Message Passing Interface. George Bosilca Message Passing Interface George Bosilca bosilca@icl.utk.edu Message Passing Interface Standard http://www.mpi-forum.org Current version: 3.1 All parallelism is explicit: the programmer is responsible

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One

More information

This lecture presents ordered lists. An ordered list is one which is maintained in some predefined order, such as alphabetical or numerical order.

This lecture presents ordered lists. An ordered list is one which is maintained in some predefined order, such as alphabetical or numerical order. 6.1 6.2 This lecture presents ordered lists. An ordered list is one which is maintained in some predefined order, such as alphabetical or numerical order. A list is numerically ordered if, for every item

More information

Cache Coherence Tutorial

Cache Coherence Tutorial Cache Coherence Tutorial The cache coherence protocol described in the book is not really all that difficult and yet a lot of people seem to have troubles when it comes to using it or answering an assignment

More information

Intermediate MPI (Message-Passing Interface) 1/11

Intermediate MPI (Message-Passing Interface) 1/11 Intermediate MPI (Message-Passing Interface) 1/11 What happens when a process sends a message? Suppose process 0 wants to send a message to process 1. Three possibilities: Process 0 can stop and wait until

More information

Intermediate MPI (Message-Passing Interface) 1/11

Intermediate MPI (Message-Passing Interface) 1/11 Intermediate MPI (Message-Passing Interface) 1/11 What happens when a process sends a message? Suppose process 0 wants to send a message to process 1. Three possibilities: Process 0 can stop and wait until

More information

Parallel Programming

Parallel Programming Parallel Programming Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS 16/17 Point-to-point communication Send MPI_Ssend MPI_Send MPI_Isend. MPI_Bsend Receive MPI_Recv MPI_Irecv Paolo Bientinesi MPI

More information

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices About libgcm Using the SPUs with the RSX Brief overview of GCM Replay December 7 th, 2004

More information

Parallel Programming

Parallel Programming Parallel Programming for Multicore and Cluster Systems von Thomas Rauber, Gudula Rünger 1. Auflage Parallel Programming Rauber / Rünger schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG

More information

CSE 410 Final Exam Sample Solution 6/08/10

CSE 410 Final Exam Sample Solution 6/08/10 Question 1. (12 points) (caches) (a) One choice in designing cache memories is to pick a block size. Which of the following do you think would be the most reasonable size for cache blocks on a computer

More information

Parallel Processing. Parallel Processing. 4 Optimization Techniques WS 2018/19

Parallel Processing. Parallel Processing. 4 Optimization Techniques WS 2018/19 Parallel Processing WS 2018/19 Universität Siegen rolanda.dwismuellera@duni-siegena.de Tel.: 0271/740-4050, Büro: H-B 8404 Stand: September 7, 2018 Betriebssysteme / verteilte Systeme Parallel Processing

More information

Laplace Exercise Solution Review

Laplace Exercise Solution Review Laplace Exercise Solution Review John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Finished? If you have finished, we can review a few principles that you have inevitably

More information

There are a number of ways of doing this, and we will examine two of them. Fig1. Circuit 3a Flash two LEDs.

There are a number of ways of doing this, and we will examine two of them. Fig1. Circuit 3a Flash two LEDs. Flashing LEDs We re going to experiment with flashing some LEDs in this chapter. First, we will just flash two alternate LEDs, and then make a simple set of traffic lights, finally a running pattern; you

More information

Computer Architecture

Computer Architecture Boaz Kantor Introduction to Computer Science, Fall semester 2010-2011 IDC Herzliya Computer Architecture I know what you're thinking, 'cause right now I'm thinking the same thing. Actually, I've been thinking

More information

Concurrent/Parallel Processing

Concurrent/Parallel Processing Concurrent/Parallel Processing David May: April 9, 2014 Introduction The idea of using a collection of interconnected processing devices is not new. Before the emergence of the modern stored program computer,

More information

The MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola

The MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola The MPI Message-passing Standard Lab Time Hands-on SPD Course 2016-2017 Massimo Coppola Remember! Simplest programs do not need much beyond Send and Recv, still... Each process lives in a separate memory

More information

Scaling Tuple-Space Communication in the Distributive Interoperable Executive Library. Jason Coan, Zaire Ali, David White and Kwai Wong

Scaling Tuple-Space Communication in the Distributive Interoperable Executive Library. Jason Coan, Zaire Ali, David White and Kwai Wong Scaling Tuple-Space Communication in the Distributive Interoperable Executive Library Jason Coan, Zaire Ali, David White and Kwai Wong August 18, 2014 Abstract The Distributive Interoperable Executive

More information

Investigating the Performance of Asynchronous Jacobi s Method for Solving Systems of Linear Equations

Investigating the Performance of Asynchronous Jacobi s Method for Solving Systems of Linear Equations Investigating the Performance of Asynchronous Jacobi s Method for Solving Systems of Linear Equations Bethune, Iain and Bull, J. Mark and Dingle, Nicholas J. and Higham, Nicholas J. 2011 MIMS EPrint: 2011.82

More information

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM Parallel Programming Lecture 19: Message Passing, cont. Mary Hall November 4, 2010 Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM Today we will cover Successive Over Relaxation.

More information

Parallelization. Tianhe-1A, 2.45 Pentaflops/s, 224 Terabytes RAM. Nigel Mitchell

Parallelization. Tianhe-1A, 2.45 Pentaflops/s, 224 Terabytes RAM. Nigel Mitchell Parallelization Tianhe-1A, 2.45 Pentaflops/s, 224 Terabytes RAM Nigel Mitchell Outline Pros and Cons of parallelization Shared memory vs. cluster computing MPI as a tool for sending and receiving messages

More information

Lock Ahead: Shared File Performance Improvements

Lock Ahead: Shared File Performance Improvements Lock Ahead: Shared File Performance Improvements Patrick Farrell Cray Lustre Developer Steve Woods Senior Storage Architect woods@cray.com September 2016 9/12/2016 Copyright 2015 Cray Inc 1 Agenda Shared

More information

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction

More information

PROVING THINGS ABOUT PROGRAMS

PROVING THINGS ABOUT PROGRAMS PROVING THINGS ABOUT CONCURRENT PROGRAMS Lecture 23 CS2110 Fall 2010 Overview 2 Last time we looked at techniques for proving things about recursive algorithms We saw that in general, recursion matches

More information

Optimization of MPI Applications Rolf Rabenseifner

Optimization of MPI Applications Rolf Rabenseifner Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization

More information

Parallel Computing Paradigms

Parallel Computing Paradigms Parallel Computing Paradigms Message Passing João Luís Ferreira Sobral Departamento do Informática Universidade do Minho 31 October 2017 Communication paradigms for distributed memory Message passing is

More information

Definition: A data structure is a way of organizing data in a computer so that it can be used efficiently.

Definition: A data structure is a way of organizing data in a computer so that it can be used efficiently. The Science of Computing I Lesson 4: Introduction to Data Structures Living with Cyber Pillar: Data Structures The need for data structures The algorithms we design to solve problems rarely do so without

More information

An introduction to MPI

An introduction to MPI An introduction to MPI C MPI is a Library for Message-Passing Not built in to compiler Function calls that can be made from any compiler, many languages Just link to it Wrappers: mpicc, mpif77 Fortran

More information

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212 Operating Systems Spring 2009-2010 Outline Process Synchronisation (contd.) 1 Process Synchronisation (contd.) 2 Announcements Presentations: will be held on last teaching week during lectures make a 20-minute

More information

High-Performance Computing: MPI (ctd)

High-Performance Computing: MPI (ctd) High-Performance Computing: MPI (ctd) Adrian F. Clark: alien@essex.ac.uk 2015 16 Adrian F. Clark: alien@essex.ac.uk High-Performance Computing: MPI (ctd) 2015 16 1 / 22 A reminder Last time, we started

More information

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF Your Name: Answers SUNet ID: root @stanford.edu In accordance with both the letter and the

More information

COMP Data Structures

COMP Data Structures COMP 2140 - Data Structures Shahin Kamali Topic 5 - Sorting University of Manitoba Based on notes by S. Durocher. COMP 2140 - Data Structures 1 / 55 Overview Review: Insertion Sort Merge Sort Quicksort

More information

Teaching London Computing

Teaching London Computing Teaching London Computing A Level Computer Science Topic 1: GCSE Python Recap William Marsh School of Electronic Engineering and Computer Science Queen Mary University of London Aims What is programming?

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Parallel Computing with MPI Building/Debugging MPI Executables MPI Send/Receive Collective Communications with MPI April 10, 2012 Dan Negrut,

More information

Advanced Parallel Programming

Advanced Parallel Programming Advanced Parallel Programming Derived Datatypes Dr David Henty HPC Training and Support Manager d.henty@epcc.ed.ac.uk +44 131 650 5960 16/01/2014 MPI-IO 2: Derived Datatypes 2 Overview Lecture will cover

More information

Computer Architecture

Computer Architecture Boaz Kantor Introduction to Computer Science, Fall semester 2009-2010 IDC Herzliya Computer Architecture I know what you're thinking, 'cause right now I'm thinking the same thing. Actually, I've been thinking

More information

Examples of P vs NP: More Problems

Examples of P vs NP: More Problems Examples of P vs NP: More Problems COMP1600 / COMP6260 Dirk Pattinson Australian National University Semester 2, 2017 Catch Up / Drop in Lab When Fridays, 15.00-17.00 Where N335, CSIT Building (bldg 108)

More information

Optimising for the p690 memory system

Optimising for the p690 memory system Optimising for the p690 memory Introduction As with all performance optimisation it is important to understand what is limiting the performance of a code. The Power4 is a very powerful micro-processor

More information

4 Algorithms. Definition of an Algorithm

4 Algorithms. Definition of an Algorithm 4 Algorithms Definition of an Algorithm An algorithm is the abstract version of a computer program. In essence a computer program is the codification of an algorithm, and every algorithm can be coded as

More information

QUIZ. What is wrong with this code that uses default arguments?

QUIZ. What is wrong with this code that uses default arguments? QUIZ What is wrong with this code that uses default arguments? Solution The value of the default argument should be placed in either declaration or definition, not both! QUIZ What is wrong with this code

More information

Lecture 10 Exceptions and Interrupts. How are exceptions generated?

Lecture 10 Exceptions and Interrupts. How are exceptions generated? Lecture 10 Exceptions and Interrupts The ARM processor can work in one of many operating modes. So far we have only considered user mode, which is the "normal" mode of operation. The processor can also

More information

ACTIVITY 2: Reflection of Light

ACTIVITY 2: Reflection of Light UNIT L Developing Ideas ACTIVITY 2: Reflection of Light Purpose Most people realize that light is necessary to see things, like images in mirrors, and various kinds of objects. But how does that happen?

More information

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file? Big picture (from Sandberg et al.) NFS Don Porter CSE 506 Intuition Challenges Instead of translating VFS requests into hard drive accesses, translate them into remote procedure calls to a server Simple,

More information

Advanced Features. SC10 Tutorial, November 15 th Parallel Programming with Coarray Fortran

Advanced Features. SC10 Tutorial, November 15 th Parallel Programming with Coarray Fortran Advanced Features SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran Advanced Features: Overview Execution segments and Synchronisation Non-global Synchronisation Critical Sections

More information

NFS. Don Porter CSE 506

NFS. Don Porter CSE 506 NFS Don Porter CSE 506 Big picture (from Sandberg et al.) Intuition ò Instead of translating VFS requests into hard drive accesses, translate them into remote procedure calls to a server ò Simple, right?

More information

7. Traffic Simulation via Cellular Automata

7. Traffic Simulation via Cellular Automata EX7CellularAutomataForTraffic.nb 1 7. Traffic Simulation via Cellular Automata From Chapter 12 (mainly) of "Computer Simulations with Mathematica", Gaylord & Wellin, Springer 1995 [DCU library ref 510.2855GAY]

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 6 ( Analyzing Distributed Memory Algorithms )

CSE 590: Special Topics Course ( Supercomputing ) Lecture 6 ( Analyzing Distributed Memory Algorithms ) CSE 590: Special Topics Course ( Supercomputing ) Lecture 6 ( Analyzing Distributed Memory Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2012 2D Heat Diffusion

More information

Tree Search for Travel Salesperson Problem Pacheco Text Book Chapt 6 T. Yang, UCSB CS140, Spring 2014

Tree Search for Travel Salesperson Problem Pacheco Text Book Chapt 6 T. Yang, UCSB CS140, Spring 2014 Tree Search for Travel Salesperson Problem Pacheco Text Book Chapt 6 T. Yang, UCSB CS140, Spring 2014 Outline Tree search for travel salesman problem. Recursive code Nonrecusive code Parallelization with

More information

4.1 COMPUTATIONAL THINKING AND PROBLEM-SOLVING

4.1 COMPUTATIONAL THINKING AND PROBLEM-SOLVING 4.1 COMPUTATIONAL THINKING AND PROBLEM-SOLVING 4.1.2 ALGORITHMS ALGORITHM An Algorithm is a procedure or formula for solving a problem. It is a step-by-step set of operations to be performed. It is almost

More information