Advanced MPI Programming

Size: px
Start display at page:

Download "Advanced MPI Programming"

Transcription

1 Advanced MPI Programming Latestslidesandcodeexamplesareavailableat Pavan%Balaji% Argonne'Na*onal'Laboratory' Web:' James%Dinan% Intel'Corp.' ' Torsten%Hoefler% ETH'Zurich' Web:'hEp://htor.inf.ethz.ch/' Rajeev%Thakur% Argonne'Na*onal'Laboratory' Web:' '

2 Hybrid Programming with Threads, Shared Memory, and GPUs

3 MPI and Threads MPIdescribesparallelismbetweenprocesses'(withseparate addressspaces) ThreadparallelismprovidesasharedNmemorymodelwithina process OpenMPandPthreadsarecommonmodels OpenMPprovidesconvenientfeaturesforloopNlevelparallelism. Threadsarecreatedandmanagedbythecompiler,basedonuser direc7ves. Pthreadsprovidemorecomplexanddynamicapproaches.Threadsare createdandmanagedexplicitlybytheuser. Advanced'MPI,'SC13'(11/17/2013)' 85

4 Programming for Multicore Almostallchipsaremul7corethesedays Today sclusterso^encomprisemul7plecpuspernodesharing memory,andthenodesthemselvesareconnectedbya network Commonop7onsforprogrammingsuchclusters AllMPI MPIbetweenprocessesbothwithinanodeandacrossnodes MPIinternallyusessharedmemorytocommunicatewithinanode MPI+OpenMP UseOpenMPwithinanodeandMPIacrossnodes MPI+Pthreads UsePthreadswithinanodeandMPIacrossnodes Thelamertwoapproachesareknownas hybridprogramming Advanced'MPI,'SC13'(11/17/2013)' 86

5 Hybrid Programming with MPI+Threads MPIBonly%Programming% Rank%0% Rank%1% MPI+Threads%Hybrid%Programming% Rank%0% Rank%1% InMPINonlyprogramming, eachmpiprocesshasasingle programcounter InMPI+threadshybrid programming,therecanbe mul7plethreadsexecu7ng simultaneously AllthreadsshareallMPI objects(communicators, requests) TheMPIimplementa7onmight needtotakeprecau7onsto makesurethestateofthempi stackisconsistent Advanced'MPI,'SC13'(11/17/2013)' 87

6 MPI s Four Levels of Thread Safety MPIdefinesfourlevelsofthreadsafetyNNtheseare commitmentstheapplica7onmakestothempi MPI_THREAD_SINGLE:onlyonethreadexistsintheapplica7on MPI_THREAD_FUNNELED:mul7threaded,butonlythemainthread makesmpicalls(theonethatcalledmpi_init_thread) MPI_THREAD_SERIALIZED:mul7threaded,butonlyonethreadat'a'*me makesmpicalls MPI_THREAD_MULTIPLE:mul7threadedandanythreadcanmakeMPI callsatany7me(withsomerestric7onstoavoidraces seenextslide) Threadlevelsareinincreasingorder Ifanapplica7onworksinFUNNELEDmode,itcanworkinSERIALIZED MPIdefinesanalterna7vetoMPI_Init MPI_Init_thread(requested,provided) Applica*on'gives'level'it'needs;'MPI'implementa*on'gives'level'it'supports' Advanced'MPI,'SC13'(11/17/2013)' 88

7 MPI_THREAD_SINGLE Therearenothreadsinthesystem E.g.,therearenoOpenMPparallelregions int main(int argc, char ** argv) { int buf[100]; MPI_Init(argc, argv); MPI_Comm_rank(MPI_COMM_WORLD, rank); for (i = 0; i < 100; i++) compute(buf[i]); /* Do MPI stuff */ MPI_Finalize(); } return 0; Advanced'MPI,'SC13'(11/17/2013)' 89

8 MPI_THREAD_FUNNELED AllMPIcallsaremadebythemasterthread OutsidetheOpenMPparallelregions InOpenMPmasterregions int main(int argc, char ** argv) { int buf[100], provided; MPI_Init_thread(argc, argv, MPI_THREAD_FUNNELED, provided); MPI_Comm_rank(MPI_COMM_WORLD, rank); #pragma omp parallel for for (i = 0; i < 100; i++) compute(buf[i]); /* Do MPI stuff */ MPI_Finalize(); } return 0; Advanced'MPI,'SC13'(11/17/2013)' 90

9 MPI_THREAD_SERIALIZED OnlyonethreadcanmakeMPIcallsata7me ProtectedbyOpenMPcri7calregions int main(int argc, char ** argv) { int buf[100], provided; MPI_Init_thread(argc, argv, MPI_THREAD_SERIALIZED, provided); MPI_Comm_rank(MPI_COMM_WORLD, rank); #pragma omp parallel for for (i = 0; i < 100; i++) { compute(buf[i]); #pragma omp critical /* Do MPI stuff */ } MPI_Finalize(); } return 0; Advanced'MPI,'SC13'(11/17/2013)' 91

10 MPI_THREAD_MULTIPLE AnythreadcanmakeMPIcallsany7me(restric7onsapply) int main(int argc, char ** argv) { int buf[100], provided; MPI_Init_thread(argc, argv, MPI_THREAD_MULTIPLE, provided); MPI_Comm_rank(MPI_COMM_WORLD, rank); #pragma omp parallel for for (i = 0; i < 100; i++) { compute(buf[i]); /* Do MPI stuff */ } MPI_Finalize(); } return 0; Advanced'MPI,'SC13'(11/17/2013)' 92

11 Threads and MPI Animplementa7onisnotrequiredtosupportlevelshigher thanmpi_thread_single;thatis,animplementa7onisnot requiredtobethreadsafe AfullythreadNsafeimplementa7onwillsupport MPI_THREAD_MULTIPLE AprogramthatcallsMPI_Init(insteadofMPI_Init_thread) shouldassumethatonlympi_thread_singleissupported A'threaded'MPI'program'that'does'not'call'MPI_Init_thread'is' an'incorrect'program'(common'user'error'we'see)' Advanced'MPI,'SC13'(11/17/2013)' 93

12 Specification of MPI_THREAD_MULTIPLE Ordering:Whenmul7plethreadsmakeMPIcallsconcurrently, theoutcomewillbeasifthecallsexecutedsequen7allyinsome (any)order Orderingismaintainedwithineachthread Usermustensurethatcollec7veopera7onsonthesamecommunicator, window,orfilehandlearecorrectlyorderedamongthreads E.g.,cannotcallabroadcastononethreadandareduceonanotherthreadon thesamecommunicator Itistheuser'sresponsibilitytopreventraceswhenthreadsinthesame applica7onpostconflic7ngmpicalls E.g.,accessinganinfoobjectfromonethreadandfreeingitfromanother thread Blocking:BlockingMPIcallswillblockonlythecallingthreadand willnotpreventotherthreadsfromrunningorexecu7ngmpi func7ons Advanced'MPI,'SC13'(11/17/2013)' 94

13 Ordering in MPI_THREAD_MULTIPLE: Incorrect Example with Collectives Thread 1 Process 0 MPI_Bcast(comm) Process 1 MPI_Bcast(comm) Thread 2 MPI_Barrier(comm) MPI_Barrier(comm) P0andP1canhavedifferentorderingsofBcastandBarrier Heretheusermustusesomekindofsynchroniza7onto ensurethateitherthread1orthread2getsscheduledfirston bothprocesses Otherwiseabroadcastmaygetmatchedwithabarrieronthe samecommunicator,whichisnotallowedinmpi Advanced'MPI,'SC13'(11/17/2013)' 95

14 Ordering in MPI_THREAD_MULTIPLE: Incorrect Example with RMA int main(int argc, char ** argv) { /* Initialize MPI and RMA window */ #pragma omp parallel for for (i = 0; i < 100; i++) { target = rand(); MPI_Win_lock(MPI_LOCK_EXCLUSIVE, target, 0, win); MPI_Put(..., win); MPI_Win_unlock(target, win); } /* Free MPI and RMA window */ } return 0; Different%threads%can%lock%the%same%process%causing%mulHple%locks%to%the%same%target%before% the%first%lock%is%unlocked% Advanced'MPI,'SC13'(11/17/2013)' 96

15 Ordering in MPI_THREAD_MULTIPLE: Incorrect Example with Object Management Thread 1 Process 0 MPI_Bcast(comm) Process 1 MPI_Bcast(comm) Thread 2 MPI_Comm_free(comm) MPI_Comm_free(comm) Theuserhastomakesurethatonethreadisnotusingan objectwhileanotherthreadisfreeingit Thisisessen7allyanorderingissue;theobjectmightgetfreedbefore itisused Advanced'MPI,'SC13'(11/17/2013)' 97

16 Blocking Calls in MPI_THREAD_MULTIPLE: Correct Example Thread 1 Process 0 MPI_Recv(src=1) Process 1 MPI_Recv(src=0) Thread 2 MPI_Send(dst=1) MPI_Send(dst=0) Animplementa7onmustensurethattheaboveexample neverdeadlocksforanyorderingofthreadexecu7on Thatmeanstheimplementa7oncannotsimplyacquirea threadlockandblockwithinanmpifunc7on.itmust releasethelocktoallowotherthreadstomakeprogress. Advanced'MPI,'SC13'(11/17/2013)' 98

17 Performance with MPI_THREAD_MULTIPLE Threadsafetydoesnotcomeforfree Theimplementa7onmustprotectcertaindatastructuresor partsofcodewithmutexesorcri7calsec7ons Tomeasuretheperformanceimpact,weranteststomeasure communica7onperformancewhenusingmul7plethreads versusmul7pleprocesses Forresults,seeThakur/Gropppaper: TestSuiteforEvalua7ng PerformanceofMul7threadedMPICommunica7on, Parallel' Compu*ng,2009 Advanced'MPI,'SC13'(11/17/2013)' 100

18 Why is it hard to optimize MPI_THREAD_MULTIPLE MPIinternallymaintainsseveralresources BecauseofMPIseman7cs,itisrequiredthatallthreadshave accesstosomeofthedatastructures E.g.,thread1canpostanIrecv,andthread2canwaitforits comple7on thustherequestqueuehastobesharedbetweenboth threads Sincemul7plethreadsareaccessingthissharedqueue,itneedstobe locked addsalotofoverhead Advanced'MPI,'SC13'(11/17/2013)' 102

19 Hybrid Programming: Correctness Requirements HybridprogrammingwithMPI+threadsdoesnotdomuchto reducethecomplexityofthreadprogramming Yourapplica7ons7llhastobeacorrectmul7Nthreadedapplica7on Ontopofthat,youalsoneedtomakesureyouarecorrectlyfollowing MPIseman7cs Manycommercialdebuggersoffersupportfordebugging hybridmpi+threadsapplica7ons(mostlyformpi+pthreads andmpi+openmp) Advanced'MPI,'SC13'(11/17/2013)' 103

20 Example of Problem with Threads Ptolemyisaframeworkformodeling,simula7on,anddesignof concurrent,realn7me,embeddedsystems DevelopedatUCBerkeley(PI:EdLee) Itisarigorouslytested,widelyusedpieceofso^ware PtolemyIIwasfirstreleasedin2000 Yet,onApril26,2004,fouryearsa^eritwasfirstreleased,the codedeadlocked! Thebugwaslurkingfor4yearsofwidespreaduseandtes7ng! Afastermachineorsomethingthatchangedthe7mingcaughtthe bug See TheProblemwithThreads byedlee,ieeecomputer,2006 Advanced'MPI,'SC13'(11/17/2013)' 104

21 An Example we encountered recently Wereceivedabugreportaboutaverysimple mul7threadedmpiprogramthathangs Runwith2processes Eachprocesshas2threads Boththreadscommunicatewiththreadsontheother processasshowninthenextslide WespentseveralhourstryingtodebugMPICHbefore discoveringthatthebugisactuallyintheuser sprogram Advanced'MPI,'SC13'(11/17/2013)' 105

22 2 Proceses, 2 Threads, Each Thread Executes this Code for(j=0;j<2;j++){ if(rank==1){ for(i=0;i<2;i++) MPI_Send(NULL,0,MPI_CHAR,0,0,MPI_COMM_WORLD); for(i=0;i<2;i++) } MPI_Recv(NULL,0,MPI_CHAR,0,0,MPI_COMM_WORLD,stat); else{/*rank==0*/ for(i=0;i<2;i++) MPI_Recv(NULL,0,MPI_CHAR,1,0,MPI_COMM_WORLD,stat); for(i=0;i<2;i++) } } MPI_Send(NULL,0,MPI_CHAR,1,0,MPI_COMM_WORLD); Advanced'MPI,'SC13'(11/17/2013)' 106

23 Intended Ordering of Operations Rank%0% Rank%1% 2recvs(T1) 2sends(T1) 2recvs(T1) 2sends(T1) 2recvs(T2) 2sends(T2) 2recvs(T2) 2sends(T2) 2sends(T1) 2recvs(T1) 2sends(T1) 2recvs(T1) 2sends(T2) 2recvs(T2) 2sends(T2) 2recvs(T2) Everysendmatchesareceiveontheotherrank Advanced'MPI,'SC13'(11/17/2013)' 107

24 What Happened Thread 1 Rank 0 2 recvs 2 sends 2 recvs 2 sends Rank 1 2 sends 2 recvs 2 sends 2 recvs Thread 2 2 recvs 2 sends 2 recvs 2 sends 2 sends 2 recvs 2 sends 2 recvs All4threadsstuckinreceivesbecausethesendsfromone itera7ongotmatchedwithreceivesfromthenextitera7on Advanced'MPI,'SC13'(11/17/2013)' 108

25 Possible Ordering of Operations in Practice Rank%0% Rank%1% 2recvs(T1) 2sends(T1) 1recv(T1) 1recv(T1) 2sends(T1) 1recv(T2) 1recv(T2) 2sends(T2) 2recvs(T2) 2sends(T2) 2sends(T1) 1recv(T1) 1recv(T1) 2sends(T1) 2recvs(T1) 2sends(T2) 1recv(T2) 1recv(T2) 2sends(T2) 2recvs(T2) BecausetheMPIopera7onscanbeissuedinanarbitrary orderacrossthreads,allthreadscouldblockinarecvcall Advanced'MPI,'SC13'(11/17/2013)' 109

26 Hybrid Programming with Shared Memory MPIN3allowsdifferentprocessestoallocatesharedmemory throughmpi MPI_Win_allocate_shared UsesmanyoftheconceptsofoneNsidedcommunica7on Applica7onscandohybridprogrammingusingMPIorload/ storeaccessesonthesharedmemorywindow OtherMPIfunc7onscanbeusedtosynchronizeaccessto sharedmemoryregions Canbesimplertoprogramthanthreads Advanced'MPI,'SC13'(11/17/2013)' 110

27 Creating Shared Memory Regions in MPI MPI_COMM_WORLD% MPI_Comm_split_type(COMM_TYPE_SHARED) Shared%memory% communicator% MPI_Win_allocate_shared Shared%memory% communicator% Shared%memory% communicator% Shared%memory% window% Shared%memory% window% Shared%memory% window% Advanced'MPI,'SC13'(11/17/2013)' 111

28 Regular RMA windows vs. Shared memory windows Load/store' Load/store' P0% Local% memory% PUT/GET' TradiHonal%RMA%windows% P1% Local% memory% P0% P1% Load/store' Local%memory% Shared%memory%windows% Load/store' Load/store' Sharedmemorywindowsallow applica7onprocessestodirectly performload/storeaccesseson allofthewindowmemory E.g.,x[100]=10 Alloftheexis7ngRMAfunc7ons canalsobeusedonsuch memoryformoreadvanced seman7cssuchasatomic opera7ons Canbeveryusefulwhen processeswanttousethreads onlytogetaccesstoallofthe memoryonthenode Youcancreateasharedmemory windowandputyourshareddata Advanced'MPI,'SC13'(11/17/2013)' 112

29 Memory allocation and placement Sharedmemoryalloca7ondoesnotneedtobeuniform acrossprocesses Processescanallocateadifferentamountofmemory(evenzero) TheMPIstandarddoesnotspecifywherethememorywould beplaced(e.g.,whichphysicalmemoryitwillbepinnedto) Implementa7onscanchoosetheirownstrategies,thoughitis expectedthatanimplementa7onwilltrytoplacesharedmemory allocatedbyaprocess closetoit Thetotalallocatedsharedmemoryonacommunicatoris con7guousbydefault Userscanpassaninfohintcalled noncon7g thatwillallowthempi implementa7ontoalignmemoryalloca7onsfromeachprocessto appropriateboundariestoassistwithplacement Advanced'MPI,'SC13'(11/17/2013)' 113

30 Shared Arrays with Shared memory windows int main(int argc, char ** argv) { int buf[100]; MPI_Init(argc, argv); MPI_Comm_split_type(..., MPI_COMM_TYPE_SHARED,.., comm); MPI_Win_allocate_shared(comm,..., win); MPI_Comm_rank(comm, rank); MPI_Win_lockall(win); /* copy data to local part of shared memory */ MPI_Barrier(comm); /* use shared memory */ MPI_Win_unlock_all(win); } MPI_Win_free(win); MPI_Finalize(); return 0; Advanced'MPI,'SC13'(11/17/2013)' 114

31

32

33 Walkthrough of 2D Stencil Code with Shared Memory Windows Codecanbedownloadedfrom Advanced'MPI,'SC13'(11/17/2013)' 115

Lecture 36: MPI, Hybrid Programming, and Shared Memory. William Gropp

Lecture 36: MPI, Hybrid Programming, and Shared Memory. William Gropp Lecture 36: MPI, Hybrid Programming, and Shared Memory William Gropp www.cs.illinois.edu/~wgropp Thanks to This material based on the SC14 Tutorial presented by Pavan Balaji William Gropp Torsten Hoefler

More information

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism

More information

Introduction to MPI+OpenMP hybrid programming. SuperComputing Applications and Innovation Department

Introduction to MPI+OpenMP hybrid programming. SuperComputing Applications and Innovation Department Introduction to MPI+OpenMP hybrid programming SuperComputing Applications and Innovation Department Basic concepts Architectural trend Architectural trend In a nutshell: memory per core decreases memory

More information

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico. OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing

More information

Hybrid Programming. John Urbanic Parallel Computing Specialist Pittsburgh Supercomputing Center. Copyright 2017

Hybrid Programming. John Urbanic Parallel Computing Specialist Pittsburgh Supercomputing Center. Copyright 2017 Hybrid Programming John Urbanic Parallel Computing Specialist Pittsburgh Supercomputing Center Copyright 2017 Assuming you know basic MPI This is a rare group that can discuss this topic meaningfully.

More information

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico. OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalabissu-tokyoacjp/~reiji/pna16/ [ 5 ] MPI: Message Passing Interface Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1 Architecture

More information

MPI Introduction More Advanced Topics

MPI Introduction More Advanced Topics MPI Introduction More Advanced Topics Torsten Hoefler (some slides borrowed from Rajeev Thakur and Pavan Balaji) Course Outline Friday Morning 9.00-10.30: Intro to Advanced Features Nonblocking collective

More information

Performance Considerations: Hardware Architecture

Performance Considerations: Hardware Architecture Performance Considerations: Hardware Architecture John Zollweg Parallel Optimization and Scientific Visualization for Ranger March 13, 29 based on material developed by Kent Milfeld, TACC www.cac.cornell.edu

More information

MPI and OpenMP. Mark Bull EPCC, University of Edinburgh

MPI and OpenMP. Mark Bull EPCC, University of Edinburgh 1 MPI and OpenMP Mark Bull EPCC, University of Edinburgh markb@epcc.ed.ac.uk 2 Overview Motivation Potential advantages of MPI + OpenMP Problems with MPI + OpenMP Styles of MPI + OpenMP programming MPI

More information

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather

More information

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.

PCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed

More information

COMP528: Multi-core and Multi-Processor Computing

COMP528: Multi-core and Multi-Processor Computing COMP528: Multi-core and Multi-Processor Computing Dr Michael K Bane, G14, Computer Science, University of Liverpool m.k.bane@liverpool.ac.uk https://cgi.csc.liv.ac.uk/~mkbane/comp528 2X So far Why and

More information

Message Passing Interface. George Bosilca

Message Passing Interface. George Bosilca Message Passing Interface George Bosilca bosilca@icl.utk.edu Message Passing Interface Standard http://www.mpi-forum.org Current version: 3.1 All parallelism is explicit: the programmer is responsible

More information

More advanced MPI and mixed programming topics

More advanced MPI and mixed programming topics More advanced MPI and mixed programming topics Extracting messages from MPI MPI_Recv delivers each message from a peer in the order in which these messages were send No coordination between peers is possible

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 20 Shared memory computers and clusters On a shared

More information

Systems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications

Systems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications Systems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications William Gropp University of Illinois, Urbana- Champaign Pavan Balaji, Rajeev Thakur

More information

MPI & OpenMP Mixed Hybrid Programming

MPI & OpenMP Mixed Hybrid Programming MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Hybrid Computing. Lars Koesterke. University of Porto, Portugal May 28-29, 2009

Hybrid Computing. Lars Koesterke. University of Porto, Portugal May 28-29, 2009 Hybrid Computing Lars Koesterke University of Porto, Portugal May 28-29, 29 Why Hybrid? Eliminates domain decomposition at node Automatic coherency at node Lower memory latency and data movement within

More information

MPI Program Structure

MPI Program Structure MPI Program Structure Handles MPI communicator MPI_COMM_WORLD Header files MPI function format Initializing MPI Communicator size Process rank Exiting MPI 1 Handles MPI controls its own internal data structures

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

MPI Remote Memory Access Programming (MPI3-RMA) and Advanced MPI Programming

MPI Remote Memory Access Programming (MPI3-RMA) and Advanced MPI Programming TORSTEN HOEFLER MPI Remote Access Programming (MPI3-RMA) and Advanced MPI Programming presented at RWTH Aachen, Jan. 2019 based on tutorials in collaboration with Bill Gropp, Rajeev Thakur, and Pavan Balaji

More information

Hybrid MPI and OpenMP Parallel Programming

Hybrid MPI and OpenMP Parallel Programming Hybrid MPI and OpenMP Parallel Programming Jemmy Hu SHARCNET HPTC Consultant July 8, 2015 Objectives difference between message passing and shared memory models (MPI, OpenMP) why or why not hybrid? a common

More information

Message Passing Interface

Message Passing Interface MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across

More information

MPI-3 One-Sided Communication

MPI-3 One-Sided Communication HLRN Parallel Programming Workshop Speedup your Code on Intel Processors at HLRN October 20th, 2017 MPI-3 One-Sided Communication Florian Wende Zuse Institute Berlin Two-sided communication Standard Message

More information

Issues in Developing a Thread-Safe MPI Implementation. William Gropp Rajeev Thakur

Issues in Developing a Thread-Safe MPI Implementation. William Gropp Rajeev Thakur Issues in Developing a Thread-Safe MPI Implementation William Gropp Rajeev Thakur Why MPI and Threads Applications need more computing power than any one processor can provide Parallel computing (proof

More information

Message Passing Programming. Introduction to MPI

Message Passing Programming. Introduction to MPI Message Passing Programming Introduction to MPI Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Lecture 34: One-sided Communication in MPI. William Gropp

Lecture 34: One-sided Communication in MPI. William Gropp Lecture 34: One-sided Communication in MPI William Gropp www.cs.illinois.edu/~wgropp Thanks to This material based on the SC14 Tutorial presented by Pavan Balaji William Gropp Torsten Hoefler Rajeev Thakur

More information

MPI (Message Passing Interface)

MPI (Message Passing Interface) MPI (Message Passing Interface) Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability. Defines routines, not implementation.

More information

Holland Computing Center Kickstart MPI Intro

Holland Computing Center Kickstart MPI Intro Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:

More information

CS 179: GPU Programming. Lecture 14: Inter-process Communication

CS 179: GPU Programming. Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each

More information

Parallel Programming Using MPI

Parallel Programming Using MPI Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a

More information

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS.

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS. Give your answers in the space provided with each question. Answers written elsewhere will not be graded. Q1). [4 points] Consider a memory system with level 1 cache of 64 KB and DRAM of 1GB with processor

More information

High Performance Computing Course Notes Message Passing Programming III

High Performance Computing Course Notes Message Passing Programming III High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming III Blocking synchronous send the sender doesn t return until it receives the acknowledgement from the receiver that the

More information

ECE 563 Midterm 1 Spring 2015

ECE 563 Midterm 1 Spring 2015 ECE 563 Midterm 1 Spring 215 To make it easy not to cheat, this exam is open book and open notes. Please print and sign your name below. By doing so you signify that you have not received or given any

More information

int sum;... sum = sum + c?

int sum;... sum = sum + c? int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(

More information

A short overview of parallel paradigms. Fabio Affinito, SCAI

A short overview of parallel paradigms. Fabio Affinito, SCAI A short overview of parallel paradigms Fabio Affinito, SCAI Why parallel? In principle, if you have more than one computing processing unit you can exploit that to: -Decrease the time to solution - Increase

More information

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:

Parallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy: COMP528 MPI Programming, I www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Flynn s taxonomy: Parallel hardware SISD (Single

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.

CS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011. CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with

More information

CSE 160 Lecture 18. Message Passing

CSE 160 Lecture 18. Message Passing CSE 160 Lecture 18 Message Passing Question 4c % Serial Loop: for i = 1:n/3-1 x(2*i) = x(3*i); % Restructured for Parallelism (CORRECT) for i = 1:3:n/3-1 y(2*i) = y(3*i); for i = 2:3:n/3-1 y(2*i) = y(3*i);

More information

Introduction to MPI-2 (Message-Passing Interface)

Introduction to MPI-2 (Message-Passing Interface) Introduction to MPI-2 (Message-Passing Interface) What are the major new features in MPI-2? Parallel I/O Remote Memory Operations Dynamic Process Management Support for Multithreading Parallel I/O Includes

More information

MPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education

MPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education MPI Tutorial Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education Center for Vision, Cognition, Learning and Art, UCLA July 15 22, 2013 A few words before

More information

Multicore Architecture and Hybrid Programming

Multicore Architecture and Hybrid Programming Multicore Architecture and Hybrid Programming Rebecca Hartman-Baker Oak Ridge National Laboratory hartmanbakrj@ornl.gov 2004-2009 Rebecca Hartman-Baker. Reproduction permitted for non-commercial, educational

More information

Performance Analysis of Parallel Applications Using LTTng & Trace Compass

Performance Analysis of Parallel Applications Using LTTng & Trace Compass Performance Analysis of Parallel Applications Using LTTng & Trace Compass Naser Ezzati DORSAL LAB Progress Report Meeting Polytechnique Montreal Dec 2017 What is MPI? Message Passing Interface (MPI) Industry-wide

More information

JURECA Tuning for the platform

JURECA Tuning for the platform JURECA Tuning for the platform Usage of ParaStation MPI 2017-11-23 Outline ParaStation MPI Compiling your program Running your program Tuning parameters Resources 2 ParaStation MPI Based on MPICH (3.2)

More information

Lecture 9: MPI continued

Lecture 9: MPI continued Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next

More information

15-440: Recitation 8

15-440: Recitation 8 15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs

More information

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:... ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One

More information

Parallel Programming with MPI: Day 1

Parallel Programming with MPI: Day 1 Parallel Programming with MPI: Day 1 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH 43212-1163 1 Table of Contents Brief History of MPI

More information

Message Passing with MPI

Message Passing with MPI Message Passing with MPI PPCES 2016 Hristo Iliev IT Center / JARA-HPC IT Center der RWTH Aachen University Agenda Motivation Part 1 Concepts Point-to-point communication Non-blocking operations Part 2

More information

Advanced MPI. Andrew Emerson

Advanced MPI. Andrew Emerson Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One

More information

DPHPC: Locks Recitation session

DPHPC: Locks Recitation session SALVATORE DI GIROLAMO DPHPC: Locks Recitation session spcl.inf.ethz.ch 2-threads: LockOne T0 T1 volatile int flag[2]; void lock() { int j = 1 - tid; flag[tid] = true; while (flag[j])

More information

Introduction to Parallel Programming with MPI and OpenMP. Charles Augustine October 29, 2018

Introduction to Parallel Programming with MPI and OpenMP. Charles Augustine October 29, 2018 Introduction to Parallel Programming with MPI and OpenMP Charles Augustine October 29, 2018 Goals of Workshop Have basic understanding of Parallel programming MPI OpenMP Run a few examples of C/C++ code

More information

Introduction to parallel computing concepts and technics

Introduction to parallel computing concepts and technics Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing

More information

Derived Datatypes. MPI - Derived Data Structures. MPI Datatypes Procedure Datatype Construction Contiguous Datatype*

Derived Datatypes. MPI - Derived Data Structures. MPI Datatypes Procedure Datatype Construction Contiguous Datatype* Derived Datatypes MPI - Derived Data Structures Based on notes from Science & Technology Support High Performance Computing Ohio Supercomputer Center MPI Datatypes Procedure Datatype Construction Contiguous

More information

MPI-2. Introduction Dynamic Process Creation. Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk

MPI-2. Introduction Dynamic Process Creation. Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk MPI-2 Introduction Dynamic Process Creation Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/mpi2-report.htm Using MPI2: Advanced

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

High Performance Computing Course Notes Message Passing Programming III

High Performance Computing Course Notes Message Passing Programming III High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming III Communication modes Synchronous mode The communication is considered complete when the sender receives the acknowledgement

More information

Lecture 7: Distributed memory

Lecture 7: Distributed memory Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing

More information

Distributed Memory Programming with Message-Passing

Distributed Memory Programming with Message-Passing Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and

More information

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction

More information

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,

More information

MPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session

MPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session S. DI GIROLAMO [DIGIROLS@INF.ETHZ.CH] MPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session Slides credits: Pavan Balaji, Torsten Hoefler https://htor.inf.ethz.ch/teaching/mpi_tutorials/isc16/hoefler-balaji-isc16-advanced-mpi.pdf

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Parallel Computing with MPI Building/Debugging MPI Executables MPI Send/Receive Collective Communications with MPI April 10, 2012 Dan Negrut,

More information

Collective Communications I

Collective Communications I Collective Communications I Ned Nedialkov McMaster University Canada CS/SE 4F03 January 2016 Outline Introduction Broadcast Reduce c 2013 16 Ned Nedialkov 2/14 Introduction A collective communication involves

More information

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,

More information

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT 6.189 IAP 2007 Lecture 5 Parallel Programming Concepts 1 6.189 IAP 2007 MIT Recap Two primary patterns of multicore architecture design Shared memory Ex: Intel Core 2 Duo/Quad One copy of data shared among

More information

Extending the Task-Aware MPI (TAMPI) Library to Support Asynchronous MPI primitives

Extending the Task-Aware MPI (TAMPI) Library to Support Asynchronous MPI primitives Extending the Task-Aware MPI (TAMPI) Library to Support Asynchronous MPI primitives Kevin Sala, X. Teruel, J. M. Perez, V. Beltran, J. Labarta 24/09/2018 OpenMPCon 2018, Barcelona Overview TAMPI Library

More information

Parallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab

Parallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab Parallel Programming Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab Summing n numbers for(i=1; i++; i

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2 Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.

More information

Debugging on Blue Waters

Debugging on Blue Waters Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work

More information

Parallel Computing. Lecture 17: OpenMP Last Touch

Parallel Computing. Lecture 17: OpenMP Last Touch CSCI-UA.0480-003 Parallel Computing Lecture 17: OpenMP Last Touch Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Some slides from here are adopted from: Yun (Helen) He and Chris Ding

More information

Mark Pagel

Mark Pagel Mark Pagel pags@cray.com New features in XT MPT 3.1 and MPT 3.2 Features as a result of scaling to 150K MPI ranks MPI-IO Improvements(MPT 3.1 and MPT 3.2) SMP aware collective improvements(mpt 3.2) Misc

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2010/2011 1 st Semester Recovery Exam February 2, 2011 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers

More information

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc. Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright

More information

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS

First day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel

More information

MPI and CUDA. Filippo Spiga, HPCS, University of Cambridge.

MPI and CUDA. Filippo Spiga, HPCS, University of Cambridge. MPI and CUDA Filippo Spiga, HPCS, University of Cambridge Outline Basic principle of MPI Mixing MPI and CUDA 1 st example : parallel GPU detect 2 nd example: heat2d CUDA- aware MPI, how

More information

Simple examples how to run MPI program via PBS on Taurus HPC

Simple examples how to run MPI program via PBS on Taurus HPC Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload

More information

Message-Passing Computing

Message-Passing Computing Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate

More information

Parallel Programming

Parallel Programming Parallel Programming MPI Part 1 Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS17/18 Preliminaries Distributed-memory architecture Paolo Bientinesi MPI 2 Preliminaries Distributed-memory architecture

More information

Hybrid Programming with MPI and OpenMP

Hybrid Programming with MPI and OpenMP Hybrid Programming with and OpenMP Fernando Silva and Ricardo Rocha Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2017/2018 F. Silva and R. Rocha (DCC-FCUP) Programming

More information

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM

CS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM Parallel Programming Lecture 19: Message Passing, cont. Mary Hall November 4, 2010 Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM Today we will cover Successive Over Relaxation.

More information

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut

mith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut mith College CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut dthiebaut@smith.edu Introduction to MPI D. Thiebaut Inspiration Reference MPI by Blaise Barney, Lawrence Livermore National

More information

Hybrid MPI/ OpenMP. Presentation. Martin Čuma Center for High Performance Computing University of Utah

Hybrid MPI/ OpenMP. Presentation. Martin Čuma Center for High Performance Computing University of Utah Presentation Hybrid MPI/ OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu http://www.chpc.utah.edu Slide 1 Overview Single and multilevel parallelism. Example

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How

More information

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.

CS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010. Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -

More information

Introducing Task-Containers as an Alternative to Runtime Stacking

Introducing Task-Containers as an Alternative to Runtime Stacking Introducing Task-Containers as an Alternative to Runtime Stacking EuroMPI, Edinburgh, UK September 2016 Jean-Baptiste BESNARD jbbesnard@paratools.fr Julien ADAM, Sameer SHENDE, Allen MALONY (ParaTools)

More information

Programming with MPI. Pedro Velho

Programming with MPI. Pedro Velho Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?

More information

Point-to-Point Communication. Reference:

Point-to-Point Communication. Reference: Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point

More information

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering

Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Course Number EE 8218 011 Section Number 01 Course Title Parallel Computing

More information

CS420: Operating Systems

CS420: Operating Systems Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing

More information

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;

More information

MPI MESSAGE PASSING INTERFACE

MPI MESSAGE PASSING INTERFACE MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions

More information

Best Practice Guide for Writing MPI + OmpSs Interoperable Programs. Version 1.0, 3 rd April 2017

Best Practice Guide for Writing MPI + OmpSs Interoperable Programs. Version 1.0, 3 rd April 2017 Best Practice Guide for Writing MPI + OmpSs Interoperable Programs Version 1.0, 3 rd April 2017 Copyright INTERTWinE Consortium 2017 Table of Contents 1 INTRODUCTION... 1 1.1 PURPOSE... 1 1.2 GLOSSARY

More information

Parallel Programming Overview

Parallel Programming Overview Parallel Programming Overview Introduction to High Performance Computing 2019 Dr Christian Terboven 1 Agenda n Our Support Offerings n Programming concepts and models for Cluster Node Core Accelerator

More information

MPI Message Passing Interface

MPI Message Passing Interface MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information

More information