Advanced MPI Programming
|
|
- Leo Dalton
- 5 years ago
- Views:
Transcription
1 Advanced MPI Programming Latestslidesandcodeexamplesareavailableat Pavan%Balaji% Argonne'Na*onal'Laboratory' Web:' James%Dinan% Intel'Corp.' ' Torsten%Hoefler% ETH'Zurich' Web:'hEp://htor.inf.ethz.ch/' Rajeev%Thakur% Argonne'Na*onal'Laboratory' Web:' '
2 Hybrid Programming with Threads, Shared Memory, and GPUs
3 MPI and Threads MPIdescribesparallelismbetweenprocesses'(withseparate addressspaces) ThreadparallelismprovidesasharedNmemorymodelwithina process OpenMPandPthreadsarecommonmodels OpenMPprovidesconvenientfeaturesforloopNlevelparallelism. Threadsarecreatedandmanagedbythecompiler,basedonuser direc7ves. Pthreadsprovidemorecomplexanddynamicapproaches.Threadsare createdandmanagedexplicitlybytheuser. Advanced'MPI,'SC13'(11/17/2013)' 85
4 Programming for Multicore Almostallchipsaremul7corethesedays Today sclusterso^encomprisemul7plecpuspernodesharing memory,andthenodesthemselvesareconnectedbya network Commonop7onsforprogrammingsuchclusters AllMPI MPIbetweenprocessesbothwithinanodeandacrossnodes MPIinternallyusessharedmemorytocommunicatewithinanode MPI+OpenMP UseOpenMPwithinanodeandMPIacrossnodes MPI+Pthreads UsePthreadswithinanodeandMPIacrossnodes Thelamertwoapproachesareknownas hybridprogramming Advanced'MPI,'SC13'(11/17/2013)' 86
5 Hybrid Programming with MPI+Threads MPIBonly%Programming% Rank%0% Rank%1% MPI+Threads%Hybrid%Programming% Rank%0% Rank%1% InMPINonlyprogramming, eachmpiprocesshasasingle programcounter InMPI+threadshybrid programming,therecanbe mul7plethreadsexecu7ng simultaneously AllthreadsshareallMPI objects(communicators, requests) TheMPIimplementa7onmight needtotakeprecau7onsto makesurethestateofthempi stackisconsistent Advanced'MPI,'SC13'(11/17/2013)' 87
6 MPI s Four Levels of Thread Safety MPIdefinesfourlevelsofthreadsafetyNNtheseare commitmentstheapplica7onmakestothempi MPI_THREAD_SINGLE:onlyonethreadexistsintheapplica7on MPI_THREAD_FUNNELED:mul7threaded,butonlythemainthread makesmpicalls(theonethatcalledmpi_init_thread) MPI_THREAD_SERIALIZED:mul7threaded,butonlyonethreadat'a'*me makesmpicalls MPI_THREAD_MULTIPLE:mul7threadedandanythreadcanmakeMPI callsatany7me(withsomerestric7onstoavoidraces seenextslide) Threadlevelsareinincreasingorder Ifanapplica7onworksinFUNNELEDmode,itcanworkinSERIALIZED MPIdefinesanalterna7vetoMPI_Init MPI_Init_thread(requested,provided) Applica*on'gives'level'it'needs;'MPI'implementa*on'gives'level'it'supports' Advanced'MPI,'SC13'(11/17/2013)' 88
7 MPI_THREAD_SINGLE Therearenothreadsinthesystem E.g.,therearenoOpenMPparallelregions int main(int argc, char ** argv) { int buf[100]; MPI_Init(argc, argv); MPI_Comm_rank(MPI_COMM_WORLD, rank); for (i = 0; i < 100; i++) compute(buf[i]); /* Do MPI stuff */ MPI_Finalize(); } return 0; Advanced'MPI,'SC13'(11/17/2013)' 89
8 MPI_THREAD_FUNNELED AllMPIcallsaremadebythemasterthread OutsidetheOpenMPparallelregions InOpenMPmasterregions int main(int argc, char ** argv) { int buf[100], provided; MPI_Init_thread(argc, argv, MPI_THREAD_FUNNELED, provided); MPI_Comm_rank(MPI_COMM_WORLD, rank); #pragma omp parallel for for (i = 0; i < 100; i++) compute(buf[i]); /* Do MPI stuff */ MPI_Finalize(); } return 0; Advanced'MPI,'SC13'(11/17/2013)' 90
9 MPI_THREAD_SERIALIZED OnlyonethreadcanmakeMPIcallsata7me ProtectedbyOpenMPcri7calregions int main(int argc, char ** argv) { int buf[100], provided; MPI_Init_thread(argc, argv, MPI_THREAD_SERIALIZED, provided); MPI_Comm_rank(MPI_COMM_WORLD, rank); #pragma omp parallel for for (i = 0; i < 100; i++) { compute(buf[i]); #pragma omp critical /* Do MPI stuff */ } MPI_Finalize(); } return 0; Advanced'MPI,'SC13'(11/17/2013)' 91
10 MPI_THREAD_MULTIPLE AnythreadcanmakeMPIcallsany7me(restric7onsapply) int main(int argc, char ** argv) { int buf[100], provided; MPI_Init_thread(argc, argv, MPI_THREAD_MULTIPLE, provided); MPI_Comm_rank(MPI_COMM_WORLD, rank); #pragma omp parallel for for (i = 0; i < 100; i++) { compute(buf[i]); /* Do MPI stuff */ } MPI_Finalize(); } return 0; Advanced'MPI,'SC13'(11/17/2013)' 92
11 Threads and MPI Animplementa7onisnotrequiredtosupportlevelshigher thanmpi_thread_single;thatis,animplementa7onisnot requiredtobethreadsafe AfullythreadNsafeimplementa7onwillsupport MPI_THREAD_MULTIPLE AprogramthatcallsMPI_Init(insteadofMPI_Init_thread) shouldassumethatonlympi_thread_singleissupported A'threaded'MPI'program'that'does'not'call'MPI_Init_thread'is' an'incorrect'program'(common'user'error'we'see)' Advanced'MPI,'SC13'(11/17/2013)' 93
12 Specification of MPI_THREAD_MULTIPLE Ordering:Whenmul7plethreadsmakeMPIcallsconcurrently, theoutcomewillbeasifthecallsexecutedsequen7allyinsome (any)order Orderingismaintainedwithineachthread Usermustensurethatcollec7veopera7onsonthesamecommunicator, window,orfilehandlearecorrectlyorderedamongthreads E.g.,cannotcallabroadcastononethreadandareduceonanotherthreadon thesamecommunicator Itistheuser'sresponsibilitytopreventraceswhenthreadsinthesame applica7onpostconflic7ngmpicalls E.g.,accessinganinfoobjectfromonethreadandfreeingitfromanother thread Blocking:BlockingMPIcallswillblockonlythecallingthreadand willnotpreventotherthreadsfromrunningorexecu7ngmpi func7ons Advanced'MPI,'SC13'(11/17/2013)' 94
13 Ordering in MPI_THREAD_MULTIPLE: Incorrect Example with Collectives Thread 1 Process 0 MPI_Bcast(comm) Process 1 MPI_Bcast(comm) Thread 2 MPI_Barrier(comm) MPI_Barrier(comm) P0andP1canhavedifferentorderingsofBcastandBarrier Heretheusermustusesomekindofsynchroniza7onto ensurethateitherthread1orthread2getsscheduledfirston bothprocesses Otherwiseabroadcastmaygetmatchedwithabarrieronthe samecommunicator,whichisnotallowedinmpi Advanced'MPI,'SC13'(11/17/2013)' 95
14 Ordering in MPI_THREAD_MULTIPLE: Incorrect Example with RMA int main(int argc, char ** argv) { /* Initialize MPI and RMA window */ #pragma omp parallel for for (i = 0; i < 100; i++) { target = rand(); MPI_Win_lock(MPI_LOCK_EXCLUSIVE, target, 0, win); MPI_Put(..., win); MPI_Win_unlock(target, win); } /* Free MPI and RMA window */ } return 0; Different%threads%can%lock%the%same%process%causing%mulHple%locks%to%the%same%target%before% the%first%lock%is%unlocked% Advanced'MPI,'SC13'(11/17/2013)' 96
15 Ordering in MPI_THREAD_MULTIPLE: Incorrect Example with Object Management Thread 1 Process 0 MPI_Bcast(comm) Process 1 MPI_Bcast(comm) Thread 2 MPI_Comm_free(comm) MPI_Comm_free(comm) Theuserhastomakesurethatonethreadisnotusingan objectwhileanotherthreadisfreeingit Thisisessen7allyanorderingissue;theobjectmightgetfreedbefore itisused Advanced'MPI,'SC13'(11/17/2013)' 97
16 Blocking Calls in MPI_THREAD_MULTIPLE: Correct Example Thread 1 Process 0 MPI_Recv(src=1) Process 1 MPI_Recv(src=0) Thread 2 MPI_Send(dst=1) MPI_Send(dst=0) Animplementa7onmustensurethattheaboveexample neverdeadlocksforanyorderingofthreadexecu7on Thatmeanstheimplementa7oncannotsimplyacquirea threadlockandblockwithinanmpifunc7on.itmust releasethelocktoallowotherthreadstomakeprogress. Advanced'MPI,'SC13'(11/17/2013)' 98
17 Performance with MPI_THREAD_MULTIPLE Threadsafetydoesnotcomeforfree Theimplementa7onmustprotectcertaindatastructuresor partsofcodewithmutexesorcri7calsec7ons Tomeasuretheperformanceimpact,weranteststomeasure communica7onperformancewhenusingmul7plethreads versusmul7pleprocesses Forresults,seeThakur/Gropppaper: TestSuiteforEvalua7ng PerformanceofMul7threadedMPICommunica7on, Parallel' Compu*ng,2009 Advanced'MPI,'SC13'(11/17/2013)' 100
18 Why is it hard to optimize MPI_THREAD_MULTIPLE MPIinternallymaintainsseveralresources BecauseofMPIseman7cs,itisrequiredthatallthreadshave accesstosomeofthedatastructures E.g.,thread1canpostanIrecv,andthread2canwaitforits comple7on thustherequestqueuehastobesharedbetweenboth threads Sincemul7plethreadsareaccessingthissharedqueue,itneedstobe locked addsalotofoverhead Advanced'MPI,'SC13'(11/17/2013)' 102
19 Hybrid Programming: Correctness Requirements HybridprogrammingwithMPI+threadsdoesnotdomuchto reducethecomplexityofthreadprogramming Yourapplica7ons7llhastobeacorrectmul7Nthreadedapplica7on Ontopofthat,youalsoneedtomakesureyouarecorrectlyfollowing MPIseman7cs Manycommercialdebuggersoffersupportfordebugging hybridmpi+threadsapplica7ons(mostlyformpi+pthreads andmpi+openmp) Advanced'MPI,'SC13'(11/17/2013)' 103
20 Example of Problem with Threads Ptolemyisaframeworkformodeling,simula7on,anddesignof concurrent,realn7me,embeddedsystems DevelopedatUCBerkeley(PI:EdLee) Itisarigorouslytested,widelyusedpieceofso^ware PtolemyIIwasfirstreleasedin2000 Yet,onApril26,2004,fouryearsa^eritwasfirstreleased,the codedeadlocked! Thebugwaslurkingfor4yearsofwidespreaduseandtes7ng! Afastermachineorsomethingthatchangedthe7mingcaughtthe bug See TheProblemwithThreads byedlee,ieeecomputer,2006 Advanced'MPI,'SC13'(11/17/2013)' 104
21 An Example we encountered recently Wereceivedabugreportaboutaverysimple mul7threadedmpiprogramthathangs Runwith2processes Eachprocesshas2threads Boththreadscommunicatewiththreadsontheother processasshowninthenextslide WespentseveralhourstryingtodebugMPICHbefore discoveringthatthebugisactuallyintheuser sprogram Advanced'MPI,'SC13'(11/17/2013)' 105
22 2 Proceses, 2 Threads, Each Thread Executes this Code for(j=0;j<2;j++){ if(rank==1){ for(i=0;i<2;i++) MPI_Send(NULL,0,MPI_CHAR,0,0,MPI_COMM_WORLD); for(i=0;i<2;i++) } MPI_Recv(NULL,0,MPI_CHAR,0,0,MPI_COMM_WORLD,stat); else{/*rank==0*/ for(i=0;i<2;i++) MPI_Recv(NULL,0,MPI_CHAR,1,0,MPI_COMM_WORLD,stat); for(i=0;i<2;i++) } } MPI_Send(NULL,0,MPI_CHAR,1,0,MPI_COMM_WORLD); Advanced'MPI,'SC13'(11/17/2013)' 106
23 Intended Ordering of Operations Rank%0% Rank%1% 2recvs(T1) 2sends(T1) 2recvs(T1) 2sends(T1) 2recvs(T2) 2sends(T2) 2recvs(T2) 2sends(T2) 2sends(T1) 2recvs(T1) 2sends(T1) 2recvs(T1) 2sends(T2) 2recvs(T2) 2sends(T2) 2recvs(T2) Everysendmatchesareceiveontheotherrank Advanced'MPI,'SC13'(11/17/2013)' 107
24 What Happened Thread 1 Rank 0 2 recvs 2 sends 2 recvs 2 sends Rank 1 2 sends 2 recvs 2 sends 2 recvs Thread 2 2 recvs 2 sends 2 recvs 2 sends 2 sends 2 recvs 2 sends 2 recvs All4threadsstuckinreceivesbecausethesendsfromone itera7ongotmatchedwithreceivesfromthenextitera7on Advanced'MPI,'SC13'(11/17/2013)' 108
25 Possible Ordering of Operations in Practice Rank%0% Rank%1% 2recvs(T1) 2sends(T1) 1recv(T1) 1recv(T1) 2sends(T1) 1recv(T2) 1recv(T2) 2sends(T2) 2recvs(T2) 2sends(T2) 2sends(T1) 1recv(T1) 1recv(T1) 2sends(T1) 2recvs(T1) 2sends(T2) 1recv(T2) 1recv(T2) 2sends(T2) 2recvs(T2) BecausetheMPIopera7onscanbeissuedinanarbitrary orderacrossthreads,allthreadscouldblockinarecvcall Advanced'MPI,'SC13'(11/17/2013)' 109
26 Hybrid Programming with Shared Memory MPIN3allowsdifferentprocessestoallocatesharedmemory throughmpi MPI_Win_allocate_shared UsesmanyoftheconceptsofoneNsidedcommunica7on Applica7onscandohybridprogrammingusingMPIorload/ storeaccessesonthesharedmemorywindow OtherMPIfunc7onscanbeusedtosynchronizeaccessto sharedmemoryregions Canbesimplertoprogramthanthreads Advanced'MPI,'SC13'(11/17/2013)' 110
27 Creating Shared Memory Regions in MPI MPI_COMM_WORLD% MPI_Comm_split_type(COMM_TYPE_SHARED) Shared%memory% communicator% MPI_Win_allocate_shared Shared%memory% communicator% Shared%memory% communicator% Shared%memory% window% Shared%memory% window% Shared%memory% window% Advanced'MPI,'SC13'(11/17/2013)' 111
28 Regular RMA windows vs. Shared memory windows Load/store' Load/store' P0% Local% memory% PUT/GET' TradiHonal%RMA%windows% P1% Local% memory% P0% P1% Load/store' Local%memory% Shared%memory%windows% Load/store' Load/store' Sharedmemorywindowsallow applica7onprocessestodirectly performload/storeaccesseson allofthewindowmemory E.g.,x[100]=10 Alloftheexis7ngRMAfunc7ons canalsobeusedonsuch memoryformoreadvanced seman7cssuchasatomic opera7ons Canbeveryusefulwhen processeswanttousethreads onlytogetaccesstoallofthe memoryonthenode Youcancreateasharedmemory windowandputyourshareddata Advanced'MPI,'SC13'(11/17/2013)' 112
29 Memory allocation and placement Sharedmemoryalloca7ondoesnotneedtobeuniform acrossprocesses Processescanallocateadifferentamountofmemory(evenzero) TheMPIstandarddoesnotspecifywherethememorywould beplaced(e.g.,whichphysicalmemoryitwillbepinnedto) Implementa7onscanchoosetheirownstrategies,thoughitis expectedthatanimplementa7onwilltrytoplacesharedmemory allocatedbyaprocess closetoit Thetotalallocatedsharedmemoryonacommunicatoris con7guousbydefault Userscanpassaninfohintcalled noncon7g thatwillallowthempi implementa7ontoalignmemoryalloca7onsfromeachprocessto appropriateboundariestoassistwithplacement Advanced'MPI,'SC13'(11/17/2013)' 113
30 Shared Arrays with Shared memory windows int main(int argc, char ** argv) { int buf[100]; MPI_Init(argc, argv); MPI_Comm_split_type(..., MPI_COMM_TYPE_SHARED,.., comm); MPI_Win_allocate_shared(comm,..., win); MPI_Comm_rank(comm, rank); MPI_Win_lockall(win); /* copy data to local part of shared memory */ MPI_Barrier(comm); /* use shared memory */ MPI_Win_unlock_all(win); } MPI_Win_free(win); MPI_Finalize(); return 0; Advanced'MPI,'SC13'(11/17/2013)' 114
31
32
33 Walkthrough of 2D Stencil Code with Shared Memory Windows Codecanbedownloadedfrom Advanced'MPI,'SC13'(11/17/2013)' 115
Lecture 36: MPI, Hybrid Programming, and Shared Memory. William Gropp
Lecture 36: MPI, Hybrid Programming, and Shared Memory William Gropp www.cs.illinois.edu/~wgropp Thanks to This material based on the SC14 Tutorial presented by Pavan Balaji William Gropp Torsten Hoefler
More informationHybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.
Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism
More informationIntroduction to MPI+OpenMP hybrid programming. SuperComputing Applications and Innovation Department
Introduction to MPI+OpenMP hybrid programming SuperComputing Applications and Innovation Department Basic concepts Architectural trend Architectural trend In a nutshell: memory per core decreases memory
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing
More informationHybrid Programming. John Urbanic Parallel Computing Specialist Pittsburgh Supercomputing Center. Copyright 2017
Hybrid Programming John Urbanic Parallel Computing Specialist Pittsburgh Supercomputing Center Copyright 2017 Assuming you know basic MPI This is a rare group that can discuss this topic meaningfully.
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalabissu-tokyoacjp/~reiji/pna16/ [ 5 ] MPI: Message Passing Interface Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1 Architecture
More informationMPI Introduction More Advanced Topics
MPI Introduction More Advanced Topics Torsten Hoefler (some slides borrowed from Rajeev Thakur and Pavan Balaji) Course Outline Friday Morning 9.00-10.30: Intro to Advanced Features Nonblocking collective
More informationPerformance Considerations: Hardware Architecture
Performance Considerations: Hardware Architecture John Zollweg Parallel Optimization and Scientific Visualization for Ranger March 13, 29 based on material developed by Kent Milfeld, TACC www.cac.cornell.edu
More informationMPI and OpenMP. Mark Bull EPCC, University of Edinburgh
1 MPI and OpenMP Mark Bull EPCC, University of Edinburgh markb@epcc.ed.ac.uk 2 Overview Motivation Potential advantages of MPI + OpenMP Problems with MPI + OpenMP Styles of MPI + OpenMP programming MPI
More informationChip Multiprocessors COMP Lecture 9 - OpenMP & MPI
Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather
More informationPCAP Assignment I. 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail.
PCAP Assignment I 1. A. Why is there a large performance gap between many-core GPUs and generalpurpose multicore CPUs. Discuss in detail. The multicore CPUs are designed to maximize the execution speed
More informationCOMP528: Multi-core and Multi-Processor Computing
COMP528: Multi-core and Multi-Processor Computing Dr Michael K Bane, G14, Computer Science, University of Liverpool m.k.bane@liverpool.ac.uk https://cgi.csc.liv.ac.uk/~mkbane/comp528 2X So far Why and
More informationMessage Passing Interface. George Bosilca
Message Passing Interface George Bosilca bosilca@icl.utk.edu Message Passing Interface Standard http://www.mpi-forum.org Current version: 3.1 All parallelism is explicit: the programmer is responsible
More informationMore advanced MPI and mixed programming topics
More advanced MPI and mixed programming topics Extracting messages from MPI MPI_Recv delivers each message from a peer in the order in which these messages were send No coordination between peers is possible
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 20 Shared memory computers and clusters On a shared
More informationSystems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications
Systems Software for Scalable Applications (or) Super Faster Stronger MPI (and friends) for Blue Waters Applications William Gropp University of Illinois, Urbana- Champaign Pavan Balaji, Rajeev Thakur
More informationMPI & OpenMP Mixed Hybrid Programming
MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationHybrid Computing. Lars Koesterke. University of Porto, Portugal May 28-29, 2009
Hybrid Computing Lars Koesterke University of Porto, Portugal May 28-29, 29 Why Hybrid? Eliminates domain decomposition at node Automatic coherency at node Lower memory latency and data movement within
More informationMPI Program Structure
MPI Program Structure Handles MPI communicator MPI_COMM_WORLD Header files MPI function format Initializing MPI Communicator size Process rank Exiting MPI 1 Handles MPI controls its own internal data structures
More informationHigh Performance Computing Course Notes Message Passing Programming I
High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works
More informationMPI Remote Memory Access Programming (MPI3-RMA) and Advanced MPI Programming
TORSTEN HOEFLER MPI Remote Access Programming (MPI3-RMA) and Advanced MPI Programming presented at RWTH Aachen, Jan. 2019 based on tutorials in collaboration with Bill Gropp, Rajeev Thakur, and Pavan Balaji
More informationHybrid MPI and OpenMP Parallel Programming
Hybrid MPI and OpenMP Parallel Programming Jemmy Hu SHARCNET HPTC Consultant July 8, 2015 Objectives difference between message passing and shared memory models (MPI, OpenMP) why or why not hybrid? a common
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More informationMPI-3 One-Sided Communication
HLRN Parallel Programming Workshop Speedup your Code on Intel Processors at HLRN October 20th, 2017 MPI-3 One-Sided Communication Florian Wende Zuse Institute Berlin Two-sided communication Standard Message
More informationIssues in Developing a Thread-Safe MPI Implementation. William Gropp Rajeev Thakur
Issues in Developing a Thread-Safe MPI Implementation William Gropp Rajeev Thakur Why MPI and Threads Applications need more computing power than any one processor can provide Parallel computing (proof
More informationMessage Passing Programming. Introduction to MPI
Message Passing Programming Introduction to MPI Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationLecture 34: One-sided Communication in MPI. William Gropp
Lecture 34: One-sided Communication in MPI William Gropp www.cs.illinois.edu/~wgropp Thanks to This material based on the SC14 Tutorial presented by Pavan Balaji William Gropp Torsten Hoefler Rajeev Thakur
More informationMPI (Message Passing Interface)
MPI (Message Passing Interface) Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability. Defines routines, not implementation.
More informationHolland Computing Center Kickstart MPI Intro
Holland Computing Center Kickstart 2016 MPI Intro Message Passing Interface (MPI) MPI is a specification for message passing library that is standardized by MPI Forum Multiple vendor-specific implementations:
More informationCS 179: GPU Programming. Lecture 14: Inter-process Communication
CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each
More informationParallel Programming Using MPI
Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a
More information[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS.
Give your answers in the space provided with each question. Answers written elsewhere will not be graded. Q1). [4 points] Consider a memory system with level 1 cache of 64 KB and DRAM of 1GB with processor
More informationHigh Performance Computing Course Notes Message Passing Programming III
High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming III Blocking synchronous send the sender doesn t return until it receives the acknowledgement from the receiver that the
More informationECE 563 Midterm 1 Spring 2015
ECE 563 Midterm 1 Spring 215 To make it easy not to cheat, this exam is open book and open notes. Please print and sign your name below. By doing so you signify that you have not received or given any
More informationint sum;... sum = sum + c?
int sum;... sum = sum + c? Version Cores Time (secs) Speedup manycore Message Passing Interface mpiexec int main( ) { int ; char ; } MPI_Init( ); MPI_Comm_size(, &N); MPI_Comm_rank(, &R); gethostname(
More informationA short overview of parallel paradigms. Fabio Affinito, SCAI
A short overview of parallel paradigms Fabio Affinito, SCAI Why parallel? In principle, if you have more than one computing processing unit you can exploit that to: -Decrease the time to solution - Increase
More informationParallel hardware. Distributed Memory. Parallel software. COMP528 MPI Programming, I. Flynn s taxonomy:
COMP528 MPI Programming, I www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Flynn s taxonomy: Parallel hardware SISD (Single
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationCS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with
More informationCSE 160 Lecture 18. Message Passing
CSE 160 Lecture 18 Message Passing Question 4c % Serial Loop: for i = 1:n/3-1 x(2*i) = x(3*i); % Restructured for Parallelism (CORRECT) for i = 1:3:n/3-1 y(2*i) = y(3*i); for i = 2:3:n/3-1 y(2*i) = y(3*i);
More informationIntroduction to MPI-2 (Message-Passing Interface)
Introduction to MPI-2 (Message-Passing Interface) What are the major new features in MPI-2? Parallel I/O Remote Memory Operations Dynamic Process Management Support for Multithreading Parallel I/O Includes
More informationMPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education
MPI Tutorial Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education Center for Vision, Cognition, Learning and Art, UCLA July 15 22, 2013 A few words before
More informationMulticore Architecture and Hybrid Programming
Multicore Architecture and Hybrid Programming Rebecca Hartman-Baker Oak Ridge National Laboratory hartmanbakrj@ornl.gov 2004-2009 Rebecca Hartman-Baker. Reproduction permitted for non-commercial, educational
More informationPerformance Analysis of Parallel Applications Using LTTng & Trace Compass
Performance Analysis of Parallel Applications Using LTTng & Trace Compass Naser Ezzati DORSAL LAB Progress Report Meeting Polytechnique Montreal Dec 2017 What is MPI? Message Passing Interface (MPI) Industry-wide
More informationJURECA Tuning for the platform
JURECA Tuning for the platform Usage of ParaStation MPI 2017-11-23 Outline ParaStation MPI Compiling your program Running your program Tuning parameters Resources 2 ParaStation MPI Based on MPICH (3.2)
More informationLecture 9: MPI continued
Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next
More information15-440: Recitation 8
15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs
More informationITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...
ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure
More informationAdvanced MPI. Andrew Emerson
Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One
More informationParallel Programming with MPI: Day 1
Parallel Programming with MPI: Day 1 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH 43212-1163 1 Table of Contents Brief History of MPI
More informationMessage Passing with MPI
Message Passing with MPI PPCES 2016 Hristo Iliev IT Center / JARA-HPC IT Center der RWTH Aachen University Agenda Motivation Part 1 Concepts Point-to-point communication Non-blocking operations Part 2
More informationAdvanced MPI. Andrew Emerson
Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 22/02/2017 Advanced MPI 2 One
More informationDPHPC: Locks Recitation session
SALVATORE DI GIROLAMO DPHPC: Locks Recitation session spcl.inf.ethz.ch 2-threads: LockOne T0 T1 volatile int flag[2]; void lock() { int j = 1 - tid; flag[tid] = true; while (flag[j])
More informationIntroduction to Parallel Programming with MPI and OpenMP. Charles Augustine October 29, 2018
Introduction to Parallel Programming with MPI and OpenMP Charles Augustine October 29, 2018 Goals of Workshop Have basic understanding of Parallel programming MPI OpenMP Run a few examples of C/C++ code
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationDerived Datatypes. MPI - Derived Data Structures. MPI Datatypes Procedure Datatype Construction Contiguous Datatype*
Derived Datatypes MPI - Derived Data Structures Based on notes from Science & Technology Support High Performance Computing Ohio Supercomputer Center MPI Datatypes Procedure Datatype Construction Contiguous
More informationMPI-2. Introduction Dynamic Process Creation. Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk
MPI-2 Introduction Dynamic Process Creation Based on notes by Sathish Vadhiyar, Rob Thacker, and David Cronk http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/mpi2-report.htm Using MPI2: Advanced
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationHigh Performance Computing Course Notes Message Passing Programming III
High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming III Communication modes Synchronous mode The communication is considered complete when the sender receives the acknowledgement
More informationLecture 7: Distributed memory
Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing
More informationDistributed Memory Programming with Message-Passing
Distributed Memory Programming with Message-Passing Pacheco s book Chapter 3 T. Yang, CS240A Part of slides from the text book and B. Gropp Outline An overview of MPI programming Six MPI functions and
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More informationMPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018
MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,
More informationMPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session
S. DI GIROLAMO [DIGIROLS@INF.ETHZ.CH] MPI Tutorial Part 2 Design of Parallel and High-Performance Computing Recitation Session Slides credits: Pavan Balaji, Torsten Hoefler https://htor.inf.ethz.ch/teaching/mpi_tutorials/isc16/hoefler-balaji-isc16-advanced-mpi.pdf
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Parallel Computing with MPI Building/Debugging MPI Executables MPI Send/Receive Collective Communications with MPI April 10, 2012 Dan Negrut,
More informationCollective Communications I
Collective Communications I Ned Nedialkov McMaster University Canada CS/SE 4F03 January 2016 Outline Introduction Broadcast Reduce c 2013 16 Ned Nedialkov 2/14 Introduction A collective communication involves
More informationMPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group
MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,
More information6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT
6.189 IAP 2007 Lecture 5 Parallel Programming Concepts 1 6.189 IAP 2007 MIT Recap Two primary patterns of multicore architecture design Shared memory Ex: Intel Core 2 Duo/Quad One copy of data shared among
More informationExtending the Task-Aware MPI (TAMPI) Library to Support Asynchronous MPI primitives
Extending the Task-Aware MPI (TAMPI) Library to Support Asynchronous MPI primitives Kevin Sala, X. Teruel, J. M. Perez, V. Beltran, J. Labarta 24/09/2018 OpenMPCon 2018, Barcelona Overview TAMPI Library
More informationParallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab
Parallel Programming Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab Summing n numbers for(i=1; i++; i
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationIntroduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2
Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.
More informationDebugging on Blue Waters
Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work
More informationParallel Computing. Lecture 17: OpenMP Last Touch
CSCI-UA.0480-003 Parallel Computing Lecture 17: OpenMP Last Touch Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Some slides from here are adopted from: Yun (Helen) He and Chris Ding
More informationMark Pagel
Mark Pagel pags@cray.com New features in XT MPT 3.1 and MPT 3.2 Features as a result of scaling to 150K MPI ranks MPI-IO Improvements(MPT 3.1 and MPT 3.2) SMP aware collective improvements(mpt 3.2) Misc
More informationPARALLEL AND DISTRIBUTED COMPUTING
PARALLEL AND DISTRIBUTED COMPUTING 2010/2011 1 st Semester Recovery Exam February 2, 2011 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers
More informationIntroduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.
Introduction to MPI SHARCNET MPI Lecture Series: Part I of II Paul Preney, OCT, M.Sc., B.Ed., B.Sc. preney@sharcnet.ca School of Computer Science University of Windsor Windsor, Ontario, Canada Copyright
More informationFirst day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS
First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel
More informationMPI and CUDA. Filippo Spiga, HPCS, University of Cambridge.
MPI and CUDA Filippo Spiga, HPCS, University of Cambridge Outline Basic principle of MPI Mixing MPI and CUDA 1 st example : parallel GPU detect 2 nd example: heat2d CUDA- aware MPI, how
More informationSimple examples how to run MPI program via PBS on Taurus HPC
Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload
More informationMessage-Passing Computing
Chapter 2 Slide 41þþ Message-Passing Computing Slide 42þþ Basics of Message-Passing Programming using userlevel message passing libraries Two primary mechanisms needed: 1. A method of creating separate
More informationParallel Programming
Parallel Programming MPI Part 1 Prof. Paolo Bientinesi pauldj@aices.rwth-aachen.de WS17/18 Preliminaries Distributed-memory architecture Paolo Bientinesi MPI 2 Preliminaries Distributed-memory architecture
More informationHybrid Programming with MPI and OpenMP
Hybrid Programming with and OpenMP Fernando Silva and Ricardo Rocha Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2017/2018 F. Silva and R. Rocha (DCC-FCUP) Programming
More informationCS4961 Parallel Programming. Lecture 19: Message Passing, cont. 11/5/10. Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM
Parallel Programming Lecture 19: Message Passing, cont. Mary Hall November 4, 2010 Programming Assignment #3: Simple CUDA Due Thursday, November 18, 11:59 PM Today we will cover Successive Over Relaxation.
More informationmith College Computer Science CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut
mith College CSC352 Week #7 Spring 2017 Introduction to MPI Dominique Thiébaut dthiebaut@smith.edu Introduction to MPI D. Thiebaut Inspiration Reference MPI by Blaise Barney, Lawrence Livermore National
More informationHybrid MPI/ OpenMP. Presentation. Martin Čuma Center for High Performance Computing University of Utah
Presentation Hybrid MPI/ OpenMP Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu http://www.chpc.utah.edu Slide 1 Overview Single and multilevel parallelism. Example
More information30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How
More informationCS4961 Parallel Programming. Lecture 18: Introduction to Message Passing 11/3/10. Final Project Purpose: Mary Hall November 2, 2010.
Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, 2010 Final Project Purpose: - A chance to dig in deeper into a parallel programming model and explore concepts. -
More informationIntroducing Task-Containers as an Alternative to Runtime Stacking
Introducing Task-Containers as an Alternative to Runtime Stacking EuroMPI, Edinburgh, UK September 2016 Jean-Baptiste BESNARD jbbesnard@paratools.fr Julien ADAM, Sameer SHENDE, Allen MALONY (ParaTools)
More informationProgramming with MPI. Pedro Velho
Programming with MPI Pedro Velho Science Research Challenges Some applications require tremendous computing power - Stress the limits of computing power and storage - Who might be interested in those applications?
More informationPoint-to-Point Communication. Reference:
Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point
More informationFaculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering
Faculty of Electrical and Computer Engineering Department of Electrical and Computer Engineering Program: Computer Engineering Course Number EE 8218 011 Section Number 01 Course Title Parallel Computing
More informationCS420: Operating Systems
Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing
More informationLittle Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo
OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON, ULiège CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Outline Introduction From serial source code to parallel execution MPI functions
More informationBest Practice Guide for Writing MPI + OmpSs Interoperable Programs. Version 1.0, 3 rd April 2017
Best Practice Guide for Writing MPI + OmpSs Interoperable Programs Version 1.0, 3 rd April 2017 Copyright INTERTWinE Consortium 2017 Table of Contents 1 INTRODUCTION... 1 1.1 PURPOSE... 1 1.2 GLOSSARY
More informationParallel Programming Overview
Parallel Programming Overview Introduction to High Performance Computing 2019 Dr Christian Terboven 1 Agenda n Our Support Offerings n Programming concepts and models for Cluster Node Core Accelerator
More informationMPI Message Passing Interface
MPI Message Passing Interface Portable Parallel Programs Parallel Computing A problem is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information
More information