Examples. Last Time. Speedup & Parallel Efficiency 200% Observations. Outline. Lecture 6. Queuing Commands Introduction to MPI

Size: px

Start display at page:

Download "Examples. Last Time. Speedup & Parallel Efficiency 200% Observations. Outline. Lecture 6. Queuing Commands Introduction to MPI"

Barbara Ball
5 years ago
Views:

1 Lat ime Queuing Command Introduction to MPI Information Enquiry Baic Collective Communication Some embarraingly arallel examle Defined arallel efficiency & eedu Examle 03.c: Brute-force method to calculate ummation from 1 to a ecified number 04.c: Integration of a function uing traezoidal rule 05.c: Random number generation All thee examle are known a embarraingly/leaingly arallel, which exchange little information at beginning, and exchange little information at the end. hee examle demontrate excellent arallel efficiency, a will be demontrated. 1 2 Seedu Seedu & Parallel Efficiency 200% 180% 160% 140% 120% 100% 80% 60% 40% 20% Parallel Efficiency SeedU, um to 1E8 SeedU, um to 1E9 Perfect Seedu Efficiency, um to 1E8 Efficiency, um to 1E9 NR Seedu NP NR Efficiency Seedu NP Obervation he rogram eem correct! he anwer doen t change with number of roceor Very good arallel efficiency i oberved! hee examle (03, 04, 05.c) are known a embarraingly (or leaingly) arallel! 0 0% No. of Proceor, NP NR : Comutation time uing NR roceor NP : Comutation time uing NP roceor NR: Number of roceor in reference configuration 3 NP: Number of roceor ued for comutation. 4 Outline wo famou law in arallel comuting Lecture 6 More on collective communication MPI Programming (II) wo famou law in arallel comuting More on collective communication Baic oint-to-oint communication 5 6 1

$wo famou law in arallel comuting Gutafon' Law 7 Maximum eedu i governed by the erial fraction (non-arallelizable art) of a rogram A tak can be divided into arallel () and nonarallel (, erial)$

2 wo famou law in arallel comuting Gutafon' Law 7 Maximum eedu i governed by the erial fraction (non-arallelizable art) of a rogram A tak can be divided into arallel () and nonarallel (, erial) fraction: 1 1 Seedu NP 1 NP P Efficiency P 8 1 P 1 Seedu 1 1 NP P P If =0 eedu = P, efficiency=1 If /= Seedu Efficiency 100% 90% 80% Efficiency P P 1 Seedu % 60% 50% 40% 30% 20% 10% Parallel Efficiency 9 0 0% NP/NR 10 hu, we need to minimize a much a oible = erial code + communication communication : Communication overhead, may increae with NP One way to reduce for communication i overla communication with comutation o be covered next time when we talk about nonblocking communication #1 Suercomuter: 129,600 roceor 11 htt://uload.wikimedia.org/wikiedia/common/6/6b/amdahllaw.ng 12 2

3 Gutafon' Law A the roblem to be olved increae in ize, the erial fraction decreae and arallel fraction increae decreae 1 Seedu 1 1 NP P P Efficiency P P 1 13 A Driving Metahor Suoe a car i traveling between two citie 60 mile aart, and ha already ent one hour traveling half the ditance at 30 mh. Amdahl' Law aroximately ugget: No matter how fat you drive the lat half, it i imoible to achieve 90 mh average before reaching the econd city. Since it ha already taken you 1 hour and you only have a ditance of 60 mile total; going infinitely fat you would only achieve 60 mh. Gutafon' Law aroximately tate: Given enough time and ditance to travel, the car' average eed can alway eventually reach 90mh, no matter how long or how lowly it ha already traveled. For examle, in the twocitie cae thi could be achieved by driving at 150 mh for an additional hour. 14 htt://en.wikiedia.org/wiki/gutafon%27_law MPI Summary Information Enquiry MPI_Initialize() MPI_Get_roceor_name() MPI_Get_verion() MPI_Comm_ize() MPI_Comm_rank() MPI_Wtime() MPI_Finalize() Collective Communication (II) Collective communication MPI_Bcat() MPI_Reduce() Collective Communication MPI_Bat() / MPI_Reduce Collective communication MPI_Bcat(), MPI_Reduce() MPI_Scatter(), MPI_Gather() MPI_Allgather(), MPI_Allreduce() MPI_Alltoall() MPI_Barrier(), MPI_Scan() c erform vector inner roduct Broadcat reviited: MPI_Bcat() to broadcat the vector to all node Each node decide which ortion of the vector to work on Perform calculation MPI_Reduce() to um u the inner dot from different ortion of the vector hu thi i a bad arallel algorithm for erforming vector inner roduct. Should really ue MPI_Scatterv()! 18 3

4 P0 um P1 um allsum P2 um P3 um MPI Collective Communication MPI_Scatter() / MPI_Gather() All collective communication can be ued to tranmit equal-ized array unequal-ized array For dividing/grouing and ditributing/gathering array or vector (1-D array) to/from all node within the ecified communicator. Each node only receive art of the array Each node receive/end equal amount of data Effect = gather + broadcat, but better int MPI_Scatter ( void *endbuf, int endcnt, MPI_Datatye endtye, void *recvbuf, int recvcnt, MPI_Datatye recvtye, 21 int MPI_Gather ( void *endbuf, int endcnt, MPI_Datatye endtye, void *recvbuf, int recvcount, MPI_Datatye recvtye, 22 Before Scatter Oeration After Scatter Oeration

5 07.c hi i an examle demontrating the ue of MPI_Scatter() / MPI_Gather() Generate ome number on the root node Scatter generated number onto all node Each node rint out what they have Each node calculate ummation of the data the own Gather ummation from all node Root rint out the data after gathering MPI_Scatterv()/ MPI_Gatherv() For dividing/grouing and ditributing/gathering array or vector (1-D array) to/from all node within the ecified communicator Each node only receive art of the array Each node doe not necearily receive/end equal amount of data int MPI_Scatterv ( void *endbuf, int *endcnt, int *dil, MPI_Datatye endtye, void *recvbuf, int recvcnt, MPI_Datatye recvtye, 25 int MPI_Gatherv ( void *endbuf, int endcnt, MPI_Datatye endtye, void *recvbuf, int *recvcnt, int *dil, MPI_Datatye recvtye, c *endcnt hi i a rogram erforming vector normalization (make the length of the vector to be unity). *dil Other MPI collective function int MPI_Alltoall( void* endbuf, int cnt, MPI_Datatye endtye, void* recvbuf, int rcnt, MPI_Datatye recvtye, int MPI_Alltoallv( void* endbuf, int *cnt, int *dil, MPI_Datatye endtye, void* recvbuf, int *rcnt, int *rdil, MPI_Datatye recvtye, int MPI_Allgather( void* endbuf, int cnt, MPI_Datatye endtye, void* recvbuf, int rcnt, MPI_Datatye recvtye, Other MPI collective function void MPI_Barrier( commutative : 1: a#b = b#a 0: a#b!= b#a int MPI_Scan( void* endbuf, void* recvbuf, int count, MPI_Datatye datatye, MPI_O o, int MPI_O_create( MPI_Uer_function *function, int commute, MPI_O *o) int MPI_O_free(MPI_O *o) int MPI_Allgatherv( void* endbuf, int cnt, MPI_Datatye endtye, void* recvbuf, int *rcnt, int *rdil, MPI_Datatye recvtye, int MPI_Reduce_catter( void* endbuf, void* recvbuf, int *rcnt, MPI_Datatye datatye, MPI_O o,

6 Synchronization MPI_Barrier() ued to ynchronize all rocee have called thi ubroutine. int MPI_Barrier( Procee tarted u on different machine run indeendently from each other. herefore, different machine may be running different ortion of a code in an intance, and running at different eed. It i ometime neceary to enure all rocee are at the ame oint or at the ame ace. For examle, when friend go out for a long tri in different car or motorcycle, it i neceary to et u ome ynchronization oint o that everyone will reach the detination. (eecially when there are driver who doen t know how to get there). Blocking communication uually reult in ynchronization. Examle: 09a_noBarrier.c v. 09b_barrier.c (comare the outut) 31 MPI_Scan() Perform a can ( artial reduction ) of data Alo called all-refix-um ; int MPI_Scan(void *endbuf, void *recvbuf, int count, MPI_Datatye datatye, MPI_O o, P0: [ 0 1 2] P1: [ 3 4 5] P2: [ 6 7 8] P3: [ ] Examle: 10_can.c Count=3 MPI_SUM P0: [ 0 1 2] P1: [ 3 5 7] P2: [ ] P3: [ ] 32 Summary Information Enquiry MPI_Initialize() MPI_Get_roceor_name() MPI_Get_verion() MPI_Comm_ize() MPI_Comm_rank() MPI_Wtime() MPI_Finalize() Collective communication MPI_Bcat(), MPI_Reduce() MPI_Scatter(), MPI_Gather() MPI_Allgather(), MPI_Allreduce() MPI_Barrier(), MPI_Scan() MPI_Alltoall() 33 Aignment #4 34 6

Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing.

Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing. Toics Our Cluster Lecture 4 MPI Programming (I) MPI Introduction Information inquery Broadcast / Reduce 1 2 What is a cluster? A cluster is a dedicated resource for running comutational tasks. A collection