COMP4300/8300: Parallelisation via Data Partitioning. Alistair Rendell
|
|
- Betty Walton
- 6 years ago
- Views:
Transcription
1 COMP4300/8300: Parallelisation via Data Partitioning Chapter 5: Lin and Snyder Chapter 4: Wilkinson and Allen Alistair Rendell COMP4300 Lecture 7-1 Copyright c 2015 The Australian National University
2 7.1 Parallelisation Strategies Replicated data approach Each process has entire copy of data but does subset of computation Partition program data to different processes Most common Domain decomposition Divide and Conquer Partitioning of program functionality Much less common Functional Decomposition Consider addition of numbers sum = n 1 i=0 x i COMP4300 Lecture 7-2 Copyright c 2015 The Australian National University
3 7.2 D&C Example#1: Simple Summation of Vector Divide numbers into m equal parts x(0)... x(n/m 1) x(n/m)... x(2n/m 1) x((m 1)n/m)... x(n 1) Partial Sums Sum COMP4300 Lecture 7-3 Copyright c 2015 The Australian National University
4 7.3 Master/Slave Send/Recv Approach Master s = n/m for (i = 0, x = 0; i < m; i++, x = x + s) send(&numbers[x], s, i) sum = 0; for (i = 0; i < m; i++){ recv(&part_sum, any_processor) sum = sum + part_sum; Slave recv(numbers, s, master) part_sum = 0; for (i = 0; i < s; i++) part_sum = part_sum +numbers[i]; send(&part_sum, master) COMP4300 Lecture 7-4 Copyright c 2015 The Australian National University
5 7.4 Using MPI Scatter and MPI Reduce sendcount = n/m; MPI_SCATTER(void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm com) for (i = 0; i < sendcount; i++) part_sum = part_sum +numbers[i]; MPI_REDUCE(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op MPI_SUM, int root, MPI_Comm comm) NOT Master/Slave Root sends data to all processes in MPI Comm (including self) Note related MPI calls MPI SCATTERV scatters variable lengths MPI ALLREDUCE will return result to all processors COMP4300 Lecture 7-5 Copyright c 2015 The Australian National University
6 7.5 Analysis Sequential n 1 additions thus O(n) Parallel Communications#1: Computation#1: Communication#2: Computation#2: t comm1 = m(t stup +(n/m)t data ) t comp1 = n/m 1 t comm2 = m(t stup +t data ) t comp2 = m 1 Overall t p = 2mt stup +(n+m)t data +m+n/m 2 = O(n+m) Worse than sequential code!! COMP4300 Lecture 7-6 Copyright c 2015 The Australian National University
7 7.6 Domain Decomposition via Divide and Conquer Problems that can be recursively divided into smaller problems of the same type Recursive implementation of the summation problem int add(int *s) { if (numbers(s) =< 2) return (s[0]+s[1]); else{ divide(s, s1, s2); part_sum1 = add(s1); part_sum2 = add(s2); return (part_sum1 + part_sum2); COMP4300 Lecture 7-7 Copyright c 2015 The Australian National University
8 7.7 Binary Tree Divide and conquer with binary partitioning Note number of working processors decreases going up the tree Number of Processors 1 P0 2 P0 P1 4 P0 P2 P3 P = 4 8 P0 P4 P6 P2 P3 P7 P5 P1 COMP4300 Lecture 7-8 Copyright c 2015 The Australian National University
9 7.8 Simple Binary Tree Code \* Binary Tree broadcast a) 0->1 b) 0->2, 1->3 c) 0->4, 1->5, 2->6, 3->7 d) 0->8, 1->9, 2->10, 3->11, 4->12, 5->13, 6->14, 7->15 *\ lo = 1 while (lo < nproc) { if (me < lo) { id = me + lo; if (id < nproc)send(type, buf, lenbuf, &id); else if (me < 2*lo) { id = lo; recv(type, buf, lenbuf, &lenmes, &id); lo *= 2; COMP4300 Lecture 7-9 Copyright c 2015 The Australian National University
10 7.9 Analysis Assume n is a power of 2 and ignoring t stup Communication#1: Division t comm1 = n 2 t data+ n 4 t data+ n 8 t data+ n p t data = n(p 1) p t data Communication#2: Combining Computation: t comm2 = t data logp t comp = n p +logp Total: t p = n(p 1) t data +t data logp+ n p p +logp Slightly better than before - as p n cost O(n) COMP4300 Lecture 7-10 Copyright c 2015 The Australian National University
11 7.10 Higher Order Trees Possible to divide data into higher order trees, eg quad tree Initial Area First Division Second Division COMP4300 Lecture 7-11 Copyright c 2015 The Australian National University
12 7.11 The Reduce and Scan Abstractions Summation of a vector already partitioned between processes is an example of the reduce and scan abstraction Reduce: Combines a set of values to produce a single value Scan: performs a sequential operation in parts and carries along the intermediate results Reduce usually involves mapping a binary (or higher level) tree communication pattern between processes Examples (to discuss) include Finding second smallest array element Computing a k-way histogram Length of longest run of 1s Index of first occurrence of x See Lin and Snyder for further details COMP4300 Lecture 7-12 Copyright c 2015 The Australian National University
13 7.12 Scan ConsidervectorX = [0,1,2,3,4,5,6,7]scanoperationreplaceseachelementwiththecumulative sum of all preceding elements (either inclusive or exclusive of current element) Inclusive SCAN(X) = [0,1,3,6,10,15,21,28], exclusive SCAN(X) = [0,0,1,3,6,10,15,21] Inclusive sequential scan code (clear dependency!) ScanX[0]=X[0]; for (i=1; i<n; i++) ScanX[i] = ScanX[i-1] + X[i]; Parallel PRAM code (why is PRAM important?) for (j=1; j<=log2(n); j++){ FORALL (k=0; k<n; k++) IN PARALLEL { if (k >= POWER(2,j-1)){ X[k] = X[k-POWER(2,j-1)]+X[k]; COMP4300 Lecture 7-13 Copyright c 2015 The Australian National University
14 7.13 Scan Example: 8 Values on 8 Nodes for (j=1; j<=log2(n); j++){ FORALL (k=0; k<n; k++) IN PARALLEL { if (k >= POWER(2,j-1)){ X[k] = X[k-POWER(2,j-1)]+X[k]; Node and Value of Data on that Node Step Offset (j) (2 j 1 ) =1 1+2=3 2+3=5 3+4=7 4+5=9 5+6=11 6+7= =3 1+5=6 3+7=10 5+9= = = = = = =28 Final Values COMP4300 Lecture 7-14 Copyright c 2015 The Australian National University
15 7.14 D&C Example#2: Bucket Sort Divide number range (a) into m equal regions (0 a ) ( a ) m 1, m 2a m 1, (2 a ) m 3a m 1, Assign one bucket to each region Stage 1: numbers are place into appropriate buckets Stage 2: each bucket is sorted using a traditional sorting algorithm Works best if numbers are evenly distributed over the range a Sequential time t s = n+m((n/m)log(n/m)) = n+nlog(n/m) = O(nlog(n/m)) COMP4300 Lecture 7-15 Copyright c 2015 The Australian National University
16 7.15 Sequential Bucket Sort Unsorted Numbers Buckets Sorted Numbers COMP4300 Lecture 7-16 Copyright c 2015 The Australian National University
17 7.16 Parallel Bucket Sort#1 Assign one bucket to each process Unsorted Numbers Processes Buckets Sorted Numbers COMP4300 Lecture 7-17 Copyright c 2015 The Australian National University
18 7.17 Parallel Bucket Sort#2 Unsorted Numbers Processes Small Buckets Big Buckets Assign p small buckets to each process Note possible use of MPI ALLTOALL Sorted Numbers MPI_Alltoall(void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm) COMP4300 Lecture 7-18 Copyright c 2015 The Australian National University
19 7.18 Analysis Initial partitioning and distribution Sort into small buckets t comp1 = n t comm1 = t stup +t data n t comp2 = n/p Send to large buckets: (overlapping communications) t comm3 = (p 1)(t stup +(n/p 2 )t data ) Sort of large buckets t comp4 = (n/p)log(n/p) Total t p = t stup +t data n+n/p+(p 1)(t stup +(n/p 2 )t data )+(n/p)log(n/p) At best O(n) What would be the worse case scenario? COMP4300 Lecture 7-19 Copyright c 2015 The Australian National University
20 7.19 D&C Example#3: Integration Consider evaluation of integral using trapezoidal rule I = b a f(x)dx f(x) a p q δ b x COMP4300 Lecture 7-20 Copyright c 2015 The Australian National University
21 7.20 Static Distribution: SPMD Model if (i == master){ printf(" Enter number of regions\n"); scanf("%d",&n); broadcast(&n,process_group) region = (b-a)/p; start = a + region*i; end = start + region d = (b-a)/n for (x = start; x < end; x = x + d) area = area + f(x) + f(x+d); area = 0.5 * area *d; reduce_add(&integral, & area, processor_group); COMP4300 Lecture 7-21 Copyright c 2015 The Australian National University
22 7.21 Adaptive Quadrature Not all areas require the same number of points When to terminate division into smaller area is an issue Parallel code will have uneven work load f(x) a b x COMP4300 Lecture 7-22 Copyright c 2015 The Australian National University
23 7.22 D&C Example#4: N-Body Problems Summing longrange pairwise interactions, eg gravitation F = Gm am b r 2 where G is the gravitational constant, m a and m b are the mass of two bodies, and r is the distance between them In Cartesian space F x = Gm am b r 2 F y = Gm am b r 2 F z = Gm am b r 2 ( ) xb x a r ( ) yb y a r ( ) zb z a What is the total force on the sun due to all other stars in the milky way? Given the force on each star we can calculate their motions Molecular dynamics is very similar but long range forces are electrostatic r COMP4300 Lecture 7-23 Copyright c 2015 The Australian National University
24 7.23 Simple Sequential Force Code for (i = 0; i < nstars; i++) for (j = 0; j < nstars; j++){ if (i!= j ){ rij2 = (x[i]-x[j])*(x[i]-x[j]) + (y[i]-y[j])*(y[i]-y[j]) + (z[i]-z[j])*(z[i]-z[j]); Fx[i] = Fx[i] + G*m[i]*m[j]/rij2 * (x[i]-x[j]) / sqrt(rij2) Fy[i] = Fy[i] + G*m[i]*m[j]/rij2 * (y[i]-y[j]) / sqrt(rij2) Fz[i] = Fz[i] + G*m[i]*m[j]/rij2 * (z[i]-z[j]) / sqrt(rij2) Aside: How could you improve this sequential code? O(n 2 ) - this will get very expensive for large N Is there a better way? COMP4300 Lecture 7-24 Copyright c 2015 The Australian National University
25 7.24 Clustering Idea: the interaction with several bodies that are cluster together but are located at large r for another body can be replaced by the interaction with the center of mass of the cluster Center of Mass r Distant Cluster of Stars COMP4300 Lecture 7-25 Copyright c 2015 The Australian National University
26 7.25 Barnes-Hut Algorithm Start with whole space in one cube Divide the cube into 8 subcubes Delete subcubes if they have no particles in them Subcubes with more than 1 particle are divided into 8 again Continue until each cube has only one particle (or none) Process creates an octtree Total mass and centre of mass of each subcube stored at each node Force evaluated by starting at each root and traversing the tree, BUT stopping at a node if the clustering algorithm can be used Scaling O(nlogn) Load balancing likely to be an issue for parallel code COMP4300 Lecture 7-26 Copyright c 2015 The Australian National University
27 7.26 Barnes-Hut Algorithm: 2D Illustration COMP4300 Lecture 7-27 Copyright c 2015 The Australian National University
Partitioning and Divide-and- Conquer Strategies
Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply divides the problem into parts Example - Adding a sequence of numbers We might consider dividing the sequence
More informationConquer Strategies. Partitioning and Divide-and- Partitioning Strategies + + Using separate send()s and recv()s. Slave
Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply divides the problem into parts Example - Adding a sequence of numbers We might consider dividing the sequence
More informationPartitioning and Divide-and-Conquer Strategies
Partitioning and Divide-and-Conquer Strategies Chapter 4 slides4-1 Partitioning Partitioning simply divides the problem into parts. Divide and Conquer Characterized by dividing problem into subproblems
More informationPartitioning and Divide-and-Conquer Strategies
Chapter 4 Slide 125 Partitioning and Divide-and-Conquer Strategies Slide 126 Partitioning Partitioning simply divides the problem into parts. Divide and Conquer Characterized by dividing problem into subproblems
More informationBasic MPI Communications. Basic MPI Communications (cont d)
Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each
More informationMessage-Passing Computing Examples
Message-Passing Computing Examples Problems with a very large degree of parallelism: Image Transformations: Shifting, Rotation, Clipping etc. Mandelbrot Set: Sequential, static assignment, dynamic work
More informationParallelization Strategies ASD Distributed Memory HPC Workshop
Parallelization Strategies ASD Distributed Memory HPC Workshop Computer Systems Group Research School of Computer Science Australian National University Canberra, Australia November 01, 2017 Day 3 Schedule
More informationPartitioning and Divide-and-Conquer Strategies
Chapter 4 Partitioning and Divide-and-Conquer Strategies Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen,
More informationData parallelism. [ any app performing the *same* operation across a data stream ]
Data parallelism [ any app performing the *same* operation across a data stream ] Contrast stretching: Version Cores Time (secs) Speedup while (step < NumSteps &&!converged) { step++; diffs = 0; foreach
More informationDistributed Memory Systems: Part IV
Chapter 5 Distributed Memory Systems: Part IV Max Planck Institute Magdeburg Jens Saak, Scientific Computing II 293/342 The Message Passing Interface is a standard for creation of parallel programs using
More informationCollective Communications
Collective Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationParallel Programming. Using MPI (Message Passing Interface)
Parallel Programming Using MPI (Message Passing Interface) Message Passing Model Simple implementation of the task/channel model Task Process Channel Message Suitable for a multicomputer Number of processes
More informationMPI. (message passing, MIMD)
MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point
More informationParallel Programming, MPI Lecture 2
Parallel Programming, MPI Lecture 2 Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Point-to-Point Communication Non Blocking PTP Communication 2 Collective
More informationCOMP 322: Fundamentals of Parallel Programming
COMP 322: Fundamentals of Parallel Programming https://wiki.rice.edu/confluence/display/parprog/comp322 Lecture 37: Introduction to MPI (contd) Vivek Sarkar Department of Computer Science Rice University
More informationCSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )
CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of
More informationMA471. Lecture 5. Collective MPI Communication
MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003
More informationLast Time. Intro to Parallel Algorithms. Parallel Search Parallel Sorting. Merge sort Sample sort
Intro to MPI Last Time Intro to Parallel Algorithms Parallel Search Parallel Sorting Merge sort Sample sort Today Network Topology Communication Primitives Message Passing Interface (MPI) Randomized Algorithms
More informationHigh-Performance Computing: MPI (ctd)
High-Performance Computing: MPI (ctd) Adrian F. Clark: alien@essex.ac.uk 2015 16 Adrian F. Clark: alien@essex.ac.uk High-Performance Computing: MPI (ctd) 2015 16 1 / 22 A reminder Last time, we started
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationThe MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola
The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola Intracommunicators COLLECTIVE COMMUNICATIONS SPD - MPI Standard Use and Implementation (5) 2 Collectives
More informationWhat s in this talk? Quick Introduction. Programming in Parallel
What s in this talk? Parallel programming methodologies - why MPI? Where can I use MPI? MPI in action Getting MPI to work at Warwick Examples MPI: Parallel Programming for Extreme Machines Si Hammond,
More informationParallel programming MPI
Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point
More informationCSE 160 Lecture 23. Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives)
CS 160 Lecture 23 Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives) Today s lecture All to all communication Application to Parallel Sorting Blocking for cache 2013
More informationMPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group
MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,
More informationLecture 5. Applications: N-body simulation, sorting, stencil methods
Lecture 5 Applications: N-body simulation, sorting, stencil methods Announcements Quiz #1 in section on 10/13 Midterm: evening of 10/30, 7:00 to 8:20 PM In Assignment 2, the following variation is suggested
More informationMessage Passing Interface
Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented
More informationNon-Blocking Communications
Non-Blocking Communications Deadlock 1 5 2 3 4 Communicator 0 2 Completion The mode of a communication determines when its constituent operations complete. - i.e. synchronous / asynchronous The form of
More informationMPI Collective communication
MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication
More informationDistributed Memory Parallel Programming
COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining
More informationIntroduction to MPI Part II Collective Communications and communicators
Introduction to MPI Part II Collective Communications and communicators Andrew Emerson, Fabio Affinito {a.emerson,f.affinito}@cineca.it SuperComputing Applications and Innovation Department Collective
More informationDistributed Memory Programming with MPI
Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,
More informationCOMP 322: Fundamentals of Parallel Programming. Lecture 34: Introduction to the Message Passing Interface (MPI), contd
COMP 322: Fundamentals of Parallel Programming Lecture 34: Introduction to the Message Passing Interface (MPI), contd Vivek Sarkar, Eric Allen Department of Computer Science, Rice University Contact email:
More informationAgenda. MPI Application Example. Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI. 1) Recap: MPI. 2) 2.
Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI Agenda 1) Recap: MPI 2) 2. Übungszettel 3) Projektpräferenzen? 4) Nächste Woche: 3. Übungszettel, Projektauswahl, Konzepte 5)
More informationAdvanced Parallel Programming
Advanced Parallel Programming Networks and All-to-All communication David Henty, Joachim Hein EPCC The University of Edinburgh Overview of this Lecture All-to-All communications MPI_Alltoall MPI_Alltoallv
More informationCollective Communication in MPI and Advanced Features
Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective
More informationIn the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.
1. Introduction to Parallel Processing In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem. a) Types of machines and computation. A conventional
More informationHigh Performance Computing
High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,
More informationNon-Blocking Communications
Non-Blocking Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationMessage Passing Interface - MPI
Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 24, 2011 Many slides adapted from lectures by
More informationOutline. Communication modes MPI Message Passing Interface Standard
MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking
More informationWorking with IITJ HPC Environment
Working with IITJ HPC Environment by Training Agenda for 23 Dec 2011 1. Understanding Directory structure of IITJ HPC 2. User vs root 3. What is bash_profile 4. How to install any source code in your user
More informationProgramming with MPI Collectives
Programming with MPI Collectives Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Collectives Classes Communication types exercise: BroadcastBarrier Gather Scatter exercise:
More informationMessage Passing Interface. most of the slides taken from Hanjun Kim
Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message
More informationL15: Putting it together: N-body (Ch. 6)!
Outline L15: Putting it together: N-body (Ch. 6)! October 30, 2012! Review MPI Communication - Blocking - Non-Blocking - One-Sided - Point-to-Point vs. Collective Chapter 6 shows two algorithms (N-body
More informationOutline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM
THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking
More informationBasic Techniques of Parallel Programming & Examples
Basic Techniques of Parallel Programming & Examples Fundamental or Common Problems with a very large degree of (data) parallelism: (PP ch. 3) Image Transformations: Shifting, Rotation, Clipping etc. Pixel-level
More informationUNIVERSITY OF MORATUWA
UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Engineering 2012 Intake Semester 8 Examination CS4532 CONCURRENT PROGRAMMING Time allowed: 2 Hours March
More informationMessage-Passing and MPI Programming
Message-Passing and MPI Programming 2.1 Transfer Procedures Datatypes and Collectives N.M. Maclaren Computing Service nmm1@cam.ac.uk ext. 34761 July 2010 These are the procedures that actually transfer
More informationCollective Communications II
Collective Communications II Ned Nedialkov McMaster University Canada SE/CS 4F03 January 2014 Outline Scatter Example: parallel A b Distributing a matrix Gather Serial A b Parallel A b Allocating memory
More informationAlgorithms and Applications
Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers
More informationCollective Communication
Lab 14 Collective Communication Lab Objective: Learn how to use collective communication to increase the efficiency of parallel programs In the lab on the Trapezoidal Rule [Lab??], we worked to increase
More informationCSC 391/691: GPU Programming Fall N-Body Problem. Copyright 2011 Samuel S. Cho
CSC 391/691: GPU Programming Fall 2011 N-Body Problem Copyright 2011 Samuel S. Cho Introduction Many physical phenomena can be simulated with a particle system where each particle interacts with all other
More informationStandard MPI - Message Passing Interface
c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those
More informationOptimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition
Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition CUG 2018, Stockholm Andreas Jocksch, Matthias Kraushaar (CSCS), David Daverio (University of Cambridge,
More informationDecomposing onto different processors
N-Body II: MPI Decomposing onto different processors Direct summation (N 2 ) - each particle needs to know about all other particles No locality possible Inherently a difficult problem to parallelize in
More informationCollective Communications I
Collective Communications I Ned Nedialkov McMaster University Canada CS/SE 4F03 January 2016 Outline Introduction Broadcast Reduce c 2013 16 Ned Nedialkov 2/14 Introduction A collective communication involves
More informationRecap of Parallelism & MPI
Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break
More informationLecture 7: More about MPI programming. Lecture 7: More about MPI programming p. 1
Lecture 7: More about MPI programming Lecture 7: More about MPI programming p. 1 Some recaps (1) One way of categorizing parallel computers is by looking at the memory configuration: In shared-memory systems
More informationIntroduction to MPI: Part II
Introduction to MPI: Part II Pawel Pomorski, University of Waterloo, SHARCNET ppomorsk@sharcnetca November 25, 2015 Summary of Part I: To write working MPI (Message Passing Interface) parallel programs
More informationLecture 7. Revisiting MPI performance & semantics Strategies for parallelizing an application Word Problems
Lecture 7 Revisiting MPI performance & semantics Strategies for parallelizing an application Word Problems Announcements Quiz #1 in section on Friday Midterm Room: SSB 106 Monday 10/30, 7:00 to 8:20 PM
More informationa. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU?
CPS 540 Fall 204 Shirley Moore, Instructor Test November 9, 204 Answers Please show all your work.. Draw a sketch of the extended von Neumann architecture for a 4-core multicore processor with three levels
More informationTopics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)
Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking
More informationIntroduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR
Introduzione al Message Passing Interface (MPI) Andrea Clematis IMATI CNR clematis@ge.imati.cnr.it Ack. & riferimenti An Introduction to MPI Parallel Programming with the Message Passing InterfaceWilliam
More informationTopic Notes: Message Passing Interface (MPI)
Computer Science 400 Parallel Processing Siena College Fall 2008 Topic Notes: Message Passing Interface (MPI) The Message Passing Interface (MPI) was created by a standards committee in the early 1990
More informationCS 179: GPU Programming. Lecture 14: Inter-process Communication
CS 179: GPU Programming Lecture 14: Inter-process Communication The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO Distributed System A collection of computers Each
More informationimmediate send and receive query the status of a communication MCS 572 Lecture 8 Introduction to Supercomputing Jan Verschelde, 9 September 2016
Data Partitioning 1 Data Partitioning functional and domain decomposition 2 Parallel Summation applying divide and conquer fanning out an array of data fanning out with MPI fanning in the results 3 An
More informationReview of MPI Part 2
Review of MPI Part Russian-German School on High Performance Computer Systems, June, 7 th until July, 6 th 005, Novosibirsk 3. Day, 9 th of June, 005 HLRS, University of Stuttgart Slide Chap. 5 Virtual
More informationFor developers. If you do need to have all processes write e.g. debug messages, you d then use channel 12 (see below).
For developers A. I/O channels in SELFE You need to exercise caution when dealing with parallel I/O especially for writing. For writing outputs, you d generally let only 1 process do the job, e.g. if(myrank==0)
More informationMPI - The Message Passing Interface
MPI - The Message Passing Interface The Message Passing Interface (MPI) was first standardized in 1994. De facto standard for distributed memory machines. All Top500 machines (http://www.top500.org) are
More informationMessage Passing with MPI
Message Passing with MPI PPCES 2016 Hristo Iliev IT Center / JARA-HPC IT Center der RWTH Aachen University Agenda Motivation Part 1 Concepts Point-to-point communication Non-blocking operations Part 2
More informationMPI Programming. Henrik R. Nagel Scientific Computing IT Division
1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Basic MPI programming Examples Finite Difference Method Finite Element Method LU Factorization Monte Carlo Method
More informationPraktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI
Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI Agenda 1) MPI für Java Installation OK? 2) 2. Übungszettel Grundidee klar? 3) Projektpräferenzen? 4) Nächste Woche: 3. Übungszettel,
More informationMasterpraktikum - Scientific Computing, High Performance Computing
Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) Thomas Auckenthaler Wolfgang Eckhardt Technische Universität München, Germany Outline Hello World P2P
More informationCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8
Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplicaiton Propose replication of vectors Develop three parallel programs, each based on a different data decomposition
More informationA Framework for Parallel Genetic Algorithms on PC Cluster
A Framework for Parallel Genetic Algorithms on PC Cluster Guangzhong Sun, Guoliang Chen Department of Computer Science and Technology University of Science and Technology of China (USTC) Hefei, Anhui 230027,
More informationLecture 9: MPI continued
Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next
More informationCPS 303 High Performance Computing
CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 5: Collective communication The numerical integration problem in Chapter 4 is not very efficient.
More informationNUMERICAL PARALLEL COMPUTING
Lecture 5, March 23, 2012: The Message Passing Interface http://people.inf.ethz.ch/iyves/pnc12/ Peter Arbenz, Andreas Adelmann Computer Science Dept, ETH Zürich E-mail: arbenz@inf.ethz.ch Paul Scherrer
More informationPARALLEL AND DISTRIBUTED COMPUTING
PARALLEL AND DISTRIBUTED COMPUTING 2013/2014 1 st Semester 1 st Exam January 7, 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers
More informationPart - II. Message Passing Interface. Dheeraj Bhardwaj
Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1 Outlines Basics of MPI How to compile and
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationScientific Computing
Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 21 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling
More informationMasterpraktikum - Scientific Computing, High Performance Computing
Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) and CG-method Michael Bader Alexander Heinecke Technische Universität München, Germany Outline MPI Hello
More informationMPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education
MPI Tutorial Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education Center for Vision, Cognition, Learning and Art, UCLA July 15 22, 2013 A few words before
More informationParallel Programming
Parallel Programming for Multicore and Cluster Systems von Thomas Rauber, Gudula Rünger 1. Auflage Parallel Programming Rauber / Rünger schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG
More informationCS 470 Spring Mike Lam, Professor. Distributed Programming & MPI
CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI
More informationCluster Computing MPI. Industrial Standard Message Passing
MPI Industrial Standard Message Passing MPI Features Industrial Standard Highly portable Widely available SPMD programming model Synchronous execution MPI Outer scope int MPI_Init( int *argc, char ** argv)
More informationIntermediate MPI features
Intermediate MPI features Advanced message passing Collective communication Topologies Group communication Forms of message passing (1) Communication modes: Standard: system decides whether message is
More informationMatrix-vector Multiplication
Matrix-vector Multiplication Review matrix-vector multiplication Propose replication of vectors Develop three parallel programs, each based on a different data decomposition Outline Sequential algorithm
More informationIntroduction to MPI. HY555 Parallel Systems and Grids Fall 2003
Introduction to MPI HY555 Parallel Systems and Grids Fall 2003 Outline MPI layout Sending and receiving messages Collective communication Datatypes An example Compiling and running Typical layout of an
More informationAdvanced MPI. Andrew Emerson
Advanced MPI Andrew Emerson (a.emerson@cineca.it) Agenda 1. One sided Communications (MPI-2) 2. Dynamic processes (MPI-2) 3. Profiling MPI and tracing 4. MPI-I/O 5. MPI-3 11/12/2015 Advanced MPI 2 One
More informationMPI MESSAGE PASSING INTERFACE
MPI MESSAGE PASSING INTERFACE David COLIGNON CÉCI - Consortium des Équipements de Calcul Intensif http://hpc.montefiore.ulg.ac.be Outline Introduction From serial source code to parallel execution MPI
More informationParallel sparse matrix algorithms - for numerical computing Matrix-vector multiplication
IIMS Postgraduate Seminar 2009 Parallel sparse matrix algorithms - for numerical computing Matrix-vector multiplication Dakuan CUI Institute of Information & Mathematical Sciences Massey University at
More informationCEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced
1 / 32 CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole
More informationCDP. MPI Derived Data Types and Collective Communication
CDP MPI Derived Data Types and Collective Communication Why Derived Data Types? Elements in an MPI message are of the same type. Complex data, requires two separate messages. Bad example: typedef struct
More informationL19: Putting it together: N-body (Ch. 6)!
Administrative L19: Putting it together: N-body (Ch. 6)! November 22, 2011! Project sign off due today, about a third of you are done (will accept it tomorrow, otherwise 5% loss on project grade) Next
More informationPractical Course Scientific Computing and Visualization
July 5, 2006 Page 1 of 21 1. Parallelization Architecture our target architecture: MIMD distributed address space machines program1 data1 program2 data2 program program3 data data3.. program(data) program1(data1)
More informationEE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 14 Parallelism in Software V
EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 14 Parallelism in Software V Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Fall 2011 --
More informationMPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 5 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More information