Advanced Message-Passing Interface (MPI)
|
|
- Hugo Quinn
- 5 years ago
- Views:
Transcription
1 Outline of the workshop 2 Advanced Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Morning: Advanced MPI Revision More on Collectives More on Point-to-Point Datatypes and Packing Communicators and Groups Topologies Outline of the workshop Afternoon: Hybrid MPI/OpenMP Theory and benchmarking Examples 3 What is MPI? MPI is a specification for a standardized library: You use its subroutines You link it with your code History: MPI- (994), MPI-2 (997), MPI-3 (202). Different implementations : 4 MPICH(2), MVAPICH(2), OpenMPI, HP-MPI,... MPI-3 contains a more modern Fortran interface ( use mpi f08 ), less prone to errors. Still very new, but implemented, for instance, in OpenMPI.7.
2 Review: MPI routines we know... Startup and exit: MPI Init, MPI Finalize Information on the processes: MPI Comm rank, MPI Comm size Point-to-point communications: MPI Send, MPI Recv MPI Irecv, MPI Isend, MPI Wait Collective communications: MPI Bcast, MPI Reduce, MPI Scatter, MPI Gather 5 Fortran PROGRAM hello USE mpi Example: Hello from N cores INTEGER err, rank, size CALL MPI Init(err) CALL MPI Comm rank (MPI COMM WORLD, & rank, ierr) CALL MPI Comm size (MPI COMM WORLD, & size, ierr) WRITE(*,*) Hello from processor,& rank, of, size CALL MPI Finalize(err) C 6 #include <stdio.h> #include <mpi.h> int main (int argc, char * argv[]) { int rank, size; MPI Init( &argc, &argv ); MPI Comm rank( MPI COMM WORLD, &rank ); MPI Comm size( MPI COMM WORLD, &size ); printf( "Hello from processor %d of %d\n", rank, size ); MPI Finalize(); return 0; END PROGRAM hello } More on Collectives 7 8 More on Collectives More on Collectives All Functions MPI Allgather, MPI Allreduce Combined MPI Gather/MPI Reduce with MPI Bcast: all ranks receive the resulting data. MPI Alltoall Everybody gathers subsequent blocks. Works like a matrix transpose. v Functions MPI Scatterv, MPI Gatherv MPI Allgatherv, MPI Alltoallv Instead of count argument, use counts and displs arrays that specify the counts and array displacements for every rank involved. MPI Barrier Synchronization. MPI Abort Abort with error code.
3 Exercise : MPI Alltoall Log in and compile the file alltoall.f90 or alltoall.c: cp /software/workshop/advancedmpi/*. module add ifort icc openmpi mpicc alltoall.c -o alltoall mpif90 alltoall.f90 -o alltoall There are errors. Can you fix them? Hint: type man MPI Alltoall to obtain the syntax for the MPI function. To submit the job, use msub -q class alltoall.pbs 9 Exercise 2: Matrix-vector multiplication Complete the multiplication in mv.f90 or mv.c using MPI Allgatherv. Rows of the matrix are distributed among processors. Example: rows and 2 in rank 0, row 3 in rank : v = Ax = a, a,2 a,3 a 2, a 2,2 a 2,3 x x 2 a 3, a 3,2 a 3,3 x 3 a,x + a,2 x 2 + a,3 x 3 a 2, x + a 2,2 x 2 + a 2,3 x 3 v v 2 a 3, x + a 3,2 x 2 + a 3,3 x 3 v 3 0 Exercise 3: Matrix-vector multiplication Complete the multiplication in mv2.f90 or mv2.c using MPI Alltoallv. Columns of the matrix and input vector are distributed among processors. Example: columns and 2 in rank 0, column 3 in rank : v = Ax = a, a,2 a,3 a 2, a 2,2 a 2,3 x x 2 a 3, a 3,2 a 3,3 x 3 a,x + a,2 x 2 a 2, x + a 2,2 x 2 + a,3x 3 a 2,3 x 3 a 3, x + a 3,2 x 2 a 3,3 x 3 Exercise 3: Matrix-vector multiplication a,x + a,2 x 2 a 2, x + a 2,2 x 2 + a,3x 3 a 2,3 x 3 a 3, x + a 3,2 x 2 a 3,3 x 3 (after MPI Alltoallv) a,x + a,2 x 2 a 2, x + a 2,2 x 2 + a,3x 3 a 2,3 x 3 v v 2 a 3, x + a 3,2 x 2 a 3,3 x 3 v 3 Note: could also use MPI Reduce or MPI Allreduce here. 2
4 3 More on point-to-point MPI Ssend: Synchronized, force to complete only when matching receive posted. MPI Bsend: Buffered using user-provided buffer. MPI Rsend: Ready send, must go after matching receive was posted. Rarely used. MPI Issend, MPI Ibsend, MPI Irsend: Asynchronous versions. MPI Sendrecv[ replace]: Sends and receives, avoiding deadlock (like MPI Irecv, MPI Isend, MPI Wait) Note: generally plain MPI Recv and MPI Send are best. Packing and Datatypes These functions create new data types: MPI Type contiguous, MPI Type vector, MPI Type indexed: Transfer parts of a matrix directly. MPI Type struct: Transfer a struct. MPI Pack, MPI Unpack: Pack and send heterogenous data. Note: double precision variables can (on all current machines) contain 53-bit integers without loss of precision. So an alternative is to pack manually into a double precision array. 4 Pack example 5 Communicators 6 integer m double precision x(m) call MPI Pack size(,mpi INTEGER,MPI COMM WORLD,size int,ierr) call MPI Pack size(m,mpi DOUBLE PRECISION,MPI COMM WORLD,size double,ierr) bufsize = size int + size double allocate(buffer(bufsize)) pos = 0 if(rank==0)then call MPI Pack(m,,MPI INTEGER,buffer,bufsize,pos,MPI COMM WORLD,ierr) call MPI Pack(x,m,MPI DOUBLE PRECISION,buffer,bufsize,pos, & MPI COMM WORLD,ierr) endif call MPI Bcast(buffer,bufsize,MPI PACKED,0,MPI COMM WORLD,ierr) if(rank>0)then call MPI Unpack(buffer,bufsize,pos,m,, & MPI INTEGER,MPI COMM WORLD,ierr) call MPI Unpack(buffer,bufsize,pos,x,m, & MPI DOUBLE PRECISION,MPI COMM WORLD,ierr) endif So far only used MPI COMM WORLD. Can split this communicator into subsets, to allow collective operations on a subset of ranks. Easiest to use: MPI Comm split(comm, color, key, newcomm[, ierror]): comm: old communicator color: all processes with the same color go into the same communicator key: rank within new communicator (can be 0 for automatic determination) newcomm: resulting new communicator
5 7 Topologies 8 Topologies Topologies group processes in an n-dimensional grid (Cartesian) or graph. Here we restrict to a Cartesian 2D grid. Helps programmer and (sometimes) hardware. MPI Dims create(p, n, dims): create balanced n-dimensional grid for p processes in n-dimensional array dims. MPI Cart create(oldcomm, n, dims, periodic, reorder, newcomm): Creates new communicator for grid with n dimensions in dims, with implied periodicity in array periodic. reorder specifies whether the ranks may change for the new communicator. MPI Cart rank(comm, coords, rank): Given n-dimensional coordinates, return rank. MPI Cart coords(comm, rank, n, coords): Given the rank, return n coordinates. Exercise 4: Matrix-vector multiplication Complete the multiplication in mv3.f90 or mv3.c using a Cartesian topology. Blocks of the matrix are distributed among processors. Example: rows 2, columns 2 in rank 0 (0,0) rows 2, column 3 in rank (0,) row 3, columns 2 in rank 2 (,0) row 3, column 3 in rank 3 (,) v = Ax = a, a,2 a,3 a 2, a 2,2 a 2,3 x x 2 a 3, a 3,2 a 3,3 x 3 9 Exercise 4: Matrix-vector multiplication v = Ax = a, a,2 a,3 a 2, a 2,2 a 2,3 x x 2 a 3, a 3,2 a 3,3 x 3 a,x + a,2 x 2 a 2, x + a 2,2 x 2 + a,3x 3 a 2,3 x 3 v v 2 a 3, x + a 3,2 x 2 a 3,3 x 3 v 3 Use MPI Reduce call to obtain v. Advantage: both vectors and the matrix can be distributed in memory. 20
6 2 Hybrid MPI and OpenMP First step: measure efficiency 22 Most clusters, including Guillimin, contain multicore nodes. For Guillimin, 2 cores per node. Idea: use hybrid MPI and OpenMP: MPI for internode communication, OpenMP intranode, eliminating intranode communication. May or may not run faster than pure MPI code. Insert MPI Wtime calls to measure wall clock time. Run for various values of p to determine scaling. Amdahl s law 23 Let f be the fraction of operations in a computation that must be performed sequentially, where 0 f. The maximum speedup ψ achievable by a parallel computer with p processors performing the computation is ψ f + ( f )/p Example: if f = than the maximum speedup is 285 for p, and for p = 024, ψ = 223. Karp-Flatt metric 24 We can also determine the experimentally determined serial fraction e given measured speedup ψ. /ψ /p e = /p Example: p = 2, ψ =.95, e = Example: p = 024, ψ = 200, e =
7 25 When to consider hybrid? If the serial portion is too expensive to parallelize using MPI but can be done using threads. Definitely! If the problem does not scale well due to excessive communication (e increases significantly as p increases). Maybe. Perhaps MPI performance can be improved: Fewer messages (less latency). Shorter messages. Replace communication by computation where possible. Example: for broadcasts, tree-like communication much more efficient than sending from master process directly to all other processes (fewer messages in master process). Analysts are here to help you optimize your code! When to consider hybrid? 26 Otherwise pure MPI can be just as fast. Also, must look out for OpenMP pitfalls: caching, false sharing, synchronization overhead, races. Example job script for Guillimin For 48 CPU cores on 4 nodes with 2 cores each: #!/bin/bash #PBS -l nodes=4:ppn=2 #PBS -V #PBS -N jobname cd $PBS O WORKDIR export IPATH NO CPUAFFINITY= export OMP NUM THREADS=2 mpiexec -n 4 -npernode./yourcode 27 Example job script for Guillimin Example job script for Guillimin The particular features of this submission script are as follows: export IPATH NO CPUAFFINITY=: tells the underlying software not to pin each process to one CPU core, which would effectively disable OpenMP parallelism. export OMP NUM THREADS=2: specifies the number of threads used for OpenMP for all 4 processes. mpiexec -n 4 -npernode./yourcode: starts program yourcode, compiled with MPI, in parallel on 4 nodes, with process per node. 28
8 29 OpenMP example: parallel for (C) OpenMP example: parallel do (Fortran) 30 Example: void addvectors(const int *a, const int *b, int *c, const int n) { int i; #pragma omp parallel for for (i = 0; i < n; i++) c[i] = a[i] + b[i]; } Here i is automatically made private because it is the loop variable. All other variables are shared. Loop split between threads, for example, for n=0, thread 0 does index 0 to 4 and thread does index 5 to 9. Example: subroutine addvectors(a, b, c, n) integer n, a(n), b(n), c(n) integer i!$omp PARALLEL DO do i =, n c(i) = a(i) + b(i) enddo!$omp END PARALLEL DO end subroutine Here i is automatically made private because it is the loop variable. All other variables are shared. Loop split between threads, for example, for n=0, thread 0 does index 0 to 4 and thread does index 5 to 9. Exercise 5: Matrix-vector multiplication 3 Consider again mv.c and mv.f90. Add a parallel for or parallel do pragma to the inner for/do loop to obtain a hybrid code, and submit. Measure its performance. Optional: do the same for the other 2 matrix-vector multiplication codes.
Practical Introduction to Message-Passing Interface (MPI)
1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationPractical Introduction to
1 2 Outline of the workshop Practical Introduction to What is ScaleMP? When do we need it? How do we run codes on the ScaleMP node on the ScaleMP Guillimin cluster? How to run programs efficiently on ScaleMP?
More informationECE 574 Cluster Computing Lecture 13
ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*
More informationIntroduction to Lab Series DMS & MPI
TDDC 78 Labs: Memory-based Taxonomy Introduction to Lab Series DMS & Mikhail Chalabine Linköping University Memory Lab(s) Use Distributed 1 Shared 2 3 Posix threads OpenMP Distributed 4 2011 LAB 5 (tools)
More informationIntroduction to MPI, the Message Passing Library
Chapter 3, p. 1/57 Basics of Basic Messages -To-? Introduction to, the Message Passing Library School of Engineering Sciences Computations for Large-Scale Problems I Chapter 3, p. 2/57 Outline Basics of
More informationHPC Parallel Programing Multi-node Computation with MPI - I
HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright
More informationPractical Introduction to
Outline of the workshop Practical Introduction to Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your serial code What is OpenMP? Why do
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Practical Introduction to Message-Passing Interface (MPI) October 1st, 2015 By: Pier-Luc St-Onge Partners and Sponsors 2 Setup for the workshop 1. Get a user ID and password paper (provided in class):
More information. Programming Distributed Memory Machines in MPI and UPC. Kenjiro Taura. University of Tokyo
.. Programming Distributed Memory Machines in MPI and UPC Kenjiro Taura University of Tokyo 1 / 57 Distributed memory machines chip (socket, node, CPU) (physical) core hardware thread (virtual core, CPU)
More informationECE Spring 2017 Exam 2
ECE 56300 Spring 2017 Exam 2 All questions are worth 5 points. For isoefficiency questions, do not worry about breaking costs down to t c, t w and t s. Question 1. Innovative Big Machines has developed
More informationWelcome to the introductory workshop in MPI programming at UNICC
Welcome...... to the introductory workshop in MPI programming at UNICC Schedule: 08.00-12.00 Hard work and a short coffee break Scope of the workshop: We will go through the basics of MPI-programming and
More informationMessage Passing Interface: Basic Course
Overview of DM- HPC2N, UmeåUniversity, 901 87, Sweden. April 23, 2015 Table of contents Overview of DM- 1 Overview of DM- Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2
More informationIntroduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign
Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s
More informationOptimization of MPI Applications Rolf Rabenseifner
Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization
More informationSlides prepared by : Farzana Rahman 1
Introduction to MPI 1 Background on MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers, and parallel programmers Used to create parallel programs based
More informationMPI Optimisation. Advanced Parallel Programming. David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh
MPI Optimisation Advanced Parallel Programming David Henty, Iain Bethune, Dan Holmes EPCC, University of Edinburgh Overview Can divide overheads up into four main categories: Lack of parallelism Load imbalance
More informationHybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.
Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism
More informationMPI Collective communication
MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication
More informationIntroduction to parallel computing with MPI
Introduction to parallel computing with MPI Sergiy Bubin Department of Physics Nazarbayev University Distributed Memory Environment image credit: LLNL Hybrid Memory Environment Most modern clusters and
More informationCollective Communication in MPI and Advanced Features
Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective
More informationIntroduction to the Message Passing Interface (MPI)
Introduction to the Message Passing Interface (MPI) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction to the Message Passing Interface (MPI) Spring 2018
More informationLecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1
Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationTutorial: parallel coding MPI
Tutorial: parallel coding MPI Pascal Viot September 12, 2018 Pascal Viot Tutorial: parallel coding MPI September 12, 2018 1 / 24 Generalities The individual power of a processor is still growing, but at
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationReview of MPI Part 2
Review of MPI Part Russian-German School on High Performance Computer Systems, June, 7 th until July, 6 th 005, Novosibirsk 3. Day, 9 th of June, 005 HLRS, University of Stuttgart Slide Chap. 5 Virtual
More informationMasterpraktikum - Scientific Computing, High Performance Computing
Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) Thomas Auckenthaler Wolfgang Eckhardt Technische Universität München, Germany Outline Hello World P2P
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing
More informationCS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with
More informationParallel Programming
Parallel Programming for Multicore and Cluster Systems von Thomas Rauber, Gudula Rünger 1. Auflage Parallel Programming Rauber / Rünger schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG
More informationMPI and OpenMP. Mark Bull EPCC, University of Edinburgh
1 MPI and OpenMP Mark Bull EPCC, University of Edinburgh markb@epcc.ed.ac.uk 2 Overview Motivation Potential advantages of MPI + OpenMP Problems with MPI + OpenMP Styles of MPI + OpenMP programming MPI
More informationCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8
Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplicaiton Propose replication of vectors Develop three parallel programs, each based on a different data decomposition
More informationParallel programming MPI
Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18
More informationCS 426. Building and Running a Parallel Application
CS 426 Building and Running a Parallel Application 1 Task/Channel Model Design Efficient Parallel Programs (or Algorithms) Mainly for distributed memory systems (e.g. Clusters) Break Parallel Computations
More informationProgramming with MPI
Programming with MPI p. 1/?? Programming with MPI More on Datatypes and Collectives Nick Maclaren nmm1@cam.ac.uk May 2008 Programming with MPI p. 2/?? Less Basic Collective Use A few important facilities
More informationThe Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing
The Message Passing Interface (MPI) TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Parallelism Decompose the execution into several tasks according to the work to be done: Function/Task
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationMasterpraktikum - Scientific Computing, High Performance Computing
Masterpraktikum - Scientific Computing, High Performance Computing Message Passing Interface (MPI) and CG-method Michael Bader Alexander Heinecke Technische Universität München, Germany Outline MPI Hello
More informationPart - II. Message Passing Interface. Dheeraj Bhardwaj
Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1 Outlines Basics of MPI How to compile and
More informationMessage Passing Interface
MPSoC Architectures MPI Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr Message Passing Interface API for distributed-memory programming parallel code that runs across
More information30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy
Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How
More informationECE 563 Spring 2016, Second Exam
1 ECE 563 Spring 2016, Second Exam DO NOT START WORKING ON THIS UNTIL TOLD TO DO SO. LEAVE IT ON THE DESK. THE LAST PAGE IS THE ANSWER SHEET. TEAR IT OFF AND PUT ALL ANSWERS THERE. TURN IN BOTH PARTS OF
More informationMPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group
MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationParallele Numerik. Blatt 1
Universität Konstanz FB Mathematik & Statistik Prof. Dr. M. Junk Dr. Z. Yang Ausgabe: 02. Mai; SS08 Parallele Numerik Blatt 1 As a first step, we consider two basic problems. Hints for the realization
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 2 Part I Programming
More informationHPC Workshop University of Kentucky May 9, 2007 May 10, 2007
HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Part 3 Parallel Programming Parallel Programming Concepts Amdahl s Law Parallel Programming Models Tools Compiler (Intel) Math Libraries (Intel)
More informationProgramming Scalable Systems with MPI. Clemens Grelck, University of Amsterdam
Clemens Grelck University of Amsterdam UvA / SurfSARA High Performance Computing and Big Data Course June 2014 Parallel Programming with Compiler Directives: OpenMP Message Passing Gentle Introduction
More information15-440: Recitation 8
15-440: Recitation 8 School of Computer Science Carnegie Mellon University, Qatar Fall 2013 Date: Oct 31, 2013 I- Intended Learning Outcome (ILO): The ILO of this recitation is: Apply parallel programs
More informationAn Introduction to MPI
An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory 1 Outline Background The message-passing model Origins of MPI and current
More informationImplementation of Parallelization
Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization
More informationECE 563 Spring 2012 First Exam
ECE 563 Spring 2012 First Exam version 1 This is a take-home test. You must work, if found cheating you will be failed in the course and you will be turned in to the Dean of Students. To make it easy not
More informationProgramming Scalable Systems with MPI. UvA / SURFsara High Performance Computing and Big Data. Clemens Grelck, University of Amsterdam
Clemens Grelck University of Amsterdam UvA / SURFsara High Performance Computing and Big Data Message Passing as a Programming Paradigm Gentle Introduction to MPI Point-to-point Communication Message Passing
More informationOpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system
OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives
More informationTopic Notes: Message Passing Interface (MPI)
Computer Science 400 Parallel Processing Siena College Fall 2008 Topic Notes: Message Passing Interface (MPI) The Message Passing Interface (MPI) was created by a standards committee in the early 1990
More informationSzámítogépes modellezés labor (MSc)
Számítogépes modellezés labor (MSc) Running Simulations on Supercomputers Gábor Rácz Physics of Complex Systems Department Eötvös Loránd University, Budapest September 19, 2018, Budapest, Hungary Outline
More informationPraktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI
Praktikum: Verteiltes Rechnen und Parallelprogrammierung Introduction to MPI Agenda 1) MPI für Java Installation OK? 2) 2. Übungszettel Grundidee klar? 3) Projektpräferenzen? 4) Nächste Woche: 3. Übungszettel,
More informationLecture 16: Recapitulations. Lecture 16: Recapitulations p. 1
Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently
More informationParallel Computing MPI. Christoph Beetz. September 7, Parallel Computing. Introduction. Parallel Computing
Christoph Beetz Theoretical Physics I, Ruhr-Universität Bochum September 7, 2010 What is Basic Overview to code What is Basic Why Calculation is too big to fit in memory of one machine domain on several
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationmpi4py HPC Python R. Todd Evans January 23, 2015
mpi4py HPC Python R. Todd Evans rtevans@tacc.utexas.edu January 23, 2015 What is MPI Message Passing Interface Most useful on distributed memory machines Many implementations, interfaces in C/C++/Fortran
More informationCornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction
1 of 18 11/1/2006 3:59 PM Cornell Theory Center Discussion: MPI Collective Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate
More informationITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...
ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure
More informationECE 574 Cluster Computing Lecture 13
ECE 574 Cluster Computing Lecture 13 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 15 October 2015 Announcements Homework #3 and #4 Grades out soon Homework #5 will be posted
More informationMPI Performance Snapshot
User's Guide 2014-2015 Intel Corporation Legal Information No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all
More informationParallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplication Propose replication of vectors Develop three
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationMatrix-vector Multiplication
Matrix-vector Multiplication Review matrix-vector multiplication Propose replication of vectors Develop three parallel programs, each based on a different data decomposition Outline Sequential algorithm
More informationOpen Multi-Processing: Basic Course
HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels
More informationParallel Computing and the MPI environment
Parallel Computing and the MPI environment Claudio Chiaruttini Dipartimento di Matematica e Informatica Centro Interdipartimentale per le Scienze Computazionali (CISC) Università di Trieste http://www.dmi.units.it/~chiarutt/didattica/parallela
More informationMPI Tutorial. Shao-Ching Huang. High Performance Computing Group UCLA Institute for Digital Research and Education
MPI Tutorial Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education Center for Vision, Cognition, Learning and Art, UCLA July 15 22, 2013 A few words before
More informationCommunication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures
Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart
More informationChip Multiprocessors COMP Lecture 9 - OpenMP & MPI
Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationThe Message Passing Model
Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationFirst day. Basics of parallel programming. RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS
First day Basics of parallel programming RIKEN CCS HPC Summer School Hiroya Matsuba, RIKEN CCS Today s schedule: Basics of parallel programming 7/22 AM: Lecture Goals Understand the design of typical parallel
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationMessage Passing Interface - MPI
Message Passing Interface - MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 24, 2011 Many slides adapted from lectures by
More informationProgramming with MPI Collectives
Programming with MPI Collectives Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Collectives Classes Communication types exercise: BroadcastBarrier Gather Scatter exercise:
More informationWhat s in this talk? Quick Introduction. Programming in Parallel
What s in this talk? Parallel programming methodologies - why MPI? Where can I use MPI? MPI in action Getting MPI to work at Warwick Examples MPI: Parallel Programming for Extreme Machines Si Hammond,
More informationLecture 9: MPI continued
Lecture 9: MPI continued David Bindel 27 Sep 2011 Logistics Matrix multiply is done! Still have to run. Small HW 2 will be up before lecture on Thursday, due next Tuesday. Project 2 will be posted next
More informationIntroduction to MPI part II. Fabio AFFINITO
Introduction to MPI part II Fabio AFFINITO (f.affinito@cineca.it) Collective communications Communications involving a group of processes. They are called by all the ranks involved in a communicator (or
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationMPI 8. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 8 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationOpenMP and MPI parallelization
OpenMP and MPI parallelization Gundolf Haase Institute for Mathematics and Scientific Computing University of Graz, Austria Chile, Jan. 2015 OpenMP for our example OpenMP generation in code Determine matrix
More informationDistributed Memory Programming With MPI Computer Lab Exercises
Distributed Memory Programming With MPI Computer Lab Exercises Advanced Computational Science II John Burkardt Department of Scientific Computing Florida State University http://people.sc.fsu.edu/ jburkardt/classes/acs2
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationA few words about MPI (Message Passing Interface) T. Edwald 10 June 2008
A few words about MPI (Message Passing Interface) T. Edwald 10 June 2008 1 Overview Introduction and very short historical review MPI - as simple as it comes Communications Process Topologies (I have no
More informationCPS343 Parallel and High Performance Computing Project 1 Spring 2018
CPS343 Parallel and High Performance Computing Project 1 Spring 2018 Assignment Write a program using OpenMP to compute the estimate of the dominant eigenvalue of a matrix Due: Wednesday March 21 The program
More informationParallel Computing. Lecture 17: OpenMP Last Touch
CSCI-UA.0480-003 Parallel Computing Lecture 17: OpenMP Last Touch Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Some slides from here are adopted from: Yun (Helen) He and Chris Ding
More informationTopologies in MPI. Instructor: Dr. M. Taufer
Topologies in MPI Instructor: Dr. M. Taufer WS2004/2005 Topology We can associate additional information (beyond the group and the context) to a communicator. A linear ranking of processes may not adequately
More informationMPI Lab. How to split a problem across multiple processors Broadcasting input to other nodes Using MPI_Reduce to accumulate partial sums
MPI Lab Parallelization (Calculating π in parallel) How to split a problem across multiple processors Broadcasting input to other nodes Using MPI_Reduce to accumulate partial sums Sharing Data Across Processors
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationIntroduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014
Introduction to MPI Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Topics Introduction MPI Model and Basic Calls MPI Communication Summary 2 Topics Introduction
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More information