Examples. Last Time. Speedup & Parallel Efficiency 200% Observations. Outline. Lecture 6. Queuing Commands Introduction to MPI

Size: px
Start display at page:

Download "Examples. Last Time. Speedup & Parallel Efficiency 200% Observations. Outline. Lecture 6. Queuing Commands Introduction to MPI"

Transcription

1 Lat ime Queuing Command Introduction to MPI Information Enquiry Baic Collective Communication Some embarraingly arallel examle Defined arallel efficiency & eedu Examle 03.c: Brute-force method to calculate ummation from 1 to a ecified number 04.c: Integration of a function uing traezoidal rule 05.c: Random number generation All thee examle are known a embarraingly/leaingly arallel, which exchange little information at beginning, and exchange little information at the end. hee examle demontrate excellent arallel efficiency, a will be demontrated. 1 2 Seedu Seedu & Parallel Efficiency 200% 180% 160% 140% 120% 100% 80% 60% 40% 20% Parallel Efficiency SeedU, um to 1E8 SeedU, um to 1E9 Perfect Seedu Efficiency, um to 1E8 Efficiency, um to 1E9 NR Seedu NP NR Efficiency Seedu NP Obervation he rogram eem correct! he anwer doen t change with number of roceor Very good arallel efficiency i oberved! hee examle (03, 04, 05.c) are known a embarraingly (or leaingly) arallel! 0 0% No. of Proceor, NP NR : Comutation time uing NR roceor NP : Comutation time uing NP roceor NR: Number of roceor in reference configuration 3 NP: Number of roceor ued for comutation. 4 Outline wo famou law in arallel comuting Lecture 6 More on collective communication MPI Programming (II) wo famou law in arallel comuting More on collective communication Baic oint-to-oint communication 5 6 1

2 wo famou law in arallel comuting Gutafon' Law 7 Maximum eedu i governed by the erial fraction (non-arallelizable art) of a rogram A tak can be divided into arallel () and nonarallel (, erial) fraction: 1 1 Seedu NP 1 NP P Efficiency P 8 1 P 1 Seedu 1 1 NP P P If =0 eedu = P, efficiency=1 If /= Seedu Efficiency 100% 90% 80% Efficiency P P 1 Seedu % 60% 50% 40% 30% 20% 10% Parallel Efficiency 9 0 0% NP/NR 10 hu, we need to minimize a much a oible = erial code + communication communication : Communication overhead, may increae with NP One way to reduce for communication i overla communication with comutation o be covered next time when we talk about nonblocking communication #1 Suercomuter: 129,600 roceor 11 htt://uload.wikimedia.org/wikiedia/common/6/6b/amdahllaw.ng 12 2

3 Gutafon' Law A the roblem to be olved increae in ize, the erial fraction decreae and arallel fraction increae decreae 1 Seedu 1 1 NP P P Efficiency P P 1 13 A Driving Metahor Suoe a car i traveling between two citie 60 mile aart, and ha already ent one hour traveling half the ditance at 30 mh. Amdahl' Law aroximately ugget: No matter how fat you drive the lat half, it i imoible to achieve 90 mh average before reaching the econd city. Since it ha already taken you 1 hour and you only have a ditance of 60 mile total; going infinitely fat you would only achieve 60 mh. Gutafon' Law aroximately tate: Given enough time and ditance to travel, the car' average eed can alway eventually reach 90mh, no matter how long or how lowly it ha already traveled. For examle, in the twocitie cae thi could be achieved by driving at 150 mh for an additional hour. 14 htt://en.wikiedia.org/wiki/gutafon%27_law MPI Summary Information Enquiry MPI_Initialize() MPI_Get_roceor_name() MPI_Get_verion() MPI_Comm_ize() MPI_Comm_rank() MPI_Wtime() MPI_Finalize() Collective Communication (II) Collective communication MPI_Bcat() MPI_Reduce() Collective Communication MPI_Bat() / MPI_Reduce Collective communication MPI_Bcat(), MPI_Reduce() MPI_Scatter(), MPI_Gather() MPI_Allgather(), MPI_Allreduce() MPI_Alltoall() MPI_Barrier(), MPI_Scan() c erform vector inner roduct Broadcat reviited: MPI_Bcat() to broadcat the vector to all node Each node decide which ortion of the vector to work on Perform calculation MPI_Reduce() to um u the inner dot from different ortion of the vector hu thi i a bad arallel algorithm for erforming vector inner roduct. Should really ue MPI_Scatterv()! 18 3

4 P0 um P1 um allsum P2 um P3 um MPI Collective Communication MPI_Scatter() / MPI_Gather() All collective communication can be ued to tranmit equal-ized array unequal-ized array For dividing/grouing and ditributing/gathering array or vector (1-D array) to/from all node within the ecified communicator. Each node only receive art of the array Each node receive/end equal amount of data Effect = gather + broadcat, but better int MPI_Scatter ( void *endbuf, int endcnt, MPI_Datatye endtye, void *recvbuf, int recvcnt, MPI_Datatye recvtye, 21 int MPI_Gather ( void *endbuf, int endcnt, MPI_Datatye endtye, void *recvbuf, int recvcount, MPI_Datatye recvtye, 22 Before Scatter Oeration After Scatter Oeration

5 07.c hi i an examle demontrating the ue of MPI_Scatter() / MPI_Gather() Generate ome number on the root node Scatter generated number onto all node Each node rint out what they have Each node calculate ummation of the data the own Gather ummation from all node Root rint out the data after gathering MPI_Scatterv()/ MPI_Gatherv() For dividing/grouing and ditributing/gathering array or vector (1-D array) to/from all node within the ecified communicator Each node only receive art of the array Each node doe not necearily receive/end equal amount of data int MPI_Scatterv ( void *endbuf, int *endcnt, int *dil, MPI_Datatye endtye, void *recvbuf, int recvcnt, MPI_Datatye recvtye, 25 int MPI_Gatherv ( void *endbuf, int endcnt, MPI_Datatye endtye, void *recvbuf, int *recvcnt, int *dil, MPI_Datatye recvtye, c *endcnt hi i a rogram erforming vector normalization (make the length of the vector to be unity). *dil Other MPI collective function int MPI_Alltoall( void* endbuf, int cnt, MPI_Datatye endtye, void* recvbuf, int rcnt, MPI_Datatye recvtye, int MPI_Alltoallv( void* endbuf, int *cnt, int *dil, MPI_Datatye endtye, void* recvbuf, int *rcnt, int *rdil, MPI_Datatye recvtye, int MPI_Allgather( void* endbuf, int cnt, MPI_Datatye endtye, void* recvbuf, int rcnt, MPI_Datatye recvtye, Other MPI collective function void MPI_Barrier( commutative : 1: a#b = b#a 0: a#b!= b#a int MPI_Scan( void* endbuf, void* recvbuf, int count, MPI_Datatye datatye, MPI_O o, int MPI_O_create( MPI_Uer_function *function, int commute, MPI_O *o) int MPI_O_free(MPI_O *o) int MPI_Allgatherv( void* endbuf, int cnt, MPI_Datatye endtye, void* recvbuf, int *rcnt, int *rdil, MPI_Datatye recvtye, int MPI_Reduce_catter( void* endbuf, void* recvbuf, int *rcnt, MPI_Datatye datatye, MPI_O o,

6 Synchronization MPI_Barrier() ued to ynchronize all rocee have called thi ubroutine. int MPI_Barrier( Procee tarted u on different machine run indeendently from each other. herefore, different machine may be running different ortion of a code in an intance, and running at different eed. It i ometime neceary to enure all rocee are at the ame oint or at the ame ace. For examle, when friend go out for a long tri in different car or motorcycle, it i neceary to et u ome ynchronization oint o that everyone will reach the detination. (eecially when there are driver who doen t know how to get there). Blocking communication uually reult in ynchronization. Examle: 09a_noBarrier.c v. 09b_barrier.c (comare the outut) 31 MPI_Scan() Perform a can ( artial reduction ) of data Alo called all-refix-um ; int MPI_Scan(void *endbuf, void *recvbuf, int count, MPI_Datatye datatye, MPI_O o, P0: [ 0 1 2] P1: [ 3 4 5] P2: [ 6 7 8] P3: [ ] Examle: 10_can.c Count=3 MPI_SUM P0: [ 0 1 2] P1: [ 3 5 7] P2: [ ] P3: [ ] 32 Summary Information Enquiry MPI_Initialize() MPI_Get_roceor_name() MPI_Get_verion() MPI_Comm_ize() MPI_Comm_rank() MPI_Wtime() MPI_Finalize() Collective communication MPI_Bcat(), MPI_Reduce() MPI_Scatter(), MPI_Gather() MPI_Allgather(), MPI_Allreduce() MPI_Barrier(), MPI_Scan() MPI_Alltoall() 33 Aignment #4 34 6

Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing.

Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing. Toics Our Cluster Lecture 4 MPI Programming (I) MPI Introduction Information inquery Broadcast / Reduce 1 2 What is a cluster? A cluster is a dedicated resource for running comutational tasks. A collection

More information

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III)

Topics. Lecture 7. Review. Other MPI collective functions. Collective Communication (cont d) MPI Programming (III) Topics Lecture 7 MPI Programming (III) Collective communication (cont d) Point-to-point communication Basic point-to-point communication Non-blocking point-to-point communication Four modes of blocking

More information

Assignment #3. Assignment #3. Assignment #3. What is a cluster? IT Group Cluster2 (1/2) IT Group Cluster2

Assignment #3. Assignment #3. Assignment #3. What is a cluster? IT Group Cluster2 (1/2) IT Group Cluster2 Assignment #3 Assignment #3 How to count FLOP? A = A + b * c 2 floating oint oerations for(int i=0;i

More information

MA471. Lecture 5. Collective MPI Communication

MA471. Lecture 5. Collective MPI Communication MA471 Lecture 5 Collective MPI Communication Today: When all the processes want to send, receive or both Excellent website for MPI command syntax available at: http://www-unix.mcs.anl.gov/mpi/www/ 9/10/2003

More information

Outline. Communication modes MPI Message Passing Interface Standard

Outline. Communication modes MPI Message Passing Interface Standard MPI THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

MPI Collective communication

MPI Collective communication MPI Collective communication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) MPI Collective communication Spring 2018 1 / 43 Outline 1 MPI Collective communication

More information

Scalasca performance properties The metrics tour

Scalasca performance properties The metrics tour Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware

More information

Programming with MPI Collectives

Programming with MPI Collectives Programming with MPI Collectives Jan Thorbecke Type to enter text Delft University of Technology Challenge the future Collectives Classes Communication types exercise: BroadcastBarrier Gather Scatter exercise:

More information

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction

Cornell Theory Center. Discussion: MPI Collective Communication I. Table of Contents. 1. Introduction 1 of 18 11/1/2006 3:59 PM Cornell Theory Center Discussion: MPI Collective Communication I This is the in-depth discussion layer of a two-part module. For an explanation of the layers and how to navigate

More information

Basic MPI Communications. Basic MPI Communications (cont d)

Basic MPI Communications. Basic MPI Communications (cont d) Basic MPI Communications MPI provides two non-blocking routines: MPI_Isend(buf,cnt,type,dst,tag,comm,reqHandle) buf: source of data to be sent cnt: number of data elements to be sent type: type of each

More information

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM

Outline. Communication modes MPI Message Passing Interface Standard. Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa Tp.HCM THOAI NAM Outline Communication modes MPI Message Passing Interface Standard TERMs (1) Blocking If return from the procedure indicates the user is allowed to reuse resources specified in the call Non-blocking

More information

Distributed Memory Programming with MPI

Distributed Memory Programming with MPI Distributed Memory Programming with MPI Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna moreno.marzolla@unibo.it Algoritmi Avanzati--modulo 2 2 Credits Peter Pacheco,

More information

High Performance Computing

High Performance Computing High Performance Computing Course Notes 2009-2010 2010 Message Passing Programming II 1 Communications Point-to-point communications: involving exact two processes, one sender and one receiver For example,

More information

Lecture Outline. Global flow analysis. Global Optimization. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Lecture Outline. Global flow analysis. Global Optimization. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization Lecture Outline Global flow analyi Global Optimization Global contant propagation Livene analyi Adapted from Lecture by Prof. Alex Aiken and George Necula (UCB) CS781(Praad) L27OP 1 CS781(Praad) L27OP

More information

Recap of Parallelism & MPI

Recap of Parallelism & MPI Recap of Parallelism & MPI Chris Brady Heather Ratcliffe The Angry Penguin, used under creative commons licence from Swantje Hess and Jannis Pohlmann. Warwick RSE 13/12/2017 Parallel programming Break

More information

Lecture 14: Minimum Spanning Tree I

Lecture 14: Minimum Spanning Tree I COMPSCI 0: Deign and Analyi of Algorithm October 4, 07 Lecture 4: Minimum Spanning Tree I Lecturer: Rong Ge Scribe: Fred Zhang Overview Thi lecture we finih our dicuion of the hortet path problem and introduce

More information

CS 6230: High-Performance Computing and Parallelization Introduction to MPI

CS 6230: High-Performance Computing and Parallelization Introduction to MPI CS 6230: High-Performance Computing and Parallelization Introduction to MPI Dr. Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

More information

MPI. (message passing, MIMD)

MPI. (message passing, MIMD) MPI (message passing, MIMD) What is MPI? a message-passing library specification extension of C/C++ (and Fortran) message passing for distributed memory parallel programming Features of MPI Point-to-point

More information

Scalasca performance properties The metrics tour

Scalasca performance properties The metrics tour Scalasca performance properties The metrics tour Markus Geimer m.geimer@fz-juelich.de Scalasca analysis result Generic metrics Generic metrics Time Total CPU allocation time Execution Overhead Visits Hardware

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2017 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Quadrilaterals. Learning Objectives. Pre-Activity

Quadrilaterals. Learning Objectives. Pre-Activity Section 3.4 Pre-Activity Preparation Quadrilateral Intereting geometric hape and pattern are all around u when we tart looking for them. Examine a row of fencing or the tiling deign at the wimming pool.

More information

Modeling of communication complexity in parallel computing

Modeling of communication complexity in parallel computing American Journal of Netork and Communication 2014; 3(5-1): 29-42 Publihed online July 30, 2014 (htt://.cienceublihinggrou.com/j/ajnc) doi: 10.11648/j.ajnc..2014030501.13 ISSN: 2326-893X (Print); ISSN:

More information

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem.

In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem. 1. Introduction to Parallel Processing In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a problem. a) Types of machines and computation. A conventional

More information

Collective Communications

Collective Communications Collective Communications Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Intra and Inter Communicators

Intra and Inter Communicators Intra and Inter Communicators Groups A group is a set of processes The group have a size And each process have a rank Creating a group is a local operation Why we need groups To make a clear distinction

More information

Message Passing with MPI

Message Passing with MPI Message Passing with MPI PPCES 2016 Hristo Iliev IT Center / JARA-HPC IT Center der RWTH Aachen University Agenda Motivation Part 1 Concepts Point-to-point communication Non-blocking operations Part 2

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2018 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

CINES MPI. Johanne Charpentier & Gabriel Hautreux

CINES MPI. Johanne Charpentier & Gabriel Hautreux Training @ CINES MPI Johanne Charpentier & Gabriel Hautreux charpentier@cines.fr hautreux@cines.fr Clusters Architecture OpenMP MPI Hybrid MPI+OpenMP MPI Message Passing Interface 1. Introduction 2. MPI

More information

Performance properties The metrics tour

Performance properties The metrics tour Performance properties The metrics tour Markus Geimer & Brian Wylie Jülich Supercomputing Centre scalasca@fz-juelich.de January 2012 Scalasca analysis result Confused? Generic metrics Generic metrics Time

More information

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 5. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 5 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Performance properties The metrics tour

Performance properties The metrics tour Performance properties The metrics tour Markus Geimer & Brian Wylie Jülich Supercomputing Centre scalasca@fz-juelich.de August 2012 Scalasca analysis result Online description Analysis report explorer

More information

Performance properties The metrics tour

Performance properties The metrics tour Performance properties The metrics tour Markus Geimer & Brian Wylie Jülich Supercomputing Centre scalasca@fz-juelich.de Scalasca analysis result Online description Analysis report explorer GUI provides

More information

ECE-320 Lab 2: Root Locus For Controller Design

ECE-320 Lab 2: Root Locus For Controller Design ECE-320 Lab 2: Root Locu For Controller Deign In thi Lab you will exlore the ue of the root locu technique in deigning controller. The root locu indicate the oible location of the cloed loo ole of a ytem

More information

Soft Output Decoding Algorithm for Turbo Codes Implementation in Mobile Wi-Max Environment.

Soft Output Decoding Algorithm for Turbo Codes Implementation in Mobile Wi-Max Environment. Available online at www.ciencedirect.com Procedia Technology 6 (202 ) 666 673 2nd International Conference on Communication, Comuting & Security [ICCCS-202] Soft Outut Decoding Algorithm for Turbo Code

More information

Paul Burton April 2015 An Introduction to MPI Programming

Paul Burton April 2015 An Introduction to MPI Programming Paul Burton April 2015 Topics Introduction Initialising MPI & basic concepts Compiling and running a parallel program on the Cray Practical : Hello World MPI program Synchronisation Practical Data types

More information

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface )

CSE 613: Parallel Programming. Lecture 21 ( The Message Passing Interface ) CSE 613: Parallel Programming Lecture 21 ( The Message Passing Interface ) Jesmin Jahan Tithi Department of Computer Science SUNY Stony Brook Fall 2013 ( Slides from Rezaul A. Chowdhury ) Principles of

More information

High-Performance Computing: MPI (ctd)

High-Performance Computing: MPI (ctd) High-Performance Computing: MPI (ctd) Adrian F. Clark: alien@essex.ac.uk 2015 16 Adrian F. Clark: alien@essex.ac.uk High-Performance Computing: MPI (ctd) 2015 16 1 / 22 A reminder Last time, we started

More information

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing On the Efficacy of a Fued CPU+GPU Proceor (or APU) for Parallel Computing Mayank Daga, Ahwin M. Aji, and Wu-chun Feng Dept. of Computer Science Sampling of field that ue GPU Mac OS X Comology Molecular

More information

Standard MPI - Message Passing Interface

Standard MPI - Message Passing Interface c Ewa Szynkiewicz, 2007 1 Standard MPI - Message Passing Interface The message-passing paradigm is one of the oldest and most widely used approaches for programming parallel machines, especially those

More information

L15: Putting it together: N-body (Ch. 6)!

L15: Putting it together: N-body (Ch. 6)! Outline L15: Putting it together: N-body (Ch. 6)! October 30, 2012! Review MPI Communication - Blocking - Non-Blocking - One-Sided - Point-to-Point vs. Collective Chapter 6 shows two algorithms (N-body

More information

Matrix-vector Multiplication

Matrix-vector Multiplication Matrix-vector Multiplication Review matrix-vector multiplication Propose replication of vectors Develop three parallel programs, each based on a different data decomposition Outline Sequential algorithm

More information

CORRECTNESS ISSUES AND LOOP INVARIANTS

CORRECTNESS ISSUES AND LOOP INVARIANTS The next everal lecture 2 Study algorithm for earching and orting array. Invetigate their complexity how much time and pace they take Formalize the notion of average-cae and wort-cae complexity CORRECTNESS

More information

Routing Definition 4.1

Routing Definition 4.1 4 Routing So far, we have only looked at network without dealing with the iue of how to end information in them from one node to another The problem of ending information in a network i known a routing

More information

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola

The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola The MPI Message-passing Standard Practical use and implementation (V) SPD Course 6/03/2017 Massimo Coppola Intracommunicators COLLECTIVE COMMUNICATIONS SPD - MPI Standard Use and Implementation (5) 2 Collectives

More information

Bottom Up parsing. Bottom-up parsing. Steps in a shift-reduce parse. 1. s. 2. np. john. john. john. walks. walks.

Bottom Up parsing. Bottom-up parsing. Steps in a shift-reduce parse. 1. s. 2. np. john. john. john. walks. walks. Paring Technologie Outline Paring Technologie Outline Bottom Up paring Paring Technologie Paring Technologie Bottom-up paring Step in a hift-reduce pare top-down: try to grow a tree down from a category

More information

Collective Communication in MPI and Advanced Features

Collective Communication in MPI and Advanced Features Collective Communication in MPI and Advanced Features Pacheco s book. Chapter 3 T. Yang, CS240A. Part of slides from the text book, CS267 K. Yelick from UC Berkeley and B. Gropp, ANL Outline Collective

More information

Intermediate MPI features

Intermediate MPI features Intermediate MPI features Advanced message passing Collective communication Topologies Group communication Forms of message passing (1) Communication modes: Standard: system decides whether message is

More information

Collective Communication: Gatherv. MPI v Operations. root

Collective Communication: Gatherv. MPI v Operations. root Collective Communication: Gather MPI v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes

More information

Today's agenda. Parallel Programming for Multicore Machines Using OpenMP and MPI

Today's agenda. Parallel Programming for Multicore Machines Using OpenMP and MPI Today's agenda Homework discussion Bandwidth and latency in theory and in practice Paired and Nonblocking Pt2Pt Communications Other Point to Point routines Collective Communications: One-with-All Collective

More information

Basic Communication Operations (Chapter 4)

Basic Communication Operations (Chapter 4) Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:

More information

Distributed Memory Parallel Programming

Distributed Memory Parallel Programming COSC Big Data Analytics Parallel Programming using MPI Edgar Gabriel Spring 201 Distributed Memory Parallel Programming Vast majority of clusters are homogeneous Necessitated by the complexity of maintaining

More information

Review of MPI Part 2

Review of MPI Part 2 Review of MPI Part Russian-German School on High Performance Computer Systems, June, 7 th until July, 6 th 005, Novosibirsk 3. Day, 9 th of June, 005 HLRS, University of Stuttgart Slide Chap. 5 Virtual

More information

1 The secretary problem

1 The secretary problem Thi i new material: if you ee error, pleae email jtyu at tanford dot edu 1 The ecretary problem We will tart by analyzing the expected runtime of an algorithm, a you will be expected to do on your homework.

More information

Analyzing Hydra Historical Statistics Part 2

Analyzing Hydra Historical Statistics Part 2 Analyzing Hydra Hitorical Statitic Part Fabio Maimo Ottaviani EPV Technologie White paper 5 hnode HSM Hitorical Record The hnode i the hierarchical data torage management node and ha to perform all the

More information

Data parallelism. [ any app performing the *same* operation across a data stream ]

Data parallelism. [ any app performing the *same* operation across a data stream ] Data parallelism [ any app performing the *same* operation across a data stream ] Contrast stretching: Version Cores Time (secs) Speedup while (step < NumSteps &&!converged) { step++; diffs = 0; foreach

More information

Introduction to MPI Part II Collective Communications and communicators

Introduction to MPI Part II Collective Communications and communicators Introduction to MPI Part II Collective Communications and communicators Andrew Emerson, Fabio Affinito {a.emerson,f.affinito}@cineca.it SuperComputing Applications and Innovation Department Collective

More information

Message Passing Interface

Message Passing Interface Message Passing Interface DPHPC15 TA: Salvatore Di Girolamo DSM (Distributed Shared Memory) Message Passing MPI (Message Passing Interface) A message passing specification implemented

More information

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo

東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo Overview of MPI 東京大学情報基盤中心准教授片桐孝洋 Takahiro Katagiri, Associate Professor, Information Technology Center, The University of Tokyo 台大数学科学中心科学計算冬季学校 1 Agenda 1. Features of MPI 2. Basic MPI Functions 3. Reduction

More information

Interface Tracking in Eulerian and MMALE Calculations

Interface Tracking in Eulerian and MMALE Calculations Interface Tracking in Eulerian and MMALE Calculation Gabi Luttwak Rafael P.O.Box 2250, Haifa 31021,Irael Interface Tracking in Eulerian and MMALE Calculation 3D Volume of Fluid (VOF) baed recontruction

More information

AgentTeamwork Programming Manual

AgentTeamwork Programming Manual AgentTeamwork Programming Manual Munehiro Fukuda Miriam Wallace Computing and Software Systems, University of Washington, Bothell AgentTeamwork Programming Manual Table of Contents Table of Contents..2

More information

Parallel Computing. MPI Collective communication

Parallel Computing. MPI Collective communication Parallel Computing MPI Collective communication Thorsten Grahs, 18. May 2015 Table of contents Collective Communication Communicator Intercommunicator 18. May 2015 Thorsten Grahs Parallel Computing I SS

More information

Collective Communication: Gather. MPI - v Operations. Collective Communication: Gather. MPI_Gather. root WORKS A OK

Collective Communication: Gather. MPI - v Operations. Collective Communication: Gather. MPI_Gather. root WORKS A OK Collective Communication: Gather MPI - v Operations A Gather operation has data from all processes collected, or gathered, at a central process, referred to as the root Even the root process contributes

More information

Parallel Computing. Distributed memory model MPI. Leopold Grinberg T. J. Watson IBM Research Center, USA. Instructor: Leopold Grinberg

Parallel Computing. Distributed memory model MPI. Leopold Grinberg T. J. Watson IBM Research Center, USA. Instructor: Leopold Grinberg Parallel Computing Distributed memory model MPI Leopold Grinberg T. J. Watson IBM Research Center, USA Why do we need to compute in parallel large problem size - memory constraints computation on a single

More information

MPI - v Operations. Collective Communication: Gather

MPI - v Operations. Collective Communication: Gather MPI - v Operations Based on notes by Dr. David Cronk Innovative Computing Lab University of Tennessee Cluster Computing 1 Collective Communication: Gather A Gather operation has data from all processes

More information

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline Generic Travere CS 62, Lecture 9 Jared Saia Univerity of New Mexico Travere(){ put (nil,) in bag; while (the bag i not empty){ take ome edge (p,v) from the bag if (v i unmarked) mark v; parent(v) = p;

More information

Parallel Programming with MPI MARCH 14, 2018

Parallel Programming with MPI MARCH 14, 2018 Parallel Programming with MPI SARDAR USMAN & EMAD ALAMOUDI SUPERVISOR: PROF. RASHID MEHMOOD RMEHMOOD@KAU.EDU.SA MARCH 14, 2018 Sources The presentation is compiled using following sources. http://mpi-forum.org/docs/

More information

ECE 563 Second Exam, Spring 2014

ECE 563 Second Exam, Spring 2014 ECE 563 Second Exam, Spring 2014 Don t start working on this until I say so Your exam should have 8 pages total (including this cover sheet) and 11 questions. Each questions is worth 9 points. Please let

More information

Lecture Topic: Multi-Core Processors: MPI 1.0 Overview (Part-II)

Lecture Topic: Multi-Core Processors: MPI 1.0 Overview (Part-II) Multi-Core Processors : MPI 1.0 Overview Part-II 1 C-DAC Four Days Technology Workshop ON Hybrid Computing Coprocessors/Accelerators Power-Aware Computing Performance of Applications Kernels hypack-2013

More information

Message Passing Interface. most of the slides taken from Hanjun Kim

Message Passing Interface. most of the slides taken from Hanjun Kim Message Passing Interface most of the slides taken from Hanjun Kim Message Passing Pros Scalable, Flexible Cons Someone says it s more difficult than DSM MPI (Message Passing Interface) A standard message

More information

Overview of MPI 國家理論中心數學組 高效能計算 短期課程. 名古屋大学情報基盤中心教授片桐孝洋 Takahiro Katagiri, Professor, Information Technology Center, Nagoya University

Overview of MPI 國家理論中心數學組 高效能計算 短期課程. 名古屋大学情報基盤中心教授片桐孝洋 Takahiro Katagiri, Professor, Information Technology Center, Nagoya University Overview of MPI 名古屋大学情報基盤中心教授片桐孝洋 Takahiro Katagiri, Professor, Information Technology Center, Nagoya University 國家理論中心數學組 高效能計算 短期課程 1 Agenda 1. Features of MPI 2. Basic MPI Functions 3. Reduction Operations

More information

Parallel Programming. Using MPI (Message Passing Interface)

Parallel Programming. Using MPI (Message Passing Interface) Parallel Programming Using MPI (Message Passing Interface) Message Passing Model Simple implementation of the task/channel model Task Process Channel Message Suitable for a multicomputer Number of processes

More information

Collective Communications II

Collective Communications II Collective Communications II Ned Nedialkov McMaster University Canada SE/CS 4F03 January 2014 Outline Scatter Example: parallel A b Distributing a matrix Gather Serial A b Parallel A b Allocating memory

More information

HPC Parallel Programing Multi-node Computation with MPI - I

HPC Parallel Programing Multi-node Computation with MPI - I HPC Parallel Programing Multi-node Computation with MPI - I Parallelization and Optimization Group TATA Consultancy Services, Sahyadri Park Pune, India TCS all rights reserved April 29, 2013 Copyright

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplication Propose replication of vectors Develop three

More information

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI

CS 470 Spring Mike Lam, Professor. Distributed Programming & MPI CS 470 Spring 2019 Mike Lam, Professor Distributed Programming & MPI MPI paradigm Single program, multiple data (SPMD) One program, multiple processes (ranks) Processes communicate via messages An MPI

More information

Announcements. CSE332: Data Abstractions Lecture 19: Parallel Prefix and Sorting. The prefix-sum problem. Outline. Parallel prefix-sum

Announcements. CSE332: Data Abstractions Lecture 19: Parallel Prefix and Sorting. The prefix-sum problem. Outline. Parallel prefix-sum Announcement Homework 6 due Friday Feb 25 th at the BEGINNING o lecture CSE332: Data Abtraction Lecture 19: Parallel Preix and Sorting Project 3 the lat programming project! Verion 1 & 2 - Tue March 1,

More information

Efficient Parallel Hierarchical Clustering

Efficient Parallel Hierarchical Clustering Efficient Parallel Hierarchical Clustering Manoranjan Dash 1,SimonaPetrutiu, and Peter Scheuermann 1 Deartment of Information Systems, School of Comuter Engineering, Nanyang Technological University, Singaore

More information

AN ALGORITHM FOR RESTRICTED NORMAL FORM TO SOLVE DUAL TYPE NON-CANONICAL LINEAR FRACTIONAL PROGRAMMING PROBLEM

AN ALGORITHM FOR RESTRICTED NORMAL FORM TO SOLVE DUAL TYPE NON-CANONICAL LINEAR FRACTIONAL PROGRAMMING PROBLEM RAC Univerity Journal, Vol IV, No, 7, pp 87-9 AN ALGORITHM FOR RESTRICTED NORMAL FORM TO SOLVE DUAL TYPE NON-CANONICAL LINEAR FRACTIONAL PROGRAMMING PROLEM Mozzem Hoain Department of Mathematic Ghior Govt

More information

Parallel programming MPI

Parallel programming MPI Parallel programming MPI Distributed memory Each unit has its own memory space If a unit needs data in some other memory space, explicit communication (often through network) is required Point-to-point

More information

Compiler Construction

Compiler Construction Compiler Contruction Lecture 6 - An Introduction to Bottom- Up Paring 3 Robert M. Siegfried All right reerved Bottom-up Paring Bottom-up parer pare a program from the leave of a pare tree, collecting the

More information

L19: Putting it together: N-body (Ch. 6)!

L19: Putting it together: N-body (Ch. 6)! Administrative L19: Putting it together: N-body (Ch. 6)! November 22, 2011! Project sign off due today, about a third of you are done (will accept it tomorrow, otherwise 5% loss on project grade) Next

More information

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop

MPI Tutorial. Shao-Ching Huang. IDRE High Performance Computing Workshop MPI Tutorial Shao-Ching Huang IDRE High Performance Computing Workshop 2013-02-13 Distributed Memory Each CPU has its own (local) memory This needs to be fast for parallel scalability (e.g. Infiniband,

More information

Introduction to Parallel Programming & Cluster Computing MPI Collective Communications

Introduction to Parallel Programming & Cluster Computing MPI Collective Communications Introduction to Parallel Programming & Cluster Computing MPI Collective Communications Co-sponsored by SC11 Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor

More information

Shortest Paths Problem. CS 362, Lecture 20. Today s Outline. Negative Weights

Shortest Paths Problem. CS 362, Lecture 20. Today s Outline. Negative Weights Shortet Path Problem CS 6, Lecture Jared Saia Univerity of New Mexico Another intereting problem for graph i that of finding hortet path Aume we are given a weighted directed graph G = (V, E) with two

More information

Power Aware Location Aided Routing in Mobile Ad-hoc Networks

Power Aware Location Aided Routing in Mobile Ad-hoc Networks International Journal of Scientific and Reearch Publication, Volume, Iue 1, December 01 1 Power Aware Location Aided Routing in Mobile Ad-hoc Network Anamika Computer Science, Inderprataha Engineering

More information

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced

CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced 1 / 32 CEE 618 Scientific Parallel Computing (Lecture 5): Message-Passing Interface (MPI) advanced Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole

More information

Project C/MPI: Matrix-Vector Multiplication

Project C/MPI: Matrix-Vector Multiplication Master MICS: Parallel Computing Lecture Project C/MPI: Matrix-Vector Multiplication Sebastien Varrette Matrix-vector multiplication is embedded in many algorithms for solving

More information

Part - II. Message Passing Interface. Dheeraj Bhardwaj

Part - II. Message Passing Interface. Dheeraj Bhardwaj Part - II Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110016 India http://www.cse.iitd.ac.in/~dheerajb 1 Outlines Basics of MPI How to compile and

More information

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8 Chapter 8 Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplicaiton Propose replication of vectors Develop three parallel programs, each based on a different data decomposition

More information

mpidl The Power of MPI in IDL Version Tech-X Corporation 5621 Arapahoe Avenue, Suite A Boulder, CO

mpidl The Power of MPI in IDL Version Tech-X Corporation 5621 Arapahoe Avenue, Suite A Boulder, CO mpidl The Power of MPI in IDL Version 2.4.0 Tech-X Corporation 5621 Arapahoe Avenue, Suite A Boulder, CO 80303 http://www.txcorp.com info@txcorp.com mpidl User Guide CONTENTS Contents Table of Contents

More information

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group

MPI: Parallel Programming for Extreme Machines. Si Hammond, High Performance Systems Group MPI: Parallel Programming for Extreme Machines Si Hammond, High Performance Systems Group Quick Introduction Si Hammond, (sdh@dcs.warwick.ac.uk) WPRF/PhD Research student, High Performance Systems Group,

More information

A Generic Model for Diagram Syntax and Semantics

A Generic Model for Diagram Syntax and Semantics A Generic Model for Diagram Syntax and Semantic BRTHOLD HOMANN Univerität Bremen MARK MINAS Univerität rlangen In thi extended abtract, we recall how the yntax of diagram i catured by the diagram editor

More information

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS A SIMPLE IMPERATIVE LANGUAGE Eventually we will preent the emantic of a full-blown language, with declaration, type and looping. However, there are many complication, o we will build up lowly. Our firt

More information

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router Ditributed Packet Proceing Architecture with Reconfigurable Hardware Accelerator for 100Gbp Forwarding Performance on Virtualized Edge Router Satohi Nihiyama, Hitohi Kaneko, and Ichiro Kudo Abtract To

More information

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks Performance of a Robut Filter-baed Approach for Contour Detection in Wirele Senor Network Hadi Alati, William A. Armtrong, Jr., and Ai Naipuri Department of Electrical and Computer Engineering The Univerity

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each type of circuit will be implemented in two

More information

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X Lecture 37: Global Optimization [Adapted from note by R. Bodik and G. Necula] Topic Global optimization refer to program optimization that encompa multiple baic block in a function. (I have ued the term

More information

Today s Outline. CS 561, Lecture 23. Negative Weights. Shortest Paths Problem. The presence of a negative cycle might mean that there is

Today s Outline. CS 561, Lecture 23. Negative Weights. Shortest Paths Problem. The presence of a negative cycle might mean that there is Today Outline CS 56, Lecture Jared Saia Univerity of New Mexico The path that can be trodden i not the enduring and unchanging Path. The name that can be named i not the enduring and unchanging Name. -

More information

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC)

CSE. Parallel Algorithms on a cluster of PCs. Ian Bush. Daresbury Laboratory (With thanks to Lorna Smith and Mark Bull at EPCC) Parallel Algorithms on a cluster of PCs Ian Bush Daresbury Laboratory I.J.Bush@dl.ac.uk (With thanks to Lorna Smith and Mark Bull at EPCC) Overview This lecture will cover General Message passing concepts

More information

A Message Passing Standard for MPP and Workstations

A Message Passing Standard for MPP and Workstations A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker Message Passing Interface (MPI) Message passing library Can be

More information