Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Similar documents
An Introduction to OpenMP

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

15-418, Spring 2008 OpenMP: A Short Introduction

A brief introduction to OpenMP

[Potentially] Your first parallel application

OpenMP - Introduction

Parallel and High Performance Computing CSE 745

ECE 574 Cluster Computing Lecture 10

Shared Memory Parallelism - OpenMP

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Shared Memory programming paradigm: openmp

6.1 Multiprocessor Computing Environment

Shared Memory Parallelism using OpenMP

Implementation of Parallelization

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Lecture 4: OpenMP Open Multi-Processing

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Parallel Architectures

Introduction to Parallel Computing

MPI & OpenMP Mixed Hybrid Programming

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

CS691/SC791: Parallel & Distributed Computing

EE/CSCI 451: Parallel and Distributed Computation

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Enhancements in OpenMP 2.0

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Scientific Programming in C XIV. Parallel programming

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

OpenMP threading: parallel regions. Paolo Burgio

Introduction to Standard OpenMP 3.1

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Parallel Programming: OpenMP

Parallel Computing Introduction

Overview: The OpenMP Programming Model

Session 4: Parallel Programming with OpenMP

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

Computer Architecture

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Shared Memory Programming with OpenMP

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

OpenMP. A.Klypin. Shared memory and OpenMP. Simple Example. Threads. Dependencies. Directives. Handling Common blocks.

A Uniform Programming Model for Petascale Computing

Introduction to OpenMP

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

Distributed Systems + Middleware Concurrent Programming with OpenMP

Parallel Programming

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Introduction to OpenMP

Introduction to. Slides prepared by : Farzana Rahman 1

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP: Open Multiprocessing

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Parallel Numerical Algorithms

Shared Memory Programming with OpenMP (3)

Parallel Computing. Lecture 16: OpenMP - IV

Introduction to OpenMP

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

OpenMP Application Program Interface

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Parallel Computing Parallel Programming Languages Hwansoo Han

CME 213 S PRING Eric Darve

Concurrent Programming with OpenMP

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Introduction to OpenMP

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Multi-core Architecture and Programming

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Parallel Numerical Algorithms

Open Multi-Processing: Basic Course

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

Allows program to be incrementally parallelized

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Hybrid Model Parallel Programs

EPL372 Lab Exercise 5: Introduction to OpenMP

Introduction [1] 1. Directives [2] 7

Parallel Computing. Hwansoo Han (SKKU)

OpenMP: Open Multiprocessing

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Transcription:

Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008

Serial Computing Serially solving a problem

Parallel Computing Parallelly solving a problem

Parallel Computer Memory Architecture Shared Memory

Parallel Computer Memory Architecture Shared Memory Multiple processor can operate independently but share the same memory resources. Changes in a memory location effected by one processor are visible to all other processors. Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA.

Parallel Computer Memory Architecture Shared Memory - Advantages Global address space User-friendly programming perspective to memory Data sharing - fast and uniform Shared Memory Disadvantages Lack of scalability More CPUs increases traffic on the shared memory path Synchronization Programmer's responsibility

Parallel Computer Memory Architecture Distributed Memory

Parallel Computer Memory Architecture Distributed Memory Processors have their own local memory. Memory addresses in one processor do not map to another processor. No concept of global address space or cache coherency. Each processor can operate independently. Access to the data in another processor is done via communication.

Parallel Computer Memory Architecture Distributed Memory Advantages Memory scalable: number of processors is proportional to the size of memory. Rapid access to own memor: without interference and overhead to maintain cache coherency. Cost effectiveness: can use commodity, off-the-shelf processors and networking.

Parallel Computer Memory Architecture Distributed Memory Disadvantages Data communication among processors details are driven by programmers. Difficult to map existing data structures, based on global memory, to this memory organization. Non-uniform memory access (NUMA) times

Parallel Computer Memory Architecture Hybrid Distributed-Shared Memory

Parallel Programming Model Shared Memory OpenMP Threads OpenMP, POSIX Thread (kowns as Ptherad) Message Passing MPI Data Parallel F90, F95, HPF Hybrid OpenMP & MPI together

What is OpenMP? An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism Open specifications for Multi Processing Portable: Unix & Windows NT Standardized: Expected to become ANSI standard later???

Goals for OpenMP Standardization: Provide a standard among a variety of shared memory architecture/platforms. Lean & Mean : Establish a simple and limited set of directives for programming shared memory machines. Significant parallelism can be implemented by using just 3 or 4 directive. Ease of use: Provide capability to incrementally parallelize a serial program. Provide the capability to implement both coarse-grain and fine-grain parallelism. Portability: Supports Fortran (77, 90, and 95), C, and C++

OpenMp Directive PARALLEL construct Work-Sharing construct Combined Parallel Work-sharing construct Synchronization construct THREADPRIVATE Directive

OpenMP Directives PARALLEL Region construct A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct. #pragma omp parallel [clause...] newline if (scalar_expression) private (list) shared (list) default (shared none) firstprivate (list) reduction (operator: list) copyin (list) num_threads (integer-expression) structured_block

OpenMP Directives Work-Sharing Constructs Ddivides the execution of the enclosed code region among the members of the team that encounter it. Does not launch new threads No implied barrier upon entry to a work-sharing construct, however there is an implied barrier at the end of a work sharing construct.

OpenMP Directive Work-Sharing construct Data parallelism Functionality Serialize parallelism section of code

OpenMP Directives Synchronization construct MASTER only master execute CRITICAL only one thread at a time BARRIER wait for all threads to reach barrier and then execute all together. ATOMIC specific memory location must be updated atomically ORDERED iterations of loop will be executed in the same order as in the serial processor.

OpenMP Directives Synchronization construct MASTER only master execute CRITICAL only one thread at a time BARRIER wait for all threads to reach barrier and then execute all together. ATOMIC specific memory location must be updated atomically ORDERED iterations of loop will be executed in the same order as in the serial processor.

What is MPI? M P I = Message Passing Interface MPI is a specification, NOT a library - but rather the specification of what such a library should be. Practical Portable Efficient Flexible Defined for C/C++ and Fortran

Reasons for using MPI Standardization - Only message passing library which can be considered a standard. Portability - No need to modify your source code when you port your application to a different platform that supports (and is compliant with) the MPI standard Functionality - Over 115 routines are defined in MPI-1 alone. Availability - A variety of implementations are available, both vendor and public domain.

General MPI program structure

Virtual Topology Describes a mapping/ordering of MPI processes into a geometric "shape". 1 D Line Ring 2 D Mesh Torus 3 D 3 D Mesh Hypercube

MPI / OpenMP Pro - MPI Pro Open MP Portable to distributed and shared memory machines. Possible high performance Scales beyond one node No data placement problem Implicit synchronization Easy to implement parallelism Low latency, high bandwidth Implicit Communication Coarse and fine granularity Dynamic load balancing

MPI / OpenMP Cons - MPI Cons Open MP Difficult to develop and debug. Only on shared memory machines High latency, low bandwidth Explicit communication Large granularity Average performance Scale within one node Possible data placement problem Dynamic load balancing is difficult Explicit synchronization No specific thread order

Thank you.