OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Similar documents
Lecture 4: OpenMP Open Multi-Processing

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

ECE 574 Cluster Computing Lecture 10

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Open Multi-Processing: Basic Course

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Introduction to OpenMP

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Shared Memory programming paradigm: openmp

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

High Performance Computing: Tools and Applications

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

High Performance Computing: Tools and Applications

Computer Architecture

Parallel Programming

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

High Performance Computing

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Parallel Programming

CS 470 Spring Mike Lam, Professor. OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

Parallel Programming in C with MPI and OpenMP

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Our new HPC-Cluster An overview

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

CS691/SC791: Parallel & Distributed Computing

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Multithreading in C with OpenMP

Parallel Programming in C with MPI and OpenMP

JANUARY 2004 LINUX MAGAZINE Linux in Europe User Mode Linux PHP 5 Reflection Volume 6 / Issue 1 OPEN SOURCE. OPEN STANDARDS.

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Parallel Computing. Lecture 13: OpenMP - I

Workshop Agenda Feb 25 th 2015

Data Environment: Default storage attributes

Shared Memory Programming Model

OpenMP Programming. Aiichiro Nakano

Parallel Programming in C with MPI and OpenMP

Session 4: Parallel Programming with OpenMP

OpenMP: Open Multiprocessing

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche.

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

OpenMP: Open Multiprocessing

A brief introduction to OpenMP

Shared Memory Parallelism - OpenMP

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Parallel Programming using OpenMP

Parallel Programming using OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

OpenMP Fundamentals Fork-join model and data environment

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

OpenMP loops. Paolo Burgio.

<Insert Picture Here> OpenMP on Solaris

Scientific Computing

Programming Shared Memory Systems with OpenMP Part I. Book

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

In-Class Guerrilla Development of MPI Examples

GCC Developers Summit Ottawa, Canada, June 2006

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Masterpraktikum - High Performance Computing

15-418, Spring 2008 OpenMP: A Short Introduction

Introduction to OpenMP

Parallel and Distributed Programming. OpenMP

DPHPC: Introduction to OpenMP Recitation session

Parallel programming using OpenMP

Introduction to OpenMP

Parallel Programming: OpenMP

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

Scientific Programming in C XIV. Parallel programming

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Shared Memory Programming Paradigm!

Introduction to OpenMP.

Raspberry Pi Basics. CSInParallel Project

Introduction to OpenMP

Concurrent Programming with OpenMP

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

GLOSSARY. OpenMP. OpenMP brings the power of multiprocessing to your C, C++, and. Fortran programs. BY WOLFGANG DAUTERMANN

Parallele Numerik. Blatt 1

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Transcription:

OpenMPand the PGAS Model CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

LastTime: Message Passing Natural model for distributed-memory systems Remote ( far ) memory must be retrieved before use Programmer responsible for specifying: Participants(Single peer, collective communication, etc.) Data types (MPI_CHAR, MPI_DOUBLE, etc.) Logical synchronization How about shared-memory systems? All processors can directly access any memory CMSC714-2

OpenMP: OpenMulti-Processing Portable interface for multiple processor systems Agnostic to architecture or language Alternative to POSIX threads Thread management too low-level for HPC applications Task parallelism vs. data parallelism Not a language Extends existing language (C/C++/Fortran) Compiler responsible for low-level thread management Allows for incremental approach to parallelism CMSC714-3

OpenMPFork/Join Model Single master thread executes until parallel region When a parallel region is reached At start, N-1 additional threads are spawned (Fork) All N threads execute the same code Different code paths achieved through unique thread ID At end, threads are synchronized via barrier (Join) Based on image by A1 licensed under CC BY 3.0 Totally CMSC714-4

OpenMPSyntax Compiler directives Express parallel structure of code Clauses modify directives #pragma omp <directive> <clauses> { // Code region affected by directive. } Run-time library routines #include <omp.h> int id = omp_get_thread_num(); CMSC714-5

OpenMPparallelDirective Begin a new parallel region #include <stdio.h> #include <omp.h> int main(int argc, char* argv[]) { #pragma omp parallel { printf( Thread %d of %d.\n, omp_get_thread_num(), omp_get_num_threads()); } return 0; } $ gcc -fopenmp a.c $./a.out Thread 0 of 4 Thread 3 of 4 Thread 2 of 4 Thread 1 of 4 CMSC714-6

OpenMPData Environment All variables have a data-sharing attribute Shared: all threads use the same copy Private: all threads use local storage for this variable Firstprivate: Like private, but value is initialized on fork Lastprivate: Like private, but value updated on join In general: Variables defined inside a parallel region are private Variables defined outside a parallel region are shared Not always true: more details available in the spec Data-sharing attributes can be overridden CMSC714-7

OpenMPWorksharingConstructs Used to parallelize loops HPC programs contain computationally intensive loops #pragma omp loop for (i = 0; i < MAX; ++i) { } Restrictions Loop iterations must be independent Strict rules for loop initialization, test, and increment Premature termination of loops not supported CMSC714-8

OpenMPExample: Finding Primes #include <stdio.h> #include <stdlib.h> int main(int argc, char* argv[]) { int cap = atoi(argv[1]); int count = 0; int num = 0; Parallelize the following loop. Keep a private copy of numfor each thread, and copy the maximum value back when the loop is finished. #pragma omp parallel loop reduction(max:num) schedule(dynamic) for (int i = 0; i < cap; ++i) { int prime = 1; for (int j = 0; prime && j * j <= i; ++j) prime = i % j; Use a work queue to load-balance threads. } if (prime) { #pragma omp atomic ++count; num = i; } } printf( Prime #%d is %d\n, count, num); return 0; Synchronize the next line; count is a shared variable. CMSC714-9

OpenMPSynchronization #pragma omp barrier Threads pause here until all reach this point #pragma omp critical Threads will execute one at a time #pragma omp single Only one thread will ever execute Other threads wait in an implied barrier at end of region #pragma omp master Only the master thread will execute Other threads skip the region without waiting CMSC714-10

OpenMPSync Puzzles Puzzle #1 int x = 1; #pragma omp parallel num_threads(2) { #pragma omp single { ++x; } printf( %d\n, x); } 2 2 CMSC714-11

OpenMPSync Puzzles Puzzle #2 int x = 1; #pragma omp parallel num_threads(2) { #pragma omp master { ++x; } printf( %d\n, x); } 1 2 2 2 or CMSC714-12

OpenMPSync Puzzles Puzzle #3 int x = 1; #pragma omp parallel num_threads(2) { ++x; #pragma omp barrier printf( %d\n, x); } Shared variable access is not automatically synchronized. Behavior is undefined. CMSC714 - OpenMPand the PGAS Model -Ray 13

Limitations of OpenMP Not fool-proof Data dependencies, conflicts, race conditions Gives you plentyof rope to hang yourself with OpenMP and Amdahl s Law What is Dagumand Menon take on this issue? Computational Problem Size USA #1 supercomputer (Titan) has 32GB/node World #1 supercomputer (Tianhe-2) has 64GB/node What if our datasets are larger than 64GB? You re gonna need a bigger boat CMSC714-14

MPI + OpenMP Programming in MPI + OpenMPis hard Necessary for today s large distributed memory systems CMSC714-15

PGAS Programming Model Partitioned Global Address Space Presents an abstracted shared address space Simplifies the complexity of MPI + OpenMP Exposes data/task locality for performance Several languages use this model UPC, Fortress, HPF, X10, etc. Ever heard of these? CMSC714-16

Chapel Themes Designed from scratch Blank slate as opposed to a language extension Allows for clearer, more expressive language semantics Multi-resolution design Higher-level abstractions built from lower-level ones Allows users to mix and match as needed Global-view programming Data can be declared using global problem size Data can be accessed using global indices CMSC714-17

Slide used for educational purposes under US Copyright Law Title 17 U.S.C. section 107. 18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43