Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Similar documents
15-418, Spring 2008 OpenMP: A Short Introduction

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

An Introduction to OpenMP

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

OpenMP - Introduction

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

Shared Memory Parallelism using OpenMP

Shared Memory Parallelism - OpenMP

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Parallel Programming: OpenMP

Overview: The OpenMP Programming Model

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

ECE 574 Cluster Computing Lecture 10

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Lecture 4: OpenMP Open Multi-Processing

Distributed Systems + Middleware Concurrent Programming with OpenMP

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Parallel Programming

EPL372 Lab Exercise 5: Introduction to OpenMP

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Introduction to OpenMP

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

Multithreading in C with OpenMP

Shared Memory programming paradigm: openmp

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Introduction to OpenMP

Introduction to OpenMP

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

Introduction to Standard OpenMP 3.1

[Potentially] Your first parallel application

Introduction to. Slides prepared by : Farzana Rahman 1

EE/CSCI 451: Parallel and Distributed Computation

Shared Memory Programming with OpenMP

COMP Parallel Computing. SMM (2) OpenMP Programming Model

Introduction [1] 1. Directives [2] 7

Introduction to OpenMP

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Programming Shared Memory Systems with OpenMP Part I. Book

Introduction to OpenMP.

Concurrent Programming with OpenMP

Programming Shared-memory Platforms with OpenMP. Xu Liu

Introduction to OpenMP

Multi-core Architecture and Programming

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Introduction to OpenMP

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

Computational Mathematics

A brief introduction to OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming using OpenMP

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming using OpenMP

Open Multi-Processing: Basic Course

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Alfio Lazzaro: Introduction to OpenMP

Computer Architecture

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

Masterpraktikum - High Performance Computing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

CS691/SC791: Parallel & Distributed Computing

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

OpenMP 4.5: Threading, vectorization & offloading

Shared Memory Programming Paradigm!

Shared Memory Programming Model

Lab: Scientific Computing Tsunami-Simulation

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

POSIX Threads and OpenMP tasks

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

OpenMP Fundamentals Fork-join model and data environment

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Shared Memory Programming with OpenMP. Lecture 3: Parallel Regions

Introduction to OpenMP

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP on Ranger and Stampede (with Labs)

OpenMP Application Program Interface

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

OPENMP OPEN MULTI-PROCESSING

Practical in Numerical Astronomy, SS 2012 LECTURE 12

CME 213 S PRING Eric Darve

Introduction to OpenMP. Lecture 4: Work sharing directives

Introduction to OpenMP

Parallel Programming

Synchronisation in Java - Java Monitor

High performance computations with multithreading. Gene Soudlenkov

Transcription:

The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives OpenMP Format A Multi-threaded Hello World Program OpenMP:Terminology and Behavior The omp for Directive The sections Directive file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_1.htm[6/14/2012 12:02:54 PM]

Parallelization Open Multi-Processing What is Parallelization? Simultaneous use of more than one processor to complete some work is parallelization of the work. This work can be: A collection of program statements An algorithm A part of program The problem you are trying to solve Parallel Overhead Overhead is introduced during parallelization due to : Creation of threads(fork()) Joining of threads (join()) Thread synchronization and communication e.g. critical sections False sharing Overhead is introduced during parallelization due to : Creation of threads(fork()) Joining of threads (join()) Thread synchronization and communication e.g. critical sections False sharing Overhead increases with number of threads Efficient parallelization is minimizing this overheads file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_2.htm[6/14/2012 12:02:54 PM]

Load Balancing Perfectly Load-Balanced Program In a perfectly balanced parallel program no set of processors is idle while other set of processors is doing some computation Amdahl s Law Assume that code has serial fraction Let T(1) be the execution time in one processor T(P) is execution time on P processors Then speed up on P processors is given by About Data In a shared memory parallel program variables are either shared or private private Variables Visible to one thread only Changes made to these variable are not visible to other threads Example : Local variables in a function that is executed in parallel shared Variables Visible to all threads Changes made to these variable by one thread are visible to other threads Example : Global data file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_3.htm[6/14/2012 12:02:54 PM]

What is Data Race? When two or more different threads in a multithreaded shared memory model access the same memory location the program may produce unexpected results Data race occurs under following conditions: There are two or more different threads accessing the same memory location concurrently They don t host any locks At least one access is write A for Loop for Loop for(int i = 0 ; i < 8 ; i++) a[i] = a[i] + b[i]; Execution in Parallel With 2 Threads Thread 1 a[0] = a[0] + b[0] a[1] = a[1] + b[1] a[2] = a[2] + b[2] a[3] = a[3] + b[3] Thread 2 a[4] = a[4] + b[4] a[5] = a[5] + b[5] a[6] = a[6] + b[6] a[7] = a[7] + b[7] Overview to OpenMP What is OpenMP? An API that may be used to explicitly direct multi-threaded, shared memory parallelism. When to Use OpenMP For Parallelism? A loop is not parallelized The data dependence analysis is not able to determine whether it is safe to parallelize or not The Granularity is not enough The compiler lacks information to parallelize at highest possible level file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_4.htm[6/14/2012 12:02:54 PM]

Why OpenMP? OpenMP is : Portable The API is specified for C/C++ and FORTRAN Supported in Most major platforms e.g. Unix and Windows Standardized Lean and Mean Easy in use We should parallelize only when the overhead due to parallelization is less than the speed-up obtained. Components of OpenMP OpenMP consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior Directives Environment Variables Runtime Environment Number of Threads Parallel Regions Number of Thread ID Work Sharing Threads Dynamic Thread Synchronization Scheduling Type Adjustment Data Sharing Dynamic Thread Nested Parallelism Attributes Adjustment Timers Orphaning Nested Parallelism API for Locking file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_5.htm[6/14/2012 12:02:55 PM]

OpenMP Programming Model Shared Memory, Thread Based Parallelism Explicit Parallelism Fork Join Model Compiler Directive Based Nested Parallelism Support Dynamic Threads OpenMP Directives Fortran Directive Format Format Sentinel directive [clause] Sentinel $OMP or C$OMP or $OMP Example $OMP PARALLEL DEFAULT(SHARED) PRIVATE(BETA,PI) General Rules Comments can not appear on the same line as a directive Several Fortran OpenMP directives come in pair and have the form shown below $OMP directive [Structured block of code] $OMP end directive OpenMP Format C/C++ Directive Format Format #pragma omp directive-name [clause] newline Example #pragma omp parallel default(shared) private(beta,pi) General Rules Case sensitive Each directive applies to at most one succeeding segment, which must be a structured block file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_6.htm[6/14/2012 12:02:55 PM]

OpenMP Directives Parallel Region Construct A parallel region is a block of code executed by multiple threads simultaneously #pragma omp parallel [clause[[,] clause]...] this is executed in parallel (implied barrier) Clauses Supported if (scalar expression) private (list) firstprivate (list) shared (list) default (shared none) reduction (operator: list) copyin (list) num threads (integer-expression) Example 1 A Multi-threaded Hello World Program Example Code #include omp.h void main() #pragma omp parallel int id = omp get thread num(); printf( hello(%d),id); file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_7.htm[6/14/2012 12:02:55 PM]

Example 1 A Multi-threaded Hello World Program Example Code #include omp.h void main() #pragma omp parallel int id = omp get thread num(); printf( hello(%d),id); Sample Output hello(0) hello(3) hello(1) hello(2) hello(1) hello(2) hello(0) hello(3) IF Clause If an if clause is present it must evaluate to.true. (Fortran) or non-zero (C/C++) in order to create a team of threads. Otherwise, the region is executed serially by master thread. Example 2 A Multi-threaded Hello World Program With Clauses #include omp.h void main( ) int x = 10; #pragma omp parallel if(x > 10) num threads(4) int id = omp get thread num(); printf( hello(%d),id); Num threads clause to request certain no of threads Omp get thread num() runtime function to return thread ID file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_8.htm[6/14/2012 12:02:55 PM]

How Does It Work? OpenMP:Terminology and Behavior When a thread reaches a parallel directive, it creates a team of threads and becomes the master of the team. The master thread always have ID 0 and it is the part of team There is an implied barrier at the end of parallel section. Thread adjustment (if enabled) is only done before entering a parallel region Parallel regions can be nested depends on implementation. An if clause can be used to guard the parallel region It is illegal to branch in or out of parallel region Only a single IF or NUM THREADS clause is permitted Work Sharing Constructs A work sharing construct divides the execution of enclosed code region among the members of team They don t launch new threads Must be enclosed in a parallel region No implied barrier on entry; implied barrier on exit(unless nowait is specified ) Must be encountered by all threads in team or none at all #pragma omp for $OMP DO $OMP END DO #pragma omp sections $OMP SECTIONS $OMP END SECTIONS #pragma omp single $OMP SINGLE $OMP END SINGLE The omp for Directive The iterations of loop are distributed over the members of the team. This assumes a parallel region has already been initiated, otherwise it executes in serial on a single processor. Format #pragma omp for [clause [ [, ] clause ]...] for loop There is and implied barrier at exit unless nowait clause is specified file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_9.htm[6/14/2012 12:02:55 PM]

Clauses Supported The omp for Directive Schedule(type[,chunk ] ) Private(list) Lastprivate(list) Collapse\ Ordered Firstprivate(list) Shared(list) Reduction(operator:list) Nowait Example 1 A Parallel For Loop Example #pragma omp parallel #pragma omp for for(int i = 0; i < N; i++) do some work( i ); The variable i is made private to each thread by default you could do it explicitly by private(i) clause. The sections Directive It specifies that the enclosed section(s) of codes are to be divided among the threads in the team #pragma omp sections [ clause(s) ] #pragma omp section < codeblock1 > #pragma omp section < codeblock2 > #pragma omp section : Independent section directives are nested within a sections directive.each section is executed once by a thread in the team. file:///d /...udhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture19/19_10.htm[6/14/2012 12:02:55 PM]