Parallel Numerical Algorithms
|
|
- Blake George
- 6 years ago
- Views:
Transcription
1 Parallel Numerical Algorithms [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1
2 PNA16 Lecture Plan General Topics 1. Architecture and Performance 2. Dependency 3. Locality 4. Scheduling MIMD / Distributed Memory 5. MPI: Message Passing Interface 6. Collective Communication 7. Distributed Data Structure MIMD / Shared Memory 8. OpenMP 9. Cache Performance Special Lectures 5/30 How to use FX10 (Prof. Ohshima) 6/6 Dynamic Parallelism (Prof. Peri) SIMD / Shared Memory 10. GPU and CUDA 11. SIMD Performance Parallel Numerical Algorithms / IST / UTokyo 2
3 Memory models Distributed memory Network Proc Proc Proc Proc Memory Memory Memory Memory Shared memory Uniform Memory Access (UMA) Non Uniform Memory Access (NUMA) Proc Proc Proc Proc Proc Proc Proc Proc Memory Mem Mem Mem Mem Parallel Numerical Algorithms / IST / UTokyo 3
4 Parallel Computer Nowadays Node Network System Core Node Memory Core PU Register Shared memory, SIMD Distributed memory, MIMD Shared memory, MIMD Processor: any computing part (PU, Core, or Node) Computer: may be equivalent to system Socket: set of cores on the same die / module CPU: can be a socket or a core Sequential or Serial: Antonym of Parallel Parallel Numerical Algorithms / IST / UTokyo 4
5 OpenMP Frequently used API for shared memory parallel computing in high performance computing FX10 supports OpenMP version 3.0 Shared memory, global view Describe whole data structure and whole computations Is not an automatic parallelization! It parallelizes only where you explicitly parallelize Does not guarantee correctness! It runs just as your code (not as your intention) Parallel Numerical Algorithms / IST / UTokyo 5
6 OpenMP Summary Available on the OpenMP web site Parallel Numerical Algorithms / IST / UTokyo 6
7 A tiny code with OpenMP #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); #pragma omp parallel { printf("i am %d out of %d threads n", omp_get_thread_num(), omp_get_num_threads()); return 0; Parallel Numerical Algorithms / IST / UTokyo 7
8 A tiny code with OpenMP #include <stdio.h> #include <omp.h> Include this header file int main(void) { omp_set_num_threads(8); #pragma omp parallel { printf("i am %d out of %d threads n", omp_get_thread_num(), omp_get_num_threads()); return 0; Parallel Numerical Algorithms / IST / UTokyo 8
9 A tiny code with OpenMP #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); Number of threads is set #pragma omp parallel { printf("i am %d out of %d threads n", omp_get_thread_num(), omp_get_num_threads()); return 0; Parallel Numerical Algorithms / IST / UTokyo 9
10 A tiny code with OpenMP #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); #pragma omp parallel Run next code in parallel (duplicated) { printf("i am %d out of %d threads n", omp_get_thread_num(), omp_get_num_threads()); return 0; Parallel Numerical Algorithms / IST / UTokyo 10
11 A tiny code with OpenMP #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); #pragma omp parallel { printf("i am %d out of %d threads n", omp_get_thread_num(), My thread ID omp_get_num_threads()); return 0; Parallel Numerical Algorithms / IST / UTokyo 11
12 A tiny code with OpenMP #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); #pragma omp parallel { printf("i am %d out of %d threads n", omp_get_thread_num(), omp_get_num_threads()); Number of threads (must be 8) return 0; Parallel Numerical Algorithms / IST / UTokyo 12
13 A tiny code with OpenMP #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); #pragma omp parallel { printf("i am %d out of %d threads n", omp_get_thread_num(), omp_get_num_threads()); I am 1 out of 8 threads I am 7 out of 8 threads I am 0 out of 8 threads I am 2 out of 8 threads I am 3 out of 8 threads I am 4 out of 8 threads I am 5 out of 8 threads I am 6 out of 8 threads return 0; Parallel Numerical Algorithms / IST / UTokyo 13
14 Another tiny code #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; #pragma omp parallel for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); return 0; Parallel Numerical Algorithms / IST / UTokyo 14
15 Another tiny code #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; #pragma omp parallel for parallel for-loop for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); return 0; Parallel Numerical Algorithms / IST / UTokyo 15
16 Another tiny code #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; #pragma omp parallel for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); I am 2 executed by 1 I am 3 executed by 1 I am 0 executed by 0 I am 1 executed by 0 I am 4 executed by 2 I am 5 executed by 2 I am 8 executed by 4 I am 9 executed by 4 I am 6 executed by 3 I am 7 executed by 3 return 0; Parallel Numerical Algorithms / IST / UTokyo 16
17 Disclosing the trick #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; #pragma omp parallel { printf("i am thread %d n", omp_get_thread_num()); #pragma omp for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); Do the following in parallel (duplicated) Assign one thread per iteration return 0; Parallel Numerical Algorithms / IST / UTokyo 17
18 Disclosing the trick #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; #pragma omp parallel { printf("i am thread %d n", omp_get_thread_num()); #pragma omp for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); I am thread 0 I am 0 executed by 0 I am 1 executed by 0 I am thread 1 I am 2 executed by 1 I am 3 executed by 1 I am thread 2 I am 4 executed by 2 I am 5 executed by 2 I am thread 5 I am thread 6 I am thread 7 I am thread 4 I am 8 executed by 4 I am 9 executed by 4 I am thread 3 I am 6 executed by 3 I am 7 executed by 3 return 0; Parallel Numerical Algorithms / IST / UTokyo 18
19 Disclosing the trick #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; #pragma omp parallel { printf("i am thread %d n", omp_get_thread_num()); #pragma omp for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); all threads do this one thread per iteration I am thread 0 I am 0 executed by 0 I am 1 executed by 0 I am thread 1 I am 2 executed by 1 I am 3 executed by 1 I am thread 2 I am 4 executed by 2 I am 5 executed by 2 I am thread 5 I am thread 6 I am thread 7 I am thread 4 I am 8 executed by 4 I am 9 executed by 4 I am thread 3 I am 6 executed by 3 I am 7 executed by 3 return 0; Parallel Numerical Algorithms / IST / UTokyo 19
20 Another tiny code (again) #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; This is actually a combination #pragma omp parallel for of parallel and for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); return 0; Parallel Numerical Algorithms / IST / UTokyo 20
21 Start parallel computations #pragma omp parallel Execute the following computation in parallel Following computation can be a sentence or a block A team of threads is created A; #pragma omp parallel B; C; A B B B B C Parallel Numerical Algorithms / IST / UTokyo 21
22 Setting number of threads There are three ways 1. Environment variable OMP_NUM_THREADS Weak 2. Function void omp_set_num_threads(int) 3. Clause #pragma omp parallel num_threads(8) Strong Parallel Numerical Algorithms / IST / UTokyo 22
23 Work-sharing #pragma omp for Assign one thread per iteration for the following for-loop For-loop must be something like for (i=0; i< n; i++) #pragma omp single Only one of the threads executes the following computation B B B B C D D D D Parallel Numerical Algorithms / IST / UTokyo 23
24 Some functions void omp_set_num_threads(int); Set the number of threads (for next parallel exec) int omp_get_num_threads(void); Returns the number of threads (for this parallel exec) int omp_get_thread_num(void); Returns my thread ID double omp_wtime(void); Returns wall-clock time (in second) Parallel Numerical Algorithms / IST / UTokyo 24
25 Synchronization #pragma omp barrier Wait until all the threads reach barrier Barrier Barrier Barrier Barrier Timing a code #pragma omp barrier t0 = omp_wtime(); do_computations(); #pragma omp barrier time = omp_wtime() t0; Parallel Numerical Algorithms / IST / UTokyo 25
26 BREAK Parallel Numerical Algorithms / IST / UTokyo 26
27 Three Pitfalls 1. Shared and Private Variables 2. Race Condition 3. Weak Consistency Parallel Numerical Algorithms / IST / UTokyo 27
28 Disclosing the trick (again) #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; #pragma omp parallel { printf("i am thread %d n", omp_get_thread_num()); #pragma omp for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); I am thread 0 I am 0 executed by 0 I am 1 executed by 0 I am thread 1 I am 2 executed by 1 I am 3 executed by 1 I am thread 2 I am 4 executed by 2 I am 5 executed by 2 I am thread 5 I am thread 6 I am thread 7 I am thread 4 I am 8 executed by 4 I am 9 executed by 4 I am thread 3 I am 6 executed by 3 I am 7 executed by 3 return 0; Parallel Numerical Algorithms / IST / UTokyo 28
29 What happens if? #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; #pragma omp parallel { printf("i am thread %d n", omp_get_thread_num()); All threads loop for 10 iterations? Completely Different! //#pragma omp for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); return 0; Parallel Numerical Algorithms / IST / UTokyo 29
30 What happens if? #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); int i; Shared variable #pragma omp parallel { printf("i am thread %d n", omp_get_thread_num()); //#pragma omp for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); i return 0; Parallel Numerical Algorithms / IST / UTokyo 30
31 Thread-private variable #include <stdio.h> #include <omp.h> int main(void) { omp_set_num_threads(8); #pragma omp parallel { int i; printf("i am thread %d n", omp_get_thread_num()); Private variable //#pragma omp for for (i = 0; i < 10; i++) printf("i am %d executed by %d n", i, omp_get_thread_num()); i i i i return 0; Parallel Numerical Algorithms / IST / UTokyo 31
32 Shared and private variables Shared variable The storage is accessible from all threads Must be careful to update Private variable Different storage is allocated for each thread Allocated when the thread starts, and destroyed when the thread stops Parallel Numerical Algorithms / IST / UTokyo 32
33 Shared or private Shared by default: Global variables Static variables Variables declared before omp parallel Private by default: Variables declared within omp parallel Loop induction variable of omp for int func(int k, int *m) { // but *m is n and thus shared int x; static int c = 0; int q = 1024; int main(void) { int n = 32; #pragma omp parallel { int z = func(q, &n); Parallel Numerical Algorithms / IST / UTokyo 33
34 Clauses for parallel construct #pragma omp parallel [clause[[,] clause] ] private(variable, ) Declares listed variables as private shared(variable, ) Declares listed variables as shared firstprivate(variable, ) Declares as private and initializes with the value just before omp parallel And more clauses Parallel Numerical Algorithms / IST / UTokyo 34
35 My recommendation Extract parallel part as a function Depend on the default setting of shared / private void do_comp(arg0, arg1) { #pragma omp parallel do_comp(arg0, arg1); Necessary and sufficient information is passed as arguments Reduced accidental side effects Assignments to the arguments does not effect on the caller s variables Side effects (update of shared variable) are possible only via pointers, global variables, etc. Parallel Numerical Algorithms / IST / UTokyo 35
36 Three Pitfalls 1. Shared and Private Variables 2. Race Condition 3. Weak Consistency Parallel Numerical Algorithms / IST / UTokyo 36
37 Race condition Count up solutions for each type int counter; #pragma omp parallel { if (found) { type = get_type(); counter ++; counter Race Condition Multiple threads access to the same variable concurrently Parallel Numerical Algorithms / IST / UTokyo 37
38 Reduction clause reduction(operation: variable) Produces a code to reduction operation int counter; #pragma omp parallel reduction(+: counter) { if (found) { type = get_type(); counter ++; Applicable only for scalar variables, no vector reduction Parallel Numerical Algorithms / IST / UTokyo 38
39 Vector updates Count up solutions for each type int counter[n]; #pragma omp parallel { if (found) { type = get_type(); counter[type] ++; counter Parallel Numerical Algorithms / IST / UTokyo 39
40 Atomic Operation #pragma omp atomic Execute it as an inseparable single operation Allowed operations: x binop= expr; x ++; ++x; x--; --x; int counter[n]; #pragma omp parallel { if (found) { type = get_type(); #pragma omp atomic counter[type] ++; Parallel Numerical Algorithms / IST / UTokyo 40
41 Three Pitfalls 1. Shared and Private Variables 2. Race Condition 3. Weak Consistency Parallel Numerical Algorithms / IST / UTokyo 41
42 Producer-Consumer Signal This is not provided by OpenMP Don t do the following! int data, flag = 0; #pragma omp parallel num_threads(2) if (producer) { Producer Data Consumer data = generate_data(); flag = 1; else { // consumer while (flag == 0); // wait until flag is set consume_data(data); Parallel Numerical Algorithms / IST / UTokyo 42
43 Freedom of execution order Compiler can reorder operations, as long as it does not change the meaning in sequential execution Compiler can keep the data on registers, not writing it on the main memory, as long time as it wants Hardware can reorder operations, as long as it does not change the meaning in sequential execution Hardware can keep the data on cache, not writing it on the main memory, as long time as it wants In short, the program doesn t run as it is written! Parallel Numerical Algorithms / IST / UTokyo 43
44 Weak consistency Consistency A set of restrictions on the execution of concurrent programs so that the concurrent execution is similar to the sequential ones But every trial resulted in severe degradation of performance But we need some control over execution order Weak consistency The order of operations are guaranteed only at special commands Parallel Numerical Algorithms / IST / UTokyo 44
45 Memory synchronization #pragma omp flush Every memory read and write operations before flush are made complete No memory read or write operation after flush is not started yet Rarely used by itself Automatically inserted At barrier, atomic and lock operations At entry to and exit from: parallel, critical and ordered Parallel Numerical Algorithms / IST / UTokyo 45
46 The solution int data; Barrier #pragma omp parallel if (producer) { produce_data data = produce_data(); #pragma omp barrier else { // consumer #pragma omp barrier consume_data(data); consume_data Flush is not enough Flush of the producer must be earlier than flush of the consumer Parallel Numerical Algorithms / IST / UTokyo 46
47 Barrier should be inserted Before writing data Wait for the threads that need to read the old data actually read the old data After writing data To make the threads that will read the new data wait for writing the new data Before reading data Wait for the thread that produces the new data actually produce the new data After reading data To keep the other threads from updating the data Parallel Numerical Algorithms / IST / UTokyo 47
48 Self check questions Explain private and shared variables Which variables are private/shared by default? What is Suda s recommended style? What is race condition? Show a few methods to solve race conditions What is weak consistency? What flush does? Implicit flushes inserted where, and where not? Explain why barrier is needed {before and after {reading and writing shared data Parallel Numerical Algorithms / IST / UTokyo 48
49 PNA16 Lecture Plan General Topics 1. Architecture and Performance 2. Dependency 3. Locality 4. Scheduling MIMD / Distributed Memory 5. MPI: Message Passing Interface 6. Collective Communication 7. Distributed Data Structure MIMD / Shared Memory 8. OpenMP 9. Cache Performance Special Lectures 5/30 How to use FX10 (Prof. Ohshima) 6/6 Dynamic Parallelism (Prof. Peri) SIMD / Shared Memory 10. GPU and CUDA 11. SIMD Performance Parallel Numerical Algorithms / IST / UTokyo 49
Parallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture
More informationOpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono
OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalabissu-tokyoacjp/~reiji/pna16/ [ 5 ] MPI: Message Passing Interface Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1 Architecture
More informationDistributed Systems + Middleware Concurrent Programming with OpenMP
Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationAdvanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele
Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb
More informationOpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationMango DSP Top manufacturer of multiprocessing video & imaging solutions.
1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.
More informationOpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2018 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2017 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationParallel Programming using OpenMP
1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading
More informationParallel Programming using OpenMP
1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard
More informationOpenMP Fundamentals Fork-join model and data environment
www.bsc.es OpenMP Fundamentals Fork-join model and data environment Xavier Teruel and Xavier Martorell Agenda: OpenMP Fundamentals OpenMP brief introduction The fork-join model Data environment OpenMP
More information1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008
1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction
More informationProgramming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel
More informationDPHPC: Introduction to OpenMP Recitation session
SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set
More informationDPHPC: Introduction to OpenMP Recitation session
SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set of compiler directives
More informationConcurrent Programming with OpenMP
Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 7, 2016 CPD (DEI / IST) Parallel and Distributed
More informationShared Memory Programming Model
Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache
More informationParallel and Distributed Programming. OpenMP
Parallel and Distributed Programming OpenMP OpenMP Portability of software SPMD model Detailed versions (bindings) for different programming languages Components: directives for compiler library functions
More informationOpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer
OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationParallel Programming
Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems
More informationProgramming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading
More informationShared Memory Programming Models I
Shared Memory Programming Models I Peter Bastian / Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-69120 Heidelberg phone: 06221/54-8264
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationChip Multiprocessors COMP Lecture 9 - OpenMP & MPI
Chip Multiprocessors COMP35112 Lecture 9 - OpenMP & MPI Graham Riley 14 February 2018 1 Today s Lecture Dividing work to be done in parallel between threads in Java (as you are doing in the labs) is rather
More informationCS691/SC791: Parallel & Distributed Computing
CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna14/ [ 10 ] GPU and CUDA Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationScientific Computing
Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 20 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling
More informationIntroduc)on to OpenMP
Introduc)on to OpenMP Chapter 5.1-5. Bryan Mills, PhD Spring 2017 OpenMP An API for shared-memory parallel programming. MP = multiprocessing Designed for systems in which each thread or process can potentially
More informationOpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013
OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Review
More informationOpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,
More informationShared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics
Shared Memory Programming with OpenMP Lecture 8: Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the
More informationOpenMP Library Functions and Environmental Variables. Most of the library functions are used for querying or managing the threading environment
OpenMP Library Functions and Environmental Variables Most of the library functions are used for querying or managing the threading environment The environment variables are used for setting runtime parameters
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples
More informationCS 5220: Shared memory programming. David Bindel
CS 5220: Shared memory programming David Bindel 2017-09-26 1 Message passing pain Common message passing pattern Logical global structure Local representation per processor Local data may have redundancy
More informationShared Memory Parallelism - OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial
More informationParallel programming using OpenMP
Parallel programming using OpenMP Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department
More informationOpenMP threading: parallel regions. Paolo Burgio
OpenMP threading: parallel regions Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks,
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More informationOpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++
OpenMP OpenMP Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum 1997-2002 API for Fortran and C/C++ directives runtime routines environment variables www.openmp.org 1
More informationOpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing
CS 590: High Performance Computing OpenMP Introduction Fengguang Song Department of Computer Science IUPUI OpenMP A standard for shared-memory parallel programming. MP = multiprocessing Designed for systems
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced
More informationby system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call
OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by
More informationOpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system
OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives
More informationReview. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.
Review Lecture 14 Structured block Parallel construct clauses Working-Sharing contructs for, single, section for construct with different scheduling strategies 1 2 Tasking Work Tasking New feature in OpenMP
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 2 OpenMP Shared address space programming High-level
More informationIntroduction to. Slides prepared by : Farzana Rahman 1
Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support
More informationParallel Processing Top manufacturer of multiprocessing video & imaging solutions.
1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging
More informationData Environment: Default storage attributes
COSC 6374 Parallel Computation Introduction to OpenMP(II) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Data Environment: Default storage attributes
More informationIntroduction to Standard OpenMP 3.1
Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction
More informationChap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1
Chap. 6 Part 3 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 OpenMP popular for decade Compiler-based technique Start with plain old C, C++, or Fortran Insert #pragmas into source file You
More informationParallel Programming
Parallel Programming Lecture delivered by: Venkatanatha Sarma Y Assistant Professor MSRSAS-Bangalore 1 Session Objectives To understand the parallelization in terms of computational solutions. To understand
More informationGLOSSARY. OpenMP. OpenMP brings the power of multiprocessing to your C, C++, and. Fortran programs. BY WOLFGANG DAUTERMANN
OpenMP OpenMP brings the power of multiprocessing to your C, C++, and Fortran programs. BY WOLFGANG DAUTERMANN f you bought a new computer recently, or if you are wading through advertising material because
More informationA Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh
A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationParallel Computing. Lecture 13: OpenMP - I
CSCI-UA.0480-003 Parallel Computing Lecture 13: OpenMP - I Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Small and Easy Motivation #include #include int main() {
More informationOpenMP Programming. Aiichiro Nakano
OpenMP Programming Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science
More informationParallel Programming with OpenMP
Advanced Practical Programming for Scientists Parallel Programming with OpenMP Robert Gottwald, Thorsten Koch Zuse Institute Berlin June 9 th, 2017 Sequential program From programmers perspective: Statements
More informationOpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.
OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,
More informationUvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP
Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationOpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder
OpenMP programming Thomas Hauser Director Research Computing thomas.hauser@colorado.edu CU meetup 1 Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely
More informationTieing the Threads Together
Tieing the Threads Together 1 Review Sequential software is slow software SIMD and MIMD are paths to higher performance MIMD thru: multithreading processor cores (increases utilization), Multicore processors
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationIntroduction to OpenMP.
Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i
More informationParallel Programming: OpenMP
Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An
More informationRaspberry Pi Basics. CSInParallel Project
Raspberry Pi Basics CSInParallel Project Sep 11, 2016 CONTENTS 1 Getting started with the Raspberry Pi 1 2 A simple parallel program 3 3 Running Loops in parallel 7 4 When loops have dependencies 11 5
More informationLecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1
Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method
More information15-418, Spring 2008 OpenMP: A Short Introduction
15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationShared memory parallel computing
Shared memory parallel computing OpenMP Sean Stijven Przemyslaw Klosiewicz Shared-mem. programming API for SMP machines Introduced in 1997 by the OpenMP Architecture Review Board! More high-level than
More informationOpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationSession 4: Parallel Programming with OpenMP
Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00
More informationECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications
ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Work Sharing in OpenMP November 2, 2015 Lecture 21 Dan Negrut, 2015 ECE/ME/EMA/CS 759 UW-Madison Quote of the Day Success consists
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationOpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - III Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationParallelization, OpenMP
~ Parallelization, OpenMP Scientific Computing Winter 2016/2017 Lecture 26 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de made wit pandoc 1 / 18 Why parallelization? Computers became faster and faster
More informationOPENMP OPEN MULTI-PROCESSING
OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with
More informationData Handling in OpenMP
Data Handling in OpenMP Manipulate data by threads By private: a thread initializes and uses a variable alone Keep local copies, such as loop indices By firstprivate: a thread repeatedly reads a variable
More informationAllows program to be incrementally parallelized
Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP
More informationProgramming Shared Memory Systems with OpenMP Part I. Book
Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine
More informationOpenMP C and C++ Application Program Interface Version 1.0 October Document Number
OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................
More informationReview. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause
Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with
More informationAlfio Lazzaro: Introduction to OpenMP
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:
More information