Scientific Programming in C XIV. Parallel programming

Size: px
Start display at page:

Download "Scientific Programming in C XIV. Parallel programming"

Transcription

1 Scientific Programming in C XIV. Parallel programming Susi Lehtola 11 December 2012

2 Introduction The development of microchips will soon reach the fundamental physical limits of operation quantum coherence thermal loads Clock rates (MHz GHz) will cease to increase Instead, new chips have an ever increasing amount of processor units Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 2/27

3 Parallel computing Using many processors requires specific techniques need to choose parallel algorithms High-performance parallel computing has been an everyday thing in physics for decades and will be ever more important in the future Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 3/27

4 Approaches There are two fundamentally different approaches to parallel computing shared memory parallellization distributed memory parallellization Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 4/27

5 Shared memory CPU1 CPU2 CPU3 CPU4 Memory As the name states, the memory is shared between the processors Don t need to worry about communication Only problem is to make sure that processes do not overlap... and that a parallellizable algorithm is used can be as easy as inserting a couple simple lines into the source code OpenMP Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 5/27

6 Distributed memory Memory CPU1 Memory CPU3 Network CPU4 Memory CPU2 Memory Every processor has its own memory No implicit access to other processors memory Need to handle communication between processes manually Message Passing Interface (MPI) standard Can also run on shared-memory systems Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 6/27

7 Hybrid parallellization When going to very large calculations, the overhead from MPI communication may become significant can be reduced by using shared memory parallellization within the cluster nodes better memory use due to less data duplication CPU1 CPU2 CPU3 CPU4 Memory CPU9 CPU10 CPU11 CPU12 Memory Network CPU13 CPU14 CPU15 CPU16 Memory CPU5 CPU6 CPU7 CPU8 Memory Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 7/27

8 Parallel programming As in optimization, it only makes sense to parallellize the CPU-intensive part Distributed memory parallellization can be also performed to get a larger amount of memory for the program Once you have a parallel algorithm, the implementation is simple If time is mostly spent on library routines, you can achieve speedups by using a parallellized version of the library shared-memory parallellized BLAS & LAPACK in, e.g., OpenBLAS and Intel MKL Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 8/27

9 Parallel programming, cont d Distributed memory programming using MPI is discussed on the course Tools for high-performance computing (Suurteholaskennan työkalut) lectured every other year next time in fall 2013 In the following I will give a short introduction to shared-memory parallellization using OpenMP Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 9/27

10 OpenMP The OpenMP API supports multi-platform shared-memory parallel programming in C/C++ and Fortran similar interface in C and Fortran defines a portable, scalable model with a simple and flexible interface for developing parallel applications on platforms from the desktop to the supercomputer Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 10/27

11 OpenMP, cont d Parallellizing with OpenMP can be as easy as changing f o r ( i =0; i <N; i ++) y [ i ]= f ( x [ i ] ) ; to #pragma omp p a r a l l e l f o r f o r ( i =0; i <N; i ++) y [ i ]= f ( x [ i ] ) ; and compiling the program with the -fopenmp flag (GCC). Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 11/27

12 Conditional compilation Since parallellization using OpenMP generally only induces very small changes to the program, it s wise to use conditional compilation when using OpenMP #i f d e f OPENMP #pragma omp p a r a l l e l f o r #e n d i f f o r ( i =0; i <N; i ++) y [ i ]= f ( x [ i ] ) ; If the -fopenmp flag is not given to the compiler, you will get the serial version of the code If it is given, you will get the parallel version of the code Parallellization induces some overhead, so use the serial version if you only want to use a single CPU core Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 12/27

13 Conditional compilation, cont d Test program: #i n c l u d e <s t d i o. h> #i f n d e f OPENMP #d e f i n e OPENMP 0 #e n d i f i n t main ( v o i d ) { p r i n t f ( %i \n, OPENMP ) ; r e t u r n 0 ; } Compile: $ gcc t e s t. c o s e r i a l. x $ gcc fopenmp t e s t. c o p a r a l l e l. x Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 13/27

14 Conditional compilation, cont d On Fedora 18: $. / s e r i a l. x 0 $. / p a r a l l e l. x $ gcc v e r s i o n gcc (GCC) ( Red Hat ) C o p y r i g h t (C) 2012 Free S o f t w a r e Foundation, I n c. On Red Hat Enterprise 5: $. / p a r a l l e l. x $ gcc v e r s i o n gcc (GCC) ( Red Hat ) C o p y r i g h t (C) 2006 Free S o f t w a r e Foundation, I n c. The OPENMP macro is defined as the date of the OpenMP standard implemented in the compiler (3.0 and 2.5, respectively). Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 14/27

15 Shared memory parallellization workflow Serial code thread 1 thread 2 thread 3 thread 4 Parallel code Serial code Parallellization only used in compute-intensive parts Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 15/27

16 OpenMP workflow A parallel code segment in OpenMP begins with #pragma omp p a r a l l e l At this moment the program will launch the set amount of threads defaults to the amount of cores available on the system if not overrided with OMP NUM THREADS environment variable void omp set threads (int nth) function in the code Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 16/27

17 OpenMP hello world #i n c l u d e <s t d i o. h> #i f d e f OPENMP #i n c l u d e <omp. h> #e n d i f i n t main ( v o i d ) { #i f d e f OPENMP / Amount o f cpus / c o n s t i n t ncpu=omp get num procs ( ) ; / Maximum number o f t h r e a d s / c o n s t i n t maxth=o m p g e t m a x t h r e a d s ( ) ; #e l s e c o n s t i n t ncpu =1; c o n s t i n t maxth =1; #e n d i f p r i n t f ( %i p r o c e s s o r s i n system, %i t h r e a d s a l l o w e d. \ n,\ ncpu, maxth ) ; Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 17/27

18 OpenMP hello world, cont d #i f d e f OPENMP #pragma omp p a r a l l e l #e n d i f { / P a r a l l e l r e g i o n / #i f d e f OPENMP / Amount o f t h r e a d s / c o n s t i n t nth=omp get num threads ( ) ; / I n d e x o f c u r r e n t t h r e a d / c o n s t i n t i t h=omp get thread num ( ) ; #e l s e c o n s t i n t nth =1; c o n s t i n t i t h =0; #e n d i f } } p r i n t f ( H e l l o from t h r e a d %i /% i. \ n, i t h, nth ) ; Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 18/27

19 OpenMP hello world, cont d $. / s e r i a l. x 1 p r o c e s s o r s i n system, 1 t h r e a d a l l o w e d. H e l l o from t h r e a d 0 / 1. $. / p a r a l l e l. x 8 p r o c e s s o r s i n system, 8 t h r e a d s a l l o w e d. H e l l o from t h r e a d 7 / 8. H e l l o from t h r e a d 1 / 8. H e l l o from t h r e a d 2 / 8. H e l l o from t h r e a d 0 / 8. H e l l o from t h r e a d 3 / 8. H e l l o from t h r e a d 5 / 8. H e l l o from t h r e a d 4 / 8. H e l l o from t h r e a d 6 / 8. $. / p a r a l l e l. x 8 p r o c e s s o r s i n system, 8 t h r e a d s a l l o w e d. H e l l o from t h r e a d 5 / 8. H e l l o from t h r e a d 3 / 8. H e l l o from t h r e a d 1 / 8. H e l l o from t h r e a d 4 / 8. H e l l o from t h r e a d 6 / 8. H e l l o from t h r e a d 2 / 8. H e l l o from t h r e a d 0 / 8. H e l l o from t h r e a d 7 / 8. Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 19/27

20 OpenMP hello world, cont d $ e x p o r t OMP NUM THREADS=4 $. / a. out 8 p r o c e s s o r s i n system, 4 t h r e a d s a l l o w e d. H e l l o from t h r e a d 0 / 4. H e l l o from t h r e a d 2 / 4. H e l l o from t h r e a d 3 / 4. H e l l o from t h r e a d 1 / 4. $. / a. out 8 p r o c e s s o r s i n system, 4 t h r e a d s a l l o w e d. H e l l o from t h r e a d 2 / 4. H e l l o from t h r e a d 0 / 4. H e l l o from t h r e a d 1 / 4. H e l l o from t h r e a d 3 / 4. Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 20/27

21 OpenMP hello world, cont d variables declared within the parallel block are private to each thread variables declared outside the parallel block (but within scope) are shared by default can be declared as private execution order is not guaranteed! may lead to race conditions if shared variables are modified Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 21/27

22 Race condition #i n c l u d e <omp. h> #i n c l u d e <s t d i o. h> i n t main ( v o i d ) { i n t i =0; #pragma omp i ++; p a r a l l e l } p r i n t f ( i=%i \n, i ) ; $ f o r ( ( i =0; i <8; i ++)); do. / a. out ; done i =7 i =8 i =8 i =6 i =8 i =7 i =7 i =6 A race condition occurs, leading to an incorrect result. Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 22/27

23 Race condition, cont d Correct functionality 1. thread A reads i=0 2. thread A increments i by 1 3. thread A saves i=1 to memory 4. thread B reads i=1 5. thread B increments i by 1 6. thread B saves i=2 Race condition: 1. thread A reads i=0 2. thread B reads i=0 3. thread A increments i by 1 4. thread A saves i=1 5. thread B increments i by 1 6. thread B saves i=1 Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 23/27

24 Race condition, cont d In this case, the race condition can be prevented in multiple ways. #pragma omp p a r a l l e l r e d u c t i o n (+: i ) i ++; Each thread calculates the final value of i, and the results are summed together Natural example of reduction: summation of large array #i n c l u d e < s t d l i b. h> double sum ( c o n s t double x, s i z e t N) { s i z e t i ; double r e s =0.0; #i f d e f OPENMP #pragma omp p a r a l l e l f o r r e d u c t i o n (+: r e s ) #e n d i f f o r ( i =0; i <N; i ++) r e s+=x [ i ] ; r e t u r n r e s ; } Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 24/27

25 Race condition, cont d #pragma omp #pragma omp i ++; p a r a l l e l atomic the memory location of i is protected against multiple writes Example: e x t e r n s i z e t i n d e x [ ] ; e x t e r n f l o a t a [ ], p = a, b ; / P r o t e c t a g a i n s t r a c e s among m u l t i p l e u p d a t e s. / #pragma omp atomic a [ i n d e x [ i ] ] += b ; / P r o t e c t a g a i n s t r a c e s w i t h u p d a t e s t h r o u g h a. / #pragma omp atomic p [ i ] = 1. 0 f ; Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 25/27

26 Race condition, cont d #pragma omp #pragma omp i ++; p a r a l l e l c r i t i c a l only one thread is allowed to simultaneously execute the statement i++ Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 26/27

27 Cheat sheet For more features see, e.g., the OpenMP cheat sheet Scientific Programming in C, fall 2012 Susi Lehtola Parallel computing 27/27

Parallel Programming: OpenMP

Parallel Programming: OpenMP Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An

More information

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

CS691/SC791: Parallel & Distributed Computing

CS691/SC791: Parallel & Distributed Computing CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.

More information

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008 Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared

More information

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/

More information

Shared Memory programming paradigm: openmp

Shared Memory programming paradigm: openmp IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM

More information

INTRODUCTION TO OPENMP

INTRODUCTION TO OPENMP INTRODUCTION TO OPENMP Hossein Pourreza hossein.pourreza@umanitoba.ca February 25, 2016 Acknowledgement: Examples used in this presentation are courtesy of SciNet. What is High Performance Computing (HPC)

More information

Shared Memory Programming Model

Shared Memory Programming Model Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache

More information

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008 1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

Lab: Scientific Computing Tsunami-Simulation

Lab: Scientific Computing Tsunami-Simulation Lab: Scientific Computing Tsunami-Simulation Session 4: Optimization and OMP Sebastian Rettenberger, Michael Bader 23.11.15 Session 4: Optimization and OMP, 23.11.15 1 Department of Informatics V Linux-Cluster

More information

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen OpenMPand the PGAS Model CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen LastTime: Message Passing Natural model for distributed-memory systems Remote ( far ) memory must be retrieved before use Programmer

More information

Scientific Programming in C VI. Common errors

Scientific Programming in C VI. Common errors Scientific Programming in C VI. Common errors Susi Lehtola 6 November 2012 Beginner errors If you re a beginning C programmer, you might often make off-by one errors when you use arrays: #i n c l u de

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

Lecture 4: OpenMP Open Multi-Processing

Lecture 4: OpenMP Open Multi-Processing CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP

More information

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;

More information

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group Parallelising Scientific Codes Using OpenMP Wadud Miah Research Computing Group Software Performance Lifecycle Scientific Programming Early scientific codes were mainly sequential and were executed on

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Introduction to OpenMP. Lecture 2: OpenMP fundamentals Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview 2 Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs What is OpenMP? 3 OpenMP is an API designed for programming

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced

More information

Introduction to HPC and Optimization Tutorial VI

Introduction to HPC and Optimization Tutorial VI Felix Eckhofer Institut für numerische Mathematik und Optimierung Introduction to HPC and Optimization Tutorial VI January 8, 2013 TU Bergakademie Freiberg Going parallel HPC cluster in Freiberg 144 nodes,

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Why C? Because we can t in good conscience espouse Fortran.

Why C? Because we can t in good conscience espouse Fortran. C Tutorial Why C? Because we can t in good conscience espouse Fortran. C Hello World Code: Output: C For Loop Code: Output: C Functions Code: Output: Unlike Fortran, there is no distinction in C between

More information

<Insert Picture Here> OpenMP on Solaris

<Insert Picture Here> OpenMP on Solaris 1 OpenMP on Solaris Wenlong Zhang Senior Sales Consultant Agenda What s OpenMP Why OpenMP OpenMP on Solaris 3 What s OpenMP Why OpenMP OpenMP on Solaris

More information

Open Multi-Processing: Basic Course

Open Multi-Processing: Basic Course HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels

More information

Serial and Parallel Sobel Filtering for multimedia applications

Serial and Parallel Sobel Filtering for multimedia applications Serial and Parallel Sobel Filtering for multimedia applications Gunay Abdullayeva Institute of Computer Science University of Tartu Email: gunay@ut.ee Abstract GSteamer contains various plugins to apply

More information

Hybrid Model Parallel Programs

Hybrid Model Parallel Programs Hybrid Model Parallel Programs Charlie Peck Intermediate Parallel Programming and Cluster Computing Workshop University of Oklahoma/OSCER, August, 2010 1 Well, How Did We Get Here? Almost all of the clusters

More information

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs 2 1 What is OpenMP? OpenMP is an API designed for programming

More information

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing CS 590: High Performance Computing OpenMP Introduction Fengguang Song Department of Computer Science IUPUI OpenMP A standard for shared-memory parallel programming. MP = multiprocessing Designed for systems

More information

Session 4: Parallel Programming with OpenMP

Session 4: Parallel Programming with OpenMP Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to

More information

Parallel Programming Languages 1 - OpenMP

Parallel Programming Languages 1 - OpenMP some slides are from High-Performance Parallel Scientific Computing, 2008, Purdue University & CSCI-UA.0480-003: Parallel Computing, Spring 2015, New York University Parallel Programming Languages 1 -

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

Distributed Systems + Middleware Concurrent Programming with OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

Scientific Computing

Scientific Computing Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 20 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling

More information

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with

More information

OpenMP Shared Memory Programming

OpenMP Shared Memory Programming OpenMP Shared Memory Programming John Burkardt, Information Technology Department, Virginia Tech.... Mathematics Department, Ajou University, Suwon, Korea, 13 May 2009.... http://people.sc.fsu.edu/ jburkardt/presentations/

More information

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer HPC-Lab Session 2: OpenMP M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Dr. Hyrum D. Carroll November 22, 2016 Parallel Programming in a Nutshell Load balancing vs Communication This is the eternal problem in parallel computing. The basic approaches

More information

Shared Memory Programming with OpenMP

Shared Memory Programming with OpenMP Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model

More information

OpenMP - Introduction

OpenMP - Introduction OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı - 21.06.2012 Outline What is OpenMP? Introduction (Code Structure, Directives, Threads etc.) Limitations Data Scope Clauses Shared,

More information

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics Shared Memory Programming with OpenMP Lecture 8: Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the

More information

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.

Hybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

GLOSSARY. OpenMP. OpenMP brings the power of multiprocessing to your C, C++, and. Fortran programs. BY WOLFGANG DAUTERMANN

GLOSSARY. OpenMP. OpenMP brings the power of multiprocessing to your C, C++, and. Fortran programs. BY WOLFGANG DAUTERMANN OpenMP OpenMP brings the power of multiprocessing to your C, C++, and Fortran programs. BY WOLFGANG DAUTERMANN f you bought a new computer recently, or if you are wading through advertising material because

More information

Masterpraktikum - High Performance Computing

Masterpraktikum - High Performance Computing Masterpraktikum - High Performance Computing OpenMP Michael Bader Alexander Heinecke Alexander Breuer Technische Universität München, Germany 2 #include ... #pragma omp parallel for for(i = 0; i

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

Lecture 2: Introduction to OpenMP with application to a simple PDE solver

Lecture 2: Introduction to OpenMP with application to a simple PDE solver Lecture 2: Introduction to OpenMP with application to a simple PDE solver Mike Giles Mathematical Institute Mike Giles Lecture 2: Introduction to OpenMP 1 / 24 Hardware and software Hardware: a processor

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

CS 5220: Shared memory programming. David Bindel

CS 5220: Shared memory programming. David Bindel CS 5220: Shared memory programming David Bindel 2017-09-26 1 Message passing pain Common message passing pattern Logical global structure Local representation per processor Local data may have redundancy

More information

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017 INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and

More information

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading

More information

Data Environment: Default storage attributes

Data Environment: Default storage attributes COSC 6374 Parallel Computation Introduction to OpenMP(II) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Data Environment: Default storage attributes

More information

Concurrency, Thread. Dongkun Shin, SKKU

Concurrency, Thread. Dongkun Shin, SKKU Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point

More information

Introduction to OpenMP

Introduction to OpenMP 1 / 7 Introduction to OpenMP: Exercises and Handout Introduction to OpenMP Christian Terboven Center for Computing and Communication, RWTH Aachen University Seffenter Weg 23, 52074 Aachen, Germany Abstract

More information

Parallel Programming

Parallel Programming Parallel Programming Lecture delivered by: Venkatanatha Sarma Y Assistant Professor MSRSAS-Bangalore 1 Session Objectives To understand the parallelization in terms of computational solutions. To understand

More information

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions. 1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging

More information

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 2 OpenMP Shared address space programming High-level

More information

Implementation of Parallelization

Implementation of Parallelization Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems

More information

15-418, Spring 2008 OpenMP: A Short Introduction

15-418, Spring 2008 OpenMP: A Short Introduction 15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.

More information

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for

More information

OpenMP Fundamentals Fork-join model and data environment

OpenMP Fundamentals Fork-join model and data environment www.bsc.es OpenMP Fundamentals Fork-join model and data environment Xavier Teruel and Xavier Martorell Agenda: OpenMP Fundamentals OpenMP brief introduction The fork-join model Data environment OpenMP

More information

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses

Module 11: The lastprivate Clause Lecture 21: Clause and Routines. The Lecture Contains: The lastprivate Clause. Data Scope Attribute Clauses The Lecture Contains: The lastprivate Clause Data Scope Attribute Clauses Reduction Loop Work-sharing Construct: Schedule Clause Environment Variables List of Variables References: file:///d /...ary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture%2021/21_1.htm[6/14/2012

More information

OpenMP Programming. Aiichiro Nakano

OpenMP Programming. Aiichiro Nakano OpenMP Programming Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science

More information

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,

More information

Alfio Lazzaro: Introduction to OpenMP

Alfio Lazzaro: Introduction to OpenMP First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:

More information

EPL372 Lab Exercise 5: Introduction to OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

[Potentially] Your first parallel application

[Potentially] Your first parallel application [Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Threaded Programming. Lecture 9: Alternatives to OpenMP

Threaded Programming. Lecture 9: Alternatives to OpenMP Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming

More information

Parallel processing with OpenMP. #pragma omp

Parallel processing with OpenMP. #pragma omp Parallel processing with OpenMP #pragma omp 1 Bit-level parallelism long words Instruction-level parallelism automatic SIMD: vector instructions vector types Multiple threads OpenMP GPU CUDA GPU + CPU

More information

Shared memory parallel computing

Shared memory parallel computing Shared memory parallel computing OpenMP Sean Stijven Przemyslaw Klosiewicz Shared-mem. programming API for SMP machines Introduced in 1997 by the OpenMP Architecture Review Board! More high-level than

More information

OpenMP on Ranger and Stampede (with Labs)

OpenMP on Ranger and Stampede (with Labs) OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent

More information

Shared Memory Programming With OpenMP Exercise Instructions

Shared Memory Programming With OpenMP Exercise Instructions Shared Memory Programming With OpenMP Exercise Instructions John Burkardt Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Advanced Computational Science

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization

More information

Oracle Developer Studio 12.6

Oracle Developer Studio 12.6 Oracle Developer Studio 12.6 Oracle Developer Studio is the #1 development environment for building C, C++, Fortran and Java applications for Oracle Solaris and Linux operating systems running on premises

More information

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 1 Parallel Programming Standards Thread Libraries - Win32 API / Posix

More information

https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG https://www.youtube.com/playlist?list=pllx- Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG OpenMP Basic Defs: Solution Stack HW System layer Prog. User layer Layer Directives, Compiler End User Application OpenMP library

More information

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015 PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples

More information

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve PTHREADS pthread_create, pthread_exit, pthread_join Mutex: locked/unlocked; used to protect access to shared variables (read/write) Condition variables: used to allow threads

More information