Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Similar documents
1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Barbara Chapman, Gabriele Jost, Ruud van der Pas

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

A brief introduction to OpenMP

Lecture 4: OpenMP Open Multi-Processing

CME 213 S PRING Eric Darve

Distributed Systems + Middleware Concurrent Programming with OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Introduction to OpenMP.

An Introduction to OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Overview: The OpenMP Programming Model

Parallel Computing. Prof. Marco Bertini

Parallel Programming: OpenMP

ECE 574 Cluster Computing Lecture 10

POSIX Threads and OpenMP tasks

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

Introduction to Standard OpenMP 3.1

Introduction to. Slides prepared by : Farzana Rahman 1

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction to OpenMP

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP 4.5: Threading, vectorization & offloading

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang

Shared Memory Programming with OpenMP

Alfio Lazzaro: Introduction to OpenMP

OpenMP 4.0/4.5. Mark Bull, EPCC

Cray XE6 Performance Workshop

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Session 4: Parallel Programming with OpenMP

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

[Potentially] Your first parallel application

Lab: Scientific Computing Tsunami-Simulation

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

OpenMP loops. Paolo Burgio.

Introduction to OpenMP

Speeding Up Reactive Transport Code Using OpenMP. OpenMP

CME 213 S PRING Eric Darve

Tasking in OpenMP 4. Mirko Cestari - Marco Rorro -

Parallel Programming

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP 4.0. Mark Bull, EPCC

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.

EE/CSCI 451: Parallel and Distributed Computation

Parallel Programming using OpenMP

Parallel Programming using OpenMP

OpenMP Application Program Interface

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Introduction to OpenMP

Implementation of Parallelization

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

OpenMP Fundamentals Fork-join model and data environment

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

An Introduction to OpenMP

Open Multi-Processing: Basic Course

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

Allows program to be incrementally parallelized

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

Masterpraktikum - High Performance Computing

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

COMP528: Multi-core and Multi-Processor Computing

CS691/SC791: Parallel & Distributed Computing

Introduction [1] 1. Directives [2] 7

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

Parallel algorithm templates. Threads, tasks and parallel patterns Programming with. From parallel algorithms templates to tasks

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Introduction to OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Concurrent Programming with OpenMP

Parallel Programming in C with MPI and OpenMP

Data Environment: Default storage attributes

Shared Memory Parallelism - OpenMP

OpenMP for Accelerators

Transcription:

Multicore Jukka Julku 19.2.2009

1 2 3 4 5 6

Disclaimer There are several low-level, languages and directive based approaches But no silver bullets This presentation only covers some examples of them is today's main subject!

Lowest of the low Threads POSIX threads, etc. Basically The assembly language of parallel computing But there are also advanced threading like Boost C++ threading library

Intel's Threading Building Blocks () C++ Library Abstracts low-level threading detail Concept of tasks Uses C++ templates Resembles STL Designed for portability and scalability Includes Task manager to schedule tasks Concurrent data structures Concurrent algorithms Synchronization primitives and memory allocators

Message-passing interface () The de-facto standard in distributed memory computing Huge number of existing parallel applications already run over!, point-to-point and collective communications, topologies, etc. Multicores? Depends on the library implementation Hybrid model

Unied parallel C C extension Full language SPMD Shared or distributed memory Single logically partitioned address space Threads have shared and private data Work distribution upc_forall() Synchronization barriers, locks

Portable programming interface for shared-memory parallel computers Compiler assisted parallel programming Compiler directives, library routines and environmental variables Provides means to Create teams of threads Share work between the threads Declare private or shared data Synchronization and exclusive access

The basic idea Write (sequential) code Identify parallel regions The main eort of parallelization Use directives to express parallelism Compile and run

Motivation Simple idea Easy to learn syntax Parallelize (old) sequential code Standard technique Portable Incremental parallelization C/C++ and Fortran support

Execution model Fork and join model Based on blocks of parallel computation Thread team is formed at entry Barrier at exit Nested blocks

Memory model Shared-memory Threads also have private memory Data transfer is controlled by directives Relaxed memory consistency Threads may have their own view of the world between certain synchronization points The memory is automatically synchronized at barriers, entry and exit to critical sections and lock routines Flush -operation for the programmer Motivation: Flexibility of implementation

Core elements 0 Source: Wikipedia

: C-language #pragma omp d i r e c t i v e name [ c l a u s e [ [, ] c l a u s e ]... ] Clauses control how the directives are compiled and executed

Parallel constructs #pragma omp p a r a l l e l [ c l a u s e [ [, ] c l a u s e ]... ] s t r u c t u r e d b l o c k The most important construct Denes a parallel section Everything not inside a parallel construct is sequential Best practice: Maximize parallel regions Spawns a team of threads Does not do any work sharing! Implicit barrier at the end

Example #i n c l u d e <omp. h> #i n c l u d e <s t d i o. h> i n t main ( ) { #pragma omp p a r a l l e l { p r i n t f (" Thread %d\n ", omp_get_thread_num ( ) ) ; } } r e t u r n 0 ;

Work-sharing constructs Specify how to share work between the thread team Restrictions Each thread at the team must encounter or non at all Sequence of work-sharing constructs and barriers must be the same Barriers at exit by default Can be removed with nowait -clause Orphaning

Loop construct #pragma omp f o r [ c l a u s e [ [, ] c l a u s e ]... ] f o r l o o p The most common work-sharing construct Distributes loop iterations among the threads Limited to loops whose iterations can be counted i.e. for(init-expr; var relop b; incr-expr )

Example #pragma omp p a r a l l e l s h a r e d (A, B, C, n ) p r i v a t e ( i ) { #pragma omp f o r s c h e d u l e ( dynamic, 100) f o r ( i =0; i <n ; i ++) { A [ i ] = B [ i ] + C [ i ] ; } }

Sections and single constructs #pragma omp s e c t i o n s [ c l a u s e [ [, ] c l a u s e ]... ] { [#pragma omp s e c t i o n s t r u c t u r e d b l o c k ] [#pragma omp s e c t i o n s t r u c t u r e d b l o c k ]... } Distribute independent units of work Each section is executed exactly once #pragma omp s i n g l e [ c l a u s e [ [, ] c l a u s e ]... ] s t r u c t u r e d b l o c k Block is executed by only one thread For example: Initialization tasks between two loops

Example #pragma omp p a r a l l e l p r i v a t e ( i, j ) { } #pragma omp s e c t i o n s { #pragma omp s e c t i o n { f o r ( i =0; i <n ; i ++) { C [ i ] = A [ i ] + B [ i ] ; } #pragma omp s e c t i o n { // m!= n f o r ( j =0; j <m; j ++) { F [ j ] = D[ j ] E [ j ] ; }

Task construct #pragma omp t a s k [ c l a u s e [ [, ] c l a u s e ]... ] s t r u c t u r e d b l o c k #pragma omp t a s k w a i t New in 3.0 When the directive is encountered a new task is created from the block The task can be executed by any thread of the team Taskwait construct species a wait on the completion of child tasks generated since the beginning of the current task. Where to use: While-loops and recursive code

Example v o i d q u i c k s o r t ( i n t a r r a y, i n t l e f t, i n t r i g h t ) { i n t i ; i f ( l e f t >= r i g h t ) { r e t u r n ; } i = p a r t i t i o n ( a r r a y, l e f t, r i g h t ) ; #pragma omp t a s k q u i c k s o r t ( a r r a y, l e f t, i 1 ) ; q u i c k s o r t ( a r r a y, i n d e x + 1, r i g h t ) ; #pragma omp } t a s k w a i t

Some memory clauses shared(list) Shared by all threads private(list) Private copy for each thread rstprivate(list) Private, initialized with variable of the same name lastprivate(list) Private, last value is accessible after the section Last: last iteration in sequential order, last section default(none shared private) Used to state default sharing attribute

Synchronization 1/2 Implicit barrier is not always enough denes set of other synchronization mechanisms for dierent kinds of critical sections Lock routines

Synchronization 2/2 #pragma omp b a r r i e r #pragma omp o r d e r e d s t r u c t u r e d b l o c k #pragma omp c r i t i c a l [ ( name ) ] s t r u c t u r e d b l o c k #pragma omp atomic s t a t e m e n t #pragma omp master s t r u c t u r e d b l o c k

Flush #pragma omp f l u s h [ ( l i s t ) ] Relaxed memory consistency Synchronized at synchronization points Barriers, entry and exit to critical section and lock routines Threads may need to have consistent view between the synchronization points Flush can be used to synchronize memory in these situations Granularity: Variables

Example i n t x, y ; #pragma omp p a r a l l e l s h a r e d ( x ) p r i v a t e ( y ) { #pragma omp master { x = 1 ; } y = x ; } Its not guaranteed that x is 1 as there's no barrier! Flush(x) is needed before the assignment

Simple, but... Cannot prevent: Data races Thread-safety problems Deadlocks Memory consistency problems Programmer must ensure that the values are the same between the synchronization points Compiler can rearrange code, even ush! This may lead to hard to nd problems But still, removes lot of work and in most cases is quite simple approach!

Performance Just parallelizing the code with directives doesn't necessarily bring the expected benets Especially with large number of threads ne-tuning may be required Parallelization overhead Thread management Waiting at barriers Etc.

Code examples If there is still time left, some more examples

The end Any questions?

Sources 1 Chapman, B., Jost, G. and Van Der Pas, R: Using - Portable shared memory parallel programming. 2008. Very good introduction to and parallel programming in general 2 specication 3.0. www.openmp.org 3 : http://www.threadingbuildingblocks.org/ 4 standard 2.0 5 Leiserson, C. Mirman, I.: How to survive the multicore software revolution. 6 Krawezik, G. and Capello, F.: Performance comparison of and three programming styles on shared memory multiprocessors.