COMP528: Multi-core and Multi-Processor Computing

Size: px
Start display at page:

Download "COMP528: Multi-core and Multi-Processor Computing"

Transcription

1 COMP528: Multi-core and Multi-Processor Computing Dr Michael K Bane, G14, Computer Science, University of Liverpool m.k.bane@liverpool.ac.uk 17

2 Background Reading "Using OpenMP The Next Step: Affinity, Accelerators, Tasking and SIMD", van der Pas et al. MIT Press (2017) Homework: read Chapter 1 (a nice recap of v2.5 of OpenMP) "Using OpenMP: Portable Shared Memory Parallel Programming" Chapman et al. MIT Press (2007) D= &ppg=60 v2.5 so not cover: tasks, accelerators, some refinements

3 TODAY: Performance Issues synchronisation / nowait memory model / omp flush first touch / why initialisation parallelism matters thread affinity (important) & thread placement (not so important, generally) problem false sharing Other programming models tasks in OpenMP

4 False Sharing in reality (all) modern microprocessors use "cache lines" presumptions from history eg if we defined int x[100]; it's likely we will want to access x[0] and then soon after access x[1] and then x[2] SO, when fetching variable x why not get a "cache line" of memory (presumably including x,y,z ) so we have the data handy

5 False Sharing Fetching a cache line is (generally) good for serial with (limited if any) negative side effects we can always just fetch another cache line when we need it BUT for threaded applications one thread could fetch a cache line, update just one element, which then means the line is "dirty" so any other threads who have fetched that cache line will have to throw away whatever they have and re-fetch the whole line for whichever element they actually need we could have many threads "invalidating" elements on same cache line

6 False Sharing we could have many threads "invalidating" elements on same cache line ==> thrashing of the cache several threads having to keep getting the same cache line much more frequently, solely due to another thread updating a single item on that cache line TYPICAL WARNING SIGNS time of "par for" increases with #threads despite work being independent and load balanced (or very bad efficiency even on low #threads) profiling/valgrind show very high #cache fetches for given work sharing arrays, particularly when promoting scalar X to array X[myThread] for allows multi-thread access to X

7 Matrix-vector batch timings #threads 80K x 8 S(p) E(p) 800 x 800 S(p) E(p) 8 x 80K S(p) E(p) % % % % % % % % % % % % 3 input data sizes all have same amount of FLOPS (640K*2) 3 different rows:cols why does one of them not scale? FALSE SHARING data fetched as cache line lots of invalidations for 8x80k since in this case y[i] = A[i][j]*x[j] LHS: 8 floats = 32 bytes = single cache line ie all threads writing

8 DIY Reduction ==> False Sharing? demo: ~/OMP/falsesharing not good: timmattson.c potential fix: timmattson_padding.c qsub/qrsh tim.sh runs each (say) 500 times grep Milli nopad.txt sort --key=7 -n head to get fastest pre-fix grep Milli pad.txt sort --key=7 -n head to get fastest post-fix

9 Preventing False Sharing padding artificially increase length of allocated vector to prevent >1 thread accessing any cache line eg x[numthreads] to x[numthreads][pad] different strides through data / use of schedule(?,??) to force threads to use improved pattern (diff cache lines per thread) accessing allocated vector use a local x (ie private) for all the parallel work, then assign to y, shared, once finished

10 Tuesday Morning: general lessons from Assignment 1 Performance Issues a) thread placement/affinity b) memory placement c) dangers of false sharing DATA DEPENDENCIES TASKS SIMD and vectorisation Directives for Programming Accelerators

11 DEPENDENCIES

12 OpenMP Work-Sharing for loops but which can we parallelise?

13 Fibonnaci x[0] = 1; x[1] = 1; for (i=2; i<n; i++) { x[i] = x[i-1] + x[i-2]; } x[0] = 1 x[1] = 1 x[2] = 2 x[3] = 3 x[4] = 5 x[5] = 8 works fine sequentially iterating

14 Fibonnaci x[0] = 1; x[1] = 1; #pragma omp parallel for \ default(none) private(i) \ shared(n) shared(x) for (i=2; i<n; i++) { x[i] = x[i-1] + x[i-2]; } x[0] = 1 x[1] = 1 x[2] = x[1] + x[0] x[3] = x[2] + x[1] #0 #1# #2 i=2,3 i=4,5 i=5,6,7 X[2]=X[1]+X[0] X[3]=X[2]=X[1] i.e. there is a LOOP CARRIED DEPENDENCY (dependency but between different iterations of the loop) for i>2, x[i] depends on x[i-1] which may be being calculated on another thread for j>3, x[j] depends on x[j-2] which may be being calculated on another thread

15 Fibonnaci Presume #1 slow off mark, even though from an external view point we could see X[3], X[2] have been computed, it is likely that #0 has done so on a cache copy & that these entries have not yet been updated in the global/shared copy of X since there is no #pragma omp flush (X) x[0] = 1; x[1] = 1; #pragma omp parallel for \ default(none) private(i) \ shared(n) shared(x) for (i=2; i<n; i++) { x[i] = x[i-1] + x[i-2]; } x[0] = 1 x[1] = 1 x[2] = x[1] + x[0] x[3] = x[2] + x[1] x[4] = x[3] + x[2] #0 #1# #2 i=2,3 i=4,5 i=5,6,7 X[2]=X[1]+X[0] X[3]=X[2]=X[1] X[4]=X[3]+X[2] i.e. there is a LOOP CARRIED DEPENDENCY (dependency but between different iterations of the loop) for i>2, x[i] depends on x[i-1] which may be being calculated on another thread for j>3, x[j] depends on x[j-2] which may be being calculated on another thread

16 Fibonnaci BUT MORE REALISTICALLY #1 will be computing X[4] before #0 has calculated the required X[3], X[2] (and so on ) x[0] = 1; x[1] = 1; #pragma omp parallel for \ default(none) private(i) \ shared(n) shared(x) for (i=2; i<n; i++) { x[i] = x[i-1] + x[i-2]; } x[0] = 1 x[1] = 1 x[2] = x[1] + x[0] x[3] = x[2] + x[1] x[4] = x[3] + x[2] #0 #1# #2 i=2,3 i=4,5 i=5,6,7 X[2]=X[1]+X[0] X[4]=X[3]+X[2] X[3]=X[2]=X[1] i.e. there is a LOOP CARRIED DEPENDENCY (dependency but between different iterations of the loop) for i>2, x[i] depends on x[i-1] which may be being calculated on another thread for j>3, x[j] depends on x[j-2] which may be being calculated on another thread

17 One has to think whether a given loop can be parallelised The compiler will merely obey without question the OMP directives, even if the logic is wrong

18 5 minutes to determine which & how to parallelise

19 Data Dependency Analysis Theory ==> not in COMP528 BUT need to be able to spot dependencies need to be able to consider options for removing dependencies Quick test: if you ran the loop in the opposite direction, do you get same results? Y=> some loop invariance, possible parallelisable, but N=> definite ordering issues, no straight forward parallelisable

20 OpenMP 2.5 now covered SUMMARY Directives, Run-time Functions, Env. Vars Worksharing Synchronisation (& nowait clause) Performance OpenMP relaxed memory model first touch (& how to use it to gain performance) false sharing NOW OpenMP 4.5: tasks, vectors & accelerators "Using OpenMP The Next Step: Affinity, Accelerators, Tasking and SIMD", van der Pas et al. MIT Press (2017)

21 TASKS

22 Tasks Rather than considering workflow imperative stmt then stmt then stmt and work-sharing across a for loop Can we identify specific tasks, push them to a queue and let the system run next appropriate task when resources available closer to data flow appropriate => user defines some dependencies, RTS runs when they are resolved

23 Tasks / Tasking think task (as in a bundle of work) not threads can be used to exploit parallelism in workloads that are not a set of for loops also support while loops & recursion, & some dynamic load balancing task parallelism (using data flow)

24 create task Tasks: Concept

25 Tasks: Concept create task add to task pool

26 create task Tasks: Concept

27 Tasks: Concept create task add to pool

28 a little later we may have create task add to pool

29 BUT we can also run tasks on avail resources T I M E

30 BUT we can also run tasks on avail resources task runs on avail thread T I M E

31 BUT we can also run tasks on avail resources avail resource => assign task & repeat T I M E

32 BUT we can also run tasks on avail resources T I M E

33 BUT we can also run tasks on avail resources start new tasks when resources become available T I M E

34 BUT we can also run tasks on avail resources aha dependency purple depends on all yellow finished T I M E

35 BUT we can also run tasks on avail resources T I M E

36 Outline of syntax task creation done in a parallel region but given thread so usually within a singly threaded region master single (maybe critical as long as not same task created) #pragma omp task { block becomes task }

37 Running the task run-time decides! might be immediate might be deferred BUT programmer can use synchronisation to ensure when must be run by (or rather where to wait until task/s has run)

38 Definitions task construct task the actual instructions (& data created) when thread runs ( encounters ) the task construct different encounters of same task construct generate different tasks task region the code encountered during execution of a task

39 Two Examples Simple, non-recursive print out adjectives, any order ~/OMP/tasks/storyTelling.c with & without taskwait Fibonnaci ~/OMP/tasks/ex_fib_tasks.c ~/OMP/tasks/ex_fib_tasks_DETAIL.c (time it )

40 overheads of tasks can be high if omp par for fits, use it can arrange for task to be on a team of threads can nest tasks can use tasks for dynamic load balancing of irregular problems

41 Tasks useful to know worth consideration in depth further reading chpt 3: Using OpenMP: The Next Step forge.cineca.it/files/scuolacalcoloparallelo_webdav/public/anno- 2016/12_Advanced_School/OpenMP_4_tasks.pdf (Cineca)

42 Tuesday Morning: general lessons from Assignment 1 Still to come Performance Issues a) thread placement/affinity b) memory placement c) dangers of false sharing DATA DEPENDENCIES TASKS SIMD and vectorisation Directives for Programming Accelerators

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples Multicore Jukka Julku 19.2.2009 1 2 3 4 5 6 Disclaimer There are several low-level, languages and directive based approaches But no silver bullets This presentation only covers some examples of them is

More information

Tasking in OpenMP 4. Mirko Cestari - Marco Rorro -

Tasking in OpenMP 4. Mirko Cestari - Marco Rorro - Tasking in OpenMP 4 Mirko Cestari - m.cestari@cineca.it Marco Rorro - m.rorro@cineca.it Outline Introduction to OpenMP General characteristics of Taks Some examples Live Demo Multi-threaded process Each

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 9: Performance tuning Sources of overhead There are 6 main causes of poor performance in shared memory parallel programs: sequential code communication load imbalance synchronisation

More information

Some features of modern CPUs. and how they help us

Some features of modern CPUs. and how they help us Some features of modern CPUs and how they help us RAM MUL core Wide operands RAM MUL core CP1: hardware can multiply 64-bit floating-point numbers Pipelining: can start the next independent operation before

More information

Advanced OpenMP. Lecture 11: OpenMP 4.0

Advanced OpenMP. Lecture 11: OpenMP 4.0 Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013 Starting to make an appearance in production compilers What s new in 4.0 User defined reductions Construct cancellation

More information

OpenMP 4.0. Mark Bull, EPCC

OpenMP 4.0. Mark Bull, EPCC OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!

More information

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp

More information

UPDATES. 1. Threads.v. hyperthreading

UPDATES. 1. Threads.v. hyperthreading UPDATES 1. Threads.v. hyperthreading Hyperthreadingis physical: set by BIOS, whether the operating sees 2 (not 1) logical cores for each physical core. Threads: lightweight processes running. Typically

More information

OpenMP 4.0/4.5. Mark Bull, EPCC

OpenMP 4.0/4.5. Mark Bull, EPCC OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all

More information

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:

More information

Parallel processing with OpenMP. #pragma omp

Parallel processing with OpenMP. #pragma omp Parallel processing with OpenMP #pragma omp 1 Bit-level parallelism long words Instruction-level parallelism automatic SIMD: vector instructions vector types Multiple threads OpenMP GPU CUDA GPU + CPU

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs. Review Lecture 14 Structured block Parallel construct clauses Working-Sharing contructs for, single, section for construct with different scheduling strategies 1 2 Tasking Work Tasking New feature in OpenMP

More information

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Introduction to OpenMP. Lecture 2: OpenMP fundamentals Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview 2 Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs What is OpenMP? 3 OpenMP is an API designed for programming

More information

OpenMP Lab on Nested Parallelism and Tasks

OpenMP Lab on Nested Parallelism and Tasks OpenMP Lab on Nested Parallelism and Tasks Nested Parallelism 2 Nested Parallelism Some OpenMP implementations support nested parallelism A thread within a team of threads may fork spawning a child team

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs 2 1 What is OpenMP? OpenMP is an API designed for programming

More information

Concurrent Programming with OpenMP

Concurrent Programming with OpenMP Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed

More information

COMP528: Multi-core and Multi-Processor Computing

COMP528: Multi-core and Multi-Processor Computing COMP528: Multi-core and Multi-Processor Computing Dr Michael K Bane, G14, Computer Science, University of Liverpool m.k.bane@liverpool.ac.uk https://cgi.csc.liv.ac.uk/~mkbane/comp528 2X So far Why and

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - III Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed

More information

Wide operands. CP1: hardware can multiply 64-bit floating-point numbers RAM MUL. core

Wide operands. CP1: hardware can multiply 64-bit floating-point numbers RAM MUL. core RAM MUL core Wide operands RAM MUL core CP1: hardware can multiply 64-bit floating-point numbers Pipelining: can start the next independent operation before the previous result is available RAM MUL core

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Shared Memory Programming with OpenMP

Shared Memory Programming with OpenMP Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Programming Shared-memory Platforms with OpenMP. Xu Liu

Programming Shared-memory Platforms with OpenMP. Xu Liu Programming Shared-memory Platforms with OpenMP Xu Liu Introduction to OpenMP OpenMP directives concurrency directives parallel regions loops, sections, tasks Topics for Today synchronization directives

More information

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu

OpenMP 4.0/4.5: New Features and Protocols. Jemmy Hu OpenMP 4.0/4.5: New Features and Protocols Jemmy Hu SHARCNET HPC Consultant University of Waterloo May 10, 2017 General Interest Seminar Outline OpenMP overview Task constructs in OpenMP SIMP constructs

More information

Make the Most of OpenMP Tasking. Sergi Mateo Bellido Compiler engineer

Make the Most of OpenMP Tasking. Sergi Mateo Bellido Compiler engineer Make the Most of OpenMP Tasking Sergi Mateo Bellido Compiler engineer 14/11/2017 Outline Intro Data-sharing clauses Cutoff clauses Scheduling clauses 2 Intro: what s a task? A task is a piece of code &

More information

CS4961 Parallel Programming. Lecture 13: Task Parallelism in OpenMP 10/05/2010. Programming Assignment 2: Due 11:59 PM, Friday October 8

CS4961 Parallel Programming. Lecture 13: Task Parallelism in OpenMP 10/05/2010. Programming Assignment 2: Due 11:59 PM, Friday October 8 CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP 10/05/2010 Mary Hall October 5, 2010 CS4961 1 Programming Assignment 2: Due 11:59 PM, Friday October 8 Combining Locality, Thread and

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

Shared Memory Parallelism - OpenMP

Shared Memory Parallelism - OpenMP Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial

More information

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group Parallelising Scientific Codes Using OpenMP Wadud Miah Research Computing Group Software Performance Lifecycle Scientific Programming Early scientific codes were mainly sequential and were executed on

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

Introduction to OpenMP. Lecture 4: Work sharing directives

Introduction to OpenMP. Lecture 4: Work sharing directives Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

Cray XE6 Performance Workshop

Cray XE6 Performance Workshop Cray XE6 Performance Workshop Multicore Programming Overview Shared memory systems Basic Concepts in OpenMP Brief history of OpenMP Compiling and running OpenMP programs 2 1 Shared memory systems OpenMP

More information

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017

Introduction to OpenMP. Tasks. N.M. Maclaren September 2017 2 OpenMP Tasks 2.1 Introduction Introduction to OpenMP Tasks N.M. Maclaren nmm1@cam.ac.uk September 2017 These were introduced by OpenMP 3.0 and use a slightly different parallelism model from the previous

More information

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.

CS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009. Parallel Programming Lecture 9: Task Parallelism in OpenMP Administrative Programming assignment 1 is posted (after class) Due, Tuesday, September 22 before class - Use the handin program on the CADE machines

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Multithreading in C with OpenMP

Multithreading in C with OpenMP Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Introduction to OpenMP

Introduction to OpenMP 1.1 Minimal SPMD Introduction to OpenMP Simple SPMD etc. N.M. Maclaren Computing Service nmm1@cam.ac.uk ext. 34761 August 2011 SPMD proper is a superset of SIMD, and we are now going to cover some of the

More information

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory

More information

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/

More information

[Potentially] Your first parallel application

[Potentially] Your first parallel application [Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel

More information

!OMP #pragma opm _OPENMP

!OMP #pragma opm _OPENMP Advanced OpenMP Lecture 12: Tips, tricks and gotchas Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! The macro _OPENMP is defined if code is

More information

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for

More information

Computational Mathematics

Computational Mathematics Computational Mathematics Hamid Sarbazi-Azad Department of Computer Engineering Sharif University of Technology e-mail: azad@sharif.edu OpenMP Work-sharing Instructor PanteA Zardoshti Department of Computer

More information

CS 5220: Shared memory programming. David Bindel

CS 5220: Shared memory programming. David Bindel CS 5220: Shared memory programming David Bindel 2017-09-26 1 Message passing pain Common message passing pattern Logical global structure Local representation per processor Local data may have redundancy

More information

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions. 1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)

More information

CSE 160 Lecture 8. NUMA OpenMP. Scott B. Baden

CSE 160 Lecture 8. NUMA OpenMP. Scott B. Baden CSE 160 Lecture 8 NUMA OpenMP Scott B. Baden OpenMP Today s lecture NUMA Architectures 2013 Scott B. Baden / CSE 160 / Fall 2013 2 OpenMP A higher level interface for threads programming Parallelization

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

Progress on OpenMP Specifications

Progress on OpenMP Specifications Progress on OpenMP Specifications Wednesday, November 13, 2012 Bronis R. de Supinski Chair, OpenMP Language Committee This work has been authored by Lawrence Livermore National Security, LLC under contract

More information

Alfio Lazzaro: Introduction to OpenMP

Alfio Lazzaro: Introduction to OpenMP First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,

More information

Introduction to Standard OpenMP 3.1

Introduction to Standard OpenMP 3.1 Introduction to Standard OpenMP 3.1 Massimiliano Culpo - m.culpo@cineca.it Gian Franco Marras - g.marras@cineca.it CINECA - SuperComputing Applications and Innovation Department 1 / 59 Outline 1 Introduction

More information

Introduction to OpenMP

Introduction to OpenMP Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP More Syntax and SIMD Nick Maclaren nmm1@cam.ac.uk September 2017 Introduction to OpenMP p. 2/?? C/C++ Parallel for (1) I said that I would give the

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for

More information

Point-to-Point Synchronisation on Shared Memory Architectures

Point-to-Point Synchronisation on Shared Memory Architectures Point-to-Point Synchronisation on Shared Memory Architectures J. Mark Bull and Carwyn Ball EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email:

More information

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading

More information

Towards OpenMP for Java

Towards OpenMP for Java Towards OpenMP for Java Mark Bull and Martin Westhead EPCC, University of Edinburgh, UK Mark Kambites Dept. of Mathematics, University of York, UK Jan Obdrzalek Masaryk University, Brno, Czech Rebublic

More information

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC. Vectorisation James Briggs 1 COSMOS DiRAC April 28, 2015 Session Plan 1 Overview 2 Implicit Vectorisation 3 Explicit Vectorisation 4 Data Alignment 5 Summary Section 1 Overview What is SIMD? Scalar Processing:

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP Synchronisation Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 June 2011 Introduction to OpenMP p. 2/?? Summary Facilities here are relevant

More information

Tasking in OpenMP. Paolo Burgio.

Tasking in OpenMP. Paolo Burgio. asking in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Shared memory parallel computing

Shared memory parallel computing Shared memory parallel computing OpenMP Sean Stijven Przemyslaw Klosiewicz Shared-mem. programming API for SMP machines Introduced in 1997 by the OpenMP Architecture Review Board! More high-level than

More information

EPL372 Lab Exercise 5: Introduction to OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

Advanced OpenMP Features

Advanced OpenMP Features Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group {terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Vectorization 2 Vectorization SIMD =

More information

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Parallel Computing on Multi-Core Systems

Parallel Computing on Multi-Core Systems Parallel Computing on Multi-Core Systems Instructor: Arash Tavakkol Department of Computer Engineering Sharif University of Technology Spring 2016 Optimization Techniques in OpenMP programs Some slides

More information

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2014 COMP3320/6464/HONS High Performance Scientific Computing Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture

More information

HPCSE - II. «OpenMP Programming Model - Tasks» Panos Hadjidoukas

HPCSE - II. «OpenMP Programming Model - Tasks» Panos Hadjidoukas HPCSE - II «OpenMP Programming Model - Tasks» Panos Hadjidoukas 1 Recap of OpenMP nested loop parallelism functional parallelism OpenMP tasking model how to use how it works examples Outline Nested Loop

More information

Session 4: Parallel Programming with OpenMP

Session 4: Parallel Programming with OpenMP Session 4: Parallel Programming with OpenMP Xavier Martorell Barcelona Supercomputing Center Agenda Agenda 10:00-11:00 OpenMP fundamentals, parallel regions 11:00-11:30 Worksharing constructs 11:30-12:00

More information

Shared Memory Parallelism using OpenMP

Shared Memory Parallelism using OpenMP Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o

More information

More Advanced OpenMP. Saturday, January 30, 16

More Advanced OpenMP. Saturday, January 30, 16 More Advanced OpenMP This is an abbreviated form of Tim Mattson s and Larry Meadow s (both at Intel) SC 08 tutorial located at http:// openmp.org/mp-documents/omp-hands-on-sc08.pdf All errors are my responsibility

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP More Syntax and SIMD Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 June 2011 Introduction to OpenMP p. 2/?? C/C++ Parallel for (1) I said

More information

Compiling for GPUs. Adarsh Yoga Madhav Ramesh

Compiling for GPUs. Adarsh Yoga Madhav Ramesh Compiling for GPUs Adarsh Yoga Madhav Ramesh Agenda Introduction to GPUs Compute Unified Device Architecture (CUDA) Control Structure Optimization Technique for GPGPU Compiler Framework for Automatic Translation

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

15-418, Spring 2008 OpenMP: A Short Introduction

15-418, Spring 2008 OpenMP: A Short Introduction 15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.

More information

Parallel algorithm templates. Threads, tasks and parallel patterns Programming with. From parallel algorithms templates to tasks

Parallel algorithm templates. Threads, tasks and parallel patterns Programming with. From parallel algorithms templates to tasks COMP528 Task-based programming in OpenMP www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of Computer Science University of Liverpool a.lisitsa@.liverpool.ac.uk Parallel algorithm templates We have

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP Simple SPMD etc. Nick Maclaren nmm1@cam.ac.uk September 2017 Introduction to OpenMP p. 2/?? Terminology I am badly abusing the term SPMD tough The

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science

More information

OpenMP: Open Multiprocessing

OpenMP: Open Multiprocessing OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP Tasks Nick Maclaren nmm1@cam.ac.uk September 2017 Introduction to OpenMP p. 2/?? OpenMP Tasks In OpenMP 3.0 with a slightly different model A form

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

OpenMP Examples - Tasking

OpenMP Examples - Tasking Dipartimento di Ingegneria Industriale e dell Informazione University of Pavia December 4, 2017 Outline 1 2 Assignment 2: Quicksort Assignment 3: Jacobi Outline 1 2 Assignment 2: Quicksort Assignment 3:

More information

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC: Ways of actually get stuff done in HPC: Practical stuff! Ø Message Passing (send, receive, broadcast,...) Ø Shared memory (load, store, lock, unlock) ü MPI Ø Transparent (compiler works magic) Ø Directive-based

More information

Shared Memory Programming Models I

Shared Memory Programming Models I Shared Memory Programming Models I Peter Bastian / Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-69120 Heidelberg phone: 06221/54-8264

More information

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013 OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface

More information

OpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16

OpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16 Lecture 8 OpenMP Today s lecture 7 OpenMP A higher level interface for threads programming http://www.openmp.org Parallelization via source code annotations All major compilers support it, including gnu

More information