Parallel Computing on Multi-Core Systems
|
|
- Prosper Houston
- 5 years ago
- Views:
Transcription
1 Parallel Computing on Multi-Core Systems Instructor: Arash Tavakkol Department of Computer Engineering Sharif University of Technology Spring 2016
2 Optimization Techniques in OpenMP programs Some slides come from Dr. Cristina and professor Daniel
3 Considerations in Using OpenMP Optimize Barrier Use they are expensive operations. Reduce using it #pragma omp parallel {... #pragma omp for for (i=0; i<n; i++)... #pragma omp for nowait for (i=0; i<n; i++) } /*-- End of parallel region - barrier is implied --*/ 3
4 Optimize Barrier Use #pragma omp parallel default(none) \ shared(n,a,b,c,d,sum) private(i) { #pragma omp for nowait for (i=0; i<n; i++) a[i] += b[i]; #pragma omp for nowait for (i=0; i<n; i++) c[i] += d[i]; #pragma omp barrier #pragma omp for nowait reduction(+:sum) for (i=0; i<n; i++) sum += a[i] + c[i]; } /*-- End of parallel region --*/ 4
5 Avoid Large Critical Regions #pragma omp parallel shared(a,b) private(c,d) {... #pragma omp critical { a += 2 * c; c = b * d; } } /*-- End of parallel region --*/ 5
6 Considerations in Using OpenMP(Cont d) Maximize Parallel Regions Overheads are associated with starting and terminating a parallel region Large parallel regions offer more opportunities for using data in cache and provide a bigger context for other compiler optimizations Minimize the number of parallel regions 6
7 Maximize Parallel Regions #pragma omp parallel for for (...) { /*-- Work-sharing loop 1 --*/ } #pragma omp parallel for for (...) { /*-- Work-sharing loop 2 --*/ }... #pragma omp parallel for for (...) { /*-- Work-sharing loop N --*/ } #pragma omp parallel { #pragma omp for /*-- Work-sharing loop 1 --*/ {... } #pragma omp for /*-- Work-sharing loop 2 --*/ {... }... #pragma omp for /*-- Work-sharing loop N --*/ {... } } 7
8 Avoid Parallel Regions in Inner Loops for (i=0; i<n; i++) for (j=0; j<n; j++) #pragma omp parallel for for (k=0; k<n; k++) {...} #pragma omp parallel for (i=0; i<n; i++) for (j=0; j<n; j++) #pragma omp for for (k=0; k<n; k++) {...} 8
9 Address Poor Load Balance Threads might have different amounts of work to do The threads wait at the next synchronization point until the slowest one completes Use Schedule Clause for (i=0; i<n; i++) { ReadFromFile(i,...); for (j=0; j<processingnum; j++ ) ProcessData(); /* lots of work here */ WriteResultsToFile(i); } 9
10 Address Poor Load Balance #pragma omp parallel { /* preload data to be used in first iteration of the i-loop */ #pragma omp single {ReadFromFile(0,...);} for (i=0; i<n; i++) { /* preload data for next iteration of the i-loop */ #pragma omp single nowait {ReadFromFile(i+1...);} #pragma omp for schedule(dynamic) for (j=0; j<processingnum; j++) ProcessChunkOfData(); /* here is the work */ /* there is a barrier at the end of this loop */ #pragma omp single nowait {WriteResultsToFile(i);} } /* threads immediately move on to next iteration of i-loop */ } /* one parallel region encloses all the work */ 10
11 False Sharing In systems with distributed coherent caches When multiple threads access same cache line and at least one of them writes to it, it causes costly invalidation misses and upgrades. When the threads actually communicate by accessing the same data, this is a necessary overhead. 11
12 False Sharing False Sharing The threads do not actually communicate They may be accessing unrelated data that just happen to be allocated in the same cache line Costly invalidation misses and upgrades are completely unnecessary Solution Splitting the data accessed by the different threads to different cache lines 12
13 False Sharing Since sum1 and sum2 are defined next to each other, the compiler is likely to allocate them next to each other in memory, in the same cache line. int sum1; int sum2; void thread1(int v[], int v_count) { sum1 = 0; for (int i = 0; i < v_count; i++) sum1 += v[i]; } void thread2(int v[], int v_count) { sum2 = 0; for (int i = 0; i < v_count; i++) sum2 += v[i]; } 13
14 False Sharing thread1 reads sum1 into its cache. Since the line is not present in any other cache. thread1 gets it in exclusive state: 14
15 False Sharing thread2 now reads sum2. Since thread1 already had the cache line in exclusive state, this causes a downgrade of the line in thread1's cache and the line is now in shared state in both caches: 15
16 False Sharing thread1 now writes its updated sum to sum1. Since it only has the line in shared state, it must upgrade the line and invalidate the line in thread2's cache: 16
17 False Sharing thread2 now writes its updated sum to sum2. Since thread1 has invalidate the cache line in it's cache it gets a coherence miss, and must invalidate the line in thread1's cache forcing thread1 to do a coherence write-back: 17
18 False Sharing The next iteration of the loops now starts, and thread1 again reads sum1. Since thread2 just invalidated the cache line in thread1's cache, it gets a coherence miss. It must also downgrade the line in thread2's cache, forcing thread2 to do a coherence write-back: 18
19 False Sharing thread2 finally reads sum2. Since it has the cache line in shared state, it can read it without and coherence activity, and we are back in the same situation as after step 2: 19
20 False Sharing For each iteration or the loops, steps 3 to 6 will repeat, each time with costly upgrades, coherence misses and coherence write-backs. To fix a false sharing problem you need to make sure that the data accessed by the different threads is allocated to different cache lines. 20
21 Avoid False Sharing #pragma omp parallel for shared(nthreads,a) schedule(static,1) for (int i=0; i<nthreads; i++) a[i] += i; Nthreads = 8 a[1] through a[7] are in the same cache line array padding can be used to eliminate the problem by dimensioning the array as a[n][8] changing the indexing from a[i] to a[i][0] eliminates the 21
22 Optimizing FFT function void fft_c(int n, COMPLEX *x, COMPLEX *w){ COMPLEX u,temp,tm; inti,j,le,windex; windex = 1; for(le=n/2 ; le > 0 ; le/=2) { wptr = w; for (j = 0 ; j< le ; j++) { u = *wptr; for (i= j; i<n; i= i+ 2*le) { xi = x + i; xip = xi + le; temp.real = xi->real + xip->real; temp.imag = xi->imag + xip->imag; tm.real = xi->real - xip->real; tm.imag = xi->imag - xip->imag; xip->real = tm.real*u.real - tm.imag*u.imag; xip->imag = tm.real*u.imag + tm.imag*u.real; *xi = temp; } wptr = wptr + windex;} windex = 2*windex;}} Slides come from Professor Daniel Etiemble 22
23 Optimizing FFT function void fft_c(int n, COMPLEX *x, COMPLEX *w) { COMPLEX u,temp,tm; inti,j,le,windex; windex = 1; for(le=n/2 ; le > 0 ; le/=2) { for (j = 0 ; j< le ; j++) { k=0; for (i = j ; i<n ; i = i + 2*le) { tm.real = x[i].real - x[i+le].real; tm.imag = x[i].imag - x[i+le].imag; x[i+le].real = tm.real*w[k].real - tm.imag* w[k].imag; x[i+le].imag;= tm.real*w[k].imag + tm.imag* w[k].real; x[i].real = x[i].real + x[i+le].real; x[i].imag = x[i].imag + x[i+le].imag; } k += windex; } windex = 2*windex } 23
24 Optimizing FFT function void fft_c(int n, float x_r[], float x_i[], float w_r[], float w_i[]) { register float tm_r, tm_i; inti,j,le,windex; windex = 1; for(le=n/2 ; le > 0 ; le/=2) { for (i = 0 ; i<n ; i = i + 2*le) { for (j = 0 ; j< le ; j++){ tm_r = x_r[i+j] x_r[i+j+le]; tm_i = x_i[i+j] x_i[i+j+le]; x_r[i+j] = x_r[i+j] + x_r[i+j+le]; x_i[i+j] = x_i[i+j] + x_i[i+j+le]; x_r[i+j+le] = tm_r*w_r[j*windex] tm_i* w_i[j*windex]; x_i[i+j+le] = tm_r*w_i[j*windex] + tm_i* w_r[j*windex]; } } windex = 2*windex; }} 24
25 Optimizing FFT function void fft_c(int n, float x_r[], float x_i[], float w_r[], float w_i[]) { register float t_r,t_i; inti,j,le,windex, k; windex = 1;k=0; for(le=n/2 ; le > 0 ; le/=2,k++) { for (i = 0 ; i<n ; i = i + 2*le) { for (j = 0 ; j< le ; j++) { t_r=x_r[i+j]-x_r[i+j+le]; t_i=x_i[i+j]-x_i[i+j+le]; x_r[i+j]=x_r[i+j]+x_r[i+j+le]; x_i[i+j]=x_i[i+j]+x_i[i+j+le]; x_r[i+j+le]=t_r*w_r[(n/2)*k+j]-t_i*w_i[(n/2)*k+j]; x_i[i+j+le]=t_r*w_i[(n/2)*k+j]+t_i*w_r[(n/2)*k+j];}} windex = 2*windex }} 25
26 QUESTIONS? 26
OpenMP programming Part II. Shaohao Chen High performance Louisiana State University
OpenMP programming Part II Shaohao Chen High performance computing @ Louisiana State University Part II Optimization for performance Trouble shooting and debug Common Misunderstandings and Frequent Errors
More informationComputational Mathematics
Computational Mathematics Hamid Sarbazi-Azad Department of Computer Engineering Sharif University of Technology e-mail: azad@sharif.edu OpenMP Work-sharing Instructor PanteA Zardoshti Department of Computer
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 9: Performance tuning Sources of overhead There are 6 main causes of poor performance in shared memory parallel programs: sequential code communication load imbalance synchronisation
More informationOpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen
OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:
More informationECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications
ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Variable Sharing in OpenMP OpenMP synchronization issues OpenMP performance issues November 4, 2015 Lecture 22 Dan Negrut, 2015
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationReview. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause
Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationOpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 Worksharing constructs To date: #pragma omp parallel created a team of threads We distributed
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationSynchronization. Event Synchronization
Synchronization Synchronization: mechanisms by which a parallel program can coordinate the execution of multiple threads Implicit synchronizations Explicit synchronizations Main use of explicit synchronization
More informationIntroduction to OpenMP.
Introduction to OpenMP www.openmp.org Motivation Parallelize the following code using threads: for (i=0; i
More informationCSE 160 Lecture 8. NUMA OpenMP. Scott B. Baden
CSE 160 Lecture 8 NUMA OpenMP Scott B. Baden OpenMP Today s lecture NUMA Architectures 2013 Scott B. Baden / CSE 160 / Fall 2013 2 OpenMP A higher level interface for threads programming Parallelization
More informationLoop Modifications to Enhance Data-Parallel Performance
Loop Modifications to Enhance Data-Parallel Performance Abstract In data-parallel applications, the same independent
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationPerformance Tuning and OpenMP
Performance Tuning and OpenMP mueller@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Outline Motivation Performance Basics General Performance Issues and
More informationCS 470 Spring Mike Lam, Professor. Advanced OpenMP
CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement
More informationOpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - III Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationParallel Programming using OpenMP
1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard
More informationParallel Programming using OpenMP
1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading
More informationOpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16
Lecture 8 OpenMP Today s lecture 7 OpenMP A higher level interface for threads programming http://www.openmp.org Parallelization via source code annotations All major compilers support it, including gnu
More informationAnnouncements. Scott B. Baden / CSE 160 / Wi '16 2
Lecture 8 Announcements Scott B. Baden / CSE 160 / Wi '16 2 Recapping from last time: Minimal barrier synchronization in odd/even sort Global bool AllDone; for (s = 0; s < MaxIter; s++) { barr.sync();
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationIntroduction to OpenMP
Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science
More informationExercise: OpenMP Programming
Exercise: OpenMP Programming Multicore programming with OpenMP 19.04.2016 A. Marongiu - amarongiu@iis.ee.ethz.ch D. Palossi dpalossi@iis.ee.ethz.ch ETH zürich Odroid Board Board Specs Exynos5 Octa Cortex
More information!OMP #pragma opm _OPENMP
Advanced OpenMP Lecture 12: Tips, tricks and gotchas Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! The macro _OPENMP is defined if code is
More informationProgramming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2015 OpenMP Provides thread programming model at a
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationParallel programming using OpenMP
Parallel programming using OpenMP Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department
More informationCOMP Parallel Computing. SMM (2) OpenMP Programming Model
COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel
More informationCOMP528: Multi-core and Multi-Processor Computing
COMP528: Multi-core and Multi-Processor Computing Dr Michael K Bane, G14, Computer Science, University of Liverpool m.k.bane@liverpool.ac.uk https://cgi.csc.liv.ac.uk/~mkbane/comp528 17 Background Reading
More informationOpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono
OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/
More informationParallel processing with OpenMP. #pragma omp
Parallel processing with OpenMP #pragma omp 1 Bit-level parallelism long words Instruction-level parallelism automatic SIMD: vector instructions vector types Multiple threads OpenMP GPU CUDA GPU + CPU
More informationShared Memory Parallelism using OpenMP
Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत SE 292: High Performance Computing [3:0][Aug:2014] Shared Memory Parallelism using OpenMP Yogesh Simmhan Adapted from: o
More informationData Environment: Default storage attributes
COSC 6374 Parallel Computation Introduction to OpenMP(II) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Data Environment: Default storage attributes
More informationCS4961 Parallel Programming. Lecture 13: Task Parallelism in OpenMP 10/05/2010. Programming Assignment 2: Due 11:59 PM, Friday October 8
CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP 10/05/2010 Mary Hall October 5, 2010 CS4961 1 Programming Assignment 2: Due 11:59 PM, Friday October 8 Combining Locality, Thread and
More informationMulti-core Architecture and Programming
Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Programming with OpenMP Content What is PpenMP Parallel
More informationCS4230 Parallel Programming. Lecture 12: More Task Parallelism 10/5/12
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, 2012 1! Homework 3: Due Before Class, Thurs. Oct. 18 handin cs4230 hw3 Problem 1 (Amdahl s Law): (i) Assuming a
More informationIntroduction to HPC and Optimization Tutorial VI
Felix Eckhofer Institut für numerische Mathematik und Optimierung Introduction to HPC and Optimization Tutorial VI January 8, 2013 TU Bergakademie Freiberg Going parallel HPC cluster in Freiberg 144 nodes,
More informationProgramming Shared Memory Systems with OpenMP Part I. Book
Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine
More informationOpenMP dynamic loops. Paolo Burgio.
OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections
More informationProgramming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading
More informationShared Memory Parallelism - OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial
More informationby system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call
OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by
More informationOPENMP TIPS, TRICKS AND GOTCHAS
OPENMP TIPS, TRICKS AND GOTCHAS OpenMPCon 2015 2 Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! Extra nasty if it is e.g. #pragma opm atomic
More information/Users/engelen/Sites/HPC folder/hpc/openmpexamples.c
/* Subset of these examples adapted from: 1. http://www.llnl.gov/computing/tutorials/openmp/exercise.html 2. NAS benchmarks */ #include #include #ifdef _OPENMP #include #endif
More informationLecture 14. Performance Programming with OpenMP
Lecture 14 Performance Programming with OpenMP Sluggish front end nodes? Advice from NCSA Announcements Login nodes are honest[1-4].ncsa.uiuc.edu Login to specific node to target one that's not overloaded
More informationConcurrent Programming with OpenMP
Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed
More informationUsing OpenMP. Shaohao Chen High performance Louisiana State University
Using OpenMP Shaohao Chen High performance computing @ Louisiana State University Outline Introduction to OpenMP OpenMP Language Features Parallel constructs Work-sharing constructs Synchronization constructs
More informationPerformance Tuning and OpenMP
Performance Tuning and OpenMP mueller@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Höchstleistungsrechenzentrum Stuttgart Outline Motivation Performance
More informationDistributed Systems + Middleware Concurrent Programming with OpenMP
Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola
More information5.12 EXERCISES Exercises 263
5.12 Exercises 263 5.12 EXERCISES 5.1. If it s defined, the OPENMP macro is a decimal int. Write a program that prints its value. What is the significance of the value? 5.2. Download omp trap 1.c from
More informationNUMERICAL PARALLEL COMPUTING
Lecture 4: More on OpenMP http://people.inf.ethz.ch/iyves/pnc11/ Peter Arbenz, Andreas Adelmann Computer Science Dept, ETH Zürich, E-mail: arbenz@inf.ethz.ch Paul Scherrer Institut, Villigen E-mail: andreas.adelmann@psi.ch
More informationME759 High Performance Computing for Engineering Applications
ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs
More informationUvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP
Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel
More informationCS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.
Parallel Programming Lecture 9: Task Parallelism in OpenMP Administrative Programming assignment 1 is posted (after class) Due, Tuesday, September 22 before class - Use the handin program on the CADE machines
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2017 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationLecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications
Lecture 7 OpenMP: Reduction, Synchronization, Scheduling & Applications Announcements Section and Lecture will be switched on Thursday and Friday Thursday: section and Q2 Friday: Lecture 2010 Scott B.
More informationDPHPC: Introduction to OpenMP Recitation session
SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set of compiler directives
More informationParallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides
Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for
More informationOpenMP Code Offloading: Splitting GPU Kernels, Pipelining Communication and Computation, and selecting Better Grid Geometries
OpenMP Code Offloading: Splitting GPU Kernels, Pipelining Communication and Computation, and selecting Better Grid Geometries Artem Chikin, Tyler Gobran, José Nelson Amaral Tyler Gobran University of Alberta
More informationPerformance Issues in Parallelization Saman Amarasinghe Fall 2009
Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries
More informationOPENMP OPEN MULTI-PROCESSING
OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Review
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 2 OpenMP Shared address space programming High-level
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2018 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationOpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.
OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,
More informationParallel and Distributed Computing
Concurrent Programming with OpenMP Rodrigo Miragaia Rodrigues MSc in Information Systems and Computer Engineering DEA in Computational Engineering CS Department (DEI) Instituto Superior Técnico October
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Hello OpenMP #pragma omp parallel { // I am now thread iof n switch(omp_get_thread_num()) { case 0 : blah1.. case 1: blah2.. // Back to normal Parallel Construct Extremely
More informationAllows program to be incrementally parallelized
Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP
More informationOpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa
OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed
More informationParallel Computing. Prof. Marco Bertini
Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to
More informationOPENMP TIPS, TRICKS AND GOTCHAS
OPENMP TIPS, TRICKS AND GOTCHAS Mark Bull EPCC, University of Edinburgh (and OpenMP ARB) markb@epcc.ed.ac.uk OpenMPCon 2015 OpenMPCon 2015 2 A bit of background I ve been teaching OpenMP for over 15 years
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationTopics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP
Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and
More informationOpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013
OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with
More informationShared Memory Programming Model
Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache
More informationOverview of OpenMP. Unit 19. Using OpenMP. Parallel for. OpenMP Library for Parallelism
19.1 Overview of OpenMP 19.2 Unit 19 OpenMP Library for Parallelism A library or API (Application Programming Interface) for parallelism Requires compiler support (make sure the compiler you use supports
More informationStatic Data Race Detection for SPMD Programs via an Extended Polyhedral Representation
via an Extended Polyhedral Representation Habanero Extreme Scale Software Research Group Department of Computer Science Rice University 6th International Workshop on Polyhedral Compilation Techniques (IMPACT
More information19.1. Unit 19. OpenMP Library for Parallelism
19.1 Unit 19 OpenMP Library for Parallelism 19.2 Overview of OpenMP A library or API (Application Programming Interface) for parallelism Requires compiler support (make sure the compiler you use supports
More informationPerformance Issues in Parallelization. Saman Amarasinghe Fall 2010
Performance Issues in Parallelization Saman Amarasinghe Fall 2010 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries
More informationTasking in OpenMP 4. Mirko Cestari - Marco Rorro -
Tasking in OpenMP 4 Mirko Cestari - m.cestari@cineca.it Marco Rorro - m.rorro@cineca.it Outline Introduction to OpenMP General characteristics of Taks Some examples Live Demo Multi-threaded process Each
More informationOpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato
OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture
More informationCOSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)
COSC 6374 Parallel Computation Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel) Edgar Gabriel Fall 2014 Introduction Threads vs. processes Recap of
More informationCSE-160 (Winter 2017, Kesden) Practice Midterm Exam. volatile int count = 0; // volatile just keeps count in mem vs register
Full Name: @ucsd.edu PID: CSE-160 (Winter 2017, Kesden) Practice Midterm Exam 1. Threads, Concurrency Consider the code below: volatile int count = 0; // volatile just keeps count in mem vs register void
More informationIntroduction to OpenMP
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory
More informationShared Memory Programming Paradigm!
Shared Memory Programming Paradigm! Ivan Girotto igirotto@ictp.it Information & Communication Technology Section (ICTS) International Centre for Theoretical Physics (ICTP) 1 Multi-CPUs & Multi-cores NUMA
More informationIntroduction to OpenMP
Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and
More information