PARALLEL PROGRAMMING MULTI- CORE COMPUTERS

Size: px
Start display at page:

Download "PARALLEL PROGRAMMING MULTI- CORE COMPUTERS"

Transcription

1 2016 HAWAII UNIVERSITY INTERNATIONAL CONFERENCES SCIENCE, TECHNOLOGY, ENGINEERING, ART, MATH & EDUCATION JUNE 10-12, 2016 HAWAII PRINCE HOTEL WAIKIKI, HONOLULU PARALLEL PROGRAMMING MULTI- CORE COMPUTERS ALAGHBAND, GITA UNIVERSITY OF COLORADO DENVER COMPUTER SCIENCE & ENGINEERING DEPARTMENT FARDI, HAMID UNIVERSITY OF COLORADO DENVER ELECTRICAL ENGINEERING DEPARTMENT

2 Prof. Gita Alaghband Computer Science and Engineering Department University of Colorado Denver. Prof. Hamid Fardi Electrical Engineering Department University of Colorado Denver. Parallel Programming Multi-core Computers Synopsis: With advances in hardware technology, we as educators find ourselves with multi-core computers as servers, desktops, and personal computers in our laboratories while teaching students how to design for sequential environments. We propose to develop pedagogy for teaching undergraduate students how to develop software and design algorithms for multicore architectures in a laboratory setting. The key success and challenge of this project will be helping students to think in parallel again.

3 Parallel Programming Multi-core Computers Gita Alaghband 1, Lan Vu 1, H. Z. Fardi 2 1 University Of Colorado Denver Department of Computer Science & Engineering Campus Box 109, P.O. Box , Denver, CO University Of Colorado Denver Department of Electrical Engineering Campus Box 110, P.O. Box , Denver, CO Abstract With advances in hardware technology, we as educators find ourselves with multi-core computers as servers, desktops, and personal computers in our laboratories while teaching students how to design for sequential environments. We propose to develop pedagogy for teaching undergraduate students how to develop scientific computing programs for multi-core architectures in a laboratory setting. The key success and challenge of this project will be helping students to think in parallel. Introduction and Motivation It is time to develop methodologies for infusion of parallel computing and programming at the undergraduate computer science and engineering curriculum level. The two decades of the eights and nineties were the height of supercomputing for machine design, parallel compilers, languages and algorithm design involving select group of highly specialized scientists with interest in high performance computing and number crunching applications. The surprising thing today is not that parallel computers and computation are attracting increased attention, but that so much of computing has revolved around a sequential, one thing at a time, style of software design. Today, with all the advances in hardware technology, we as educators find ourselves with multi-core computers as servers, desktops, personal computers, and even handheld devices in our laboratories while still teaching students how to design system software, algorithms and programming languages for sequential environments. We propose to develop methodologies for teaching undergraduate students how to develop software and design algorithms for multi-core architectures for a scientific computing course in a laboratory setting. The key success and challenge of this project will be helping students to think in parallel [1]. We will develop material for junior-level computer science and engineering student who has already had some programming language courses. The major intellectual challenge of this project is in developing strategies and material that will help students overcome the conditioning of thinking sequentially with which they are familiar. To accomplish this, concepts of global parallelism will be integral in our development.

4 Parallel Design Concepts through an Example We introduce the idea of global parallelism through two versions of parallel Gaussian Elimination implementation implemented in OpenMP C++. The choice of Gaussian elimination is due to its vast applicability to most areas of science. It is used to solve systems of linear equations that exist in engineering, chemistry, physics, economics and others with applications such as circuit simulation, fluid dynamics simulation, weather modeling, molecule dynamics, etc. The choice for OpenMP is due to being the most common parallel language platform used within current shared memory multicore computer architectures. OpenMP is not a full language by itself; it is designed as an extension to a sequential programming language to support shared memory MIMD programming. OpenMP extenstions exist for C, C++, and Fortran [2]. OpenMP supports a form of fork/join semantics. Execution starts with a single sequential process that forks a fixed number of threads upon encountering a parallel region. The team of threads will then execute the body of the parallel region and are joined back into the original process (single stream). Every time a new parallel region is encountered a different number of threads may be forked, but the number remains fixed within each parallel region. Three user specified environment variables manage parallel execution: num_threads (integer): specifies the number of threads, dynamic (Boolean): specifies whether the number of threads can change from one parallel region to the next, and a second Boolean, nested (Boolean): specifies whether nested parallelism is allowed. The programs presented here use several OpenMP constructs that won t be described in order to focus on the parallel programming style. The language constructs are described in details in [2]. For the sake of simplicity and complete presentation, the code sections for the main program (Figure 1), matrix generation of a test matrix (Figure 2), and user input (Figure 3) are presented separate from the two different parallel implementations of the Gaussian elimination. This is done to ensure that the comparison is only on the impact of parallel style and not sequential method. Both programs are sequentially identical (and optimized). The main program calls the ComputeGaussianElimination function; two versions of this function are discussed in details as focus of this paper. The LU decomposition is performed by Gaussian elimination, leaving L and U factors stored in place of the original matrix, and prints the time for performing the elimination and, optionally, the L and U factors. The execution time is calculated from calls to omp_get_wtime(), assumed to return time in seconds for ComputeGaussianElimination function. Forward and back substitution for solving equation sets with specific right hand sides using the L and U factors are not included in this program. For ease of readability, the OpenMP constructs are highlighted in bold green and the related parallel comments are presented in red. Function calls are highlighted in in purple. The matrix is initialized in parallel in Figure 2. The created threads within the pragma omp parallel region work on different rows, I, in parallel generating the matrix elements; the inner loop over columns are done sequentially within each parallel thread.

5 // Main Program int main(int argc, char *argv[]) {int n, numthreads, isprintmatrix; float **a; double runtime; bool isok; if (GetUserInput(argc,argv,n,numThreads,isPrintMatrix)==false) return 1; //specify number of threads created in parallel region omp_set_num_threads(numthreads); InitializeMatrix(a,n); //Initialize the value of matrix A[n x n] if (isprintmatrix) {cout<< "The input matrix" << endl; PrintMatrix(a,n); } runtime = omp_get_wtime(); //Compute the Gaussian Elimination for matrix a[n x n] isok = ComputeGaussianElimination(a,n); runtime = omp_get_wtime() - runtime; if (isok == true) //The eliminated matrix is as below: {if (isprintmatrix) {cout<< "Output matrix:" << endl; PrintMatrix(a,n); } //print computing time cout<< "Gaussian Elimination runs in "<< setiosflags(ios::fixed) << setprecision(2) << runtime << " seconds\n";} else {cout<< "The matrix is singular" << endl;} DeleteMatrix(a,n); return 0;} Figure 1. Main program //Initialize the value of matrix a[n x n] void InitializeMatrix(float** &a,int n) {a = new float*[n]; a[0] = new float[n*n]; for (int i = 1; i < n; i++) a[i] = a[i-1] + n; #pragma omp parallel for schedule(static) for (int i = 0 ; i < n ; i++) {for (int j = 0 ; j < n ; j++) {if (i == j) a[i][j] = (((float)i+1)*((float)i+1))/(float)2; Else a[i][j] = (((float)i+1)+((float)j+1))/(float)2;}}} void DeleteMatrix(float **a,int n) //Delete matrix matrix a[n x n] {delete[] a[0]; delete[] a; } void PrintMatrix(float **a, int n) //Print matrix {for (int i = 0 ; i < n ; i++) {cout<< "Row " << (i+1) << ":\t" ; for (int j = 0 ; j < n ; j++) {printf("%.2f\t", a[i][j]);} cout<<endl ;}} Figure 2. Matrix is initialized by the program rather than input by the user

6 // Input User Data for Gaussian elimination program : C++ OpenMP // Row-wise Data layout & Row-wise Elimination #include <iostream> #include <iomanip> #include <cmath> #include <omp.h> #include <time.h> using namespace std; // Get user input of matrix dimension and printing option bool GetUserInput(int argc, char *argv[],int& n,int& numthreads,int& isprint) {bool isok = true; if(argc < 3) {cout << "Arguments:<X> <Y> [<Z>]" << endl; cout << "X : Matrix size [X x X]" << endl; cout << "Y : Number of threads" << endl; cout << "Z = 1: print the input/output matrix if X < 10" << endl; cout << "Z <> 1 or missing: do not print input/output matrix" << endl; isok = false;} else {//get matrix size n = atoi(argv[1]); if (n <=0) {cout << "Matrix size must be larger than 0" <<endl; isok = false;} numthreads = atoi(argv[2]); //get number of threads if (numthreads <= 0) {cout << "Number of threads must be larger than 0" <<endl; isok = false;} //print the input/output matrix if (argc >=4) isprint = (atoi(argv[3])==1 && n <=9)?1:0; else isprint = 0;} return isok;} Figure 2. User input values and desired printing options Two different implementations of ComputeGaussianElimination are presented (Figures 4 and 6) to illustrate the principle ideas between designing the parallel versions using global parallelism as opposed to applying parallelism when the code segments are observed to be candidates for parallel expression. Figure 4 shows a parallelized version of the Gaussian elimination code with a style that is familiar and often used by sequential programmers and software engineers new to parallel computation. Figure 5 is the corresponding high-level flowchart depicting the parallelization and parallel operations taking place in the implementation style of Figure 4. A single stream of control characterizes the overall program structure, with parallel operations applied to large data structures with independent elements.

7 //Compute the Gaussian Elimination for matrix a[n x n], Version 1 bool ComputeGaussianElimination(float **a,int n) {float pivot,gmax,pmax,temp; int pindmax,gindmax,i,j,k; omp_lock_t lock; omp_init_lock(&lock); for (k = 0 ; k < n-1 ; k++) //Perform rowwise elimination {gmax = 0.0; //Find the pivot row among rows k, k+1,...n //The following OMP parallel statement will create parallel processes. Each thread works on a number of rows to find the //local max value pmax #pragma omp parallel shared(a,gmax,gindmax) firstprivate(n,k) private(pivot,i,j,temp,pmax,pindmax) {pmax = 0.0; #pragma omp for schedule(static) for (i = k ; i < n ; i++) {temp = abs(a[i][k]); if (temp > pmax) {pmax = temp; pindmax = i;}} // Update private max value to global gmax one process at a time: #pragma omp critical {if (gmax <= pmax) {gmax = pmax; gindmax = pindmax; }}} At this point all processes join into one stream. if (gmax == 0) return false; //If matrix is singular set the flag & quit. if (gindmax!= k) //Swap rows if necessary //The following OMP parallel statement will create parallel processes. Swap pivot row {#pragma omp parallel for shared(a) firstprivate(n,k,gindmax) private(j,temp) schedule(static) for (j = k; j < n; j++) {temp = a[gindmax][j]; a[gindmax][j] = a[k][j]; a[k][j] = temp;}} // At this point all processes join into one stream. pivot = -1.0/a[k][k]; // Compute the pivot //The following OMP parallel statement will create parallel processes to do row reductions #pragma omp parallel for shared(a) firstprivate(pivot,n,k) private(i,j,temp) schedule(dynamic) for (i = k+1 ; i < n; i++) {temp = pivot*a[i][k]; for (j = k ; j < n ; j++) {a[i][j] = a[i][j] + temp*a[k][j];}}} // At this point all processes join into one stream. omp_destroy_lock (&lock); return true;} Figure 4. Version 1 of OpenMP C++ ComputeGaussianElimination

8 In contrast in the implementation of Figure 6, the entire team of processes work over the entire flow of control rather than at specific points in the computation [3]. In this version, parallel processes are created once at the entry to the #pragma omp parallel region at the beginning of the function and are joined at the end of the region which is the last statement of the function before return. If sequential operation is required by the program logic, it is limited to specific program points where it can coordinate the activities of parallel threads (for example in

9 #pragma omp single construct). Figure 7 shows the high-level parallel flow diagram of this version. //Compute the Gaussian Elimination for matrix a[n x n], Version 2 bool ComputeGaussianElimination(float **a,int n) {float pivot,gmax,pmax,temp; int pindmax,gindmax,i,j,k; bool isok = true; //Find the pivot row among rows k, k+1,...n // The following OMP parallel statement will create parallel processes. #pragma omp parallel shared(a,gmax,gindmax) firstprivate(n,k) private(pivot,i,j,temp,pmax,pindmax) {//Perform row-wise elimination. Each thread works on a number of rows to find the local max value pmax. Then update this local max value to the global variable gmax for (k = 0 ; k < n-1 ; k++) {#pragma omp single {gmax = 0.0; } pmax = 0.0; #pragma omp for schedule(static) for (i = k ; i < n ; i++) {temp = abs(a[i][k]); if (temp > pmax) {pmax = temp; pindmax = i; }} //gmax is updated one by one #pragma omp critical {if (gmax <= pmax) {gmax = pmax; gindmax = pindmax; }} #pragma omp barrier //All threads have to reach this point before continue if (gmax == 0.0) //If matrix is singular set the flag & quit {isok = false; break;} if (gindmax!= k) //Swap rows if necessary {#pragma omp for schedule(static) for (j = k; j < n; j++) {temp = a[gindmax][j]; a[gindmax][j] = a[k][j]; a[k][j] = temp;}} pivot = -1.0/a[k][k]; //Compute the pivot #pragma omp for schedule(dynamic) //Perform row reductions for (i = k+1 ; i < n; i++) {temp = pivot*a[i][k]; for (j = k ; j < n ; j++) {a[i][j] = a[i][j] + temp*a[k][j];}}}} // At this point all processes join into one stream. return isok;} Figure 6. Version 2 of OpenMP C++ ComputeGaussianElimination

10 The outer for-loop (K) over diagonal elements in Figure 6 is a sequential loop executed independently by every parallel thread. The barriers embedded within this loop ensure that threads will complete iterations of the loop in close synchrony (i.e., they reach the barrier point before they can continue). For each diagonal element, a parallel search for the element with maximum absolute value in the pivot column below the diagonal is done within #pragma omp for schedule(static). The small amount of work for each iteration makes static scheduling appropriate to avoid the synchronization overhead of dynamic scheduling. Similarly rows are swapped to bring the pivot into the diagonal position within another statically scheduled for loop. In the row reduction loop, each iteration involves a full row to the right of the diagonal, so overhead may be small enough to make the better load balance resulting from dynamic scheduling beneficial. This is not likely to help much here with processors of the same speed since the work per parallel iteration is the same for all threads.

11 Execution time in seconds Results and Conclusion The two implementations result in drastically different performance. Figure 8 shows the result of running the two versions of our Gaussian elimination programs on a matrix of dimension 6000 on a 64-core AMD-Bulldozer multicore architecture. Note that the difference in execution times are only due to the number of times processes are created and joined, otherwise the two programs (even parallel techniques employed) are identical (cge-omp1) (cge-omp2) Number of processes Figure 8. Performance comparison of the two implementations of Gaussian elimination This method of applying parallelism within the code as observed or perceived applicable, not only won t exploit all the parallelism possible in a given sequential code, it also prevents the design of a parallel solution technique from the start. Starting the design from a sequential perspective leads to designing good sequential programs which often is not a suitable base for parallel processing. In so many cases, the best parallel solution will perform poorly on a sequential machine. It is only when it is executed in parallel on a parallel computer with enough number of processors that we observe the best performance. Learning about the trade-offs between parallelism and memory usage, inherently sequential access data structures versus data structures that allow for parallel access and operations, and allowing more operations to be performed in a parallel version compared to the sequential version solving the same problem can be done most effectively when students observe these factors in a hands-on laboratory environment and exercises that re-enforce lectures. Reference [1] Gita Alaghband, Harry F. Jordan, Overview of the force scientific parallel language, Journal of Scientific Programming, Volume 3 Issue 1, Spring [2] OpenMP Applications Program Interface is available from [3] Harry F. Jordan, Gita Alaghband, Fundamentals of Parallel Processing, Prentice Hall, 2003

Lecture 4: OpenMP Open Multi-Processing

Lecture 4: OpenMP Open Multi-Processing CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP

More information

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/

More information

Parallel programming using OpenMP

Parallel programming using OpenMP Parallel programming using OpenMP Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department

More information

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization

More information

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018 OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

A brief introduction to OpenMP

A brief introduction to OpenMP A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Distributed Systems + Middleware Concurrent Programming with OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola

More information

19.1. Unit 19. OpenMP Library for Parallelism

19.1. Unit 19. OpenMP Library for Parallelism 19.1 Unit 19 OpenMP Library for Parallelism 19.2 Overview of OpenMP A library or API (Application Programming Interface) for parallelism Requires compiler support (make sure the compiler you use supports

More information

The following program computes a Calculus value, the "trapezoidal approximation of

The following program computes a Calculus value, the trapezoidal approximation of Multicore machines and shared memory Multicore CPUs have more than one core processor that can execute instructions at the same time. The cores share main memory. In the next few activities, we will learn

More information

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.

More information

DPHPC: Introduction to OpenMP Recitation session

DPHPC: Introduction to OpenMP Recitation session SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set

More information

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen OpenMPand the PGAS Model CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen LastTime: Message Passing Natural model for distributed-memory systems Remote ( far ) memory must be retrieved before use Programmer

More information

CS 470 Spring Mike Lam, Professor. OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP CS 470 Spring 2018 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism

More information

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and

More information

OpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16

OpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16 Lecture 8 OpenMP Today s lecture 7 OpenMP A higher level interface for threads programming http://www.openmp.org Parallelization via source code annotations All major compilers support it, including gnu

More information

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 04 Programs with IO and Loop We will now discuss the module 2,

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb

More information

Shared Memory programming paradigm: openmp

Shared Memory programming paradigm: openmp IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM

More information

OPENMP OPEN MULTI-PROCESSING

OPENMP OPEN MULTI-PROCESSING OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with

More information

Shared memory parallel computing

Shared memory parallel computing Shared memory parallel computing OpenMP Sean Stijven Przemyslaw Klosiewicz Shared-mem. programming API for SMP machines Introduced in 1997 by the OpenMP Architecture Review Board! More high-level than

More information

CS 470 Spring Mike Lam, Professor. OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP CS 470 Spring 2017 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism

More information

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2. OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,

More information

ME759 High Performance Computing for Engineering Applications

ME759 High Performance Computing for Engineering Applications ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs

More information

DPHPC: Introduction to OpenMP Recitation session

DPHPC: Introduction to OpenMP Recitation session SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set of compiler directives

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

5.12 EXERCISES Exercises 263

5.12 EXERCISES Exercises 263 5.12 Exercises 263 5.12 EXERCISES 5.1. If it s defined, the OPENMP macro is a decimal int. Write a program that prints its value. What is the significance of the value? 5.2. Download omp trap 1.c from

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

EPL372 Lab Exercise 5: Introduction to OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf

More information

Multithreading in C with OpenMP

Multithreading in C with OpenMP Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads

More information

CSE 160 Lecture 8. NUMA OpenMP. Scott B. Baden

CSE 160 Lecture 8. NUMA OpenMP. Scott B. Baden CSE 160 Lecture 8 NUMA OpenMP Scott B. Baden OpenMP Today s lecture NUMA Architectures 2013 Scott B. Baden / CSE 160 / Fall 2013 2 OpenMP A higher level interface for threads programming Parallelization

More information

Announcements. Scott B. Baden / CSE 160 / Wi '16 2

Announcements. Scott B. Baden / CSE 160 / Wi '16 2 Lecture 8 Announcements Scott B. Baden / CSE 160 / Wi '16 2 Recapping from last time: Minimal barrier synchronization in odd/even sort Global bool AllDone; for (s = 0; s < MaxIter; s++) { barr.sync();

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

POSIX Threads and OpenMP tasks

POSIX Threads and OpenMP tasks POSIX Threads and OpenMP tasks Jimmy Aguilar Mena February 16, 2018 Introduction Pthreads Tasks Two simple schemas Independent functions # include # include void f u n c t i

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Review

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions

More information

Exceptions, Case Study-Exception handling in C++.

Exceptions, Case Study-Exception handling in C++. PART III: Structuring of Computations- Structuring the computation, Expressions and statements, Conditional execution and iteration, Routines, Style issues: side effects and aliasing, Exceptions, Case

More information

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

1- Write a single C++ statement that: A. Calculates the sum of the two integrates 11 and 12 and outputs the sum to the consol.

1- Write a single C++ statement that: A. Calculates the sum of the two integrates 11 and 12 and outputs the sum to the consol. 1- Write a single C++ statement that: A. Calculates the sum of the two integrates 11 and 12 and outputs the sum to the consol. B. Outputs to the console a floating point number f1 in scientific format

More information

To become familiar with array manipulation, searching, and sorting.

To become familiar with array manipulation, searching, and sorting. ELECTRICAL AND COMPUTER ENGINEERING 06-88-211: COMPUTER AIDED ANALYSIS LABORATORY EXPERIMENT #2: INTRODUCTION TO ARRAYS SID: OBJECTIVE: SECTIONS: Total Mark (out of 20): To become familiar with array manipulation,

More information

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,

More information

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading

More information

15-418, Spring 2008 OpenMP: A Short Introduction

15-418, Spring 2008 OpenMP: A Short Introduction 15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions. 1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging

More information

Computer Programming. Basic Control Flow - Loops. Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons

Computer Programming. Basic Control Flow - Loops. Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons Computer Programming Basic Control Flow - Loops Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons Objectives To learn about the three types of loops: while for do To avoid infinite

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and

More information

Shared Memory Programming Model

Shared Memory Programming Model Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache

More information

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading

More information

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

Mango DSP Top manufacturer of multiprocessing video & imaging solutions. 1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

Review. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables

Review. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables Review Lecture 3 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Work-tasking Synchronization Library Functions Environment Variables 2 35a.cpp

More information

Engineering Problem Solving with C++, 3e Chapter 2 Test Bank

Engineering Problem Solving with C++, 3e Chapter 2 Test Bank 1. Match each of the following data types with literal constants of that data type. A data type can be used more than once. A. integer B 1.427E3 B. double D "Oct" C. character B -63.29 D. string F #Hashtag

More information

Cache Awareness. Course Level: CS1/CS2. PDC Concepts Covered: PDC Concept Locality False Sharing

Cache Awareness. Course Level: CS1/CS2. PDC Concepts Covered: PDC Concept Locality False Sharing Cache Awareness Course Level: CS1/CS PDC Concepts Covered: PDC Concept Locality False Sharing Bloom Level C C Programming Knowledge Prerequisites: Know how to compile Java/C++ Be able to understand loops

More information

OpenMP Shared Memory Programming

OpenMP Shared Memory Programming OpenMP Shared Memory Programming John Burkardt, Information Technology Department, Virginia Tech.... Mathematics Department, Ajou University, Suwon, Korea, 13 May 2009.... http://people.sc.fsu.edu/ jburkardt/presentations/

More information

Accelerator Programming Lecture 1

Accelerator Programming Lecture 1 Accelerator Programming Lecture 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de January 11, 2016 Accelerator Programming

More information

Parallel Computing. Prof. Marco Bertini

Parallel Computing. Prof. Marco Bertini Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to

More information

Alfio Lazzaro: Introduction to OpenMP

Alfio Lazzaro: Introduction to OpenMP First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Raspberry Pi Basics. CSInParallel Project

Raspberry Pi Basics. CSInParallel Project Raspberry Pi Basics CSInParallel Project Sep 11, 2016 CONTENTS 1 Getting started with the Raspberry Pi 1 2 A simple parallel program 3 3 Running Loops in parallel 7 4 When loops have dependencies 11 5

More information

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:

More information

Multicore Programming with OpenMP. CSInParallel Project

Multicore Programming with OpenMP. CSInParallel Project Multicore Programming with OpenMP CSInParallel Project March 07, 2014 CONTENTS 1 Getting Started with Multicore Programming using OpenMP 2 1.1 Notes about this document........................................

More information

Synchronization. Event Synchronization

Synchronization. Event Synchronization Synchronization Synchronization: mechanisms by which a parallel program can coordinate the execution of multiple threads Implicit synchronizations Explicit synchronizations Main use of explicit synchronization

More information

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by

More information

Agenda. The main body and cout. Fundamental data types. Declarations and definitions. Control structures

Agenda. The main body and cout. Fundamental data types. Declarations and definitions. Control structures The main body and cout Agenda 1 Fundamental data types Declarations and definitions Control structures References, pass-by-value vs pass-by-references The main body and cout 2 C++ IS AN OO EXTENSION OF

More information

Shared Memory Parallelism - OpenMP

Shared Memory Parallelism - OpenMP Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial

More information

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs. Review Lecture 14 Structured block Parallel construct clauses Working-Sharing contructs for, single, section for construct with different scheduling strategies 1 2 Tasking Work Tasking New feature in OpenMP

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve PTHREADS pthread_create, pthread_exit, pthread_join Mutex: locked/unlocked; used to protect access to shared variables (read/write) Condition variables: used to allow threads

More information

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Work Sharing in OpenMP November 2, 2015 Lecture 21 Dan Negrut, 2015 ECE/ME/EMA/CS 759 UW-Madison Quote of the Day Success consists

More information

Introduction to OpenMP

Introduction to OpenMP Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory

More information

Introductory OpenMP June 2008

Introductory OpenMP June 2008 5: http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture5.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12 June 2008 Introduction

More information

1. Match each of the following data types with literal constants of that data type. A data type can be used more than once. A.

1. Match each of the following data types with literal constants of that data type. A data type can be used more than once. A. Engineering Problem Solving With C++ 4th Edition Etter TEST BANK Full clear download (no error formating) at: https://testbankreal.com/download/engineering-problem-solving-with-c-4thedition-etter-test-bank/

More information

Compiling for GPUs. Adarsh Yoga Madhav Ramesh

Compiling for GPUs. Adarsh Yoga Madhav Ramesh Compiling for GPUs Adarsh Yoga Madhav Ramesh Agenda Introduction to GPUs Compute Unified Device Architecture (CUDA) Control Structure Optimization Technique for GPGPU Compiler Framework for Automatic Translation

More information

The American University in Cairo Computer Science & Engineering Department CSCE 106 Fundamentals of Computer Science

The American University in Cairo Computer Science & Engineering Department CSCE 106 Fundamentals of Computer Science The American University in Cairo Computer Science & Engineering Department CSCE 106 Fundamentals of Computer Science Instructor: Dr. Howaida Ismail Final Exam Spring 2013 Last Name :... ID:... First Name:...

More information

Programming Shared Memory Systems with OpenMP Part I. Book

Programming Shared Memory Systems with OpenMP Part I. Book Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine

More information

The American University in Cairo Department of Computer Science & Engineering CSCI &09 Dr. KHALIL Exam-I Fall 2011

The American University in Cairo Department of Computer Science & Engineering CSCI &09 Dr. KHALIL Exam-I Fall 2011 The American University in Cairo Department of Computer Science & Engineering CSCI 106-07&09 Dr. KHALIL Exam-I Fall 2011 Last Name :... ID:... First Name:... Form I Section No.: EXAMINATION INSTRUCTIONS

More information

BITG 1113: Array (Part 1) LECTURE 8

BITG 1113: Array (Part 1) LECTURE 8 BITG 1113: Array (Part 1) LECTURE 8 1 1 LEARNING OUTCOMES At the end of this lecture, you should be able to: 1. Describe the fundamentals of arrays 2. Describe the types of array: One Dimensional (1 D)

More information

REPETITION CONTROL STRUCTURE LOGO

REPETITION CONTROL STRUCTURE LOGO CSC 128: FUNDAMENTALS OF COMPUTER PROBLEM SOLVING REPETITION CONTROL STRUCTURE 1 Contents 1 Introduction 2 for loop 3 while loop 4 do while loop 2 Introduction It is used when a statement or a block of

More information

CSCE 206: Structured Programming in C++

CSCE 206: Structured Programming in C++ CSCE 206: Structured Programming in C++ 2017 Spring Exam 2 Monday, March 20, 2017 Total - 100 Points B Instructions: Total of 13 pages, including this cover and the last page. Before starting the exam,

More information

CS691/SC791: Parallel & Distributed Computing

CS691/SC791: Parallel & Distributed Computing CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.

More information

Overview of OpenMP. Unit 19. Using OpenMP. Parallel for. OpenMP Library for Parallelism

Overview of OpenMP. Unit 19. Using OpenMP. Parallel for. OpenMP Library for Parallelism 19.1 Overview of OpenMP 19.2 Unit 19 OpenMP Library for Parallelism A library or API (Application Programming Interface) for parallelism Requires compiler support (make sure the compiler you use supports

More information

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;

More information

After an "Hour of Code" now what?

After an Hour of Code now what? After an "Hour Code" now what? 2016 Curtis Center Mathematics and Teaching Conference Chris Anderson Pressor and Director the Program in Computing UCLA Dept. Mathematics March 5, 2016 New push in K-12

More information

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause

Review. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp

More information

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP

More information

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012 CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night

More information

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008 1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction

More information

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013 OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

C++ Basics. Lecture 2 COP 3014 Spring January 8, 2018

C++ Basics. Lecture 2 COP 3014 Spring January 8, 2018 C++ Basics Lecture 2 COP 3014 Spring 2018 January 8, 2018 Structure of a C++ Program Sequence of statements, typically grouped into functions. function: a subprogram. a section of a program performing

More information

Discussion 1H Notes (Week 3, April 14) TA: Brian Choi Section Webpage:

Discussion 1H Notes (Week 3, April 14) TA: Brian Choi Section Webpage: Discussion 1H Notes (Week 3, April 14) TA: Brian Choi (schoi@cs.ucla.edu) Section Webpage: http://www.cs.ucla.edu/~schoi/cs31 More on Arithmetic Expressions The following two are equivalent:! x = x + 5;

More information