PARALLEL PROGRAMMING MULTI- CORE COMPUTERS
|
|
- Roger Francis
- 5 years ago
- Views:
Transcription
1 2016 HAWAII UNIVERSITY INTERNATIONAL CONFERENCES SCIENCE, TECHNOLOGY, ENGINEERING, ART, MATH & EDUCATION JUNE 10-12, 2016 HAWAII PRINCE HOTEL WAIKIKI, HONOLULU PARALLEL PROGRAMMING MULTI- CORE COMPUTERS ALAGHBAND, GITA UNIVERSITY OF COLORADO DENVER COMPUTER SCIENCE & ENGINEERING DEPARTMENT FARDI, HAMID UNIVERSITY OF COLORADO DENVER ELECTRICAL ENGINEERING DEPARTMENT
2 Prof. Gita Alaghband Computer Science and Engineering Department University of Colorado Denver. Prof. Hamid Fardi Electrical Engineering Department University of Colorado Denver. Parallel Programming Multi-core Computers Synopsis: With advances in hardware technology, we as educators find ourselves with multi-core computers as servers, desktops, and personal computers in our laboratories while teaching students how to design for sequential environments. We propose to develop pedagogy for teaching undergraduate students how to develop software and design algorithms for multicore architectures in a laboratory setting. The key success and challenge of this project will be helping students to think in parallel again.
3 Parallel Programming Multi-core Computers Gita Alaghband 1, Lan Vu 1, H. Z. Fardi 2 1 University Of Colorado Denver Department of Computer Science & Engineering Campus Box 109, P.O. Box , Denver, CO University Of Colorado Denver Department of Electrical Engineering Campus Box 110, P.O. Box , Denver, CO Abstract With advances in hardware technology, we as educators find ourselves with multi-core computers as servers, desktops, and personal computers in our laboratories while teaching students how to design for sequential environments. We propose to develop pedagogy for teaching undergraduate students how to develop scientific computing programs for multi-core architectures in a laboratory setting. The key success and challenge of this project will be helping students to think in parallel. Introduction and Motivation It is time to develop methodologies for infusion of parallel computing and programming at the undergraduate computer science and engineering curriculum level. The two decades of the eights and nineties were the height of supercomputing for machine design, parallel compilers, languages and algorithm design involving select group of highly specialized scientists with interest in high performance computing and number crunching applications. The surprising thing today is not that parallel computers and computation are attracting increased attention, but that so much of computing has revolved around a sequential, one thing at a time, style of software design. Today, with all the advances in hardware technology, we as educators find ourselves with multi-core computers as servers, desktops, personal computers, and even handheld devices in our laboratories while still teaching students how to design system software, algorithms and programming languages for sequential environments. We propose to develop methodologies for teaching undergraduate students how to develop software and design algorithms for multi-core architectures for a scientific computing course in a laboratory setting. The key success and challenge of this project will be helping students to think in parallel [1]. We will develop material for junior-level computer science and engineering student who has already had some programming language courses. The major intellectual challenge of this project is in developing strategies and material that will help students overcome the conditioning of thinking sequentially with which they are familiar. To accomplish this, concepts of global parallelism will be integral in our development.
4 Parallel Design Concepts through an Example We introduce the idea of global parallelism through two versions of parallel Gaussian Elimination implementation implemented in OpenMP C++. The choice of Gaussian elimination is due to its vast applicability to most areas of science. It is used to solve systems of linear equations that exist in engineering, chemistry, physics, economics and others with applications such as circuit simulation, fluid dynamics simulation, weather modeling, molecule dynamics, etc. The choice for OpenMP is due to being the most common parallel language platform used within current shared memory multicore computer architectures. OpenMP is not a full language by itself; it is designed as an extension to a sequential programming language to support shared memory MIMD programming. OpenMP extenstions exist for C, C++, and Fortran [2]. OpenMP supports a form of fork/join semantics. Execution starts with a single sequential process that forks a fixed number of threads upon encountering a parallel region. The team of threads will then execute the body of the parallel region and are joined back into the original process (single stream). Every time a new parallel region is encountered a different number of threads may be forked, but the number remains fixed within each parallel region. Three user specified environment variables manage parallel execution: num_threads (integer): specifies the number of threads, dynamic (Boolean): specifies whether the number of threads can change from one parallel region to the next, and a second Boolean, nested (Boolean): specifies whether nested parallelism is allowed. The programs presented here use several OpenMP constructs that won t be described in order to focus on the parallel programming style. The language constructs are described in details in [2]. For the sake of simplicity and complete presentation, the code sections for the main program (Figure 1), matrix generation of a test matrix (Figure 2), and user input (Figure 3) are presented separate from the two different parallel implementations of the Gaussian elimination. This is done to ensure that the comparison is only on the impact of parallel style and not sequential method. Both programs are sequentially identical (and optimized). The main program calls the ComputeGaussianElimination function; two versions of this function are discussed in details as focus of this paper. The LU decomposition is performed by Gaussian elimination, leaving L and U factors stored in place of the original matrix, and prints the time for performing the elimination and, optionally, the L and U factors. The execution time is calculated from calls to omp_get_wtime(), assumed to return time in seconds for ComputeGaussianElimination function. Forward and back substitution for solving equation sets with specific right hand sides using the L and U factors are not included in this program. For ease of readability, the OpenMP constructs are highlighted in bold green and the related parallel comments are presented in red. Function calls are highlighted in in purple. The matrix is initialized in parallel in Figure 2. The created threads within the pragma omp parallel region work on different rows, I, in parallel generating the matrix elements; the inner loop over columns are done sequentially within each parallel thread.
5 // Main Program int main(int argc, char *argv[]) {int n, numthreads, isprintmatrix; float **a; double runtime; bool isok; if (GetUserInput(argc,argv,n,numThreads,isPrintMatrix)==false) return 1; //specify number of threads created in parallel region omp_set_num_threads(numthreads); InitializeMatrix(a,n); //Initialize the value of matrix A[n x n] if (isprintmatrix) {cout<< "The input matrix" << endl; PrintMatrix(a,n); } runtime = omp_get_wtime(); //Compute the Gaussian Elimination for matrix a[n x n] isok = ComputeGaussianElimination(a,n); runtime = omp_get_wtime() - runtime; if (isok == true) //The eliminated matrix is as below: {if (isprintmatrix) {cout<< "Output matrix:" << endl; PrintMatrix(a,n); } //print computing time cout<< "Gaussian Elimination runs in "<< setiosflags(ios::fixed) << setprecision(2) << runtime << " seconds\n";} else {cout<< "The matrix is singular" << endl;} DeleteMatrix(a,n); return 0;} Figure 1. Main program //Initialize the value of matrix a[n x n] void InitializeMatrix(float** &a,int n) {a = new float*[n]; a[0] = new float[n*n]; for (int i = 1; i < n; i++) a[i] = a[i-1] + n; #pragma omp parallel for schedule(static) for (int i = 0 ; i < n ; i++) {for (int j = 0 ; j < n ; j++) {if (i == j) a[i][j] = (((float)i+1)*((float)i+1))/(float)2; Else a[i][j] = (((float)i+1)+((float)j+1))/(float)2;}}} void DeleteMatrix(float **a,int n) //Delete matrix matrix a[n x n] {delete[] a[0]; delete[] a; } void PrintMatrix(float **a, int n) //Print matrix {for (int i = 0 ; i < n ; i++) {cout<< "Row " << (i+1) << ":\t" ; for (int j = 0 ; j < n ; j++) {printf("%.2f\t", a[i][j]);} cout<<endl ;}} Figure 2. Matrix is initialized by the program rather than input by the user
6 // Input User Data for Gaussian elimination program : C++ OpenMP // Row-wise Data layout & Row-wise Elimination #include <iostream> #include <iomanip> #include <cmath> #include <omp.h> #include <time.h> using namespace std; // Get user input of matrix dimension and printing option bool GetUserInput(int argc, char *argv[],int& n,int& numthreads,int& isprint) {bool isok = true; if(argc < 3) {cout << "Arguments:<X> <Y> [<Z>]" << endl; cout << "X : Matrix size [X x X]" << endl; cout << "Y : Number of threads" << endl; cout << "Z = 1: print the input/output matrix if X < 10" << endl; cout << "Z <> 1 or missing: do not print input/output matrix" << endl; isok = false;} else {//get matrix size n = atoi(argv[1]); if (n <=0) {cout << "Matrix size must be larger than 0" <<endl; isok = false;} numthreads = atoi(argv[2]); //get number of threads if (numthreads <= 0) {cout << "Number of threads must be larger than 0" <<endl; isok = false;} //print the input/output matrix if (argc >=4) isprint = (atoi(argv[3])==1 && n <=9)?1:0; else isprint = 0;} return isok;} Figure 2. User input values and desired printing options Two different implementations of ComputeGaussianElimination are presented (Figures 4 and 6) to illustrate the principle ideas between designing the parallel versions using global parallelism as opposed to applying parallelism when the code segments are observed to be candidates for parallel expression. Figure 4 shows a parallelized version of the Gaussian elimination code with a style that is familiar and often used by sequential programmers and software engineers new to parallel computation. Figure 5 is the corresponding high-level flowchart depicting the parallelization and parallel operations taking place in the implementation style of Figure 4. A single stream of control characterizes the overall program structure, with parallel operations applied to large data structures with independent elements.
7 //Compute the Gaussian Elimination for matrix a[n x n], Version 1 bool ComputeGaussianElimination(float **a,int n) {float pivot,gmax,pmax,temp; int pindmax,gindmax,i,j,k; omp_lock_t lock; omp_init_lock(&lock); for (k = 0 ; k < n-1 ; k++) //Perform rowwise elimination {gmax = 0.0; //Find the pivot row among rows k, k+1,...n //The following OMP parallel statement will create parallel processes. Each thread works on a number of rows to find the //local max value pmax #pragma omp parallel shared(a,gmax,gindmax) firstprivate(n,k) private(pivot,i,j,temp,pmax,pindmax) {pmax = 0.0; #pragma omp for schedule(static) for (i = k ; i < n ; i++) {temp = abs(a[i][k]); if (temp > pmax) {pmax = temp; pindmax = i;}} // Update private max value to global gmax one process at a time: #pragma omp critical {if (gmax <= pmax) {gmax = pmax; gindmax = pindmax; }}} At this point all processes join into one stream. if (gmax == 0) return false; //If matrix is singular set the flag & quit. if (gindmax!= k) //Swap rows if necessary //The following OMP parallel statement will create parallel processes. Swap pivot row {#pragma omp parallel for shared(a) firstprivate(n,k,gindmax) private(j,temp) schedule(static) for (j = k; j < n; j++) {temp = a[gindmax][j]; a[gindmax][j] = a[k][j]; a[k][j] = temp;}} // At this point all processes join into one stream. pivot = -1.0/a[k][k]; // Compute the pivot //The following OMP parallel statement will create parallel processes to do row reductions #pragma omp parallel for shared(a) firstprivate(pivot,n,k) private(i,j,temp) schedule(dynamic) for (i = k+1 ; i < n; i++) {temp = pivot*a[i][k]; for (j = k ; j < n ; j++) {a[i][j] = a[i][j] + temp*a[k][j];}}} // At this point all processes join into one stream. omp_destroy_lock (&lock); return true;} Figure 4. Version 1 of OpenMP C++ ComputeGaussianElimination
8 In contrast in the implementation of Figure 6, the entire team of processes work over the entire flow of control rather than at specific points in the computation [3]. In this version, parallel processes are created once at the entry to the #pragma omp parallel region at the beginning of the function and are joined at the end of the region which is the last statement of the function before return. If sequential operation is required by the program logic, it is limited to specific program points where it can coordinate the activities of parallel threads (for example in
9 #pragma omp single construct). Figure 7 shows the high-level parallel flow diagram of this version. //Compute the Gaussian Elimination for matrix a[n x n], Version 2 bool ComputeGaussianElimination(float **a,int n) {float pivot,gmax,pmax,temp; int pindmax,gindmax,i,j,k; bool isok = true; //Find the pivot row among rows k, k+1,...n // The following OMP parallel statement will create parallel processes. #pragma omp parallel shared(a,gmax,gindmax) firstprivate(n,k) private(pivot,i,j,temp,pmax,pindmax) {//Perform row-wise elimination. Each thread works on a number of rows to find the local max value pmax. Then update this local max value to the global variable gmax for (k = 0 ; k < n-1 ; k++) {#pragma omp single {gmax = 0.0; } pmax = 0.0; #pragma omp for schedule(static) for (i = k ; i < n ; i++) {temp = abs(a[i][k]); if (temp > pmax) {pmax = temp; pindmax = i; }} //gmax is updated one by one #pragma omp critical {if (gmax <= pmax) {gmax = pmax; gindmax = pindmax; }} #pragma omp barrier //All threads have to reach this point before continue if (gmax == 0.0) //If matrix is singular set the flag & quit {isok = false; break;} if (gindmax!= k) //Swap rows if necessary {#pragma omp for schedule(static) for (j = k; j < n; j++) {temp = a[gindmax][j]; a[gindmax][j] = a[k][j]; a[k][j] = temp;}} pivot = -1.0/a[k][k]; //Compute the pivot #pragma omp for schedule(dynamic) //Perform row reductions for (i = k+1 ; i < n; i++) {temp = pivot*a[i][k]; for (j = k ; j < n ; j++) {a[i][j] = a[i][j] + temp*a[k][j];}}}} // At this point all processes join into one stream. return isok;} Figure 6. Version 2 of OpenMP C++ ComputeGaussianElimination
10 The outer for-loop (K) over diagonal elements in Figure 6 is a sequential loop executed independently by every parallel thread. The barriers embedded within this loop ensure that threads will complete iterations of the loop in close synchrony (i.e., they reach the barrier point before they can continue). For each diagonal element, a parallel search for the element with maximum absolute value in the pivot column below the diagonal is done within #pragma omp for schedule(static). The small amount of work for each iteration makes static scheduling appropriate to avoid the synchronization overhead of dynamic scheduling. Similarly rows are swapped to bring the pivot into the diagonal position within another statically scheduled for loop. In the row reduction loop, each iteration involves a full row to the right of the diagonal, so overhead may be small enough to make the better load balance resulting from dynamic scheduling beneficial. This is not likely to help much here with processors of the same speed since the work per parallel iteration is the same for all threads.
11 Execution time in seconds Results and Conclusion The two implementations result in drastically different performance. Figure 8 shows the result of running the two versions of our Gaussian elimination programs on a matrix of dimension 6000 on a 64-core AMD-Bulldozer multicore architecture. Note that the difference in execution times are only due to the number of times processes are created and joined, otherwise the two programs (even parallel techniques employed) are identical (cge-omp1) (cge-omp2) Number of processes Figure 8. Performance comparison of the two implementations of Gaussian elimination This method of applying parallelism within the code as observed or perceived applicable, not only won t exploit all the parallelism possible in a given sequential code, it also prevents the design of a parallel solution technique from the start. Starting the design from a sequential perspective leads to designing good sequential programs which often is not a suitable base for parallel processing. In so many cases, the best parallel solution will perform poorly on a sequential machine. It is only when it is executed in parallel on a parallel computer with enough number of processors that we observe the best performance. Learning about the trade-offs between parallelism and memory usage, inherently sequential access data structures versus data structures that allow for parallel access and operations, and allowing more operations to be performed in a parallel version compared to the sequential version solving the same problem can be done most effectively when students observe these factors in a hands-on laboratory environment and exercises that re-enforce lectures. Reference [1] Gita Alaghband, Harry F. Jordan, Overview of the force scientific parallel language, Journal of Scientific Programming, Volume 3 Issue 1, Spring [2] OpenMP Applications Program Interface is available from [3] Harry F. Jordan, Gita Alaghband, Fundamentals of Parallel Processing, Prentice Hall, 2003
Lecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationOpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationOpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono
OpenMP Algoritmi e Calcolo Parallelo References Useful references Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/
More informationParallel programming using OpenMP
Parallel programming using OpenMP Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationA brief introduction to OpenMP
A brief introduction to OpenMP Alejandro Duran Barcelona Supercomputing Center Outline 1 Introduction 2 Writing OpenMP programs 3 Data-sharing attributes 4 Synchronization 5 Worksharings 6 Task parallelism
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationDistributed Systems + Middleware Concurrent Programming with OpenMP
Distributed Systems + Middleware Concurrent Programming with OpenMP Gianpaolo Cugola Dipartimento di Elettronica e Informazione Politecnico, Italy cugola@elet.polimi.it http://home.dei.polimi.it/cugola
More information19.1. Unit 19. OpenMP Library for Parallelism
19.1 Unit 19 OpenMP Library for Parallelism 19.2 Overview of OpenMP A library or API (Application Programming Interface) for parallelism Requires compiler support (make sure the compiler you use supports
More informationThe following program computes a Calculus value, the "trapezoidal approximation of
Multicore machines and shared memory Multicore CPUs have more than one core processor that can execute instructions at the same time. The cores share main memory. In the next few activities, we will learn
More informationOpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer
OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
More informationDPHPC: Introduction to OpenMP Recitation session
SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationOpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen
OpenMPand the PGAS Model CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen LastTime: Message Passing Natural model for distributed-memory systems Remote ( far ) memory must be retrieved before use Programmer
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2018 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationTopics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP
Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and
More informationOpenMP. Today s lecture. Scott B. Baden / CSE 160 / Wi '16
Lecture 8 OpenMP Today s lecture 7 OpenMP A higher level interface for threads programming http://www.openmp.org Parallelization via source code annotations All major compilers support it, including gnu
More informationProgramming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 04 Programs with IO and Loop We will now discuss the module 2,
More informationParallel Programming
Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationAdvanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele
Advanced C Programming Winter Term 2008/09 Guest Lecture by Markus Thiele Lecture 14: Parallel Programming with OpenMP Motivation: Why parallelize? The free lunch is over. Herb
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationOPENMP OPEN MULTI-PROCESSING
OPENMP OPEN MULTI-PROCESSING OpenMP OpenMP is a portable directive-based API that can be used with FORTRAN, C, and C++ for programming shared address space machines. OpenMP provides the programmer with
More informationShared memory parallel computing
Shared memory parallel computing OpenMP Sean Stijven Przemyslaw Klosiewicz Shared-mem. programming API for SMP machines Introduced in 1997 by the OpenMP Architecture Review Board! More high-level than
More informationCS 470 Spring Mike Lam, Professor. OpenMP
CS 470 Spring 2017 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism
More informationOpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.
OpenMP Overview in 30 Minutes Christian Terboven 06.12.2010 / Aachen, Germany Stand: 03.12.2010 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda OpenMP: Parallel Regions,
More informationME759 High Performance Computing for Engineering Applications
ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs
More informationDPHPC: Introduction to OpenMP Recitation session
SALVATORE DI GIROLAMO DPHPC: Introduction to OpenMP Recitation session Based on http://openmp.org/mp-documents/intro_to_openmp_mattson.pdf OpenMP An Introduction What is it? A set of compiler directives
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More information5.12 EXERCISES Exercises 263
5.12 Exercises 263 5.12 EXERCISES 5.1. If it s defined, the OPENMP macro is a decimal int. Write a program that prints its value. What is the significance of the value? 5.2. Download omp trap 1.c from
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationCSE 160 Lecture 8. NUMA OpenMP. Scott B. Baden
CSE 160 Lecture 8 NUMA OpenMP Scott B. Baden OpenMP Today s lecture NUMA Architectures 2013 Scott B. Baden / CSE 160 / Fall 2013 2 OpenMP A higher level interface for threads programming Parallelization
More informationAnnouncements. Scott B. Baden / CSE 160 / Wi '16 2
Lecture 8 Announcements Scott B. Baden / CSE 160 / Wi '16 2 Recapping from last time: Minimal barrier synchronization in odd/even sort Global bool AllDone; for (s = 0; s < MaxIter; s++) { barr.sync();
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationPOSIX Threads and OpenMP tasks
POSIX Threads and OpenMP tasks Jimmy Aguilar Mena February 16, 2018 Introduction Pthreads Tasks Two simple schemas Independent functions # include # include void f u n c t i
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Review
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationExceptions, Case Study-Exception handling in C++.
PART III: Structuring of Computations- Structuring the computation, Expressions and statements, Conditional execution and iteration, Routines, Style issues: side effects and aliasing, Exceptions, Case
More informationOpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More information1- Write a single C++ statement that: A. Calculates the sum of the two integrates 11 and 12 and outputs the sum to the consol.
1- Write a single C++ statement that: A. Calculates the sum of the two integrates 11 and 12 and outputs the sum to the consol. B. Outputs to the console a floating point number f1 in scientific format
More informationTo become familiar with array manipulation, searching, and sorting.
ELECTRICAL AND COMPUTER ENGINEERING 06-88-211: COMPUTER AIDED ANALYSIS LABORATORY EXPERIMENT #2: INTRODUCTION TO ARRAYS SID: OBJECTIVE: SECTIONS: Total Mark (out of 20): To become familiar with array manipulation,
More informationOpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More information15-418, Spring 2008 OpenMP: A Short Introduction
15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationParallel Processing Top manufacturer of multiprocessing video & imaging solutions.
1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging
More informationComputer Programming. Basic Control Flow - Loops. Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons
Computer Programming Basic Control Flow - Loops Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons Objectives To learn about the three types of loops: while for do To avoid infinite
More informationIntroduction to OpenMP
Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and
More informationShared Memory Programming Model
Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache
More informationProgramming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2012 Prof. Robert van Engelen Overview Sequential consistency Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading
More informationMango DSP Top manufacturer of multiprocessing video & imaging solutions.
1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationReview. 35a.cpp. 36a.cpp. Lecture 13 5/29/2012. Compiler Directives. Library Functions Environment Variables
Review Lecture 3 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Work-tasking Synchronization Library Functions Environment Variables 2 35a.cpp
More informationEngineering Problem Solving with C++, 3e Chapter 2 Test Bank
1. Match each of the following data types with literal constants of that data type. A data type can be used more than once. A. integer B 1.427E3 B. double D "Oct" C. character B -63.29 D. string F #Hashtag
More informationCache Awareness. Course Level: CS1/CS2. PDC Concepts Covered: PDC Concept Locality False Sharing
Cache Awareness Course Level: CS1/CS PDC Concepts Covered: PDC Concept Locality False Sharing Bloom Level C C Programming Knowledge Prerequisites: Know how to compile Java/C++ Be able to understand loops
More informationOpenMP Shared Memory Programming
OpenMP Shared Memory Programming John Burkardt, Information Technology Department, Virginia Tech.... Mathematics Department, Ajou University, Suwon, Korea, 13 May 2009.... http://people.sc.fsu.edu/ jburkardt/presentations/
More informationAccelerator Programming Lecture 1
Accelerator Programming Lecture 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de January 11, 2016 Accelerator Programming
More informationParallel Computing. Prof. Marco Bertini
Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to
More informationAlfio Lazzaro: Introduction to OpenMP
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B. Bertinoro Italy, 12 17 October 2009 Alfio Lazzaro:
More informationLecture 16: Recapitulations. Lecture 16: Recapitulations p. 1
Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently
More informationParallel Programming using OpenMP
1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading
More informationProgramming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen
Programming with Shared Memory PART II HPC Fall 2007 Prof. Robert van Engelen Overview Parallel programming constructs Dependence analysis OpenMP Autoparallelization Further reading HPC Fall 2007 2 Parallel
More informationParallel Programming using OpenMP
1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationRaspberry Pi Basics. CSInParallel Project
Raspberry Pi Basics CSInParallel Project Sep 11, 2016 CONTENTS 1 Getting started with the Raspberry Pi 1 2 A simple parallel program 3 3 Running Loops in parallel 7 4 When loops have dependencies 11 5
More informationOpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen
OpenMP Dr. William McDoniel and Prof. Paolo Bientinesi HPAC, RWTH Aachen mcdoniel@aices.rwth-aachen.de WS17/18 Loop construct - Clauses #pragma omp for [clause [, clause]...] The following clauses apply:
More informationMulticore Programming with OpenMP. CSInParallel Project
Multicore Programming with OpenMP CSInParallel Project March 07, 2014 CONTENTS 1 Getting Started with Multicore Programming using OpenMP 2 1.1 Notes about this document........................................
More informationSynchronization. Event Synchronization
Synchronization Synchronization: mechanisms by which a parallel program can coordinate the execution of multiple threads Implicit synchronizations Explicit synchronizations Main use of explicit synchronization
More informationby system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call
OpenMP Syntax The OpenMP Programming Model Number of threads are determined by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by
More informationAgenda. The main body and cout. Fundamental data types. Declarations and definitions. Control structures
The main body and cout Agenda 1 Fundamental data types Declarations and definitions Control structures References, pass-by-value vs pass-by-references The main body and cout 2 C++ IS AN OO EXTENSION OF
More informationShared Memory Parallelism - OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openmp/#introduction) OpenMP sc99 tutorial
More informationReview. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.
Review Lecture 14 Structured block Parallel construct clauses Working-Sharing contructs for, single, section for construct with different scheduling strategies 1 2 Tasking Work Tasking New feature in OpenMP
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve PTHREADS pthread_create, pthread_exit, pthread_join Mutex: locked/unlocked; used to protect access to shared variables (read/write) Condition variables: used to allow threads
More informationECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications
ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Work Sharing in OpenMP November 2, 2015 Lecture 21 Dan Negrut, 2015 ECE/ME/EMA/CS 759 UW-Madison Quote of the Day Success consists
More informationIntroduction to OpenMP
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University History De-facto standard for Shared-Memory
More informationIntroductory OpenMP June 2008
5: http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture5.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12 June 2008 Introduction
More information1. Match each of the following data types with literal constants of that data type. A data type can be used more than once. A.
Engineering Problem Solving With C++ 4th Edition Etter TEST BANK Full clear download (no error formating) at: https://testbankreal.com/download/engineering-problem-solving-with-c-4thedition-etter-test-bank/
More informationCompiling for GPUs. Adarsh Yoga Madhav Ramesh
Compiling for GPUs Adarsh Yoga Madhav Ramesh Agenda Introduction to GPUs Compute Unified Device Architecture (CUDA) Control Structure Optimization Technique for GPGPU Compiler Framework for Automatic Translation
More informationThe American University in Cairo Computer Science & Engineering Department CSCE 106 Fundamentals of Computer Science
The American University in Cairo Computer Science & Engineering Department CSCE 106 Fundamentals of Computer Science Instructor: Dr. Howaida Ismail Final Exam Spring 2013 Last Name :... ID:... First Name:...
More informationProgramming Shared Memory Systems with OpenMP Part I. Book
Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine
More informationThe American University in Cairo Department of Computer Science & Engineering CSCI &09 Dr. KHALIL Exam-I Fall 2011
The American University in Cairo Department of Computer Science & Engineering CSCI 106-07&09 Dr. KHALIL Exam-I Fall 2011 Last Name :... ID:... First Name:... Form I Section No.: EXAMINATION INSTRUCTIONS
More informationBITG 1113: Array (Part 1) LECTURE 8
BITG 1113: Array (Part 1) LECTURE 8 1 1 LEARNING OUTCOMES At the end of this lecture, you should be able to: 1. Describe the fundamentals of arrays 2. Describe the types of array: One Dimensional (1 D)
More informationREPETITION CONTROL STRUCTURE LOGO
CSC 128: FUNDAMENTALS OF COMPUTER PROBLEM SOLVING REPETITION CONTROL STRUCTURE 1 Contents 1 Introduction 2 for loop 3 while loop 4 do while loop 2 Introduction It is used when a statement or a block of
More informationCSCE 206: Structured Programming in C++
CSCE 206: Structured Programming in C++ 2017 Spring Exam 2 Monday, March 20, 2017 Total - 100 Points B Instructions: Total of 13 pages, including this cover and the last page. Before starting the exam,
More informationCS691/SC791: Parallel & Distributed Computing
CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.
More informationOverview of OpenMP. Unit 19. Using OpenMP. Parallel for. OpenMP Library for Parallelism
19.1 Overview of OpenMP 19.2 Unit 19 OpenMP Library for Parallelism A library or API (Application Programming Interface) for parallelism Requires compiler support (make sure the compiler you use supports
More informationLittle Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo
OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;
More informationAfter an "Hour of Code" now what?
After an "Hour Code" now what? 2016 Curtis Center Mathematics and Teaching Conference Chris Anderson Pressor and Director the Program in Computing UCLA Dept. Mathematics March 5, 2016 New push in K-12
More informationReview. Lecture 12 5/22/2012. Compiler Directives. Library Functions Environment Variables. Compiler directives for construct, collapse clause
Review Lecture 12 Compiler Directives Conditional compilation Parallel construct Work-sharing constructs for, section, single Synchronization Work-tasking Library Functions Environment Variables 1 2 13b.cpp
More informationEE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California
EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP
More informationCS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012
CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night
More information1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008
1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction
More informationOpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013
OpenMP António Abreu Instituto Politécnico de Setúbal 1 de Março de 2013 António Abreu (Instituto Politécnico de Setúbal) OpenMP 1 de Março de 2013 1 / 37 openmp what? It s an Application Program Interface
More informationUvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP
Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel
More informationC++ Basics. Lecture 2 COP 3014 Spring January 8, 2018
C++ Basics Lecture 2 COP 3014 Spring 2018 January 8, 2018 Structure of a C++ Program Sequence of statements, typically grouped into functions. function: a subprogram. a section of a program performing
More informationDiscussion 1H Notes (Week 3, April 14) TA: Brian Choi Section Webpage:
Discussion 1H Notes (Week 3, April 14) TA: Brian Choi (schoi@cs.ucla.edu) Section Webpage: http://www.cs.ucla.edu/~schoi/cs31 More on Arithmetic Expressions The following two are equivalent:! x = x + 5;
More information