CPS343 Parallel and High Performance Computing Project 1 Spring 2018
|
|
- Jack White
- 5 years ago
- Views:
Transcription
1 CPS343 Parallel and High Performance Computing Project 1 Spring 2018 Assignment Write a program using OpenMP to compute the estimate of the dominant eigenvalue of a matrix Due: Wednesday March 21 The program /gc/cps343/matrix/pm seqcc (found on the workstation cluster, along with the associated file /gc/cps343/matrix/readmatrixseqcc) is a sequential program that opens and reads an HDF5 file to obtain matrix data and then uses the power method to estimate the dominant eigenvalue Starting with this program (or writing your own from scratch), you will need to use OpenMP so your program takes advantage of all available CPU cores when carrying out the power method Minimum Requirement: (60% of maximum) Make appropriate use of the #pragma omp parallel for directive to speed up your program Focus on the power method section of the main program Note that this approach will mean that a set of threads is started to handle each for-loop and then joined when the loop completes This process is repeated for each parallel loop Option 1: (Additional 30%) Create a version of your program that uses a single #pragma omp parallel directive In contrast to the original version, this version will only start one set of threads which will work collaboratively to carry out the power method and then terminate when work is complete You may assume that the matrix dimension is a multiple of the number of threads so that each thread will be responsible for the same number of matrix rows as every other thread Option 2: (Additional 10%) Extend your Option 1 program so that it allows for matrix dimensions that are not a multiple of the number of threads The Power Method Suppose A is an n n matrix The eigenvalues of A are scalars λ which satisfy Av = λv (1) and the vectors v are the associated eigenvectors Finding the eigenvalues and eigenvectors of a matrix is an important problem in many applications In some of these only the largest eigenvalue of a matrix, called the dominant eigenvalue, is needed One way to find it and its associated eigenvector is with the power method See the Appendix an explanation of the method
2 The Power Method Algorithm Given an n n matrix A, a tolerance ɛ > 0, and the maximum allowed number of iterations M > 0, the general power method algorithm can be formulated as follows x := (1, 1, 1,, 1) T initial eigenvector estimate λ := 0 initialized to any value λ 0 := λ + 2ɛ make sure λ λ 0 > ɛ k := 0 while λ λ 0 ɛ and k M do x := x/ x normalize x y := Ax compute next eigenvector estimate λ 0 := λ previous eigenvalue estimate λ := x T y compute new estimate: λ x T Ax x := y normalize eigenvector estimate k := k + 1 end while If the while-loop terminates with k M, then we conclude the algorithm has terminated successfully In this case λ is the dominant eigenvalue and x is the corresponding eigenvector The power method s rate of convergence depends on the difference between the magnitude of the dominant eigenvalue and the other eigenvalues Also, the power method will fail if the matrix does not have any real eigenvalues Parallel Implementation For the first option you will only need to make minor changes most of which will be inserting appropriate #pragma omp parallel for directives The second option will require significantly more work You should specify a single parallel thread body with something like # pragma omp parallel { while ( fabs ( lambda - lambda_ old ) > tol && numiter <= maxiter ) { } }
3 Note that this includes the power method s main loop You ll need to think carefully about shared variables and critical sections You ll also need to use #pragma omp barrier to synchronize the threads at several points The following discussion should help you think about how to create the body of each thread Suppose we have a parallel machine with p processors Following the PCAM design approach, we begin by partitioning the problem into many small individual tasks We can partition the matrix A into individual rows where a i is a 1 n row vector that is the i th row of A Then the product Ax becomes y = Ax = a 1 a 2 a 3 a n x = a 1 x a 2 x a 3 x a n x where the i th entry of y is computed via an inner product: n y i = a i x = a ij x j Each of these products is a task that can be computed in parallel If n is larger than the number of processors p we can assign multiple tasks to each processor during the agglomeration and mapping phases Each task uses a single row of A and the entire vector x in order to compute a single y i Every task must then make its y i value to all other tasks since the entire vector y is normalized to become the vector x for the next iteration One obvious way to agglomerate and map is to group tasks together based on the rows of A they use The rows may be interleaved or consecutive, but it s probably most natural to group tasks that use consecutive rows of A If there are p threads the matrix A is partitioned into individual blocks A i, i = 1,, p where each block has approximately n/p rows The product y = Ax then can be written as y 1 y 1 y p = j=1 A 1 x A 2 x A p x and processor i can compute y i = A i x independently of the other processors Note, however, every processor must have the entire vector y before it can compute the estimate of λ and create a normalized eigenvector estimate
4 Helps and Hints Using OpenMP To adapt your program to use OpenMP you will need to compile with the -fopenmp flag include the header file omph insert appropriate #pragma omp directives think carefully about what variables should be shared between threads Computing partition sizes When attempting the final option you will need some way to assign an unequal number of rows to the threads It is frequently the case in partitioning domains that we need to compute starting and ending indices in an array for a particular part of the partition For example, if a 100-element array is to be partitioned into four parts in a balanced way, each part should have one quarter of the elements In this case we would have The computations here are simple: Part Length Start End the length of each part is just the total array length divided by the number of parts the starting index of the i th part is i times the part length When the array length is not a multiple of the number of parts we will have some slight imbalance, but still want to have all parts be almost the same size For example, if we wanted to split a 100-element array into 8 parts we might find the following Part Length Start End Part Length Start End
5 The following C code will determine the starting and ending indices (inclusive) of the i th part (starting from 0) in a zero-based n-element array that is partitioned into m parts It is a useful routine, worth keeping around somewhere since you might have occasion to use it in the future / Computes the starting and ending displacements for the ith subinterval in an n- element array given that there are m subintervals of approximately equal size Input : int n - length of array ( array indexed [0][n -1]) int m - number of subintervals int i - subinterval number Output : int s - location to store subinterval starting index int e - location to store subinterval ending index Suppose we want to partition a element array into 3 subintervals of roughly the same size The following three pairs of calls find the starting and ending indices of each subinterval : decompose1d ( 100, 3, 0, &s, &e ); ( now s = 0, e = 33) decompose1d ( 100, 3, 1, &s, &e ); ( now s = 34, e = 66) decompose1d ( 100, 3, 2, &s, &e ); ( now s = 67, e = 99) The subinterval length can be computed with e - s + 1 Based on the FORTRAN subroutine MPE_ DECOMP1D in the file UsingMPI / intermediate / decomp f supplied with the book " Using MPI " by Gropp et al It has been adapted to use 0- based indexing / void decompose1d ( int n, int m, int i, int s, int e ) { const int length = n / m; const int deficit = n % m; s = i length + ( i < deficit? i : deficit ); e = s + length - ( i < deficit? 0 : 1 ); if ( ( e >= n ) ( i == m - 1 ) ) e = n - 1; }
6 Appendix Derivation of the Power Method To see why the power method works, consider the n n matrix A with eigenvectors v 1, v 2,, v n and associated eigenvalues λ 1, λ 2,, λ n, where λ 1 > λ 2 λ 3 λ n so λ 1 is the dominant eigenvalue Let x be any vector that can be written in the form x = c 1 v 1 + c 2 v 2 + c 3 v c n v n with c 1 0 (to ensure that x has some component parallel to v 1 ) Then Ax = A(c 1 v 1 + c 2 v 2 + c 3 v c n v n ) When we repeatedly multiply by A we have = c 1 Av 1 + c 2 Av 2 + c 3 Av c n Av n = c 1 λ 1 v 1 + c 2 λ 2 v 2 + c 3 λ 3 v c n λ n v n A k x = c 1 A k v 1 + c 2 A k v 2 + c 3 A k v c n A k v n = c 1 λ k 1v 1 + c 2 λ k 2v 2 + c 3 λ k 3v c n λ k nv n [ = c 1 λ k 1 v 1 + c ( ) k 2 λ2 v 2 + c ( ) k 3 λ3 v c n c 1 λ 1 c 1 λ 1 c 1 ( λn λ 1 ) k v n ] Notice that since λ i /λ 1 < 1 for i = 2,, n, the bracketed expression tends to the eigenvector v 1 as k Thus, the iteration x k+1 Ax k will converge to an eigenvector associated with the dominant eigenvalue of A The estimated value of λ 1 can be computed with λ 1 = lim k x T k Ax k x T k x k x T k = lim x k+1 k x T k x k It is common to normalize the vectors x k as they are produced so that x T k x k = 1 In this case λ 1 = lim k x T k x k+1
OPEN MP and MPI on Kingspeak chpc cluster
OPEN MP and MPI on Kingspeak chpc cluster Command to compile the code with openmp and mpi /uufs/kingspeak.peaks/sys/pkg/openmpi/std_intel/bin/mpicc -o hem hemhotlz.c -I /uufs/kingspeak.peaks/sys/pkg/openmpi/std_intel/include
More informationWeb consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page
Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information
More informationCOSC 462. Parallel Algorithms. The Design Basics. Piotr Luszczek
COSC 462 Parallel Algorithms The Design Basics Piotr Luszczek September 18, 2017 1/16 Levels of Abstraction 2/16 Concepts Tools Algorithms: partitioning, communication, agglomeration, mapping Domain, channel,
More informationBasic Communication Operations (Chapter 4)
Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:
More informationAcknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text
Acknowledgments Programming with MPI Parallel ming Jan Thorbecke Type to enter text This course is partly based on the MPI courses developed by Rolf Rabenseifner at the High-Performance Computing-Center
More informationOpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing
CS 590: High Performance Computing OpenMP Introduction Fengguang Song Department of Computer Science IUPUI OpenMP A standard for shared-memory parallel programming. MP = multiprocessing Designed for systems
More informationPoint-to-Point Synchronisation on Shared Memory Architectures
Point-to-Point Synchronisation on Shared Memory Architectures J. Mark Bull and Carwyn Ball EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email:
More informationLecture 2: Introduction to OpenMP with application to a simple PDE solver
Lecture 2: Introduction to OpenMP with application to a simple PDE solver Mike Giles Mathematical Institute Mike Giles Lecture 2: Introduction to OpenMP 1 / 24 Hardware and software Hardware: a processor
More informationDense matrix algebra and libraries (and dealing with Fortran)
Dense matrix algebra and libraries (and dealing with Fortran) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Dense matrix algebra and libraries (and dealing with Fortran)
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationA Few Numerical Libraries for HPC
A Few Numerical Libraries for HPC CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Few Numerical Libraries for HPC Spring 2016 1 / 37 Outline 1 HPC == numerical linear
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 15, 2010 José Monteiro (DEI / IST) Parallel and Distributed Computing
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve PTHREADS pthread_create, pthread_exit, pthread_join Mutex: locked/unlocked; used to protect access to shared variables (read/write) Condition variables: used to allow threads
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationCS691/SC791: Parallel & Distributed Computing
CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.
More informationOpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.
OpenMP and MPI Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 16, 2011 CPD (DEI / IST) Parallel and Distributed Computing 18
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationScientific Computing
Lecture on Scientific Computing Dr. Kersten Schmidt Lecture 20 Technische Universität Berlin Institut für Mathematik Wintersemester 2014/2015 Syllabus Linear Regression, Fast Fourier transform Modelling
More informationAn Introduction to OpenAcc
An Introduction to OpenAcc ECS 158 Final Project Robert Gonzales Matthew Martin Nile Mittow Ryan Rasmuss Spring 2016 1 Introduction: What is OpenAcc? OpenAcc stands for Open Accelerators. Developed by
More informationHybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code
Hybrid MPI + OpenMP Approach to Improve the Scalability of a Phase-Field-Crystal Code Reuben D. Budiardja reubendb@utk.edu ECSS Symposium March 19 th, 2013 Project Background Project Team (University of
More informationITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...
ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More information1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008
1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction
More information15-418, Spring 2008 OpenMP: A Short Introduction
15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationOpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen
OpenMPand the PGAS Model CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen LastTime: Message Passing Natural model for distributed-memory systems Remote ( far ) memory must be retrieved before use Programmer
More informationDepartment of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer
HPC-Lab Session 2: OpenMP M. Bader, A. Breuer Meetings Date Schedule 10/13/14 Kickoff 10/20/14 Q&A 10/27/14 Presentation 1 11/03/14 H. Bast, Intel 11/10/14 Presentation 2 12/01/14 Presentation 3 12/08/14
More informationLoop Modifications to Enhance Data-Parallel Performance
Loop Modifications to Enhance Data-Parallel Performance Abstract In data-parallel applications, the same independent
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationHPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)
HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP) V. Akishina, I. Kisel, G. Kozlov, I. Kulakov, M. Pugach, M. Zyzak Goethe University of Frankfurt am Main 2015 Task Parallelism Parallelization
More informationA Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh
A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationShared Memory Programming Model
Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache
More informationLecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications
Lecture 7 OpenMP: Reduction, Synchronization, Scheduling & Applications Announcements Section and Lecture will be switched on Thursday and Friday Thursday: section and Q2 Friday: Lecture 2010 Scott B.
More informationCS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011
CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011 Administrative Thursday s class Meet in WEB L130 to go over programming assignment Midterm on Thursday
More informationAdvanced Message-Passing Interface (MPI)
Outline of the workshop 2 Advanced Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Morning: Advanced MPI Revision More on Collectives More on Point-to-Point
More informationLecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1
Lecture 14: Mixed MPI-OpenMP programming Lecture 14: Mixed MPI-OpenMP programming p. 1 Overview Motivations for mixed MPI-OpenMP programming Advantages and disadvantages The example of the Jacobi method
More informationParallel Programming
Parallel Programming Lecture delivered by: Venkatanatha Sarma Y Assistant Professor MSRSAS-Bangalore 1 Session Objectives To understand the parallelization in terms of computational solutions. To understand
More informationOpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationOpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN
OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 1 Parallel Programming Standards Thread Libraries - Win32 API / Posix
More informationIntroduction to OpenMP
Introduction to OpenMP Ricardo Fonseca https://sites.google.com/view/rafonseca2017/ Outline Shared Memory Programming OpenMP Fork-Join Model Compiler Directives / Run time library routines Compiling and
More informationAdvanced Computer Architecture Lab 3 Scalability of the Gauss-Seidel Algorithm
Advanced Computer Architecture Lab 3 Scalability of the Gauss-Seidel Algorithm Andreas Sandberg 1 Introduction The purpose of this lab is to: apply what you have learned so
More informationCS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012
CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night
More informationMango DSP Top manufacturer of multiprocessing video & imaging solutions.
1 of 11 3/3/2005 10:50 AM Linux Magazine February 2004 C++ Parallel Increase application performance without changing your source code. Mango DSP Top manufacturer of multiprocessing video & imaging solutions.
More informationIntroduction to OpenMP. Lecture 2: OpenMP fundamentals
Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview 2 Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs What is OpenMP? 3 OpenMP is an API designed for programming
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationOpen Multi-Processing: Basic Course
HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels
More informationOpenMP and more Deadlock 2/16/18
OpenMP and more Deadlock 2/16/18 Administrivia HW due Tuesday Cache simulator (direct-mapped and FIFO) Steps to using threads for parallelism Move code for thread into a function Create a struct to hold
More information19.1. Unit 19. OpenMP Library for Parallelism
19.1 Unit 19 OpenMP Library for Parallelism 19.2 Overview of OpenMP A library or API (Application Programming Interface) for parallelism Requires compiler support (make sure the compiler you use supports
More informationOpenMP - exercises -
OpenMP - exercises - Introduction to Parallel Computing with MPI and OpenMP P.Dagna Segrate, November 2016 Hello world! (Fortran) As a beginning activity let s compile and run the Hello program, either
More informationCS691/SC791: Parallel & Distributed Computing
CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP Part 2 1 OPENMP: SORTING 1 Bubble Sort Serial Odd-Even Transposition Sort 2 Serial Odd-Even Transposition Sort First OpenMP Odd-Even
More informationParallel Programming with OpenMP
Advanced Practical Programming for Scientists Parallel Programming with OpenMP Robert Gottwald, Thorsten Koch Zuse Institute Berlin June 9 th, 2017 Sequential program From programmers perspective: Statements
More informationIterative Methods for Linear Systems
Iterative Methods for Linear Systems 1 the method of Jacobi derivation of the formulas cost and convergence of the algorithm a Julia function 2 Gauss-Seidel Relaxation an iterative method for solving linear
More information5.12 EXERCISES Exercises 263
5.12 Exercises 263 5.12 EXERCISES 5.1. If it s defined, the OPENMP macro is a decimal int. Write a program that prints its value. What is the significance of the value? 5.2. Download omp trap 1.c from
More informationParallel Programming: OpenMP
Parallel Programming: OpenMP Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. November 10, 2016. An Overview of OpenMP OpenMP: Open Multi-Processing An
More informationEPL372 Lab Exercise 5: Introduction to OpenMP
EPL372 Lab Exercise 5: Introduction to OpenMP References: https://computing.llnl.gov/tutorials/openmp/ http://openmp.org/wp/openmp-specifications/ http://openmp.org/mp-documents/openmp-4.0-c.pdf http://openmp.org/mp-documents/openmp4.0.0.examples.pdf
More informationShared Memory Programming with OpenMP (3)
Shared Memory Programming with OpenMP (3) 2014 Spring Jinkyu Jeong (jinkyu@skku.edu) 1 SCHEDULING LOOPS 2 Scheduling Loops (2) parallel for directive Basic partitioning policy block partitioning Iteration
More informationCOMP Parallel Computing. SMM (2) OpenMP Programming Model
COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationChapter 3 Parallel Software
Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers
More informationDepartment of Informatics V. Tsunami-Lab. Session 4: Optimization and OMP Michael Bader, Alex Breuer. Alex Breuer
Tsunami-Lab Session 4: Optimization and OMP Michael Bader, MAC-Cluster: Overview Intel Sandy Bridge (snb) AMD Bulldozer (bdz) Product Name (base frequency) Xeon E5-2670 (2.6 GHz) AMD Opteron 6274 (2.2
More informationOpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 4 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationCopyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 6 Parallel Program Development 1 Roadmap Solving non-trivial problems. The n-body problem. The traveling salesman problem. Applying Foster
More informationEvaluating the Portability of UPC to the Cell Broadband Engine
Evaluating the Portability of UPC to the Cell Broadband Engine Dipl. Inform. Ruben Niederhagen JSC Cell Meeting CHAIR FOR OPERATING SYSTEMS Outline Introduction UPC Cell UPC on Cell Mapping Compiler and
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationOpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer
OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
More informationNIC FastICA Implementation
NIC-TR-2004-016 NIC FastICA Implementation Purpose This document will describe the NIC FastICA implementation. The FastICA algorithm was initially created and implemented at The Helsinki University of
More informationAssignment 1 OpenMP Tutorial Assignment
Assignment 1 OpenMP Tutorial Assignment B. Wilkinson and C Ferner: Modification date Aug 5, 2014 Overview In this assignment, you will write and execute run some simple OpenMP programs as a tutorial. First
More informationParallel Processing Top manufacturer of multiprocessing video & imaging solutions.
1 of 10 3/3/2005 10:51 AM Linux Magazine March 2004 C++ Parallel Increase application performance without changing your source code. Parallel Processing Top manufacturer of multiprocessing video & imaging
More informationLecture 16: Recapitulations. Lecture 16: Recapitulations p. 1
Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently
More informationOverview of OpenMP. Unit 19. Using OpenMP. Parallel for. OpenMP Library for Parallelism
19.1 Overview of OpenMP 19.2 Unit 19 OpenMP Library for Parallelism A library or API (Application Programming Interface) for parallelism Requires compiler support (make sure the compiler you use supports
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationOpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018
OpenMP 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationSummer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics
Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one
More informationProgramming Shared Memory Systems with OpenMP Part I. Book
Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine
More informationJANUARY 2004 LINUX MAGAZINE Linux in Europe User Mode Linux PHP 5 Reflection Volume 6 / Issue 1 OPEN SOURCE. OPEN STANDARDS.
0104 Cover (Curtis) 11/19/03 9:52 AM Page 1 JANUARY 2004 LINUX MAGAZINE Linux in Europe User Mode Linux PHP 5 Reflection Volume 6 / Issue 1 LINUX M A G A Z I N E OPEN SOURCE. OPEN STANDARDS. THE STATE
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationScientific Programming in C XIV. Parallel programming
Scientific Programming in C XIV. Parallel programming Susi Lehtola 11 December 2012 Introduction The development of microchips will soon reach the fundamental physical limits of operation quantum coherence
More informationLittle Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo
OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;
More informationParallel Computing. Prof. Marco Bertini
Parallel Computing Prof. Marco Bertini Shared memory: OpenMP Implicit threads: motivations Implicit threading frameworks and libraries take care of much of the minutiae needed to create, manage, and (to
More informationShared Memory Programming with OpenMP
Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model
More informationLab: Scientific Computing Tsunami-Simulation
Lab: Scientific Computing Tsunami-Simulation Session 4: Optimization and OMP Sebastian Rettenberger, Michael Bader 23.11.15 Session 4: Optimization and OMP, 23.11.15 1 Department of Informatics V Linux-Cluster
More informationOpenMP on Ranger and Stampede (with Labs)
OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationParallel Image Processing
Parallel Image Processing Course Level: CS1 PDC Concepts Covered: PDC Concept Concurrency Data parallel Bloom Level C A Programming Skill Covered: Loading images into arrays Manipulating images Programming
More informationParallel Programming
Parallel Programming OpenMP Dr. Hyrum D. Carroll November 22, 2016 Parallel Programming in a Nutshell Load balancing vs Communication This is the eternal problem in parallel computing. The basic approaches
More informationParallelisation. Michael O Boyle. March 2014
Parallelisation Michael O Boyle March 2014 1 Lecture Overview Parallelisation for fork/join Mapping parallelism to shared memory multi-processors Loop distribution and fusion Data Partitioning and SPMD
More informationCache Awareness. Course Level: CS1/CS2. PDC Concepts Covered: PDC Concept Locality False Sharing
Cache Awareness Course Level: CS1/CS PDC Concepts Covered: PDC Concept Locality False Sharing Bloom Level C C Programming Knowledge Prerequisites: Know how to compile Java/C++ Be able to understand loops
More informationOrna Agmon Ben-Yehuda. OpenMP Usage. March 15, 2009 OpenMP Usage Slide 1
OpenMP Usage Orna Agmon Ben-Yehuda March 15, 2009 OpenMP Usage Slide 1 What is this talk about? Dilemmas I encountered when transforming legacy code using openmp Tricks I found to make my life easier The
More informationL21: Putting it together: Tree Search (Ch. 6)!
Administrative CUDA project due Wednesday, Nov. 28 L21: Putting it together: Tree Search (Ch. 6)! Poster dry run on Dec. 4, final presentations on Dec. 6 Optional final report (4-6 pages) due on Dec. 14
More informationParallele Numerik. Blatt 1
Universität Konstanz FB Mathematik & Statistik Prof. Dr. M. Junk Dr. Z. Yang Ausgabe: 02. Mai; SS08 Parallele Numerik Blatt 1 As a first step, we consider two basic problems. Hints for the realization
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationHigh Performance Computing. Introduction to Parallel Computing
High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More information1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3
6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require
More informationConcurrent Programming with OpenMP
Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed
More informationParallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab
Parallel Programming Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab Summing n numbers for(i=1; i++; i
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs 2 1 What is OpenMP? OpenMP is an API designed for programming
More information