CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Similar documents
DPHPC: Introduction to OpenMP Recitation session

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Introduction to OpenMP.

ECE 574 Cluster Computing Lecture 10

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Parallel Programming in C with MPI and OpenMP

EPL372 Lab Exercise 5: Introduction to OpenMP

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Parallel Programming in C with MPI and OpenMP

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Introduction to OpenMP

Introduction to OpenMP

DPHPC: Introduction to OpenMP Recitation session

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Parallel Programming in C with MPI and OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Lecture 2 A Hand-on Introduction to OpenMP

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

CS 5220: Shared memory programming. David Bindel

Shared Memory Parallelism - OpenMP

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming with OpenMP. CS240A, T. Yang

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Introduction to. Slides prepared by : Farzana Rahman 1

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Allows program to be incrementally parallelized

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Multi-core Architecture and Programming

Alfio Lazzaro: Introduction to OpenMP

EE/CSCI 451: Parallel and Distributed Computation

Lecture 4: OpenMP Open Multi-Processing

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

Shared Memory Programming Model

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Shared Memory Parallelism using OpenMP

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Session 4: Parallel Programming with OpenMP

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Shared Memory Programming Paradigm!

COMP Parallel Computing. SMM (2) OpenMP Programming Model

Parallel programming using OpenMP

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Shared Memory Programming with OpenMP

Scientific Computing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Introduction to OpenMP

An Introduction to OpenMP

Parallel Programming

Multithreading in C with OpenMP

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Introduction to OpenMP

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Shared memory programming

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Parallel Numerical Algorithms

Parallel Computing Parallel Programming Languages Hwansoo Han

Data Environment: Default storage attributes

High Performance Computing: Tools and Applications

Programming Shared Memory Systems with OpenMP Part I. Book

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Introduction to OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Parallel Programming

A brief introduction to OpenMP

Parallel Programming using OpenMP

Parallel Programming using OpenMP

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Introduction to Standard OpenMP 3.1

Open Multi-Processing: Basic Course

Example of a Parallel Algorithm

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

OpenMP Tutorial. Rudi Eigenmann of Purdue, Sanjiv Shah of Intel and others too numerous to name have contributed content for this tutorial.

Introduction to OpenMP

OpenMP: The "Easy" Path to Shared Memory Computing

Overview: The OpenMP Programming Model

Shared Memory programming paradigm: openmp

Distributed Systems + Middleware Concurrent Programming with OpenMP

Programming with OpenMP*

Unified Parallel C (UPC)

Introduction to OpenMP

Transcription:

CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman)

Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI 1.0), put / get (MPI 2.0) Multithreading (Java threads, pthreads) Shared address space Only local variables on thread stack are private Explicit thread creation, synchronization Shared-memory programming (OpenMP, UPC) Mixed shared / separate address spaces Implicit threads & synchronization 2

Shared Memory Programming Model Attempts to ease task of parallel programming Hide details Thread creation, messages, synchronization Compiler generate parallel code Based on user annotations Possibly lower performance Less control over Synchronization Locality Message granularity My inadvertently introduce data races Read & write same shared memory location in parallel loop 3

OpenMP Support parallelism for SMPs, multi-core Provide a simple portable model Allows both shared and private data Provides parallel for/do loops Includes Automatic support for fork/join parallelism Reduction variables Atomic statement one processes executes at a time Single statement only one process runs this code (first thread to reach it) 4

OpenMP Characteristics Both local & shared memory (depending on directives) Parallelism directives for parallel loops & functions Compilers convert into multi-threaded programs (i.e. pthreads) Not supported on clusters Example #pragma omp parallel for private(i) for (i=0; i<nupdate; i++) { int ran = random(); } Parallel for indicates loop iterations may be executed in parallel table[ ran & (TABSIZE-1) ] ^= stable[ ran >> (64-LSTSIZE) ]; 5

More on OpenMP Characteristics Not a full parallel language, but a language extension A set of standard compiler directives and library routines Used to create parallel Fortran, C and C++ programs Usually used to parallelize loops Standardizes last 15 years of SMP practice Implementation Compiler directives using #pragma omp <directive> Parallelism can be specified for regions & loops Data can be Private each processor has local copy Shared single copy for all processors 6

OpenMP Programming Model Fork-join parallelism (restricted form of MIMD) Normally single thread of control (master) Worker threads spawned when parallel region encountered Barrier synchronization required at end of parallel region Master Thread Parallel Regions 7

OpenMP Example Parallel Region Task level parallelism #pragma omp parallel { } double a[1000]; omp_set_num_threads(4); #pragma omp parallel { int id = omp_thread_num(); foo(id,a); } printf( all done \n ); OpenMP compiler double a[1000]; omp_set_num_threads(4); #pragma omp parallel foo(0,a); foo(1,a); foo(2,a); foo(3,a); printf( all done \n ); 8

OpenMP Example Parallel Loop Loop level parallelism #pragma omp parallel for Loop iterations are assigned to threads, invoked as functions OpenMP compiler #pragma omp parallel for for (i=0;i<n;i++) { foo(i); } #pragma omp parallel { int id, i, nthreads,start, end; id = omp_get_thread_num(); nthreads = omp_get_num_threads(); start = id * N / nthreads ; // assigning end = (id+1) * N / nthreads ; // work for (i=start; i<end; i++) { foo(i); Loop iterations } scheduled in blocks } 9

Iteration Scheduling Parallel for loop Simply specifies loop iterations may be executed in parallel Actual processor assignment is up to compiler / run-time system Scheduling goals Reduce load imbalance Reduce synchronization overhead Improve data location Scheduling approaches Block (chunks of contiguous iterations) Cyclic (round-robin) Dynamic (threads request additional iterations when done) 10

Parallelism May Cause Data Races Data race Multiple accesses to shared data in parallel At least one access is a write Result dependent on order of shared accesses May be introduced by parallel loop If data dependence exists between loop iterations Result depend on order loop iterations are executed Example #pragma omp parallel for for (i=1;i<n-1;i++) { a[i] = ( a[i-1] + a[i+1] ) / 2; } 11

Sample Fortran77 OpenMP Code program compute_pi integer n, i double precision w, x, sum, pi, f, a c function to integrate f(a) = 4.d0 / (1.d0 + a*a) print *, Enter # of intervals: read *,n c calculate the interval size w = 1.0d0/n sum = 0.0d0!$OMP PARALLEL DO PRIVATE(x), SHARED(w)!$OMP& REDUCTION(+: sum) do i = 1, n x = w * (i - 0.5d0) sum = sum + f(x) enddo pi = w * sum print *, computed pi =, pi stop end 12

Reductions Specialized computations that Partial results may be computed in parallel Combine partial results into final result Examples Addition, multiplication, minimum, maximum, count OpenMP reduction variable Compiler inserts code to Compute partial result locally Use synchronization / communication to combine results 13

UPC Extension to C for parallel computing Target Environment Distributed memory machines Cache coherent multi-processors Multi-core processors Features Explicit control of data distribution Includes parallel for statement MPI-like run-time library support 14

UPC Characteristics Local memory, shared arrays accessed by global pointers Parallelism : single program on multiple nodes (SPMD) Provides illusion of shared one-dimensional arrays Features Data distribution declarations for arrays One-sided communication routines (memput / memget) Compilers translate shared pointers & generate communication Can cast shared pointers to local pointers for efficiency Example shared int *x, *y, z[100]; upc_forall (i = 0; i < 100; j++) { z[i] = *x++ *y++; } 15

More UPC Shared pointer Key feature of UPC Enables support for distributed memory architectures Local (private) pointer pointing to shared array Consists of two parts Processor number Local address on processor Read operations on shared pointer If for nonlocal data, compiler translates into memget( ) Write operations on shared pointer If for nonlocal data, compiler translates into memput( ) Cast into local private pointer Accesses local portion of shared array w/o communication 16

UPC Execution Model SPMD-based One thread per processor Each thread starts with same entry to main Different consistency models possible Strict model is based on sequential consistency Results must match some sequential execution order Relaxed based on release consistency Writes visible only after release synchronization Increased freedom to reorder operations Reduced need to communicate results Consistency models are tricky Avoid data races altogether 17

Forall Loop Forms basis of parallelism Add fourth parameter to for loop, affinity Where code is executed is based on affinity Attempt to assign loop iteration to processor with shared data To reduce communication Lacks explicit barrier before / after execution Differs from OpenMP Supports nested forall loops 18

Split-phase Barriers Traditional Barriers Once enter barrier, busy-wait until all threads arrive Split-phase Announce intention to enter barrier (upc_notify) Perform some local operations Wait for other threads (upc_wait) Advantage Allows work while waiting for processes to arrive Disadvantage Must find work to do Takes time to communicate both notify and wait 19