Lecture 4: OpenMP Open Multi-Processing

Similar documents
ECE 574 Cluster Computing Lecture 10

EPL372 Lab Exercise 5: Introduction to OpenMP

Introduction to OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

Distributed Systems + Middleware Concurrent Programming with OpenMP

High Performance Computing: Tools and Applications

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

Introduction to OpenMP

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Shared Memory Programming with OpenMP

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

Shared Memory Parallelism - OpenMP

Masterpraktikum - High Performance Computing

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

Alfio Lazzaro: Introduction to OpenMP

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to Standard OpenMP 3.1

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

Introduction to. Slides prepared by : Farzana Rahman 1

Parallel Programming

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Parallel programming using OpenMP

Mango DSP Top manufacturer of multiprocessing video & imaging solutions.

15-418, Spring 2008 OpenMP: A Short Introduction

Multithreading in C with OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction to OpenMP

Shared Memory Programming Paradigm!

A brief introduction to OpenMP

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Introduction to OpenMP

Multi-core Architecture and Programming

CS 470 Spring Mike Lam, Professor. OpenMP

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

CS691/SC791: Parallel & Distributed Computing

Shared Memory Programming with OpenMP

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

OpenMP - Introduction

Shared Memory programming paradigm: openmp

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

Parallel Programming: OpenMP

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Overview: The OpenMP Programming Model

Shared Memory Parallelism using OpenMP

Computer Architecture

OpenMP Fundamentals Fork-join model and data environment

Introduction to OpenMP

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

Data Environment: Default storage attributes

Shared Memory Programming with OpenMP

Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche.

OpenMP. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Introduction to OpenMP.

Session 4: Parallel Programming with OpenMP

HPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas

Parallel Programming with OpenMP. CS240A, T. Yang

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

High Performance Computing: Tools and Applications

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

[Potentially] Your first parallel application

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

Programming with OpenMP*

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

Shared Memory Programming Model

Introduction to OpenMP

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Open Multi-Processing: Basic Course

OpenMP on Ranger and Stampede (with Labs)

Parallel Programming in C with MPI and OpenMP

OPENMP OPEN MULTI-PROCESSING

CS 5220: Shared memory programming. David Bindel

Parallel Programming in C with MPI and OpenMP

CME 213 S PRING Eric Darve

Introduction to OpenMP. Rogelio Long CS 5334/4390 Spring 2014 February 25 Class

OpenMP threading: parallel regions. Paolo Burgio

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Review. Tasking. 34a.cpp. Lecture 14. Work Tasking 5/31/2011. Structured block. Parallel construct. Working-Sharing contructs.

Introduction to OpenMP

OpenMP loops. Paolo Burgio.

Cluster Computing 2008

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

Transcription:

CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1

Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP constructs syntax and semantics Work sharing Thread scheduling Data sharing Reduction Synchronization count_primes hands-on! 01/23/2017 CS4230 2

OpenMP: Common Thread-Level Programming Approach in HPC Portable across shared-memory architectures Incremental parallelization Parallelize individual computations in a program while leaving the rest of the program sequential Compiler based Compiler generates thread program and synchronization Extensions to existing programming languages (Fortran, C and C++) mainly by directives a few library routines See http://www.openmp.org 01/23/2017 CS4230 3

Fork-Join Model 1/23/2017 CS 4230

OpenMP HelloWorld #include <omp.h> #include <stdio.h> int main (int argc, char *argv[]) { } return 0; } #pragma omp parallel { printf("hello World from Thread %d!\n, omp_get_thread_num()); Compiling for OpenMP gcc: -fopenmp, icc: -openmp, pgcc: -mp, 01/23/2017 CS4230 5

Number of threads if clause NUM_THREADS clause omp_set_num_threads() OMP_NUM_THREADS Default 01/23/2017 CS4230 6

OpenMP constructs Compiler directives (44) #pragma omp parallel [clause] Runtime library routines (35) #include <omp.h> int omp_get_num_threads(void) int omp_get_thread_num(void) Environment variable (13) export OMP_NUM_THREADS=x 01/23/2017 CS4230 7

Work sharing divides the execution of the enclosed code region among multiple threads for shares iterations of a loop across the team of threads #pragma omp parallel for [clause] Also sections and single (see [1]) 01/23/2017 CS4230 8

#include <omp.h> Work sharing - for int main (int argc, char *argv[]) { int i, n=10; #pragma omp parallel for { for(i=0;i<n;i++) printf("hello World!\n ); } return 0; } 01/23/2017 CS4230 9

Thread scheduling Static: Loop iterations are divided into pieces of size chunk and then statically assigned to threads. schedule(static [,chunk]) Dynamic: Loop iterations are divided into pieces of size chunk, and dynamically scheduled among the threads schedule(dynamic [,chunk]) More options, guided, runtime, auto 01/23/2017 CS4230 10

Data sharing/ Data scope shared variables are shared among threads private variables are private to a thread Default is shared Loop index is private, nested loops #pragma omp parallel for private(list) shared(list) Can be used with any work sharing clause Also firstprivate, lastprivate, default, copyin, (see [1]) 01/23/2017 CS4230 11

Reduction The reduction clause performs a reduction on the variables that appear in its list A private copy for each list variable is created for each thread. At the end of the reduction, the reduction variable is applied to all private copies of the shared variable, and the final result is written to the global shared variable. reduction (operator: list) 01/23/2017 CS4230 12

Reduction #include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { int i,n=1000; float a[1000], b[1000], sum; } for (i=0; i<n; i++) a[i] = b[i] = i * 1.0; sum = 0.0; #pragma omp parallel for reduction(+:sum) for (i=0; i<n; i++) sum = sum + (a[i] * b[i]); printf("sum = %f\n",sum); Source: http://computing.llnl.gov/tutorials/openmp/samples/c/omp_reduction.c 01/23/2017 CS4230 13

OpenMP Synchronization Recall barrier from pthreads int pthread_barrier_wait(pthread_barrier_t *barrier); Implicit barrier At the end of parallel regions Barrier can be removed with nowait clause #pragma omp parallel for nowait Explicit synchronization single, critical, atomic, ordered, flush 01/23/2017 CS4230 14

Exercise See prime_sequential.c How to improve? Write a thread parallel version using what we discussed Observe scalability with the #of threads Threads Time (s) Speedup 01/23/2017 CS4230 15

Summary What s good? Small changes are required to produce a parallel program from sequential (parallel formulation) Avoid having to express low-level mapping details Portable and scalable, correct on 1 processor What is missing? Not completely natural if want to write a parallel code from scratch Not always possible to express certain common parallel constructs Locality management Control of performance 01/23/2017 CS4230 16

References [1] Blaise Barney, Lawrence Livermore National Laboratory https://computing.llnl.gov/tutorials/openmp [2] XSEDE HPC Workshop: OpenMP https://www.psc.edu/index.php/136- users/training/2496-xsede-hpc-workshopjanuary-17-2017-openmp 01/23/2017 CS4230 17