What else is available besides OpenMP?

Similar documents
Introduction to OpenMP

OpenMP Overview. in 30 Minutes. Christian Terboven / Aachen, Germany Stand: Version 2.

Introduction to OpenMP

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP for Accelerators

Tasks and Threads. What? When? Tasks and Threads. Use OpenMP Threading Building Blocks (TBB) Intel Math Kernel Library (MKL)

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

PThreads in a Nutshell

CS 261 Fall Mike Lam, Professor. Threads

12:00 13:20, December 14 (Monday), 2009 # (even student id)

Introduction to OpenMP

DPHPC: Introduction to OpenMP Recitation session

Threaded Programming. Lecture 9: Alternatives to OpenMP

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai

DPHPC: Introduction to OpenMP Recitation session

C++ and OpenMP. 1 ParCo 07 Terboven C++ and OpenMP. Christian Terboven. Center for Computing and Communication RWTH Aachen University, Germany

Table of Contents. Cilk

OpenMP and more Deadlock 2/16/18

OpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Threads. studykorner.org

/Users/engelen/Sites/HPC folder/hpc/openmpexamples.c

Shared Memory Parallel Programming

Shared-Memory Programming

Concurrency, Thread. Dongkun Shin, SKKU

Parallel Programming in C with MPI and OpenMP

Runtime Correctness Checking for Emerging Programming Paradigms

Introduction to Parallel Programming Part 4 Confronting Race Conditions

Introduction to OpenMP.

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

NUMA-aware OpenMP Programming

Parallel Programming in C with MPI and OpenMP

Advanced OpenMP: Tools

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Data Environment: Default storage attributes

Shared memory parallel computing. Intel Threading Building Blocks

C++ and OpenMP Christian Terboven Center for Computing and Communication RWTH Aachen University, Germany

ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications

Parallel Programming in C with MPI and OpenMP

Advanced OpenMP Features

Parallelising serial applications. Darryl Gove Compiler Performance Engineering

Windows RWTH Aachen University

HPC Tools on Windows. Christian Terboven Center for Computing and Communication RWTH Aachen University.

Parallel Programming with OpenMP. CS240A, T. Yang

Concurrent Programming with OpenMP

IWOMP Dresden, Germany

HPCSE - I. «Introduction to multithreading» Panos Hadjidoukas

Performance Tools for Technical Computing

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

Advanced Operating Systems : Exam Guide. Question/Material Distribution Guide. Exam Type Questions

Advanced programming with OpenMP. Libor Bukata a Jan Dvořák

Intel Thread Building Blocks, Part II

A Programmer s View of Shared and Distributed Memory Architectures

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Thread Fundamentals. CSCI 315 Operating Systems Design 1

Concurrent Server Design Multiple- vs. Single-Thread

Guillimin HPC Users Meeting January 13, 2017

OpenMP in the Multicore Era

Parallel programming using OpenMP

Tasking and OpenMP Success Stories

High Performance Computing: Tools and Applications

OpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

Parallelism paradigms

CS Operating system

Advanced OpenMP. Other threading APIs

POSIX Threads and OpenMP tasks

Programming Shared Memory Systems with OpenMP Part I. Book

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

THREADS. Jo, Heeseung

Lecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications

CS370 Operating Systems

OpenMP Tutorial. Dirk Schmidl. IT Center, RWTH Aachen University. Member of the HPC Group Christian Terboven

Concepts in. Programming. The Multicore- Software Challenge. MIT Professional Education 6.02s Lecture 1 June 8 9, 2009

Parallelizing N-Queens with Intel Parallel Composer

OpenMP. Dr. William McDoniel and Prof. Paolo Bientinesi WS17/18. HPAC, RWTH Aachen

1 de :02

Hybrid MPI and OpenMP Parallel Programming

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

Parallel Programming

Concurrent Programming with OpenMP

OpenMP, Part 2. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2015

Thread and Synchronization

Moore s Law. Multicore Programming. Vendor Solution. Power Density. Parallelism and Performance MIT Lecture 11 1.

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

Scientific Computing

Intel Developer Products for Parallelized Software Development

ENCM 501 Winter 2019 Assignment 9

Raspberry Pi Basics. CSInParallel Project

Outline. CS4254 Computer Network Architecture and Programming. Introduction 2/4. Introduction 1/4. Dr. Ayman A. Abdel-Hamid.

Debugging Serial and Parallel Programs with Visual Studio

Introduction to OpenMP. Martin Čuma Center for High Performance Computing University of Utah

Shared Memory Programming. Parallel Programming Overview

Getting Performance from OpenMP Programs on NUMA Architectures

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017

ECE 574 Cluster Computing Lecture 10

Parallel Computing. Lecture 17: OpenMP Last Touch

CSC 1600: Chapter 6. Synchronizing Threads. Semaphores " Review: Multi-Threaded Processes"

Transcription:

What else is available besides OpenMP? Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Parallel Programming June 6, RWTH Aachen University

Other Shared Memory paradigms o OpenMP: de facto standard, supported by all compilers Based on / Win32 Threads thos can be programmed by hand as well o C++ library, fits nicely to STL like programming style Provides parallel containers and operations: tasks, parallel_for, parallel_reduce, parallel_scan, parallel_sort, Commercial + Open Source variants, Windows + Linux + Solaris 2 o : The next release of C++ Will provide a memory model for multi threading Will provide basic thread management (currently in Boost)

Pi with OpenMP double f(double x) { return (double)4.0 / ((double)1.0 + (x*x)); void computepi() { double h = (double)1.0 / (double)inumintervals; double sum = 0, x; #pragma omp parallel for private(x) reduction(+:sum) for (int i = 1; i <= inumintervals; i++) { x = h * ((double)i - (double)0.5); sum += f(x); mypi = h * sum; 3

Pi with (1/2) 4 initialization of runtime tbb::task_scheduler_init init; CPI CalcPi(1.0 / (double)n); parallel reduction tbb::parallel_reduce(tbb::blocked_range<int>(1, n+1), CalcPi); pi = (1.0 / (double)n) * CalcPi.sum;

5 Advanced Topics on OpenMP Pi with (2/2) class CPI { public: functor: do the actual work here double sum; void operator() (const tbb::blocked_range<int>& r) { double x = 0.0; for (int i = r.begin(); i!= r.end(); ++i) { x = h * ((double)i 0.5); sum += f(x); constructor CPI(double) {... constructor CPI(CPI& other, tbb::split) {... void join(cpi& other) { sum += other.sum; reduction helper initialization of runtime tbb::task_scheduler_init init; CPI CalcPi(h /*=1.0 / (double)n*/); parallel reduction tbb::parallel_reduce(tbb::blocked_range<int>(1, n+1), CalcPi); pi = (h * CalcPi.sum;

Pi with [currently: Boost] (1/2) 6 thread_group t; for (int i = 0; i < = g_inumthreads; i++) t.create_thread( CalcPi(i) ); t.join_all(); spawn and join threads

7 Advanced Topics on OpenMP Pi with [currently: Boost] (1/2) struct CalcPi { CalcPi(int _in) : m_in { functor: do the actual work here void operator() { double dpi = 0.0, dh = 1.0 / (double)g_intervals, dsum = 0; for (int i = m_in + 1; i<=g_intervals; i+= g_inumthreads) { double dx = dh * ((double)i + 0.5); dsum += f(dx); dpi = dh * dsum; { mutex::scoped_lock1(g_mred); g_dpi += dpi; Mutual Exclusion thread_group t; for (int i = 0; i < = g_inumthreads; i++) spawn and join threads t.create_thread( CalcPi(i) ); t.join_all();

Pi with (1/2) pthread_t *tid; pthread_mutex_t reduction_mutex; 8 management overhead tid = (pthread_t*) calloc(g_inumthreads, sizeof(pthread_t)); for (i = 0; i < g_inumthreads; i++) pthread_create(&tid[i], NULL, PIworker, NULL); for (i = 0; i < g_inumthreads; i++) pthread_join(tid[i], NULL); spawn and join threads

9 Advanced Topics on OpenMP Pi with (2/2) pthread_t *tid; pthread_mutex_t reduction_mutex; Outlining the manual way void PIworker(void *arg) { int myid = pthread_self() tid[0]; for (int i = myid + 1; i <= g_intervals; i+= g_inumthreads) { /* computation of dpi */ pthread_mutex_lock(&reduction_mutex); g_dpi += dpi; Critical Region the pthread_mutex_unlock(&reduction_mutex); manual way management overhead tid = (pthread_t*) calloc(g_inumthreads, sizeof(pthread_t)); for (i = 0; i < g_inumthreads; i++) pthread_create(&tid[i], NULL, PIworker, NULL); for (i = 0; i < g_inumthreads; i++) pthread_join(tid[i], NULL); spawn and join threads

The End Thank you for your attention! 10