GCC Developers Summit Ottawa, Canada, June 2006

Similar documents
Lecture 4: OpenMP Open Multi-Processing

OpenMP 2. CSCI 4850/5850 High-Performance Computing Spring 2018

CSL 860: Modern Parallel

Advanced C Programming Winter Term 2008/09. Guest Lecture by Markus Thiele

Programming with Shared Memory PART II. HPC Fall 2012 Prof. Robert van Engelen

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

High Performance Computing: Tools and Applications

Multithreading in C with OpenMP

CS 5220: Shared memory programming. David Bindel

Programming with Shared Memory PART II. HPC Fall 2007 Prof. Robert van Engelen

Parallel and Distributed Programming. OpenMP

Overview: The OpenMP Programming Model

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

OpenMP - Introduction

Parallel Programming: OpenMP

Session 4: Parallel Programming with OpenMP

Parallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

HPCSE - I. «OpenMP Programming Model - Part I» Panos Hadjidoukas

Parallel Programming in C with MPI and OpenMP

Introduction to Standard OpenMP 3.1

Shared Memory programming paradigm: openmp

Computer Architecture

OpenMP 4. CSCI 4850/5850 High-Performance Computing Spring 2018

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

OpenMP. António Abreu. Instituto Politécnico de Setúbal. 1 de Março de 2013

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Parallel Programming in C with MPI and OpenMP

OpenMP. OpenMP. Portable programming of shared memory systems. It is a quasi-standard. OpenMP-Forum API for Fortran and C/C++

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell. Specifications maintained by OpenMP Architecture Review Board (ARB)

COMP4300/8300: The OpenMP Programming Model. Alistair Rendell

ECE 574 Cluster Computing Lecture 10

Introduction to OpenMP

by system default usually a thread per CPU or core using the environment variable OMP_NUM_THREADS from within the program by using function call

EPL372 Lab Exercise 5: Introduction to OpenMP

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

Masterpraktikum - High Performance Computing

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Shared Memory Programming with OpenMP

HPC Practical Course Part 3.1 Open Multi-Processing (OpenMP)

Parallel programming using OpenMP

Parallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides

Parallel Programming in C with MPI and OpenMP

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Parallel Processing Top manufacturer of multiprocessing video & imaging solutions.

Introduction to OpenMP

OpenMP loops. Paolo Burgio.

CS691/SC791: Parallel & Distributed Computing

Introduction to OpenMP.

Concurrent Programming with OpenMP

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Parallel Programming with OpenMP. CS240A, T. Yang

OpenMP: Open Multiprocessing

A brief introduction to OpenMP

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Parallel and Distributed Computing

Introduction to. Slides prepared by : Farzana Rahman 1

A Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh

Open Multi-Processing: Basic Course

Shared Memory Programming Model

Programming Shared Address Space Platforms using OpenMP

Department of Informatics V. HPC-Lab. Session 2: OpenMP M. Bader, A. Breuer. Alex Breuer

Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

DPHPC: Introduction to OpenMP Recitation session

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming

Introduction to OpenMP

Shared Memory Programming Paradigm!

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Distributed Systems + Middleware Concurrent Programming with OpenMP

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

Translating OpenMP Programs for Distributed-Memory Systems

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

OpenMP Fundamentals Fork-join model and data environment

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

CS 470 Spring Mike Lam, Professor. OpenMP


OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Tutorial. Seung-Jai Min. School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

Shared Memory Programming Models I

Shared memory parallel computing

OpenMP 4.0 implementation in GCC. Jakub Jelínek Consulting Engineer, Platform Tools Engineering, Red Hat

COMP Parallel Computing. SMM (2) OpenMP Programming Model

CME 213 S PRING Eric Darve

OpenMP on Ranger and Stampede (with Labs)

COSC 6374 Parallel Computation. Introduction to OpenMP. Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Cluster Computing 2008

OpenMP threading: parallel regions. Paolo Burgio

15-418, Spring 2008 OpenMP: A Short Introduction

Programming Shared-memory Platforms with OpenMP. Xu Liu

Practical stuff! ü OpenMP. Ways of actually get stuff done in HPC:

Data Environment: Default storage attributes

Multi-core Architecture and Programming

Allows program to be incrementally parallelized

OpenMP Tutorial. Rudi Eigenmann of Purdue, Sanjiv Shah of Intel and others too numerous to name have contributed content for this tutorial.

CS 470 Spring Mike Lam, Professor. OpenMP

Shared Memory Parallelism - OpenMP

Parallel Numerical Algorithms

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Transcription:

OpenMP Implementation in GCC Diego Novillo dnovillo@redhat.com Red Hat Canada GCC Developers Summit Ottawa, Canada, June 2006

OpenMP Language extensions for shared memory concurrency (C, C++ and Fortran) Designed around compiler pragmas Directives specify parallelism and work sharing Clauses specify attributes for data sharing and scheduling Based on fork/join model 30 Jun 2006 2

Programming Model Directives #pragma omp (C, C++) or!$omp (Fortran) Compiler replaces directives with calls to runtime library (libgomp) Library offers API for querying/controlling threads and scheduling Runtime controls in program or environment variables OMP_NUM_THREADS, OMP_SCHEDULE, OMP_DYNAMIC, OMP_NESTED 30 Jun 2006 3

Programming Model Programmer responsible for synchronization and sharing Sharing is variable based Directives used for synchronization Original intent: Sequential run = parallel run Compiler switch enables/disables the pragmas Invalid sequential programs are possible too: parallel algorithms 30 Jun 2006 4

OpenMP - Hello World #include <omp.h> main() { #pragma omp parallel } printf ( [%d] Hello\n, omp_get_thread_num()); $ gcc -fopenmp -o hello hello.c $./hello [2] Hello [3] Hello [0] Hello Master thread [1] Hello $ gcc -o hello hello.c $./hello [0] Hello May 31, 2006 5

Distributing work across threads #pragma omp for data/loop parallelism Partitions iteration space with schedule clause #pragma omp sections cobegin/coend style parallelism Sections delimited with #pragma omp section #pragma omp workshare Distributes execution of Fortran FORALL, WHERE and array assignments 30 Jun 2006 6

Data Sharing Various rules to determine sharing properties. Most are straightforward and intuitive. Globals and heap allocated variables are shared Locals declared inside a directive body are private Loop iteration variables for parallel loops are private 30 Jun 2006 7

Data Sharing Data sharing directive threadprivate Data sharing clauses shared private firstprivate lastprivate reduction 30 Jun 2006 8

Synchronization #pragma omp single #pragma omp master #pragma omp critical #pragma omp barrier #pragma omp atomic Atomic storage update: x op= expr, x++, x #pragma omp ordered Threads enter in loop iteration order. 30 Jun 2006 9

Implementation GNU OpenMP (GOMP) Four components Parsing Intermediate Representation Code Generation Run time library (libgomp) 30 Jun 2006 10

Implementation C C++ Fortran High GIMPLE New OMP nodes Common target for FEs Low GIMPLE Data mappings Some expansion CFG Outlining Calls to libgomp SESE parallel regions Expand 30 Jun 2006 11

Original Program main() { int i, sum = 0; } #pragma omp parallel for for (i = 0; i < 10; i++) { #pragma omp atomic sum += i; } printf ("sum = %d\n", sum); 30 Jun 2006 12

High GIMPLE main () { int i, sum = 0; } #pragma omp parallel shared(sum) { #pragma omp for nowait private(i) for (i = 0; i <= 9; i = i + 1) sync_fetch_and_add_4 (&sum, i); } printf ("sum = %d\n", sum); 30 Jun 2006 13

Low GIMPLE main () { int i, *D.1576, sum = 0; struct.omp_data_s.omp_data_o; }.omp_data_o.sum = #pragma omp parallel shared(sum).omp_data_i = &.omp_data_o; #pragma omp for nowait private(i) for (i = 0; i <= 9; i = i + 1) D.1576 =.omp_data_i >sum; sync_fetch_and_add_4 (D.1576, i.0); OMP_RETURN OMP_RETURN printf (&"sum = %d\n"[0], sum.1); To be Outlined 30 Jun 2006 14

Final expansion (main) main () { int i, sum = 0; struct.omp_data_s.omp_data_o; }.omp_data_o.sum = builtin_gomp_parallel_start (main.omp_fn.0, &.omp_data_o, 0); main.omp_fn.0 (&.omp_data_o); builtin_gomp_parallel_end (); printf ("sum = %d\n", sum); 30 Jun 2006 15

Final Expansion (main.omp_fn.0) main.omp_fn.0 (.omp_data_i) { D.1581 = builtin_omp_get_num_threads(); D.1582 = (unsigned int) D.1581; D.1583 = builtin_omp_get_thread_num(); D.1584 = (unsigned int) D.1583; D.1585 = 10 / D.1582; D.1586 = D.1585 * D.1582; D.1587 = D.1586!= 10; D.1588 = D.1585 + D.1587; D.1589 = D.1588 * D.1584; D.1590 = D.1589 + D.1588; D.1591 = MIN_EXPR <D.1590, 10>; if (D.1589>=D.1591) goto <L2> else goto <L0> <L2>: return; <L0>: D.1592 = (int) D.1589; D.1593 = D.1592 * 1; i = D.1593 + 0; D.1594 = (int) D.1591; D.1595 = D.1594 * 1; D.1596 = D.1595 + 0; <L1>:; D.1576 =.omp_data_i->sum; sync_fetch_and_add_4 (D.1576, i); i = i + 1; D.1597 = i < D.1596; if (D.1597) goto <L1>; else goto <L2>; } Iteration space partitioning Local min/max limits 30 Jun 2006 16

Auto Parallelization OMP codes can be emitted before pass_expand_omp. OMP_SECTIONS task parallelism OMP_FOR loop parallelism OMP sharing clauses data sharing Synchronization with appropriate directives Code should be in normal form Works in low GIMPLE CFG is available 30 Jun 2006 17

Runtime library Wrapper around POSIX threads Various system-specific performance tweaks Synchronization usually 1-1 mapping except omp master Blocks threads with ID!= 0 omp single copyprivate needs special expansion to handle broadcast. All scheduling variants of omp for implemented Static schedules are open coded by compiler 30 Jun 2006 18

Status and Future Work To be released with 4.2 later this year Implementation available in Fedora Core 5 Automatic parallelism planned using OpenMP infrastructure Usual tweaks and bug fixing 30 Jun 2006 19

applu 154.9 147.3-4.9% equake 267.2 264.5-1.0% apsi 179.0 179.0 0.0% fma3d 139.0 133.0-4.3% ammp 140.0 153.0 9.3% Mean 169.11 167.31-1.1% Score 275.0 250.0 225.0 200.0 175.0 150.0 125.0 100.0 75.0 50.0 25.0 SPEC OMP2001 (-O2) 0.0 wupwise mgrid equake fma3d swim applu apsi ammp Benchmarks Mean ICC 9.0 GCC 4.2.0 30 Jun 2006 20