Parallel Programming. Libraries and implementations

Similar documents
Parallel Programming. Libraries and Implementations

Parallel Programming Libraries and implementations

The Cray Programming Environment. An Introduction

Message Passing Programming. Introduction to MPI

Programming Environment 4/11/2015

Message-Passing Programming with MPI. Message-Passing Concepts

Blue Waters Programming Environment

ParaTools ThreadSpotter Analysis of HELIOS

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

The Cray Programming Environment. An Introduction

HPC Architectures. Types of resource currently in use

Lecture 4: OpenMP Open Multi-Processing

Our new HPC-Cluster An overview

OpenMP 4.0/4.5. Mark Bull, EPCC

First steps on using an HPC service ARCHER

CFD exercise. Regular domain decomposition

OpenMP 4.0. Mark Bull, EPCC

Holland Computing Center Kickstart MPI Intro

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Shared Memory programming paradigm: openmp

Stable Cray Support in EasyBuild 2.7. Petar Forai

Parallel Programming Overview

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Message Passing Programming. Designing MPI Applications

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

Message Passing Interface

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

Cray Scientific Libraries. Overview

Cray Programming Environment User's Guide S

Intel Xeon Phi Coprocessor

COMP528: Multi-core and Multi-Processor Computing

Threaded Programming. Lecture 9: Alternatives to OpenMP

MPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018

The Arm Technology Ecosystem: Current Products and Future Outlook

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Introduction to MPI. Ekpe Okorafor. School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Cray Scientific Libraries: Overview and Performance. Cray XE6 Performance Workshop University of Reading Nov 2012

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Shared Memory Programming with OpenMP

Message-Passing Programming with MPI

OpenMP: Open Multiprocessing

Don t reinvent the wheel. BLAS LAPACK Intel Math Kernel Library

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4

CS420: Operating Systems

OpenACC 2.6 Proposed Features

Molecular Modelling and the Cray XC30 Performance Counters. Michael Bareford, ARCHER CSE Team

OpenACC. Part I. Ned Nedialkov. McMaster University Canada. October 2016

Introduction to parallel computing concepts and technics

Parallel design patterns ARCHER course. Vectorisation and active messaging

ARCHER Champions 2 workshop

MPI and CUDA. Filippo Spiga, HPCS, University of Cambridge.

COMPILING FOR THE ARCHER HARDWARE. Slides contributed by Cray and EPCC

Reusing this material

Computer Architecture

Pragma-based GPU Programming and HMPP Workbench. Scott Grauer-Gray

CS 426. Building and Running a Parallel Application

S Comparing OpenACC 2.5 and OpenMP 4.5

CSCS Proposal writing webinar Technical review. 12th April 2015 CSCS

Practical: a sample code

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Parallel Computing. November 20, W.Homberg

High Performance Computing Course Notes Message Passing Programming I

PROGRAMMING MODEL EXAMPLES

Outline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples

Parallel Computing: Overview

Tech Computer Center Documentation

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

Parallel Computing. Lecture 17: OpenMP Last Touch

Parallel Programming in C with MPI and OpenMP

OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel

Fractals. Investigating task farms and load imbalance

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Parallel Paradigms & Programming Models. Lectured by: Pham Tran Vu Prepared by: Thoai Nam

Debugging on Blue Waters

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Practical Introduction to Message-Passing Interface (MPI)

Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015

OpenACC Accelerator Directives. May 3, 2013

Introduction to Parallel Programming

Shifter on Blue Waters

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

Compiling applications for the Cray XC

JCudaMP: OpenMP/Java on CUDA

Introduction to SahasraT. RAVITEJA K Applications Analyst, Cray inc E Mail :

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU

The Message Passing Model

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer

Batch Systems. Running your jobs on an HPC machine

OpenACC (Open Accelerators - Introduced in 2012)

Workshop Agenda Feb 25 th 2015

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Transcription:

Parallel Programming Libraries and implementations

Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us This means you are free to copy and redistribute the material and adapt and build on the material under the following terms: You must give appropriate credit, provide a link to the license and indicate if changes were made. If you adapt or build on the material you must distribute your work under the same license as the original. Note that this presentation contains images owned by others. Please seek their permission before reusing these images.

http://www.archer.ac.uk support@archer.ac.uk

Outline How we manage software packages & libraries on ARCHER MPI distributed memory de-facto standard Using MPI OpenMP shared memory de-facto standard Using OpenMP Other parallel programming technologies CUDA, OpenCL, OpenACC Examples of common scientific libraries

Module environment The module environment allows you to easily load different packages and manage different versions of packages. Via the module command List loaded modules, view available modules, load and unload modules user@eslogin001:~> module list Currently Loaded Modulefiles: 1) modules/3.2.10.2 15) rca/1.0.0-2.0502.57212.ari 2) eswrap/1.3.3-1.020200.1278.0 16) atp/1.8.3 3) switch/1.0-1.0502.57058.1.58.ari 17) PrgEnv-cray/5.2.56 4) craype-network-aries 18) pbs/12.2.401.141761 5) craype/2.4.2 19) craype-ivybridge 6) cce/8.4.1 20) cray-mpich/7.2.6 7) cray-libsci/13.2.0 21) packages-archer 8) udreg/2.3.2-1.0502.9889.2.20.ari 22) bolt/0.6 9) ugni/6.0-1.0502.10245.9.9.ari 23) nano/2.2.6 10) pmi/5.0.7-1.0000.10678.155.25.ari 24) leave_time/1.0.0 11) dmapp/7.0.1-1.0502.10246.8.4.ari 25) quickstart/1.0 12) gni-headers/4.0-1.0502.10317.9.ari 26) ack/2.14 13) xpmem/0.1-2.0502.57015.1.15.ari 27) xalt/0.6.0 14) dvs/2.5_0.9.0-1.0502.1958.2.55.ari 28) epcc-tools/6.0

Using the module environment user@eslogin001:~> module avail PrgEnv-cray/5.1.29 PrgEnv-cray/5.2.56(default) PrgEnv-gnu/5.1.29 PrgEnv-gnu/5.2.56(default) PrgEnv-intel/5.1.29 PrgEnv-intel/5.2.56(default) cray-mpich/6.3.1 cray-mpich/7.1.1 cray-mpich/7.2.6(default) cray-mpich/7.3.2 cray-mpich/7.4.2 cray-netcdf/4.3.2 cray-netcdf/4.4.0 cray-netcdf/4.3.3.1(default) cray-netcdf/4.4.1 cray-petsc/3.5.2.1 cray-petsc/3.6.3.0 cray-petsc/3.6.1.0 (default) cray-petsc/3.7.2.0 fftw/2.1.5.7 fftw/2.1.5.9 fftw/3.3.4.5(default) fftw/3.3.4.7 fftw/3.3.4.9 user@eslogin001:~> module load fftw user@eslogin001:~> module unload fftw user@eslogin001:~> module load fftw/2.1.5.7 user@eslogin001:~> module swap fftw/2.1.5.7 fftw/3.3.4.9 user@eslogin001:~> module swap PrgEnv-cray PrgEnv-gnu

MPI Library Distributed, message-passing programming

Message-passing concepts

What is MPI? Message Passing Interface MPI is not a programming language There is no such thing as an MPI compiler MPI is available as a library of function/subroutine calls The library implements the MPI standard The C or Fortran compiler knows nothing about what MPI actually does Just the prototype/interfaces of the functions/subroutine It is just another library

The MPI standard MPI itself is a standard Agreed upon by approx 100 representatives from about 40 organisations (the MPI forum) Academics Industry Vendors Application developers First standard (MPI version 1.0) drafted in 1993 We are currently on version 3 Version 4 is being drafted

MPI Libraries The MPI forum defines the standard and vendors/open source developers then actually implement this There are a number of different implementations but all should support version 2.0 or 3.0 As with compilers there are variations in implementation details but all features in the standards should work Examples: MPICH and OpenMPI Cray-MPICH on ARCHER which implements version 3.1 of the standard (optimised for Cray machines, specifically the interconnect)

Features of MPI MPI is a portable library used for writing parallel programs using the message passing model You can expect MPI to be available on any HPC platform you use Aids portability between HPC machines and is trivial to install on local clusters Based on a number of processes running independently in parallel The HPC resource provides the command to launch the processes in parallel (i.e. aprun or mpiexec) Can think of each process as an instance of your executable communicating with other instances

Explicit Parallelism In message-passing all the parallelism is explicit The program includes specific instructions for each communication What to send or receive When to send or receive Synchronisation It is up to the developer to design the parallel decomposition and implement it How will you divide up the problem? When will you need to communicate between processes?

Supported features Point to point communications Communications involving two processes; a sender and receiver Wide variety of semantics involving non-blocking communications Other aspects such as wildcards & custom data types Collective communications Communication that involves many processes Implements all the collective communications we saw in the programming models lecture and many more Also supports non-blocking communications and custom data types

Example: MPI HelloWorld #include mpi.h int main(int argc, char* argv[]) { int size,rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf("hello world - I'm rank %d of %d\n", rank, size); } MPI_Finalize(); return 0;

OpenMP Shared-memory parallelism using directives

Shared-memory concepts Threads communicate by having access to the same memory space Any thread can alter any bit of data No explicit communications between the parallel tasks

OpenMP Open Multi Processing Application programming interface (API) for shared variable programming Set of extensions to C, C++ and Fortran Compiler directives Runtime library functions Environment variables Not a library interface like MPI Uses directives, which are a special line in the source code with a meaning understood by the compilers Ignored if OpenMP is disabled and it becomes regular sequential code This is also a standard (http://openmp.org)

Features of OpenMP Directives define parallel regions in the code OpenMP threads are active in these regions and divide the workload amongst themselves The compiler needs to understand what OpenMP does It is responsible for producing the parallel code OpenMP supported by all common compilers used in HPC Parallelism less explicit than MPI You just specify what parts of the program you want to run in parallel OpenMP version 4.5 is the latest version Can be used to program the Xeon Phi

Loop-based parallelism The most common form of OpenMP parallelism is to parallelise the work in a loop The OpenMP directives tell the compiler to divide the iterations of the loop between the threads #pragma omp parallel shared(a,b,c) private(i) { #pragma omp for schedule(dynamic) nowait for (i=0; i < N; i++) { c[i] = a[i] + b[i]; } }

Addition example asum = 0.0 #pragma omp parallel \ shared(a,n) private(i) \ reduction(+:asum) { #pragma omp for for (i=0; i < N; i++) { asum += a[i]; } } printf( asum = %f\n, asum); asum=0 loop: i = istart,istop myasum += a[i] end loop asum

Other parallel programming technologies

CUDA CUDA is an Application Program Interface (API) for programming NVIDIA GPU accelerators Proprietary software provided by NVIDIA. Should be available on all systems with NVIDIA GPU accelerators Write GPU specific functions called kernels Launch kernels using syntax within standard C programs Includes functions to shift data between CPU and GPU memory Similar to OpenMP programming in many ways in that the parallelism is implicit in the kernel design and launch

OpenCL An open, cross-platform standard for programming accelerators includes GPUs, e.g. from both NVIDIA and AMD also Xeon Phi, Digital Signal Processors,... Comprises a language + library Harder to write than CUDA if you have NVIDIA GPUs but portable across multiple platforms although maintaining performance is difficult

Other parallel implementations Partitioned Global Address Space (PGAS) Coarray Fortran, Unified Parallel C, Chapel Cray SHMEM, OpenSHMEM Single-sided communication library OpenACC Directive-based approach for programming accelerators

Common scientific parallel libraries Two examples

PETSc Portable Extensible Toolkit for Scientific Computation Suite of data structures & routines for the parallel and scalable solution of PDEs The programmer uses the library framework itself which under the hood will use parallel technologies MPI, OpenMP and/or CUDA. Unlike many serial libraries, you the programmer are responsible for performance & scalability.

NetCDF Network Common Data Form Self describing, machine independent file data format and implementation that is very common for writing and reading scientific data Parallel version supporting parallel IO Multiple processes/threads can read and write to a file concurrently Built on top of MPI Many third party tools such as visualisation suites Again requires user understanding, both from the programmer and also the user (file configuration options)

Summary

Parallel and scientific libraries The module environment is an easy way of managing many different software packages, their dependencies and different versions. Distributed memory programmed using MPI Shared memory programmed using OpenMP GPU accelerators most often programmed using CUDA There are very many software packages installed on ARCHER, but scientific libraries often require in-depth knowledge and understanding to get good performance.