Practical: a sample code

Similar documents
Compiling applications for the Cray XC

OpenACC and the Cray Compilation Environment James Beyer PhD

First steps on using an HPC service ARCHER

COMPILING FOR THE ARCHER HARDWARE. Slides contributed by Cray and EPCC

Our Workshop Environment

PROGRAMMING MODEL EXAMPLES

Introduction to PICO Parallel & Production Enviroment

OpenACC compiling and performance tips. May 3, 2013

Blue Waters Programming Environment

Programming Environment 4/11/2015

Sharpen Exercise: Using HPC resources and running parallel applications

Introduction to SahasraT. RAVITEJA K Applications Analyst, Cray inc E Mail :

Introduction to GALILEO

CSCS Proposal writing webinar Technical review. 12th April 2015 CSCS

Introduction to GALILEO

User Guide of High Performance Computing Cluster in School of Physics

Sharpen Exercise: Using HPC resources and running parallel applications

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Introduction to GALILEO

AWP ODC QUICK START GUIDE

Steps to create a hybrid code

OpenACC Accelerator Directives. May 3, 2013

Batch environment PBS (Running applications on the Cray XC30) 1/18/2016

Effective Use of CCV Resources

High Performance Beowulf Cluster Environment User Manual

An Introduction to OpenACC

Running applications on the Cray XC30

Introduction to CINECA HPC Environment

Parallel Programming. Libraries and implementations

Using the MaRC2 HPC Cluster

Introduction to HPC Numerical libraries on FERMI and PLX

Running Jobs on Blue Waters. Greg Bauer

CSC Supercomputing Environment

Using the computational resources at the GACRC

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System

Never forget Always use the ftn, cc, and CC wrappers

Getting started with the CEES Grid

Heidi Poxon Cray Inc.

Stable Cray Support in EasyBuild 2.7. Petar Forai

Parallel Programming with Fortran Coarrays: Exercises

How to get Access to Shaheen2? Bilel Hadri Computational Scientist KAUST Supercomputing Core Lab

Our new HPC-Cluster An overview

Quick Guide for the Torque Cluster Manager

Batch Systems. Running calculations on HPC resources

OpenACC Course. Office Hour #2 Q&A

PRACTICAL MACHINE SPECIFIC COMMANDS KRAKEN

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization

Supercomputing environment TMA4280 Introduction to Supercomputing

Accelerator programming with OpenACC

Illinois Proposal Considerations Greg Bauer

Homework 1 Due Monday April 24, 2017, 11 PM

Debugging on Blue Waters

ISTeC Cray High-Performance Computing System. Richard Casey, PhD RMRCE CSU Center for Bioinformatics

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Introduction to CINECA Computer Environment

Shifter and Singularity on Blue Waters

Answers to Federal Reserve Questions. Training for University of Richmond

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Introduction to Unix Environment: modules, job scripts, PBS. N. Spallanzani (CINECA)

NBIC TechTrack PBS Tutorial

Introduction to Discovery.

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

GPU Cluster Usage Tutorial

Cray Programming Environment User's Guide S

VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING. BSC Tools Hands-On. Germán Llort, Judit Giménez. Barcelona Supercomputing Center

Advanced Research Computing. ARC3 and GPUs. Mark Dixon

Shifter on Blue Waters

Introduction to Discovery.

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Wednesday, August 10, 11. The Texas Advanced Computing Center Michael B. Gonzales, Ph.D. Program Director, Computational Biology

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

Introduc)on to Pacman

XSEDE New User Tutorial

Introduction to HPC Using zcluster at GACRC

Graham vs legacy systems

The Cray Programming Environment. An Introduction

Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1

Programming Environment on Ranger Cluster

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

Introduction to HPC Using zcluster at GACRC

Our Workshop Environment

OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4

ISTeC Cray High Performance Computing System User s Guide Version 5.0 Updated 05/22/15

Intel Xeon Phi Coprocessor

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro

New User Seminar: Part 2 (best practices)

Our Workshop Environment

Introduction to High-Performance Computing (HPC)

Grid Engine Users Guide. 5.5 Edition

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Compiling environment

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

Ambiente CINECA: moduli, job scripts, PBS. A. Grottesi (CINECA)

High Performance Computing (HPC) Using zcluster at GACRC

OBTAINING AN ACCOUNT:

Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

XSEDE New User Tutorial

An Introduction to Cluster Computing Using Newton

Transcription:

Practical: a sample code Alistair Hart Cray Exascale Research Initiative Europe 1 Aims The aim of this practical is to examine, compile and run a simple, pre-prepared OpenACC code The aims of this are: to familiarise you with the system to let you explore the compiler and runtime feedback 2 1

The system You are using a small Cray system called "raven" it is a hybrid XE6/XK6 system XE6: each node is dual AMD Interlagos (total of 32 cores) XK6: each node is one AMD Interlagos and one Nvidia Fermi+ X2090 GPU you log in and compile on a front end node you run the jobs by submitting a jobscript to the PBS batch system jobs will not run from the front end command line you select the XK6 nodes by submitting to a special PBS queue there are two filesystems home directories the lustre filesystem you should submit jobs from a directory on the lustre filesystem 3 Raven Raven is part of the Cray Marketing Partner Network to use it you must agree to the Terms and Conditions in particular, you cannot publish or show performance figures without Cray's prior approval Raven is not backed up. At all. the lustre filesystem can be purged at any time if it gets too full please copy any important (but small) results files back to your home directory the home directories are also not backed up no-one is going to delete them without notice, but hardware can fail After the course ends the training accounts that you are using are temporary and will be deleted at the end of the course. it is your responsibility to copy any files that you wish to keep to another, non-cray system before the course ends. if you want a more permanent account to continue working on OpenACC, please contact me 4 2

Getting started Cray uses a linux-based environment on the login nodes You will have a bash login shell by default All the usual linux commands are available Software versions are loaded and unloaded using the Gnu module command (see man module) To see which modules are currently loaded, type: module list To see which modules are available, type: module avail You can wildcard the end of the names, e.g.: module avail PrgEnv* For more complicated grepping, you need to redirect stderr to stdout, e.g. module avail 2>&1 grep "Env" You load a new module by typing: module load <module name> Some modules (e.g. different compiler versions) conflict, so you should first "module unload" the old version (or use "module swap") 5 Programming Environments A number of different compilers are supported You select these by loading a Programming Environment module PrgEnv-cray for CCE (the default) PrgEnv-pgi for PGI PrgEnv-gnu for gcc, gfortran Once one of these is loaded, you can then select a compiler suite CCE: module avail cce Make sure you type: module swap cce cce/81.newest PGI: module avail pgi use the default module pgi/12.7.0 Gnu: module avail gcc use the default module gcc/4.7.1 For GPU programming (CUDA, OpenCL, OpenACC...) make sure you: module load craype-accel-nvidia20 6 3

Using the compilers You use the compilers via wrapper functions ftn for Fortran; cc for C; CC for C++ it doesn't matter which PrgEnv is loaded the wrappers add optimisation options, architecture-specific stuff and all the important library paths in many cases, you don't need any other compiler options if you want unoptimised code, you must use option -O0 Further information man pages for the wrapper commands give you general information For more detail see the compiler-specific man pages CCE: crayftn, craycc, craycc PGI: pgfortran, pgcc GNU: gfortran, gcc You will need the appropriate PrgEnv module loaded to see these 7 Some Cray Compilation Environment basics CCE-specific features: Optimisation: -O2 is the default and you should usually use this -O3 activates more aggressive options; could be faster or slower OpenMP: is supported by default. if you don't want it, use either -hnoomp or -xomp compiler flags CCE only gives minimal information to stderr when compiling to see more information, you should request a compiler listing file flags -ra for ftn or -hlist=a for cc writes a file with extension.lst contains annotated source listing, followed by explanatory messages each message is tagged with an identifier, e.g.: ftn-6430 to get more information on this, type: explain <identifier> For a full description of the Cray compilers, see the reference manuals at http://docs.cray.com. 8 4

Compiling CUDA Compilation: module load craype-accel-nvidia20 Main CPU code compiled with PrgEnv "cc" wrapper either PrgEnv-gnu for gcc; or PrgEnv-cray for craycc GPU CUDA-C kernels compiled with nvcc nvcc -O3 -arch=sm_20 update the -arch option for Kepler PrgEnv "cc" wrapper used for linking Only GPU flag needed: -lcudart e.g. no CUDA -L flags needed (added in cc wrapper) 9 Compiling OpenCL Compilation: module load craype-accel-nvidia20 Main CPU code compiled with PrgEnv "cc" wrapper either PrgEnv-gnu for gcc; or PrgEnv-cray for craycc GPU OpenCL kernels compiled with nvcc PrgEnv "cc" wrapper used for linking Only GPU flag needed: -lopencl Alternatively: Use PrgEnv-gnu for all compilation still need -lopencl at linktime 10 5

Submitting jobs and the lustre filesystem You should submit jobs from the lustre filesystem you can compile there as well if you wish Create a unique directory for yourself: mkdir -p /lus/scratch/$user and subdirectories if you want To submit a job, create a PBS jobscript there is a skeleton script provided as part of the tutorial materials just rename the executable note that command aprun is used by the jobscript to run the executable. submit the job using command: qsub <jobscript name> other options are specified in the jobscript a job number (ending in.sdb) is returned to view the queued and running jobs: qstat to stop a queued or running job: qdel <job number> 11 The sample code The sample code Designed to demonstrate functionality not interested in performance at this stage Implements the simple example from the lectures A 3d array a is initialised It's values are doubled and stored in a new array b A checksum is calculated and compared with the expected result These are implemented as 3 OpenACC kernels There are three versions of the code Version 00 has all 3 kernels in same main program. There is no attempt to keep data on the GPU between the kernels. Version 01 uses a data region to avoid data sloshing. Version 02 has more complicated calltree calls a subroutine that contains an OpenACC kernel. This kernel also contains a function call. 12 6

Code versions and building them There are versions for 4 different programming models C or Fortran, with static or dynamic allocation of arrays N.B. there is no version00 for dynamic arrays with C (see note in version01) source filename based on these, e.g. first_example_fstatic_v00.f90 Get your environment right make sure you have the right PrgEnv loaded (cray or pgi) make sure you have loaded the correct compiler version module make sure you have loaded module craype-accel-nvidia20 Build the code PrgEnv-cray: ftn -ra <Fortran source file> cc -hlist=a <C source file> PrgEnv-pgi: ftn -Minfo=all <Fortran source file> cc -Minfo=all <C source file> 13 Automation You can do it all by hand if you wish, or use automation There's nothing magic being done here Automated building: can just type: make VERSION=[00 01 02] [F C][static dynamic] Makefile will echo commands it uses to build the code automatically detects which PrgEnv you are using (uses PE_ENV env. var.) remember to type "make clean" if you switch PrgEnv modules Automated building and job submission type: bash build_submit.bash MYPE TARGET VERSION MYPE should be cray or pgi TARGET should be Fstatic or Fdynamic or Cstatic or Cdynamic VERSION should be 00 or 01 or 02 This will: load the correct modules using script../xk_setup.bash build the code using the Makefile create directory: /lus/scratch/$user/openacc_training/practical1/target_version_date_time write and submit a PBS jobscript You can then cd to this directory and look at the output 14 7

What to check Check correctness Did the code compile correctly? Did the job execute? Was the answer correct? Next, understand what did the compiler did examine and understand the compiler feedback CCE: open the.lst file PGI: read the output to stdout did it compile for the accelerator? what data did it plan to move and when? how were the loop iterations scheduled? 15 What actually ran? Did we actually run on the accelerator? We can ask the runtime for some feedback cd to run directory, edit jobscript, uncomment appropriate line CCE: set CRAY_ACC_DEBUG to 1 (least detailed) to 3 (most detailed) PGI: set ACC_NOTIFY Resubmit the job: qsub <jobscript name> Examine commentary (in the log file) and make sure you understand it Profiling the code A quick way of profiling is to use the Nvidia compute profiler CCE and PGI compile to PTX (as does nvcc), so this will work for all Edit the jobscript and uncomment the profiling line Resubmit the job Examine the profile (in file cuda_profile_0.log) Can change location with env. var. COMPUTE_PROFILE_LOG This is a "blow-by-blow" account Larger codes need a more aggregated report We will cover profiling in more detail later 16 8

Further work Choose a target and repeat this for all three versions Start with the Cray compiler Then either: repeat for a different programming model target, or try the PGI compiler 17 Getting the examples On raven: change to a directory where you want to work either in your home directory or under /lus/scratch/$user type: tar zxvf ~tr99/cray_openacc_training.tgz This creates a new directory./cray_openacc_training please note the file LICENCE.txt The codes for Practical 1 are in: Cray_OpenACC_training/Practical1 There is a README file that summarises these slides 18 9