SCALABLE HYBRID PROTOTYPE

Similar documents
Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

MIC Lab Parallel Computing on Stampede

Debugging Intel Xeon Phi KNC Tutorial

Overview of Intel Xeon Phi Coprocessor

Slurm basics. Summer Kickstart June slide 1 of 49

Symmetric Computing. Jerome Vienne Texas Advanced Computing Center

Symmetric Computing. SC 14 Jerome VIENNE

Symmetric Computing. John Cazes Texas Advanced Computing Center

Using a Linux System 6

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Introduc)on to Xeon Phi

Xeon Phi Native Mode - Sharpen Exercise

Introduc)on to Hyades

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Graham vs legacy systems

Xeon Phi Native Mode - Sharpen Exercise

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Symmetric Computing. ISC 2015 July John Cazes Texas Advanced Computing Center

n N c CIni.o ewsrg.au

Introduction to CINECA Computer Environment

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary

Advanced Threading and Optimization

Introduction to HPC Using zcluster at GACRC

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

Introduction to SLURM & SLURM batch scripts

Beginner's Guide for UK IBM systems

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

The RWTH Compute Cluster Environment

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Computing with the Moore Cluster

Introduction to GALILEO

Managing and Deploying GPU Accelerators. ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center

Introduction to GALILEO

HPC Introductory Course - Exercises

Introduction to HPC2N

Introduction to SLURM & SLURM batch scripts

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

How to run a job on a Cluster?

High Performance Computing Cluster Advanced course

Sherlock for IBIIS. William Law Stanford Research Computing

Introduction to PICO Parallel & Production Enviroment

Cluster Clonetroop: HowTo 2014

Stampede User Environment

Ambiente CINECA: moduli, job scripts, PBS. A. Grottesi (CINECA)

Practical Introduction to

Native Computing and Optimization. Hang Liu December 4 th, 2013

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Description of Power8 Nodes Available on Mio (ppc[ ])

Introduction to HPC Using zcluster at GACRC

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

Our Workshop Environment

MVAPICH MPI and Open MPI

Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to RCC. January 18, 2017 Research Computing Center

Introduction to SLURM & SLURM batch scripts

JURECA Tuning for the platform

Advanced Job Launching. mapping applications to hardware

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Introduction to Unix Environment: modules, job scripts, PBS. N. Spallanzani (CINECA)

Introduction to High-Performance Computing (HPC)

Compiling applications for the Cray XC

Exercises: Abel/Colossus and SLURM

High Performance Computing (HPC) Using zcluster at GACRC

Native Computing and Optimization on Intel Xeon Phi

Our new HPC-Cluster An overview

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction to GALILEO

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Beacon Quickstart Guide at AACE/NICS

Message Passing Interface (MPI) on Intel Xeon Phi coprocessor

Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro

Hands-on. MPI basic exercises

Batch Systems. Running calculations on HPC resources

Working with Shell Scripting. Daniel Balagué

High Performance Beowulf Cluster Environment User Manual

Introduction to High-Performance Computing (HPC)

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero

Intel Xeon Phi Coprocessor

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

Batch Systems. Running your jobs on an HPC machine

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

Introduction to HPC Using zcluster at GACRC

Accelerator Programming Lecture 1

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda

Intel MPI Cluster Edition on Graham A First Look! Doug Roberts

LAB. Preparing for Stampede: Programming Heterogeneous Many- Core Supercomputers

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

Docker task in HPC Pack

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

How to Use a Supercomputer - A Boot Camp

Cerebro Quick Start Guide

Introduction to the Cluster

Transcription:

SCALABLE HYBRID PROTOTYPE

Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform Whole system benchmarking energy efficiency, productivity and performance Located at CSC IT Center for Science Ltd Espoo, Finland Documentation of the system https://confluence.csc.fi/display/hpcproto/hpc+prototypes

Current Configuration master Head node (frontend) Users login here Program development and test runs Contains a single Xeon Phi for development Freely shared resource: Do not use for heavy computation or performance measurements! node[02-10] Compute nodes Accessible via the batch job queue system node[02-05] Xeon Phi node[02-05]-mic0 Xeon Phi hostnames node[06-10] Nvidia Kepler

Diagram of a Xeon Phi node Host Xeon Phi Host RAM (bank 0) 16GB MIC memory banks 8GB DDR3 51,2GB/s GDDR5 320GB/s CPU0 c0 c1 c2 L3 cache (20MB) c3 c4 c5 PCIe2 8GB/s MIC CPU c58 c0 c1 c59 L2 cache (30MB) Other nodes QPI 96GB/s CPU1 c0 c1 c2 L3 cache (20MB) c3 c4 c5 PCIe2 8GB/s InfiniBand card InfiniBand HCA FDR InfiniBand 7GB/s DDR3 51,2GB/s Host RAM (bank 1) 16GB Packages Cores Memory

First Login ssh to hybrid.csc.fi with your training account $ ssh Y hybrid.csc.fi l trngnn Create a passwordless host key $ ssh-keygen -f $HOME/.ssh/id_rsa -N ' $ cp $HOME/.ssh/id_rsa.pub $HOME/.ssh/authorized_keys Try logging into the MIC card Hostname mic0 or master-mic0 $ ssh mic0

Using Modules Environment-modules package used to manage different programming environment settings Examples of usage To load the latest version of Intel compilers, use: $ module load intel To see all available modules: $ module avail To see what modules are loaded $ module list

Custom configuration on Hybrid NFS mounts /home, /share, /usr/local Additional native support libraries and programs Python, HDF5, gcc etc. Small libraries and utilities (strace etc.) SLURM batch job queuing system Execution auto-offload on frontend Some common paths preset on the Phi i.e. /opt/intel/composerxe/mic/lib64

Execution Auto-offload Developed at CSC Implemented in the frontend node Makes e.g. cross-compiling much easier 1. Detects if MIC binary is executed on the host Normally this fails with cannot execute binary file 2. Runs the binary on the Xeon Phi using micrun Transparent to the end user Environment variables are passed with MIC_ prefix Return values are passed correctly Can be disabled by MICRUN_DISABLE=1

SLURM Batch Job Queue System Reserves and allocates nodes to jobs At CSC we are moving to use SLURM on all systems Designed for HPC from the ground up Open source, extendable, lightweight Becoming increasingly popular in the HPC community MIC support in development Offload support in v. 2.5 (Nov 2012) Native/symmetric model via a helper script

Checking the queue $ squeue Checking node status $ sinfo [-r] Running a job interactively $ srun [command] Sending a batch job $ sbatch [job script] SLURM commands For simplicity all of the following examples use interactive execution (srun). However for real work you should run batch jobs.

Submitting interactive jobs (srun) Interactive shell session $ srun - pty /bin/bash -l $ hostname node02 $ exit $ hostname master Single thread on MIC $ srun./omphello.mic Hello from thread 0 at node02-mic0 Multiple threads on MIC $ export MIC_OMP_NUM_THREADS=2 $ srun./omphello.mic Hello from thread 0 at node02-mic0 Hello from thread 1 at node02-mic0 Remember to exit the Interactive session! All MIC_ prefixed env. variables will be passed to the MIC card

Submitting an Offload Job Applicable to LEO, OpenCL, MKL offload Requires the GRES parameter to be used $ srun -gres=mic:1./hello_offload Hello from offload section in node02-mic0 If you don t use it, you get a cryptic error $ srun./hello_offload offload warning: OFFLOAD_DEVICES device number -1 does not correspond to a physical device MPI offload job $ srun n 2 -tasks-per-node 1./mpihello_offload Hello from offload section in node02-mic0 Hello from offload section in node03-mic0

Submitting a native MPI job MPI tasks only on MIC nodes Several parameters must be defined Define # tasks and threads with environment variables MIC_PPN and MIC_OMP_NUM_THREADS Set number of nodes using N slurm flag Use mpirun-mic to launch the executable Use the m flag to specify the MIC executable MPI tasks per MIC Threads per MIC MPI task $ export MIC_PPN=4 $ export MIC_OMP_NUM_THREADS=60 $ srun -N 2 mpirun-mic -m./mpiomphello.mic Node count MIC executable

Submitting a Symmetric Job MPI tasks on MIC and host Similar to native MPI but some more parameters Define # of host tasks with environment variable OMP_NUM_THREADS Use SLURM flags to define # of CPU host tasks For example -n and --tasks-per-node Add the executable to the mpirun-mic command Use the -c flag to specify the CPU host executable MPI tasks per MIC Threads per MIC MPI task Threads per host MPI task $ export MIC_PPN=4 $ export MIC_OMP_NUM_THREADS=60 $ export OMP_NUM_THREADS=6 $ srun -n 2 mpirun-mic -m./mpiomphello.mic c./mpiomphello Host MPI task count MIC executable MIC executable

Further mpirun-mic settings The v flag shows the underlying mpiexec command to be run The h flag provides help You can define additional parameters to the underlying mpiexec command by setting the following env variables MPIEXEC_FLAGS_HOST & MPIEXEC_FLAGS_MIC For example: $ export MPIEXEC_FLAGS_HOST= -prepend-rank \ env KMP_AFFINITY verbose

Protip: MIC Environment Variables You may want to load a set of environment variables to the MIC card but not on the host This might be difficult with a shared home directory Put a conditional like this in your $HOME/.profile to run MIC-specific environment setup commands if [ `uname -m` == 'k1om' ]; then echo I am MIC! fi

Protip: Cross-compiling in Practice GNU cross-compiler environment for Phi Located in /usr/x86_64-k1om-linux Enables building legacy libraries and applications for Xeon Phi In practice it can be difficult Typical build script (usually./configure) rarely designed with good cross-compiling support Requires a varying extent of hand tuning The executable auto offload makes things somewhat easier

Typical Cross-compile on Hybrid 1. Set environment variables to point to cross-compiler and support libraries export LDFLAGS='-L/usr/local/linux-k1om-4.7/x86_64-k1om-linux/lib/ \ -Wl,-rpath=/usr/local/linux-k1om-4.7/x86_64-k1om-linux/lib/ export CFLAGS="-I/usr/local/linux-k1om-4.7/x86_64-k1om-linux/include 2. Run configure./configure --host=x86_64-k1om-linux [other configure flags] 3. Fix linker flags in all Makefiles and libtools that are probably incorrect for i in `find -name Makefile` ;do sed -i -e \ 's@-m elf_x86_64@-m elf_k1om@' $i;done for i in `find -name libtool`; do sed -i -e \ 's@-m elf_x86_64@-m elf_k1om@' $i;done 4. Run make make